Course title Big Data Storage and Tools
Course code KI/EBIG
Organizational form of instruction Lecture + Lesson
Level of course Master
Year of study not specified
Semester Winter
Number of ECTS credits 7
Language of instruction English
Status of course unspecified
Form of instruction Face-to-face
Work placements This is not an internship
Recommended optional programme components None
Course availability The course is available to visiting students
  • Fišer Jiří, Mgr. Ph.D.
  • Kubera Petr, RNDr. Ph.D.
Course content
1. Virtualization principles and tool overview 2. Creating unified environments (application isolation): Docker, CoreOS, rkt 3. Cluster architecture: an overview of Hadoop 4. Storage HDFS, HiveQL 5. MapReduce framework (principles) 6. Spark and its architecture 7. Spark modules: MLlib (machine learning), GraphX, Spark Streaming (data streaming) 8. - 9. NoSQL databases and BigData (MongoDB, Neo4j, Caché)

Learning activities and teaching methods
Learning outcomes
The course is focused on the topic of processing large and rapidly growing volume of data through Hadoop technology or some types of NoSQL databases. The lectures cover the basic principles of distributed storage and distributed data processing, while the exercises focus on the implementation of sample examples. The introductory lectures are dedicated to the installation of individual software components and their cooperation with the use of virtualized containers.


Assessment methods and criteria
Exam: preparation and oral defense of a seminar program that processes big data in a distributed system, verification of general factual knowledge Prerequisites: programming (Python or C or Matlab), relational databases
Recommended literature

