Lecturer(s)
|
-
Fišer Jiří, Mgr. Ph.D.
-
Kubera Petr, RNDr. Ph.D.
|
Course content
|
1. Virtualization principles and tool overview 2. Creating unified environments (application isolation): Docker, CoreOS, rkt 3. Cluster architecture: an overview of Hadoop 4. Storage HDFS, HiveQL 5. MapReduce framework (principles) 6. Spark and its architecture 7. Spark modules: MLlib (machine learning), GraphX, Spark Streaming (data streaming) 8. - 9. NoSQL databases and BigData (MongoDB, Neo4j, Caché)
|
Learning activities and teaching methods
|
unspecified
|
Learning outcomes
|
The course is focused on the topic of processing large and rapidly growing volume of data through Hadoop technology or some types of NoSQL databases. The lectures cover the basic principles of distributed storage and distributed data processing, while the exercises focus on the implementation of sample examples. The introductory lectures are dedicated to the installation of individual software components and their cooperation with the use of virtualized containers.
|
Prerequisites
|
programming (Python or C or Matlab), relational databases
|
Assessment methods and criteria
|
unspecified
preparation and oral defense of a seminar program that processes big data in a distributed system, verification of general factual knowledge
|
Recommended literature
|
|