Vyučující
|
-
Fišer Jiří, Mgr. Ph.D.
-
Kubera Petr, RNDr. Ph.D.
|
Obsah předmětu
|
1. Virtualization principles and tool overview 2. Creating unified environments (application isolation): Docker, CoreOS, rkt 3. Cluster architecture: an overview of Hadoop 4. Storage HDFS, HiveQL 5. MapReduce framework (principles) 6. Spark and its architecture 7. Spark modules: MLlib (machine learning), GraphX, Spark Streaming (data streaming) 8. - 9. NoSQL databases and BigData (MongoDB, Neo4j, Caché)
|
Studijní aktivity a metody výuky
|
nespecifikováno
|
Výstupy z učení
|
The course is focused on the topic of processing large and rapidly growing volume of data through Hadoop technology or some types of NoSQL databases. The lectures cover the basic principles of distributed storage and distributed data processing, while the exercises focus on the implementation of sample examples. The introductory lectures are dedicated to the installation of individual software components and their cooperation with the use of virtualized containers.
|
Předpoklady
|
programming (Python or C or Matlab), relational databases
|
Hodnoticí metody a kritéria
|
nespecifikováno
preparation and oral defense of a seminar program that processes big data in a distributed system, verification of general factual knowledge
|
Doporučená literatura
|
|