Course: Big Data Storage and Tools

» List of faculties » PRF » KI
Course title Big Data Storage and Tools
Course code KI/EBIG
Organizational form of instruction Lecture + Lesson
Level of course Master
Year of study not specified
Semester Winter
Number of ECTS credits 7
Language of instruction English
Status of course unspecified
Form of instruction Face-to-face
Work placements This is not an internship
Recommended optional programme components None
Course availability The course is available to visiting students
Lecturer(s)
  • Fišer Jiří, Mgr. Ph.D.
  • Kubera Petr, RNDr. Ph.D.
Course content
1. Virtualization principles and tool overview 2. Creating unified environments (application isolation): Docker, CoreOS, rkt 3. Cluster architecture: an overview of Hadoop 4. Storage HDFS, HiveQL 5. MapReduce framework (principles) 6. Spark and its architecture 7. Spark modules: MLlib (machine learning), GraphX, Spark Streaming (data streaming) 8. - 9. NoSQL databases and BigData (MongoDB, Neo4j, Caché)

Learning activities and teaching methods
unspecified
Learning outcomes
The course is focused on the topic of processing large and rapidly growing volume of data through Hadoop technology or some types of NoSQL databases. The lectures cover the basic principles of distributed storage and distributed data processing, while the exercises focus on the implementation of sample examples. The introductory lectures are dedicated to the installation of individual software components and their cooperation with the use of virtualized containers.

Prerequisites
programming (Python or C or Matlab), relational databases

Assessment methods and criteria
unspecified
preparation and oral defense of a seminar program that processes big data in a distributed system, verification of general factual knowledge
Recommended literature


Study plans that include the course
Faculty Study plan (Version) Category of Branch/Specialization Recommended year of study Recommended semester