FID3019 Advanced course in Data-Intensive Computing
KTH Royal Institute of Technology
Enrolled as a doctoral student.
Topics:
• Distributed file systems
• No SQL databases
• Scalable messaging systems
• Big Data execution engines: Map-Reduce, Spark
• High level queries and interactive processing: Hive and Spark SQL
• Stream processing
• Graph processing
• Scalable machine learning
• Resource management
The course complements distributed systems courses, with a focus on processing, storing and analyzing massive data. It prepares the students for Ph.D. studies in the area of data-intensive computing systems.
The main objective of this course is to provide the students with a solid foundation for understanding large scale distributed systems used for storing and processing massive data.
More specifically after the course is completed the student will be able to:
• Explain the architecture and properties of the computer systems needed to store, search and index large volumes of data.
• Describe the different computational models for processing large data sets for data at rest (batch processing) and data in motion (stream processing).
• Use various computational engines to design and implements nontrivial analytics on massive data.
• Explain the different models for scheduling and resource allocation computational tasks on large computing clusters.
• Elaborate on the tradeoffs when designing efficient algorithms for processing massive data in a distributed computing setting.
Reviews
Improve accuracy by rating this course