Data analytics and machine learning with Spark

Krishna Kumar, Department of Engineering, University of Cambridge

Course description

In this course, we will first cover the basics of using Spark for data analytics. Spark is rapidly becoming the compute engine of choice for big data. Spark programs are more concise and often run 10-100 times faster than Hadoop MapReduce jobs.

This course will teach you the basics of working with Spark (PySpark) and will provide you with the necessary foundation for diving deeper into Spark. You will learn about Spark’s architecture and programming model, including commonly used APIs. After completing this course, you will be able to write and debug basic Spark applications. The focus of this course will be Spark Core, Spark SQL, and Spark MLlib. This course will also cover real-time data streaming and processing as well as data visualisation techniques.


  • Knowledge of Unix/Linux command line and SSH.

  • Python programming

Last updated