# Introduction

## Data analytics and machine learning with Spark

> Krishna Kumar, Department of Engineering, University of Cambridge

![https://creativecommons.org/licenses/by-nc-sa/4.0/](https://img.shields.io/badge/license-cc--by--nc--4.0-brightgreen.svg)

![Build Status](https://travis-ci.org/kks32-courses/data-analytics.svg?branch=master)

![https://www.gitbook.com/read/book/kks32-courses/data-analytics](https://img.shields.io/badge/read--at-gitbooks-brightgreen.svg)

## Course description

In this course, we will first cover the basics of using Spark for data analytics. Spark is rapidly becoming the compute engine of choice for big data. Spark programs are more concise and often run 10-100 times faster than Hadoop MapReduce jobs.

This course will teach you the basics of working with Spark (PySpark) and will provide you with the necessary foundation for diving deeper into Spark. You will learn about Spark’s architecture and programming model, including commonly used APIs. After completing this course, you will be able to write and debug basic Spark applications. The focus of this course will be Spark Core, Spark SQL, and Spark MLlib. This course will also cover real-time data streaming and processing as well as data visualisation techniques.

### Prerequisites

* Knowledge of Unix/Linux command line and SSH.
* Python programming


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://kks32-courses.gitbook.io/data-analytics/master.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
