Fast data processing pipeline for predicting flight delays using Apache APIs: Kafka, Spark ML, Drill, with MapR-DB JSON
I.A & Deep/Machine Learning Deep-dive conference Beginner EN

The possibility to blend machine learning with real-time transactional data flowing through a single platform is opening a world of new possibilities, such as enabling organizations to take advantage of opportunities as they arise. Leveraging these opportunities requires fast, scalable data processing pipelines which process, analyze, and store events as they arrive.

In this deep dive we will look at the architecture of a data pipeline that combines streaming data with machine learning to predict flight delays. You will see the end-to-end process required to build this application using Apache APIs for Kafka, Spark, Drill and other technologies:

Part 1 using Apache Spark Machine Learning to build a model to predict flight delays.

Part 2 Kafka and Spark Streaming: Using the ML model with streaming data to do real-time analysis of flight delays.

Part 3 Spark Streaming and fast storage with MapR-DB JSON

Part 4 Analysis of Flight delay data and predictions stored in MapR-DB with Apache Spark, Apache Drill and OJAI.

The format will consist of lecture and demo. Code and a developer container will be provided for download so that developers can try out the code on their own after the lecture or at home.

Deep-Dive Conference 2 [Amphi 139]
16 May 2018
14:00 - 17:00