IoT Real-time ML with CDAP
In this article, we demonstrate that MQTT and CDAP are complementary technologies that allow to built smart IoT end-to-end applications from the edge to visualization and business intelligence either on-premise or in the public cloud.
Why not Kafka?
Apache Kafka and all its components & connectors offer a really impressing eco system for real-time streaming. Why we finally decided to look for an alternative is due to the fact, that we wanted to reduce & simplify our end-to-end technology stack. With seamless integration of data ingestion, transformation up to machine learning and deep learning.
All facets of modern data processing covered by the same technology, same point-and-click user experience without writing a single line of code.
Our customers complain of a confusing & inflating landscape of complex data technologies and platforms. Adoption most often leads to a plethora of tools in business demanding for many different skills.
Leveraging Google CDAP is an important step towards standardization & unification of data processing due to its integrated plugin mechanism.
We have built an eXtension Pack with 200+ data connectors & operators and this pack turns CDAP into the ultimate Swiss knife for a broad spectrum of AI use cases.
This article describes a hand-picked automotive IoT use case sample and thereby introduces a small subset of plugins of our open source eXtension Pack.
Architecture
For this sample, we simulate vehicle streaming data at scale using a Car Data Simulator.
This simulator sends vehicle sensor readings in real-time to a HiveMQ broker. This is an open source MQTT broker supporting MQTT v3 and v5.

HiveMQ Plugin
For data ingestion, the HiveMQ streaming connector is used. This connector is built on top of Apache Spark Streaming and the associated event receiver leverages the HiveMQ client to subscribe to the car sensor topic.
The HiveMQ plugin marks the starting point for data preprocessing and (analytic) model building & inference, where all tasks are mapped to and executed by appropriate plugins of the CDAP eXtension Pack.
Model Building Plugins
For model building, car sensor readings are streamed to an internal CDAP data table. This approach removes the need for an additional (external) data lake for historical sensor data, and reduces development, testing and operating costs.
Analytics models are built with integrated deep learning plugins, based on Intel’s Analytics-Zoo. These analytic plugins deploy trained models to an integrated model management that works as a model hub for seamless model building and inference.
This approach ensures that built or trained machine learning models are directly available for CDAP streaming applications without re-development or extra costly effort for model deployment.
For real-time machine learning, two different analytic models are used:
- Anomaly detection with an Auto-encoder Neural Network as an example of unsupervised learning with Intel’s Analytics-Zoo.
- Prediction of sensor events with an LSTM Neural Network as an example of supervised learning with Intel’s Analytics-Zoo.
Model Inference Plugins
Model inference plugins reuse preprocessing plugins (for model building) and have access to trained and published models. No longer re-development efforts and costs.
Crate DB Plugin
In this example, inferred real-time signals are sent to Crate DB for analysis, monitoring and visualization purposes. Crate DB is a next generation SQL data base for machine data at IoT scale.
Its JDBC (PostgreSQL compliant) interface supports integration with e.g. Grafana for time series visualization, Metabase and even Tableau for business intelligence tasks.
Takeaways
This article intends to share a next-generation open source approach for (Industrial) IoT business cases, based on a single platform technology.
Instead of adopting a variety of complex and individual technology, companies are enabled to leverage a single ease-of-use technology and a variety of plugins, covering a broad range of data connectors and operators.
This approach is neither restricted to data-driven vehicle use cases nor to the IoT domain.
Our next article demonstrates a similar use case for Cyber Defense: Car sensor readings are replaced by readings from endpoints (via Facebooks osquery) and network traffic monitors (via Zeek). The common setup for model building & inference, however, remains the same.
Cyber Defense, E-Commerce, Internet-of-Things, Marketing, you name it, it is just the matter to choose the right CDAP plugins, and 200+ plugins define a solid and robust foundation for many many use cases
Originally published at https://www.linkedin.com.