Tag: big-data

Filtered selection of tools tagged big-data.

Apache Flink

Apache Flink is an open-source platform for low-latency stream processing and stateful real-time data processing, with support for batch workloads, fault tolerance, and multiple APIs.

AI Open Source

Apache Spark

Apache Spark is a strong fit when distributed processing of large datasets and ML workloads needs to be used repeatedly by a team, not just tried once. It is especially relevant for data platforms with large volumes and clear pipelines, where the key question is whether the team, cluster operations, and data model actually suit Spark in practice.

AI Open Source

Hadoop MapReduce

Hadoop MapReduce is a data and automation tool for classic distributed batch processing for large datasets in the Hadoop ecosystem.

AI Open Source