Tag: data-engineering

Filtered selection of tools tagged data-engineering.

Apache Spark

Apache Spark is a strong fit when distributed processing of large datasets and ML workloads needs to be used repeatedly by a team, not just tried once. It is especially relevant for data platforms with large volumes and clear pipelines, where the key question is whether the team, cluster operations, and data model actually suit Spark in practice.

AI Open Source