Awesome Streaming

A curated list of awesome streaming (stream processing) frameworks, applications, readings and other resources. Inspired by other awesome projects.
Website
https://manuzhang.github.io/awesome-streaming/ is a more dynamic website where you can find updates of the awesome projects here.
Table of Contents
- Streaming Engine
- Streaming Library
- Streaming Application
- IoT
- DSL
- Data Pipeline
- Online Machine Learning
- Streaming SQL
- Toolkit
- Benchmark
- Closed Source
- Readings
Streaming Engine
- Apache Apex [Java] - unified platform for big data stream and batch processing.
- Apache Ballista [Rust] - distributed compute platform powered by Apache Arrow.
- Apache Flink [Java] - system for high-throughput, low-latency data stream processing that supports stateful computation, data-driven windowing semantics and iterative stream processing.
- Apache Heron (incubating) [Java] - a realtime, distributed, fault-tolerant stream processing engine from Twitter.
- Apache Samza [Scala/Java] - distributed stream processing framework that build on Kafka(messaging, storage) and YARN(fault tolerance, processor isolation, security and resource management).
- Apache Spark Streaming [Scala] - makes it easy to build scalable fault-tolerant streaming applications.
- Apache Storm [Clojure/Java] - distributed real-time computation system. Storm is to stream processing what Hadoop is to batch processing.
- ArkFlow [Rust] - High-performance Rust stream processing engine, providing powerful data stream processing capabilities, supporting multiple input/output sources and processors.
- Arroyo [Rust] - a distributed stream processing engine. Supports SQL and Rust pipelines. Scales up to millions of events per second. Supports stateful operations like windows and joins, state checkpointing for fault-tolerance and recovery of pipelines. Uses the Timely Dataflow model.
- AthenaX [Java] - Uber's Stream Analytics Framework used in production
- Bytewax [Python] - data parallel, distributed, stateful stream processing framework.
- CocoIndex [Rust/Python] - ETL framework to build fresh index for AI, with realtime incremental updates.
- Faust [Python] - stream processing library, porting the ideas from Kafka Streams to Python
- Gearpump [Scala] - lightweight real-time distributed streaming engine built on Akka.
- Hazelcast Jet [Java] - A general purpose distributed data processing engine, built on top of Hazelcast.
- hailstorm [Haskell] - distributed stream processing with exactly-once semantics based on Storm.
- Maki Nage [Python] - A stream processing framework for data scientists, based on Kafka and ReactiveX.
- mantis [Java] - Netflix's platform to build an ecosystem of realtime stream processing applications
- mupd8(muppet) [Scala/Java] - mapReduce-style framework for processing fast/streaming data.
- NebulaStream [C++] - High-performance, general-purpose, end-to-end data-management system for cloud-edge-sensor environments.
- Numaflow [Java/Python/Go/Rust] - Kubernetes native stream processing platform with language agnostic framework. Scalable and cost-efficient
- Onyx [Clojure] - Distributed, masterless, high performance, fault tolerant data processing.
- Pathway [Python] - The fastest data processing engine supporting unified workflows for batch, streaming data, and LLM applications.
- s4 [Java] - general-purpose, distributed, scalable, fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data.