Awesome Data Engineering 
A curated list of awesome things related to Data Engineering.
Contents
- Databases
- Data Comparison
- Data Ingestion
- File System
- Serialization format
- Stream Processing
- Batch Processing
- Charts and Dashboards
- Workflow
- Data Lake Management
- ELK Elastic Logstash Kibana
- Docker
- Datasets
- Realtime
- Data Dumps
- Monitoring
- Prometheus
- Profiling
- Data Profiler
- Testing
- Community
- Forums
- Conferences
- Podcasts
- Books
Databases
- Relational
- RQLite - Replicated SQLite using the Raft consensus protocol.
- MySQL - The world's most popular open source database.
- TiDB - A distributed NewSQL database compatible with MySQL protocol.
- Percona XtraBackup - A free, open source, complete online backup solution for all versions of Percona Server, MySQL® and MariaDB®.
- mysql_utils - Pinterest MySQL Management Tools.
- MariaDB - An enhanced, drop-in replacement for MySQL.
- PostgreSQL - The world's most advanced open source database.
- Amazon RDS - Makes it easy to set up, operate, and scale a relational database in the cloud.
- Crate.IO - Scalable SQL database with the NOSQL goodies.
- Key-Value
- Redis - An open source, BSD licensed, advanced key-value cache and store.
- Riak - A distributed database designed to deliver maximum data availability by distributing data across multiple servers.
- AWS DynamoDB - A fast and flexible NoSQL database service for all applications that need consistent, single-digit millisecond latency at any scale.
- HyperDex - A scalable, searchable key-value store. Deprecated.
- SSDB - A high performance NoSQL database supporting many data structures, an alternative to Redis.
- Kyoto Tycoon - A lightweight network server on top of the Kyoto Cabinet key-value database, built for high-performance and concurrency.
- IonDB - A key-value store for microcontroller and IoT applications.
- Column
- Cassandra - The right choice when you need scalability and high availability without compromising performance.
- Cassandra Calculator - This simple form allows you to try out different values for your Apache Cassandra cluster and see what the impact is for your application.
- CCM - A script to easily create and destroy an Apache Cassandra cluster on localhost.
- ScyllaDB - NoSQL data store using the seastar framework, compatible with Apache Cassandra.
- HBase - The Hadoop database, a distributed, scalable, big data store.
- AWS Redshift - A fast, fully managed, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all your data using your existing business intelligence tools.
- FiloDB - Distributed. Columnar. Versioned. Streaming. SQL.
- Vertica - Distributed, MPP columnar database with extensive analytics SQL.
- ClickHouse - Distributed columnar DBMS for OLAP. SQL.
- Document
- MongoDB - An open-source, document database designed for ease of development and scaling.
- Percona Server for MongoDB - Percona Server for MongoDB® is a free, enhanced, fully compatible, open source, drop-in replacement for the MongoDB® Community Edition that includes enterprise-grade features and functionality.
- MemDB - Distributed Transactional In-Memory Database (based on MongoDB).
- Elasticsearch - Search & Analyze Data in Real Time.
- Couchbase - The highest performing NoSQL distributed database.
- RethinkDB - The open-source database for the realtime web.