Awesome Hadoop 
A curated list of amazingly awesome Hadoop and Hadoop ecosystem resources. Inspired by Awesome PHP, Awesome Python and Awesome Sysadmin
- Awesome Hadoop
- Hadoop
- YARN
- NoSQL
- SQL on Hadoop
- Data Management
- Workflow, Lifecycle and Governance
- Data Ingestion and Integration
- DSL
- Libraries and Tools
- Realtime Data Processing
- Distributed Computing and Programming
- Packaging, Provisioning and Monitoring
- Monitoring
- Search
- Security
- Benchmark
- Machine learning and Big Data analytics
- Misc.
- Resources
- Other Awesome Lists
Hadoop
- Apache Hadoop - Apache Hadoop
- Apache Hadoop Ozone - An Object Store for Apache Hadoop
- Apache Tez - A Framework for YARN-based, Data Processing Applications In Hadoop
- SpatialHadoop - SpatialHadoop is a MapReduce extension to Apache Hadoop designed specially to work with spatial data.
- GIS Tools for Hadoop - Big Data Spatial Analytics for the Hadoop Framework
- Elasticsearch Hadoop - Elasticsearch real-time search and analytics natively integrated with Hadoop. Supports Map/Reduce, Cascading, Apache Hive and Apache Pig.
- hadoopy - Python MapReduce library written in Cython.
- mrjob - mrjob is a Python 2.5+ package that helps you write and run Hadoop Streaming jobs.
- pydoop - Pydoop is a package that provides a Python API for Hadoop.
- hdfs-du - HDFS-DU is an interactive visualization of the Hadoop distributed file system.
- White Elephant - Hadoop log aggregator and dashboard
- Genie - Genie provides REST-ful APIs to run Hadoop, Hive and Pig jobs, and to manage multiple Hadoop resources and perform job submissions across them.
- Apache Kylin - Apache Kylin is an open source Distributed Analytics Engine from eBay Inc. that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets
- Crunch - Go-based toolkit for ETL and feature extraction on Hadoop
- Apache Ignite - Distributed in-memory platform
YARN
- Apache Slider - Apache Slider is a project in incubation at the Apache Software Foundation with the goal of making it possible and easy to deploy existing applications onto a YARN cluster.
- Apache Twill - Apache Twill is an abstraction over Apache Hadoop® YARN that reduces the complexity of developing distributed applications, allowing developers to focus more on their application logic.
- mpich2-yarn - Running MPICH2 on Yarn
NoSQL
Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable.
- Apache HBase - Apache HBase
- Apache Phoenix - A SQL skin over HBase supporting secondary indices
- happybase - A developer-friendly Python library to interact with Apache HBase.
- Hannibal - Hannibal is tool to help monitor and maintain HBase-Clusters that are configured for manual splitting.
- Haeinsa - Haeinsa is linearly scalable multi-row, multi-table transaction library for HBase
- hindex - Secondary Index for HBase
- Apache Accumulo - The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system.
- OpenTSDB - The Scalable Time Series Database
- Apache Cassandra