Natural Language Generation
Natural Language Generation is a broad domain with applications in chat-bots, story generation, and data descriptions. There is a wide spectrum of different technologies addressing parts or the whole of the NLG process. This list aims to represent this deversity of NLG applications and techniques by providing links to various projects, tools, research papers, and learning materials.
Contents
- Datasets
- Dialog
- Evaluation
- Grammar
- Libraries
- Narrative Generation
- Neural Natural Language Generation
- Papers and Articles
- Products
- Realizers
- Templating Languages
- Videos
Datasets
- Alex Context NLG Dataset - A dataset for NLG in dialogue systems in the public transport information domain.
- Box-score data - This dataset consists of (human-written) NBA basketball game summaries aligned with their corresponding box- and line-scores.
- E2E - This shared task focuses on recent end-to-end (E2E), data-driven NLG methods, which jointly learn sentence planning and surface realisation from non-aligned data.
- Neural-Wikipedian - The repository contains the code along with the required corpora that were used in order to build a system that "learns" how to generate English biographies for Semantic Web triples.
- WeatherGov - Computer-generated weather forecasts from weather.gov (US public forecast), along with corresponding weather data.
- WebNLG - The enriched version of the WebNLG - a resource for evaluating common NLG tasks, including Discourse Ordering, Lexicalization and Referring Expression Generation.
- WikiBio - wikipedia biography dataset - This dataset gathers 728,321 biographies from wikipedia. It aims at evaluating text generation algorithms.
- The Schema-Guided Dialogue Dataset - The Schema-Guided Dialogue (SGD) dataset consists of over 20k annotated multi-domain, task-oriented conversations between a human and a virtual assistant.
- The Wikipedia company corpus - Company descriptions collected from Wikipedia. The dataset contains semantic representations, short, and long descriptions for 51K companies in English.
- YelpNLG - YelpNLG provides resources for natural language generation of restaurant reviews.
Dialog
- Chatito - Generate datasets for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!
- NNDIAL - NNDial is an open source toolkit for building end-to-end trainable task-oriented dialogue models.
- Plato - This is the Plato Research Dialogue System, a flexible platform for developing conversational AI agents.
- RNNLG - RNNLG is an open source benchmark toolkit for Natural Language Generation (NLG) in spoken dialogue system application domains.
- TGen - Statistical NLG for spoken dialogue systems.
Evaluation
- BLEURT: a Transfer Learning-Based Metric for Natural Language Generation
- compare-mt - A tool for holistic analysis of language generations systems.
- GEM - a benchmark environment for NLG with a focus on its Evaluation, both through human annotations and automated Metrics.
- NLG-eval - Evaluation code for various unsupervised automated metrics for Natural Language Generation.
- VizSeq - A Visual Analysis Toolkit for Text Generation Tasks.
Grammar
- OpenCCG - OpenCCG library for parsing and realization with CCG.
- GrammaticalFramework - A programming language for multilingual grammar applications.
- EasyCCG - CCG: All combinators, common grammar format, parsing to logical form, parameter estimation for probabilistic CCG.
- CCG Lab - All combinators, common grammar format, parsing to logical form, parameter estimation for probabilistic CCG.
- CCGweb - A Web platform for parsing and annotation.