What is BioIE? It includes any effort to extract structured information from unstructured (or, at least inconsistently structured) biological, clinical, or other biomedical data. The data source is often some collection of text documents written in technical language. If the resulting information is verifiable and consistent across sources, we may then consider it knowledge. Extracting information and producing knowledge from bio data requires adaptations upon methods developed for other types of unstructured data.
BioIE has undergone massive changes since the introduction of language models like BERT and the more recently created Large Language Models (LLMs; e.g., GPT-3/4, LLAMA2/3, Gemini, etc).
Resources included here are preferentially those available at no monetary cost and limited license requirements. Methods and datasets should be publicly accessible and actively maintained.
See also awesome-nlp, awesome-biology and Awesome-Bioinformatics.
Please read the contribution guidelines before contributing. Please add your favourite resource by raising a pull request.
Contents
- Research Overviews
- Groups Active in the Field
- Organizations
- Journals and Events
- Journals
- Conferences and Other Events
- Challenges
- Tutorials
- Guides
- Video Lectures and Online Courses
- Code Libraries
- Repos for Specific Datasets
- Tools, Platforms, and Services
- Annotation Tools
- Techniques and Models
- Datasets
- Biomedical Text Sources
- Annotated Text Data
- Protein-protein Interaction Annotated Corpora
- Other Datasets
- Ontologies and Controlled Vocabularies
- Data Models
- Credits
Research Overviews
LLMs in Biomedical IE
- Large language models in healthcare: A comprehensive benchmark - a statistical and human evaluation of sixteen different LLMs applied to medical language tasks.
- Assessing the research landscape and clinical utility of large language models: a scoping review - a high-level review of LLM applications in medicine as of March 2024.
- Ethical and regulatory challenges of large language models in medicine - a review of ethical issues arising from applications of LLMs in biomedicine.
- On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 - a frequently referenced but still relevant work concerning the roles, applications, and risks of language models.
Pre-LLM Overviews
- Biomedical Informatics on the Cloud: A Treasure Hunt for Advancing Cardiovascular Medicine - An overview of how BioIE and bioinformatics workflows can be applied to questions in cardiovascular health and medicine research.
- Clinical information extraction applications: A literature review - A review of clinical IE papers published as of September 2016. From Mayo Clinic group (see below).
- Literature Based Discovery: Models, methods, and trends - A review of Literature Based Discovery (LBD), or the philosophy that meaningful connections may be found between seemingly unrelated scientific literature.
- For some historical context on LBD, see papers by University of Chicago's Don Swanson and Neil Smalheiser, including Undiscovered Public Knowledge (paywalled) and Rediscovering Don Swanson: the Past, Present and Future of Literature-Based Discovery.
- Mining Electronic Health Records (EHRs): A Survey - A review of the methods and philosophy behind mining electronic health records, including using them for adverse event detection. See Table 2 for a list of relevant papers as of mid-2017.