Awesome Audit Algorithms

A curated list of algorithms for auditing black-box algorithms. Nowadays, many algorithms (recommendation, scoring, classification) are operated at third party providers, without users or institutions having any insights on how they operate on their data. Audit algorithms in this list thus apply to this setup, coined the "black-box" setup, where one auditor wants to get some insight on these remote algorithms. banner

A user queries a remote algorithm (eg, through available APIs), to infer information about that algorithm.

Contents

Papers

Related Events (conferences/workshops)

Papers2025

Auditing Pay-Per-Token in Large Language Models - (arXiv) Develops an auditing framework based on martingale theory that enables a trusted third-party auditor who sequentially queries a provider to detect token misreporting.

P2NIA: Privacy-Preserving Non-Iterative Auditing - (ECAI) Proposes a mutually beneficial collaboration for both the auditor and the platform: a privacy-preserving and non-iterative audit scheme that enhances fairness assessments using synthetic or local data, avoiding the challenges associated with traditional API-based audits.

The Fair Game: Auditing & debiasing AI algorithms overtime - (Cambridge Forum on AI: Law and Governance) Aims to simulate the evolution of ethical and legal frameworks in the society by creating an auditor which sends feedback to a debiasing algorithm deployed around an ML system.

Robust ML Auditing using Prior Knowledge - (ICML) Formally establishes the conditions under which an auditor can prevent audit manipulations using prior knowledge about the ground truth.

CALM: Curiosity-Driven Auditing for Large Language Models - (AAAI) Auditing as a black-box optimization problem where the goal is to automatically uncover input-output pairs of the target LLMs that exhibit illegal, immoral, or unsafe behaviors.

Queries, Representation & Detection: The Next 100 Model Fingerprinting Schemes - (AAAI) Divides model fingerprinting into three core components, to identify ∼100 previously unexplored combinations of these and gain insights into their performance.

2024

Hardware and software platform inference - (arXiv) A method for identifying the underlying GPU architecture and software stack of a black-box machine learning model solely based on its input-output behavior.

Auditing Local Explanations is Hard - (NeurIPS) Gives the (prohibitive) query complexity of auditing explanations.

LLMs hallucinate graphs too: a structural perspective - (complex networks) Queries LLMs for known graphs and studies topological hallucinations. Proposes a structural hallucination rank.

Fairness Auditing with Multi-Agent Collaboration - (ECAI) Considers multiple agents working together, each auditing the same platform for different tasks.

Mapping the Field of Algorithm Auditing: A Systematic Literature Review Identifying Research Trends, Linguistic and Geographical Disparities - (Arxiv) Systematic review of algorithm auditing studies and identification of trends in their methodological approaches.

FairProof: Confidential and Certifiable Fairness for Neural Networks - (Arxiv) Proposes an alternative paradigm to traditional auditing using crytographic tools like Zero-Knowledge Proofs; gives a system called FairProof for verifying fairness of small neural networks.

Under manipulations, are some AI models harder to audit? - (SATML) Relates the difficulty of black-box audits to the capacity of the targeted models, using the Rademacher complexity.

Improved Membership Inference Attacks Against Language Classification Models - (ICLR) Presents a framework for running membership inference attacks against classifier, in audit mode.

Auditing Fairness by Betting - (Neurips) [Code] Sequential methods that allows for the continuous monitoring of incoming data from a black-box classifier or regressor.

2023

Privacy Auditing with One (1) Training Run - (NeurIPS - best paper) A scheme for auditing differentially private machine learning systems with a single training run.

Awesome Audit Algorithms

banner

A user queries a remote algorithm (eg, through available APIs), to infer information about that algorithm.

Papers
Related Events (conferences/workshops)

Papers

2025

Auditing Pay-Per-Token in Large Language Models - (arXiv) Develops an auditing framework based on martingale theory that enables a trusted third-party auditor who sequentially queries a provider to detect token misreporting.
P2NIA: Privacy-Preserving Non-Iterative Auditing - (ECAI) Proposes a mutually beneficial collaboration for both the auditor and the platform: a privacy-preserving and non-iterative audit scheme that enhances fairness assessments using synthetic or local data, avoiding the challenges associated with traditional API-based audits.
The Fair Game: Auditing & debiasing AI algorithms overtime - (Cambridge Forum on AI: Law and Governance) Aims to simulate the evolution of ethical and legal frameworks in the society by creating an auditor which sends feedback to a debiasing algorithm deployed around an ML system.
Robust ML Auditing using Prior Knowledge - (ICML) Formally establishes the conditions under which an auditor can prevent audit manipulations using prior knowledge about the ground truth.
CALM: Curiosity-Driven Auditing for Large Language Models - (AAAI) Auditing as a black-box optimization problem where the goal is to automatically uncover input-output pairs of the target LLMs that exhibit illegal, immoral, or unsafe behaviors.
Queries, Representation & Detection: The Next 100 Model Fingerprinting Schemes - (AAAI) Divides model fingerprinting into three core components, to identify ∼100 previously unexplored combinations of these and gain insights into their performance.

2024

Hardware and software platform inference - (arXiv) A method for identifying the underlying GPU architecture and software stack of a black-box machine learning model solely based on its input-output behavior.
Auditing Local Explanations is Hard - (NeurIPS) Gives the (prohibitive) query complexity of auditing explanations.
LLMs hallucinate graphs too: a structural perspective - (complex networks) Queries LLMs for known graphs and studies topological hallucinations. Proposes a structural hallucination rank.
Fairness Auditing with Multi-Agent Collaboration - (ECAI) Considers multiple agents working together, each auditing the same platform for different tasks.
Mapping the Field of Algorithm Auditing: A Systematic Literature Review Identifying Research Trends, Linguistic and Geographical Disparities - (Arxiv) Systematic review of algorithm auditing studies and identification of trends in their methodological approaches.
FairProof: Confidential and Certifiable Fairness for Neural Networks - (Arxiv) Proposes an alternative paradigm to traditional auditing using crytographic tools like Zero-Knowledge Proofs; gives a system called FairProof for verifying fairness of small neural networks.
Under manipulations, are some AI models harder to audit? - (SATML) Relates the difficulty of black-box audits to the capacity of the targeted models, using the Rademacher complexity.
Improved Membership Inference Attacks Against Language Classification Models - (ICLR) Presents a framework for running membership inference attacks against classifier, in audit mode.
Auditing Fairness by Betting - (Neurips) [Code] Sequential methods that allows for the continuous monitoring of incoming data from a black-box classifier or regressor.

2023

Privacy Auditing with One (1) Training Run - (NeurIPS - best paper) A scheme for auditing differentially private machine learning systems with a single training run.