The justice system, banks, and private companies use algorithms to make decisions that have profound impacts on people’s lives. Unfortunately, those algorithms are sometimes biased — disproportionately impacting people of colour as well as individuals in lower-income classes when they apply for loans or jobs, or even when courts decide what bail should be set while a person awaits trial.
U.S. researchers have developed a new Artificial Intelligence(AI) programming language that can assess the fairness of algorithms more exactly, and more quickly, than available alternatives. Their Sum-Product Probabilistic Language (SPPL) is a probabilistic programming system.
Probabilistic programming is an emerging field at the intersection of programming languages and artificial intelligence that aims to make AI systems much easier to develop, with early successes in computer vision, common-sense data cleaning, and automated data modelling. Probabilistic programming languages make it much easier for programmers to define probabilistic models and carry out probabilistic inference — that is, work backwards to infer probable explanations for observed data.
One of the researchers said that there are previous systems that can solve various fairness questions. This system is not the first; but because this system is specialised and optimised for a certain class of models, it can deliver solutions thousands of times faster. The system can be up to 3,000 times faster than previous approaches.
SPPL gives fast, exact solutions to probabilistic inference questions. These inference results are based on SPPL programmes that encode probabilistic models of what kinds of applicants are likely, a priori, and also how to classify them. Fairness questions that SPPL can answer include “Is there a difference between the probability of recommending a loan to an immigrant and nonimmigrant applicant with the same socioeconomic status?” or “What’s the probability of a hire, given that the candidate is qualified for the job and from an underrepresented group?”.
SPPL is different from most probabilistic programming languages, as SPPL only allows users to write probabilistic programmes for which it can automatically deliver exact probabilistic inference results. SPPL also makes it possible for users to check how fast inference will be, and therefore avoid writing slow programmes.
In contrast, other probabilistic programming languages allow users to write down probabilistic programmes where the only known ways to do inference are approximate — that is, the results include errors whose nature and magnitude can be hard to characterise.
Error from approximate probabilistic inference is tolerable in many AI applications. But it is undesirable to have inference errors corrupting results in socially impactful applications of AI, such as automated decision-making, and especially in fairness analysis.
SPPL avoids errors by restricting to a carefully designed class of models that still includes a broad class of AI algorithms, including the decision tree classifiers that are widely used for algorithmic decision-making. SPPL works by compiling probabilistic programmes into a specialised data structure called a “sum-product expression.” SPPL further builds on the emerging theme of using probabilistic circuits as a representation that enables efficient probabilistic inference.
This approach extends prior work on sum-product networks to models and queries expressed via a probabilistic programming language. However, this approach comes with limitations. SPPL is substantially faster for analysing the fairness of a decision tree, for example, but it cannot analyse models like neural networks. Other systems can analyse both neural networks and decision trees, but they tend to be slower and give inexact answers.
SPPL shows that exact probabilistic inference is practical, not just theoretically possible, for a broad class of probabilistic programmes. The researchers have been applying SPPL to probabilistic programmes learned from real-world databases, to quantify the probability of rare events, generate synthetic proxy data given constraints, and automatically screen data for probable anomalies.