by Clare Joy, Strategy & Expansion Lead at Onfido
When developing new technologies, we must ensure that they operate fairly. At a time when identity is increasingly being used as the key to digital access, any technology based on identity must function fairly and equally for everyone, regardless of race, age, gender, or other characteristics leading to human physical diversity. While digital services have proliferated across many industries, this issue is particularly relevant in the financial sector, as Covid-19 accelerates a shift towards automated platforms delivered remotely by banks and other providers – with biases in AI having stark implications for unfairly rewarding certain groups over others.
How does AI bias creep into machine learning models?
Algorithmic decision making relies on machine learning techniques that recognise patterns from historical data. While often successful, it can pose a significant threat when these patterns are based on biases found in the data – these can emerge in two scenarios. First, a standard machine learning model can incorporate the biases found in the data during training. This can lead to subsequent predictions being made based on these biases.
The other is that although the data is not necessarily biased, there could simply be less data available from a minority group for training. When there is less data to work with, especially with modern machine learning techniques, this is more likely to lead to modelling inaccuracies.
When this happens, it can have a real world impact. For instance, a criminal justice system in Florida has been found guilty of mislabelling African-American defendants as ‘high risk’ at a much higher rate than white defendants. We also saw Amazon discontinue use of an AI-powered recruitment platform which was shown to prioritise male applicants based on the language they used in their CVs.
In the financial industry, many processes that underpin much of society – from credit assignments to mortgage approvals – are simply not as fair as they should be because decisions are based on historical biases. Every individual or group should have the same set of opportunities, regardless of gender, age and ethnicity. For those of us that work with machine learning models, it is imperative that we try to minimise cases of unjust bias and understand how bias arises in our models.
Measuring and mitigating bias
Several tech giants are already attempting to do this by releasing supporting software for various parts of the machine learning lifecycle to mitigate biases. For instance, Google released multiple fairness diagnosis tools and a library enabling the training of fair models. Microsoft and IBM have also released tools for assessing and improving algorithmic fairness. However, it is incumbent on all businesses to optimise their own AI processes to eliminate bias.
This is something that we at Onfido focused on in the Information Commissioner’s Office (ICO) Privacy Sandbox, where we systematically measured and mitigated algorithmic bias in our artificial intelligence technology, with a particular focus on racial and other data related bias effects in biometric facial recognition technology. This closed the difference in performance between ethnicity groups for our facial recognition algorithm, which included achieving a 60x false acceptance improvement for users in the “Africa” category.
Part of the solution means companies using AI must review their machine learning models to ensure that they are not using biased data. Regularisation during training is one way of adding fairness, although this assumes that the model and relevant data are both available for a particular vendor or practitioner. By mathematically denoting a notion of fairness, it is possible to optimise for the chosen fairness constraints by adding them to the objective function.
Alternatively, pre-processing the training data sets means that features and sensitive information are decorrelated before training, while having a minimum impact on the data or decision rules. This is particularly applicable when there is no access to both the data and the training pipeline. Another strategy to obtain fairness is post-processing which is done by adjusting the classifier after training, when the pipeline is either unavailable or re-training is costly. By recalibrating the classifier after training, the threshold is set so that it maximises a certain fairness criterion.
Championing fairness
Ultimately, by formalising a mathematical notion of algorithmic fairness, we give ourselves a way to remove biases at the data stage, during the model training stage and through post-processing adjustments. Integrating and monitoring fairness constraints in this way can ensure that algorithms provide the same level of opportunity for every group throughout society.
Deploying machine learning models is a responsibility as well as a tool for all businesses that work with AI. While removing bias is essential for improving customer onboarding and user journeys, we have an ethical imperative to minimise cases of AI bias and understand how it arises in our models. In particular, global financial services with such influence over wealth distribution must ensure they do not exhibit biases that could hinder the opportunity of certain groups, which is critical as we enter a privacy-preserving world.