by Boyke Baboelal, Strategic Solutions Director Americas at Asset Control
“Financial services have harnessed the power of Machine Learning (ML) for years to drive new business, increase profitability and to reduce risk.
However, within data management, widespread adoption has yet to advance. One issue is that use cases and capabilities of ML related to data management are not always understood by operational teams. Another is that the obvious use cases require high levels of accuracy, while the accuracy of ML methods is currently seen as difficult to predict. Most importantly, there is a strong day-to-day focus on delivering cleansed data to downstream applications such as risk, trade support, and compliance engines, leaving little time to improve or embark on perceived, large projects.
There are many potential use cases of ML in data management, however, that can reduce operational cost through improved productivity, a better user experience through context-driven user interfaces, reduced risk, and most importantly improved services and data quality through more effective operations.
One area where ML can add value quickly is around measuring and controlling data risks and checking the effectiveness of controls. The time is right for this because the reference to “data quality” has increased dramatically in regulatory guidelines over recent years. For example, within the Targeted Review of Internal Models (TRIM), the ECB dedicated a section to the importance of a “Data quality management process” for its National Competent Authorities. TRIM states that “Institutions should establish and implement an effective data quality framework.” Within Solvency II, similar guidelines exist and require the identification and management of data risks, including management of corresponding controls.
Within control frameworks ML can help reduce the cost of checking large data volumes through performant big data analytics, increase the effectiveness of controls by utilising deep learning techniques, and improve compliance with policies using ML algorithms that process unstructured data and discover processes and anomalous user activity from work performed.
The benefit of starting with data risks and controls is that all these improvements can be made with little investment, and without impacting the Business-As-Usual activities.
One use case where ML adds significant value to key controls is exception handling. This is perhaps the most important control in data management. Its key function of timely and accurate data checks helps find anomalies which subsequently require validation by a data cleanser. Exception handling can only be effective if the right rules are applied to data objects. The consistent application of checks across the data universe, especially within a large universe, can be difficult to assess, and this is where ML (i.e. anomaly detection) can make a difference by identifying data objects that are not properly checked so that operational users can assign the appropriate rules, and improve the effectiveness of the exception handling control.
There are many ML algorithms, e.g. based on distance, density, clustering and classification methods, that can be used for anomaly detection and all have their pros and cons. One of the most effective is deep anomaly detection using autoencoders, also known as replicator neural networks (RNNs), to check for inconsistencies in control settings. RNNs encode the input data using multiple layers within the neural network into a summary representation. Subsequently, a decoder tries to recreate the original data using the summary representation. The idea is that if most of the data universe is normal, the neural network will bias the decoding towards normal values. The difference between the original data and the decoded value is the basis for detecting an anomaly. i.e. the more these values differ, the higher the anomaly score.
RNNs have many advantages over traditional ML methods. They scale well over large data sets, can use many features to detect anomalies, are effective in discovering non-linear features, and allow for calculation of anomaly scores. More importantly, RNNs do not need data to be labelled to learn between normal and anomalous values and are therefore easier to set up and maintain and more cost-effective. In case of market data or reference data management, the disadvantages of RNNs are limited given that the number of features is very limited compared to, for example, working with images, and market and reference data is comparably very structured.
With the right use cases, data management teams can – with little investment – quickly experience the benefits that ML brings. Cost can be kept low as analytical libraries are commonly available and ML expertise is more widespread in different industries, including financial services. By using new analytics, data management productivity will increase, controls will improve, risks will reduce, and data quality will increase, while taking an important step in preparing for stricter data quality regulations, i.e. in which data quality frameworks are required, where data risks need to be identified, monitored and controlled, and controls need to be regularly evaluated for effectiveness and improved upon.
Once data teams embrace ML within their daily processes, further improvements can be made in areas like exception handling and user interfaces, to better detect suspect data and advance user experience through context-driven UIs and dynamic workflows.
Given many organisations have started data quality initiatives and ML use has matured in other industries, this is a good time to start looking into artificial intelligence for data management. Data quality intelligence with ML is the next step towards data quality with operational efficiency.”