Refinitiv Labs focuses on harnessing the power of Big Data and Machine Learning (ML) to drive innovation and shape the future of financial services. In this article, we showcase a tool that pushes the boundaries of data science, driving positive change in the industry by focusing on predicting potential environmental, social and governance (ESG) controversies.
- When Refinitiv analysts review an article, they manually look for controversies in 20 ESG topics defined in-house, many of which align with the UN Sustainable Development Goals.
- Tim Nugent’s team within Refinitiv Labs have used Google’s open-source NLP model, BERT, which has demonstrated state-of-the-art performance in a range of classification tasks.
- High quality data is crucial for supervised machine learning tasks. Refinitiv’s ESG controversy model has been trained using 30,000+ positive articles and alongside a set of negative examples with further work ongoing.
Why is ESG important to financial institutions?
Refinitiv has seen a growth in demand for ESG data as part of investment analysis. Fund managers look to the ESG factors reported by companies, but they also want to understand if information not reported by companies may indicate ESG controversies. Does company A have a spotless record on ESG factors that could positively influence their investments? Are there any potential ESG controversies brewing that could have a negative influence?
To uncover this kind of information, Refinitiv analysts search for news stories about a specific company using a set of ESG-related keywords, and if there’s a positive match, the story is subject to further scrutiny. For example, an analyst would identify a potential governance controversy in the following snippet:
CHICAGO (Reuters) – The agricultural unit of German chemicals company Bayer AG will halt future U.S. sales of an insecticide that can be used on more than 200 crops after losing a fight with the U.S. Environment Protection Agency, the company said on Friday.
The analyst would read the story and determine whether there is apparent or potential ESG controversy.
Automating the hunt for ESG controversy data
This can all take a long time. As Tim Nugent, Senior Research Scientist at Refinitiv Labs, explains “the problem we need to solve is that it’s time-consuming to search and read news articles”.
When the Refinitiv analysts review an article manually they look for controversies in 20 ESG topics defined in-house, many of which align with the UN sustainable development goals. 
For a specific company, by examining each of the ESG topics, the analysts decide whether the article suggests controversy or not for that topic. In essence, they perform document classification – something which can be re-framed as a supervised machine learning task. An algorithm can be trained to make the same decision and output a probability score for each of the ESG controversy topics. Where the probability sits above a confidence threshold it proceeds directly through the ESG pipeline, while low confidence predictions are sent to human analysts for further review.
This is another illustration of the hybrid approach described by Refinitiv CRO, Debra Walton.<4> As she says, “we are witnessing the evolution of smarter humans accompanied by smarter machines…why not see AI’s ultimate goal as assisting people in doing their jobs better and more effectively than before?”.
BERT-RNA: a domain-specific model
Tim Nugent’s team within Refinitiv Labs have used Google’s open-source NLP model, BERT, <5> which has demonstrated state-of-the-art performance in a range of classification tasks. BERT is pre-trained on 3.3 billion words from a general domain corpus, such as Wikipedia and the open BookCorpus dataset,<6> so has a good, native understanding of the English language.
The Refinitiv team further trained BERT using a business and finance-specific corpus. They used Reuters News Archive, a further 715 million words from about 2 million articles. The extra training gives the model a better understanding of the domain-specific terminology of business and financial news and improves its prediction confidence downstream. Once this step was complete, they “fine-tuned” the domain-specific model to deal with the ESG controversy classification task.
“The field is highly competitive and giving customers an edge can be profoundly impactful,” says Tim Nugent. BERT is a state-of-the-art model for language processing, but pre-training the model with additional data from Reuters News, has made it smarter still. BERT-RNA, as Nugent styles the adapted model, shows improvements in confidence from generic BERT (82% vs 78%) because of its adaptation for the nuances of financially focussed language. While 4% may not appear on the surface to be significant, it has the potential to translate to a huge competitive advantage.
High quality data is crucial for supervised machine learning tasks. The ESG controversy model, trained using approximately 30,000 “positive” articles that Refinitiv analysts had already annotated, was crucial, and used alongside a corresponding set of negative examples. Further work will focus on training the model with additional sources of ESG data that are typically less structured than the traditional market and index data, such as a company’s self-reported data.
The Refinitiv Labs team have used machine learning and NLP to positive effect, allowing the company’s ESG analysts to be more productive and efficient. The BERT-RNA model allows human expertise and domain-specific knowledge to work alongside each other. The analyst team now get to do what they know best — they can offer the client-base insightful information about ESG controversies surrounding their companies of interest.
Refinitiv™ Labs collaborate with customers around the world to solve big problems with trusted Refinitiv data.
- Environmental, social and corporate governance (Investopedia)
- An introduction to natural language processing
- Policy: Five priorities for the UN Sustainable Development Goals
- How smarter machines make us smarter humans
- Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing
- The BookCorpus dataset https://arxiv.org/pdf/1506.06724.pdf