Skip to content

Next-level NLP and potential ESG controversies

Jo Stichbury
Jo Stichbury
Freelance Technical Writer

Refinitiv Labs focuses on harnessing the power of Big Data and Machine Learning (ML) to drive innovation and shape the future of financial services. In this article, we showcase a tool that pushes the boundaries of data science, driving positive change in the industry by focusing on predicting potential environmental, social and governance (ESG) controversies.


  1. When Refinitiv analysts review an article, they manually look for controversies in 20 ESG topics defined in-house, many of which align with the UN Sustainable Development Goals.
  2. Tim Nugent’s team within Refinitiv Labs have used Google’s open-source NLP model, BERT, which has demonstrated state-of-the-art performance in a range of classification tasks.
  3. High quality data is crucial for supervised machine learning tasks. Refinitiv’s ESG controversy model has been trained using 30,000+ positive articles and alongside a set of negative examples with further work ongoing.

ESG metrics measure the sustainability, environmental and and societal impact of a company or business, and in finance, ESG is a hot topic. Before committing to a company, investors want to know if there are any potential controversies brewing, or if the company shows particular leadership in an area of ESG, such as diversity in the workforce. The use of machine learning and data science technologies in the financial services industry is surging too. Refinitiv Labs are working at the intersection of these areas, to build powerful models that give their clients a competitive edge.

 

 

Why is ESG important to financial institutions?

Refinitiv has seen a growth in demand for ESG data as part of investment analysis. Fund managers look to the ESG factors reported by companies, but they also want to understand if information not reported  by companies may indicate ESG controversies. Does company A have a spotless record on ESG factors that could positively influence their investments? Are there any potential ESG controversies brewing that could have a negative influence?

To uncover this kind of information, Refinitiv analysts search for news stories about a specific company using a set of ESG-related keywords, and if there’s a positive match, the story is subject to further scrutiny. For example, an analyst would identify a potential governance controversy in the following snippet:

CHICAGO (Reuters) – The agricultural unit of German chemicals company Bayer AG will halt future U.S. sales of an insecticide that can be used on more than 200 crops after losing a fight with the U.S. Environment Protection Agency, the company said on Friday.

The analyst would read the story and determine whether there is apparent or potential ESG controversy.

Automating the hunt for ESG controversy data

This can all take a long time. As Tim Nugent, Senior Research Scientist at Refinitiv Labs, explains “the problem we need to solve is that it’s time-consuming to search and read news articles”.

 

Demand by Fund Managers for ESG data continues to grow, so to meet this request for expanded coverage, Refinitiv Labs have come up with a new way to optimize the process and build a more efficient workflow. Using machine learning and natural language processing (NLP),[2] they have trained a model to review a news stream and triage news stories for potential ESG controversies, in order to speed up the process.

When the Refinitiv analysts review an article manually they look for controversies in 20 ESG topics defined in-house, many of which align with the UN sustainable development goals. [3]

For a specific company, by examining each of the ESG topics, the analysts decide whether the article suggests controversy or not for that topic. In essence, they perform document classification – something which can be re-framed as a supervised machine learning task. An algorithm can be trained to make the same decision and output a probability score for each of the ESG controversy topics. Where the probability sits above a confidence threshold it proceeds directly through the ESG pipeline, while low confidence predictions are sent to human analysts for further review.

This is another illustration of the hybrid approach described by Refinitiv CRO, Debra Walton.<4> As she says, “we are witnessing the evolution of smarter humans accompanied by smarter machines…why not see AI’s ultimate goal as assisting people in doing their jobs better and more effectively than before?”.

BERT-RNA: a domain-specific model

Tim Nugent’s team within Refinitiv Labs have used Google’s open-source NLP model, BERT, <5> which has demonstrated state-of-the-art performance in a range of classification tasks. BERT is pre-trained on 3.3 billion words from a general domain corpus, such as Wikipedia and the open BookCorpus dataset,<6> so has a good, native understanding of the English language.

The Refinitiv team further trained BERT using a business and finance-specific corpus. They used Reuters News Archive, a further 715 million words from about 2 million articles. The extra training gives the model a better understanding of the domain-specific terminology of business and financial news and improves its prediction confidence downstream. Once this step was complete, they “fine-tuned” the domain-specific model to deal with the ESG controversy classification task.

“The field is highly competitive and giving customers an edge can be profoundly impactful,” says Tim Nugent. BERT is a state-of-the-art model for language processing, but pre-training the model with additional data from Reuters News, has made it smarter still. BERT-RNA, as Nugent styles the adapted model, shows improvements in confidence from generic BERT (82% vs 78%) because of its adaptation for the nuances of financially focussed language. While 4% may not appear on the surface to be significant, it has the potential to translate to a huge competitive advantage. 

High quality data is crucial for supervised machine learning tasks. The ESG controversy model, trained using approximately 30,000 “positive” articles that Refinitiv analysts had already annotated, was crucial, and used alongside a corresponding set of negative examples. Further work will focus on training the model with additional sources of ESG data that are typically less structured than the traditional market and index data, such as a company’s self-reported data.

Conclusion

The Refinitiv Labs team have used machine learning and NLP to positive effect, allowing the company’s ESG analysts to be more productive and efficient. The BERT-RNA model allows human expertise and domain-specific knowledge to work alongside each other. The analyst team now get to do what they know best — they can offer the client-base insightful information about ESG controversies surrounding their companies of interest.

Refinitiv™ Labs collaborate with customers around the world to solve big problems with trusted Refinitiv data.

See what we’re working on: https://www.refinitiv.com/en/labs

References

  1. Environmental, social and corporate governance (Investopedia)
  2. An introduction to natural language processing
  3. Policy: Five priorities for the UN Sustainable Development Goals
  4. How smarter machines make us smarter humans
  5. Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing
  6. The BookCorpus dataset https://arxiv.org/pdf/1506.06724.pdf