Skip to content

Which alternative dataset is right for you?

Saeed Amen
Saeed Amen
Founder of Cuemacro

Choosing the right alternative dataset will depend on many factors, including investment style and length of data history. Cuemacro founder Saeed Amen — co-author of The Book of Alternative Data — highlights what to look for in the search for alt data.


  1. Knowing which alternative dataset is right for you varies according to investment size and styles — for example, quant versus discretionary trading.
  2. Factors like length of data history are important, and alternative data should be used to augment existing data sources.
  3. Refinitiv is in partnership with BattleFin, an alternative data discovery and technology company. The Ensemble platform streamlines the alternative data discovery process through access to many different alternative data providers.

The investment industry has many buzzwords at present, the chief among them being artificial intelligence and machine learning. Alternative data, however, isn’t far behind. After all, without data you can’t do much machine learning.

The promise of alternative data for investors is that it can give you an edge. You might uncover interesting observations about markets, which others may not. Alternative datasets can also give you the chance to generate forecasts quicker than the rest of the market.

On the flipside, if an alternative dataset has become very popular, and you are not using it, you essentially have a blind spot in your analysis. Certain alternative datasets have essentially become must have items, for example helping to forecast sales of U.S. retailers ahead of earnings releases.

Quote image. Which alternative dataset is right for you?

The difficulty is knowing which alternative dataset is right for you. Just because a dataset is unusual, doesn’t necessarily mean it will be helpful for enhancing alpha.

Choosing the right alternative dataset

There are a lot of different criteria we need to apply to a dataset before we can do any number crunching on it. There are simply too many datasets to test, hence, we need to have some sort of criteria for shortlisting the ones we’d like to look at first.

For example, a large quant fund will be looking for datasets, which can be used to trade many different tickers. By contrast, a discretionary firm will tend to look at a smaller number of assets and dig down into each of them in more detail. A dataset which has a limited number of tickers, will still be of interest for such a firm.

There is also the question of the capacity of a strategy which uses alternative datasets related to this ticker question. If a fund with billions of dollars of assets under management deploys a strategy using alternative data in several millions of dollars it’s unlikely to be of interest, even if the Sharpe ratio is very high.

For a small prop trading firm, such a strategy might be very attractive. The value of a dataset therefore can differ significantly between firms.

Quote image. Which alternative dataset is right for you?

We also need to ask how structured the data is. If it is in a heavily raw form, it can be more challenging to use for all but the largest quant firms. By contrast datasets in a much more structured form are easier to integrate into workflows.

For example, Refinitiv has many alternative datasets which have been structured, including a machine readable news product portfolio featuring Reuters News along with advanced NLP products in News Analytics and MarketPsych indices.

Understanding the due diligence process

The length of the data history is another factor. Without a large data history, it can be difficult to evaluate a dataset.

The flipside is that if we wait for a long time until enough history is available, there might have been alpha decay in the signal as more and more market participants use it.

There are also many other questions that are part of the due diligence process. In particular, there are legal questions. Is the data GDPR (the EU law on data protection and privacy) compliant? Have the datasets been sufficiently scrubbed so that the data from individual users is not identifiable?

After you’ve chosen a dataset you’d like to test, it then takes time to sign non-disclosure agreements and other agreements with the data supplier. Only then, can you get to the fun bit of crunching the data!

Combining alternative datasets

Typically, we want to see whether adding the alternative data to existing models helps to increase predictability. The key point of using alternative data is not to replace existing data sources, but instead to augment them.

If the signal from an alternative dataset is heavily correlated to existing “traditional” datasets then it doesn’t add much value. By contrast, if we find for example that the R squared of our forecast is significantly increased through the addition of an alternative dataset, then it is likely to be more use.

Quote image. Which alternative dataset is right for you?

There is also nothing stopping you from combining many different alternative datasets together.

In The Book of Alternative Data, I look with co-author Alex Denev at the many use cases combining alternative data. The book, which is due to be published next year, uses the example of combining news sentiment of retailers with car counts of their car parks taken from satellite imagery.

We find that using these two alternative datasets together helps to forecast earnings per share better than using either one in isolation. Hence, it’s not just a question of which alternative dataset is useful, but how we can combine them.

A streamlined approach to alt data

So, let’s say you want to look at alternative data, how does one get started? BattleFin’s Ensemble platform is an excellent way to explore what is possible. Ensemble helps to streamline the process at many levels, including the onboarding process.

Rather than negotiating separate non-disclosure agreements and testing agreements with every data vendor, the process is done just once, saving time and money. Ensemble also helps in the evaluation and testing process for data, enabling you to test the historical data in a private sandbox in the cloud.

While finding alternative datasets might seem daunting, tools like Ensemble can help significantly with the process.

Focus more time on research with access to a huge range of pre-integrated, standardized data from QA Direct