Friday, 21 January 2022

Alternative data and sentiment analysis: prospecting non-standard data in machine learning-driven finance

Kristian Bondo Hansen (@BondoHansen) and Christian Borch (@CboMpp)

The first three months of 2021 saw two remarkable events shake the global economy and the world of finance. One was the meme-stock short squeeze of aggressive short-selling hedge funds driven by retail investors, who mobilised on social media platforms—most notably Reddit’s sub-site WallStreetBets—and used the fingertip market access provided by trading apps like Robinhood to rally around and effectively run up the price of ailing American companies such as the video game store franchise GameStop and cinema chain AMC Theatres. This event made conventional views on the drivers of prices in the financial markets topsy-turvy and directed the attention of investors to market chatter in social media communities. The other event was the Ever Given’s clogging of the Suez Canal; a freak accident resulting in a week-long blocking of the narrow Egyptian waterway through which 30% of all global container traffic travels.
What unites these two events? From an investor perspective, both events showed the predictive potential in data sources from outside of standard market data (price, trade, volume, etc.). Closely monitoring freight routes via GPS tracking and inventory analysis using satellite imagery of container terminals can help detect frictions along the supply chain that may eventually have a negative impact on individual firms’ financial statements. Similarly, keeping an ear to the buzz on social media might prove an inside track to predictions of retail rallies in individual stocks.

In our article ‘Alternative data and sentiment analysis: prospecting non-standard data in machine learning-driven finance’, we examine this disparate category of heterogeneous data sources—including social media data, GPS tracking data, sensor data, satellite imagery, credit card transaction data, and more—which market professionals call ‘alternative data’. We focus mainly on social media data, scraped from the web, and parsed through Natural Language Processing (NLP) machine learning algorithms. Drawing on interviews with investment managers and traders using alternative data, as well as intermediaries sourcing and vending such data, we show how alternative data are viewed, used, monetised, and exploited by market professionals.

Our key argument is that alternative data should always be considered in relation to the analytics tools with which patterns are extracted, signals discovered, or anomalies detected in big data sets. A crucial task in alternative data use is to render data amenable to analysis with the analytics tools available; a type of standardisation effort that data scientists call ‘prospecting’. As the financial industry’s interest in alternative data grows, ever-more data sources undergo prospecting and thus, are rendered finance relevant. We propose to see this development as financialisation on the data level—the continuous appropriation of data that could potentially be transformed into valuable market insights—and argue that it prompts new concerns about how to govern this ever-expanding category of heterogeneous data both inside individual firms and across the industry.