How To Make Data Worth It

The best solution is a direct API data feed with as much automated transformation and structuring as possible. But entity mapping and ticker tagging is a major challenge. Ticker tagging means assigning a company reference or brand alias back to its unique stock symbol and proper name. For example, “Verizon’’ needs to map back to VZ and Verizon Communications Inc. And not all references are so direct. Maybe a Twitter user sarcastically references Verizon’s slogan while including a typo — “that’s powerful.” A hedge fund might want that sentiment included in its investment analysis, but it would need sophisticated AI to even detect the reference.

And it doesn’t stop at ticker symbols. Some fund managers also want data mapped to CUSIPs, alphanumeric codes for North American securities, or ISINs, international identifier codes.

One of the leading alternative-data providers — and one of the standouts in handling the tagging and mapping challenge, according to Ekster — is Thinknum.

“There’s an opportunity in the market to have what they call referential data — having all these different ways of referencing a given entity, company or security, mapped back in a way that facilitates the data analysis,” said Boris Spiwak, director of marketing at Thinknum. “And I think we’re all sort of trying to figure out the best way to do that.”

Thinknum sells up to 35 data sets for each company that it tracks. Those include social media and job listing data sets, but also more niche information like car inventory, retail store growth, hotel web traffic data and vendor-specific product pricing by location. The information is publicly available; anyone with the know-how could, say, scrape Glassdoor in hopes of detecting hiring patterns. But that ability to map and tag referential data as a direct feed has major value. Thinknum’s API data feeds cost between $25,000 and $50,000 per data set, per year, Spiwak said.


Any ability to cut down turnaround time between acquisition and analysis is valuable, especially because many data intermediaries go for a quantity-over-quality approach: aggregated data sets with high ticker coverage, but not necessarily insightful ticker coverage.

“The problem today is … how do we know if a data set is going to be valuable? It could take six months of R&D, [and] you have to buy it first. You don’t know how much alpha it’s going to generate until much later,” Ekster said.

Neuravest, formerly known as Lucena Research, is one of the companies focused on cracking that conundrum. Neuravest is something of an intermediary after the intermediaries. It partners with 42 select alternative-data providers and works to validate data sets before passing them along and incorporating them into machine-learning investment models for fund managers.

Raw data is piped into the system, which generates what the company calls a data qualification report. The platform measures the data along 12 checkpoints before it’s allowed to be incorporated into a model. Checkpoints include an indicator of the length of time before a signal loses value, plus a distribution of price action following a given event, such as a news announcement that generates social-media chatter.

After validation, the data is scrubbed, ticker-tagged and normalized before a model is built to generate back-testable investment theses. By bringing together uncorrelated data sets, the models aim to identify constituent stocks and assets that are about to move abnormally compared to similar stocks.

But it begins with that first step — proving a data set is even worth the time. It’s about “identifying which ones are good for certain scenarios, and really providing them on a silver platter to customers, so they don’t have to deal with all these other purchases and evaluations and hiring quants and infrastructure,” said Erez Katz, co-founder and CEO of Neuravest.

The Future of Alternative Data

Even with well-structured feeds and benchmarked data sets, the need for skilled data analysts in finance isn’t going anywhere. Fundamental firms incorporate alt data to help interrogate their existing investment hypotheses, while quants input the alternative stuff into models alongside reams of traditional data. That is, alternative data will always be an ingredient, not the whole stew.

That’s also why experts sometimes push back on the idea that a widely distributed data set necessarily means diminishing alpha, particularly if it’s non-aggregated. “If you give the same raw data set to 20 different funds and analysts, they’ll come up with 20 different ways to make money on it,” Ekster said. “So in that sense, there will be no alpha decay.”

Katz struck a similar note, emphasizing the need for subject matter expertise and innovative thinking. “You need people who have very strong analytical skills, but also people who understand Wall Street, what it takes to move markets and how to circumvent what the common crowd knowledge presents.”


It’s also important to note that firms are no longer looking at alternative data strictly as an alpha generator. Data sets can also be used more like insurance — information to help limit loss in the face of potential upheaval. For instance, Spiwak said Thinknum saw “unprecedented” inbound demand when, at the height of the GameStop saga, it released its Reddit Mentions data set — which tracks, in real time, how often ticker symbols are mentioned in the top 100 posts on r/WallStreetBets and r/Stocks.

It was alternative data as risk management. If a hedge fund was shorting a stock, here was a way to maybe know if a short squeeze was imminent.

The Greensill episode offered a similar lesson. Sentiment analysis of Greensill employee reviews on job sites revealed turmoil prior to the finance company’s eventual collapse.

“There were some pretty clear signals from people working there that something wasn’t kosher,” Spiwak said.

Progress in the industry also means that sectors beyond finance are paying attention to the value of alternative data. Thinknum offers a more user-friendly, web-based user interface for data sets that’s less expensive than the API feed. The bulk of customers who use it come from companies outside finance, according to Spiwak.


Once a data set has enough historical data and true representativeness, it becomes attractive to enterprises too, and sometimes even governments. “You see a lot of non-institutional-investor interest in data sets that are mature enough and developed enough,” Ekster said. “And they’re using the fact that the institutional investment community uses them as a validation point.”

So far, that’s perhaps most evident in the fast-growing people analytics industry. Companies want up-to-the-minute data of employee sentiment, both for the employer and its competitors. And real-time tracking of competitors’ job listings can give a company a better picture of competitors’ growth strategies. While finance remains Thinknum’s beachhead, this kind of broader adoption of alternative data represents the future of the industry, Spiwak said.

Plus, there are always new kinds of data sets emerging. For example, ESG — environmental, social and governance — data has been the subject of much activity and chatter lately. It’s essentially a way of quantifying, through three main criteria, how sustainable a given enterprise is. That has broad appeal for governments tracking climate-related information, for companies looking to prove their green bona fides and for investors who’ve noticed the studies that indicate sustainable funds have performed as well or better than conventional funds.

ESG data isn’t perfect. The Organization for Economic Cooperation and Development recently called for more consistent standards to ensure across-the-board verifiability. But it’s clear nonetheless that — whether incorporating satellite data of construction practices or flood risk analysis or some other telling metric — alternative inputs will be key.

“To achieve that ESG goal, for the most part, alternative data is the only source of information that you have,” Ekster said. “You won’t get that from stock prices or company filings. You get that from alternative sources.”

Culled from by Stephen Gossett

Leave a Reply

Your email address will not be published. Required fields are marked *