Barclays Research Open Source Alternative Data Tools

Barclays Research Data Science has published its first open-source coding tools to help users query the most popular financial research databases.

Ryan Preclaw, head of investment science research and Adam Kelleher, chief data scientist for research at Barclays investment bank, said in a report that analysts use a variety of data sources alongside financials. traditional such as filings, results call transcripts, which requires them to use modern technology for data analysis.

“A single project may require SQL queries, boto get queries, Spark read operations, manipulation of local Excel sheets, etc.,” they said.

As a result, Barclays’ research data science team built a tool to make more efficient use of a new dataset, which they say has been a major contributing factor to the growth rate of the more than linear productivity over time.

Ryan Preclaw, Barclays

“When we work with a new dataset, usually as part of a research project, we wrap the ETL (extract, transform, load) code and add it to the tool,” they added. “It makes it a push-button feature that’s ready the next time we need to use it.”

Later, they can usually perform the same operation with a single line of code.

Barclays believes these tools will be useful for others doing financial research as they create better quality control of queries and enable reproducibility by standardizing pre-processing on widely available datasets.

“Our software is designed to be extensible,” the report says. “It’s divided into sections by data source, so it’s easy to add code to support new providers or datasets.”

The tool is designed to extract end-user data sources and present a standardized interface for working with financial data through a data access layer and an API layer.

Adam Kelleher, Barclays

The report pointed out that data layers provide a clean interface to raw data. One of the main advantages is that if there are 100 applications using the database then if the database changes the tool only has to make one change instead of 100 if the queries are implemented in application code The API layer uses the data access layer to input data and produce a standard format for analysis.

The API layer uses the data access layer to grab the data and then reformats it into a standard format for analysis.

“The simplicity of the interface makes data logistics trivial, so analysts can focus their time and mental energy on those aspects of their work where their unique skills are additive, rather than rote data logistics” , added Preclaw and Kelleher.