Mark Dredze, associate professor of computer science at John Hopkins, discusses BloombergGPT, the first large language model built specifically for the finance industry. This work was a collaboration between Bloomberg’s artificial intelligence (AI) Engineering team and the machine learning (ML) Product and Research group in the company’s chief technology office, where Dredze is a visiting researcher.
Why does finance need its own language model?
While recent advances in AI models have demonstrated exciting new applications for many domains, the complexity and unique terminology of the financial domain warrant a domain-specific model. It’s not unlike other specialized domains, like medicine, which contain vocabulary you don’t see in general-purpose text. A finance-specific model will be able to improve existing financial natural language processing (NLP) tasks, such as sentiment analysis, named entity recognition, news classification, and question answering, among others. However, we also expect that domain-specific models will unlock new opportunities.
For example, we envision BloombergGPT transforming natural language queries from financial professionals into valid Bloomberg Query Language, or BQL, an incredibly powerful tool that enables financial professionals to quickly pinpoint and interact with data about different classes of securities. So if the user asks: “Get me the last price and market cap for Apple,” the system will return get(px_last,cur_mkt_cap) for([‘AAPL US Equity’]). This string of code will enable them to import the resulting data quickly and easily into data science and portfolio management tools.
What did you learn while building the new model?
Building these models isn’t easy, and there are a tremendous number of details you need to get right to make them work. We learned a lot from reading papers from other research groups who built language models. To contribute back to the community, we wrote a paper with over 70 pages detailing how we built our dataset, the choices that went into the model architecture, how we trained the model, and an extensive evaluation of the resulting model. We also released detailed “training chronicles” that contains a narrative description of the model-training process. Our goal is to be as open as possible about how we built the model to support other research groups who may be seeking to build their own models.