STAC’s study on financial NLP a step towards machine learning benchmarks

The Securities Technology Analysis Center designed a study to illustrate how its benchmarks for machine learning (ML) can be constructed and used. It’s intended to help data scientists and data engineers know what to expect when using the data science tools and cloud products of this project and how to avoid common pitfalls.

The workload is topic modeling of SEC Form 10-K filings using Latent Dirichlet Allocation (LDA), a form of natural language processing (NLP).

We used this workload to explore the question of scale-up versus scale-out in a cloud environment on three SUTs (Systems Under Test):

  • A single Google Cloud Platform (GCP) n1-standard-16 instance with Skylake and RHEL 7.6
  • A single GCP n1-standard-96 instance with Skylake and RHEL 7.6
  • A Google Cloud Dataproc (Spark as a service) cluster containing 13 x n1-standard-16 Skylake instances (1 master and 12 worker nodes) and Debian Linux 8

The test design is a proposal to elicit feedback from the STAC AI Group on use cases, benchmark design, and research priorities around ML techniques and technologies.

Read excerpts from the study

Related Posts

Previous Post
What crypto funds are doing for custody, benchmarks and insurance (premium)
Next Post
Fiserve to acquire First Data for $22bn (analyses)

Related Posts

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
X

Reset password

Create an account