STAC’s study on financial NLP a step towards machine learning benchmarks

January 18, 2019

The Securities Technology Analysis Center designed a study to illustrate how its benchmarks for machine learning (ML) can be constructed and used. It’s intended to help data scientists and data engineers know what to expect when using the data science tools and cloud products of this project and how to avoid common pitfalls.

The workload is topic modeling of SEC Form 10-K filings using Latent Dirichlet Allocation (LDA), a form of natural language processing (NLP).

The STAC team used this workload to explore the question of scale-up versus scale-out in a cloud environment on three SUTs (Systems Under Test):

A single Google Cloud Platform (GCP) n1-standard-16 instance with Skylake and RHEL 7.6
A single GCP n1-standard-96 instance with Skylake and RHEL 7.6
A Google Cloud Dataproc (Spark as a service) cluster containing 13 x n1-standard-16 Skylake instances (1 master and 12 worker nodes) and Debian Linux 8

The test design is a proposal to elicit feedback from the STAC AI Group on use cases, benchmark design, and research priorities around ML techniques and technologies.

Read excerpts from the study

Username

Password

Remember me


Reset password
Find user name
Create an account

Member Login

STAC’s study on financial NLP a step towards machine learning benchmarks

Click here to cancel reply.

Member Login

Finadium

Contact Us

STAC’s study on financial NLP a step towards machine learning benchmarks

Related Posts

Click here to cancel reply.

Member Login

Finadium

Contact Us