Galileo announced that it has raised $5.1 million in seed funding. Founded by former Apple, Google and Uber AI engineers, it is a machine learning (ML) data intelligence platform for unstructured data that gives data scientists the ability to inspect, discover and fix critical ML data errors 10x faster across the entire ML lifecycle – from pre-training to post-training to post-production. The platform is currently in private beta with the Fortune 500 and startups across multiple industries.
It is common for data scientists to use spreadsheets and Python scripts to inspect and fix their training unstructured data. Doing this ‘data detective work’ consumes more than 50% of a data scientist’s time, is ad-hoc, manual, error prone and leads to poor data transparency across the organization, causing avoidable mispredictions and biases in production models.
Galileo aims to solve that with a few lines of code added by the data scientist while training a model, Galileo auto-logs the data, leverages some advanced statistical algorithms the team has created and then intelligently surfaces the model’s failure points with actions and integrations to immediately fix them, all within one platform. This reduces the time taken to find critical errors in ML data across training and production models from weeks to minutes using its tools. And it goes a step further by acting as a collaborative system of record for the data scientist’s training runs, bringing transparency towards how specific data and model parameter changes impact overall performance – this is key for ML teams to be data-driven.
Bradley Shimmin, chief analyst of AI Platforms, Analytics and Data Management, said in a statement:” When it comes to addressing the complex problem of inspecting and fixing the data — especially for unstructured data — many platforms still presume that enterprise practitioners work with data they already know and trust across the ML lifecycle. This couldn’t be further from the truth and is one of the biggest bottlenecks for ML adoption today.”
More than 80% of the world’s data today is unstructured (text, image, speech, etc.) and historically has been vastly untapped for ML. Recent advancements have made it easy for any data scientist to plug and play complex models for unstructured data, leading to a surge in their adoption across industries.
“The motivation for Galileo came from our personal experiences at Apple, Google and Uber AI and from conversations with hundreds of ML teams working with unstructured data where we noticed that, while they have a long list of model-focused MLOps tools to choose from, the biggest bottleneck and time sink for high quality ML is always around fixing the data they work with. This is critical, but prohibitively manual, ad-hoc and slow, leading to poor model predictions and avoidable model biases creeping into production for the business,” said Vikram Chatterji, co-founder and CEO of Galileo, in a statement. “With unstructured data across the enterprise being generated at an unprecedented scale and now rapidly leveraged for ML, we are building Galileo with the goal of being the intelligent data bench for data scientists to systematically and quickly inspect, fix and track their ML data in one place.”
Half of the Galileo team comprises researchers from Apple, Google and Stanford AI who are focused on pushing the envelope of data-centric research that is then baked into the Galileo platform for any ML team to leverage. The other half of the team is focused on building novel systems that can perform extremely low latency in-memory computations on millions of data points using minimal system resources. This combination allows Galileo customers to get quick, intelligent data insights throughout the entire ML workflow.
The Factory led the round and Anthony Goldbloom (co-founder and CEO at Kaggle) and other angel investors also participated. Company advisers include Amy Chang (Disney, P&G board member) and Pete Warden (one of the creators of TensorFlow). The company plans to use the $5.1 million seed funding to hire across all departments and accelerate research and development to meet the demand of the industry for a purpose-built product to find and fix ML data blind spots across the workflow while working with unstructured data.