Model Management & Operations Platform for Production Machine Learning
August 25, 2020
Today, I’m thrilled to announce the formal launch of the Verta Model Management and Operations platform, and Verta’s $10M Series A funding led by Intel Capital. With the Verta Platform, we help data science teams to tame the chaos of brittle and fragmented workflows that abound in machine learning, and to build and deploy intelligent products faster.
The roots of Verta go back to the ModelDB project at MIT CSAIL and my Ph.D. thesis. With a team at MIT, we built the first modern, open-source model management system widely used in research labs, Fortune-500 companies, and startups alike.
As we supported a growing number of users with ModelDB, it became clear that companies faced an even bigger problem when using machine learning as part of their core business; namely, operationalizing cutting-edge AI and ML models to incorporate them into products and services. Data science teams were ill-equipped to take models from research to production and productionizing a single new model routinely took many months, significantly slowing the time to value when using AI techniques.
Solving this central bottleneck of operationalizing AI and ML models has been our charter at Verta.
Why is operationalizing models so challenging, a.k.a., why all the excitement around #MLOps?
Five years ago, the hurdle in AI and ML adoption was the lack of easy-to-use model-building frameworks; fast-forward to today and the widespread availability of pre-trained networks and flexible model development frameworks means that a model can be built with just a few lines of code. The bottleneck in AI adoption has instead moved further downstream to making models operational, i.e. incorporating the products of data science into end-user facing products and services.
This process typically involves multiple steps:
- Validating model results using industry-standard methods (such as back-testing in finance),
- Developing scaffolding code to transform the model from a simple set of numbers into a runnable format that can be consumed by other software components,
- Optimizing the model and scaffold performance to sustain the expected request load,
- Creating pipelines to send appropriately formatted live data to the model,
- Instrumenting models to observe them in real-time and take preventive actions, and
- Assigning engineering headcount to keep the model running within the relevant Service Level Agreements (SLAs).
Now imagine repeating this process every time a new model or a new version of an existing model is to be rolled out.
Many of the above challenges are similar to those faced in software development and delivery and often covered under the banner of DevOps. However, four characteristics of AI and ML make DevOps tools and practices inadequate for operationalizing ML:
Data scientists are not software engineers.
Data scientists have a fantastically unique set of skills -- they work with data, run large numbers of experiments on state-of-the-art models, and use domain knowledge to build the best models. However, once a model has been built, as described above, a wholly different set of steps outside the purview of data science must be implemented. Expecting data scientists to also be experts at software development and DevOps in order to operationalize models is wasteful and takes them away from their core competency of building great models.
Heterogeneity is great for innovation, not for production.
Another challenge with operationalizing ML models comes from the heterogeneity of workflows used to train and deploy models. For example, a model may be developed in one of many languages including Python, R, and Scala; different libraries like scikit-learn, Tensorflow, PyTorch may be used to build the model; training pipelines may be executed in a variety of systems such as AirFlow, Apache Spark, KubeFlow; and depending on the use cases, different types of deployment architectures such as batch vs. real-time inference, containerized vs. serverless systems may be required to be used. Due to the combinatorial explosion of different possible ML workflows, it is challenging not only to maintain high velocity of model deployment but also to do so while complying with existing IT and infrastructure restrictions.
Robust infrastructure and operational expertise is missing.
Given the relative recency of ML being used in production systems, fundamental abstractions for developing and deploying models are missing. For instance, while software development processes are built on reliable versioning systems like Git, data science teams still don’t have a way to version their models or reproduce them -- a problem we set out to solve with ModelDB. In addition, very few teams have had experience maintaining production ML systems. As a result, production ML systems often break in unexpected ways and it can take weeks to make them functional again.
With many production models come challenges of scale.
Organizations with hundreds of models (e.g. banks, insurance companies) face a unique type of challenge. Arising from the heterogeneity in ML workflows and the siloed nature of large enterprises, there is currently no way to keep track of all model assets across the organization. For example, it is near-impossible to answer questions such as what models exist across the organization, where is a given model being used, who is using that model, and whether that model is performing as expected. This lack of visibility not only leads to duplicated effort in data science but also leaves the organization open to liabilities arising from unintended model use.
Introducing the Verta Platform
Given the unique constraints of data science and machine learning workflows described above, we have built the Verta platform around the following key design principles.
Let data scientists be data scientists.
On the Verta platform, data scientists can continue to build the best models using tools of their choice without having to learn the intricacies of Docker, Kubernetes, or monitoring software. Once a data science model is registered with Verta, the platform will package, optimize, and run this model without any additional intervention from the data scientist.
With Verta’s mission to bring order to fragmented ML workflows, Verta enables data science teams to continue leveraging parts of the ML workflow that work well for them while integrating into Verta to fill gaps in their infrastructure. So a data science team can use KubeFlow to train a model, AWS SageMaker to deploy it, and then Verta to monitor it -- it’s that easy!
Build for scale from Day-1.
In contrast to the frequently brittle ML pipelines that are the result of one-off model deployment efforts, we built Verta to be scalable and production-ready from Day 1: any models operationalized via Verta are robust, ready for high performance, and can be tightly integrated into the rest of the software or IT infrastructure from Day-1. Verta works for you for Production Model #1 and continues to scale and grow all the way to Production Model #10,000.
Provide a single point of synchronization.
To address the unique challenges that arise when managing large numbers of models, Verta serves as the one, central source of truth for models across different stages of their lifecycle, including development, validation, deployment and monitoring. No matter how a model was built or where it has been deployed, Verta provides a single point of governance for all model assets across an organization.
Read more about how the Verta platform embodies these design principles in this blog post by Verta CTO Conrado Miranda.
Over the last year, we have been heads-down in developing and battle-testing the Verta platform. We have been privileged to work with several top customers including one of the world’s leading workplace collaboration companies and several other AI-forward enterprises, demonstrating outcomes including a 10X increase in the speed of new model deployment.
None of this would be possible without our fantastic team. With Conrado Miranda, who brings years of experience in building and running production ML systems at Twitter and Nvidia, we’ve been able to bring together an extraordinarily customer-centric team.
In addition, we are privileged to have a forward-thinking set of investors including Intel Capital who led our Series A, seed investors including General Catalyst, Village Global, and Unusual Ventures, and new partners including Sweat Equity Ventures. The $10M Series A funding will enable us to support even more enterprises in operationalizing AI and ML and to continue to build an exceptional team.
If you’re part of a data science team looking for a way to bring more models into production and organize your ML efforts, sign up for our Platform trial!
Likewise, if solving these problems with production machine learning is interesting to you, check out open positions at Verta.
Join us, we are just getting started!
Manasi Vartak, CEO & Founder, Verta