Featureform: The ML Feature Store

28 August 2021

Featureform: The ML Feature Store

James Alcorn

Principal

image

As artificial intelligence moves out of the lab and into production, the infrastructure, tooling, and workflows to deploy and manage machine learning systems - a category often referred to as MLOps - has become one of the most exciting frontiers of innovation. MLOps exists because the iterative and data-intensive nature of the machine learning process demands an alternative to the traditional DevOps tools built in the SaaS era. Machine learning practitioners need a modern, deliberate, and industry-hardened workflow.

Many world-class teams have emerged to form this new, MLOps category. Some, like Zetta portfolio company Domino Data Lab, are building the end-to-end workbench for data scientists. Others are building tools that address a specific pain-point along the machine learning pipeline, like DVC, in data versioning; Weights & Biases, in experiment tracking; Kubeflow, in model orchestration; and BentoML, in model serving.

Of all the problems to solve in MLOps, operationalizing machine learning features is one of the most acute. Practically speaking, features are the data streams that enable a predictive model, and a feature store is the software that is used to manage, transform, and serve them to that model. Data, of course, is the primitive element in AI, and the feature store layer is such a crucial foundation that large technology companies like Linkedin, Uber, and Airbnb have each devoted meaningful resources to building and maintaining in-house feature stores for their sprawling data science teams. These features stores are almost always built in the context of a larger, end-to-end ML platform, like Uber’s Michelangelo.

Machine learning is rapidly diffusing outside of hyper-scale consumer technology companies, both into other types of technology companies, and across global economic sectors like financial services, healthcare, and real estate. The demand for feature stores has grown in lock-step, yet there are few workable solutions for organizations without near-unlimited data science and machine learning resources. Designing a feature store for this large and under-served market is a major challenge in MLOps that has yet to be solved.

Enter Simba Khadder. We first met Simba as the CEO of Triton, an analytics company building recommender systems for subscription businesses, including some of the nation’s largest publishers like The Wall Street Journal. Like other applied AI companies, Simba and his team at Triton found that their machine-learned content recommendations improved dramatically when they incorporated a richer set of features into their models.

At Triton’s peak, Simba and his team processed the behavioral data of over 100 Million Monthly Active Users (MAUs) through their system. To perform at this scale, and to operationalize their ever-expanding feature set, Triton was forced to build several foundational MLOps technologies that simply weren’t available in the marketplace at the time, including a production-grade feature store.

Simba quickly realized the potential impact his new feature store product could have, and started a company to amplify its reach. We see the same opportunity that he and his team do, and are privileged to support Simba’s new company, Featureform, through its next phase of growth.

Simba’s real-world experiences building ML systems for a diverse set of companies led Featureform to the unique product experience that it boasts today: a virtual feature store. Featureform won’t ask you to rewrite thousands of existing features already in production, or retool your existing data infrastructure. Featureform sits above the infrastructure layer and provides data scientists a standardized way to define, manage, and share features. Adhering to its set of user-centric design principles has allowed Featureform to begin serving the needs of the fastest-growing enterprise customers in the market.

Simba and his team have established themselves as an important and rising voice within the global MLOps community, and they’ve assembled a small group of seasoned practitioners who built and scaled some of the most important technology companies of the last decade.

We’re thrilled to share that Featureform has launched from stealth and announced the open source release of their first product: the Featureform Embedding Store. We couldn’t be more excited to support Simba, Shab, and the rest of the team on their mission to standardize and accelerate the machine learning process. If you’re a little intrigued, check out the embedding store and give us some feedback. If you’re a lot intrigued, Featureform is hiring.