Chapter 9: Feature Engineering, Metadata Management, and Knowledge Graphs

doi:10.63345/WP-978-93-7559-564-9

Synopsis

Feature Store Concepts and Architectures

Introduces online vs. offline feature stores, data freshness requirements, and read/write APIs for model serving.
Example: Feast deployed on Kubernetes, backed by Redis for online features and Big Query for offline features.
Case Study: A ride-hailing company reduced feature retrieval latency by 70% through a unified feature store across clouds.

What & Why

A feature store is a centralized system to ingest, store, and serve machine-learning features consistently for both training and inference. It eliminates the “training–serving skew” by ensuring the same feature definitions and transformations apply online and offline. Without a feature store, teams reinvent feature pipelines per project, leading to inconsistent data, duplicated effort, and slower time to market.

How & Where

Feature stores consist of two layers:

Offline Store: A data warehouse or lake (e.g., Big Query, Snowflake) where historical feature values live for model training.

Online Store: A low-latency key–value store (e.g., Redis, Cassandra) powering real-time inference.

Data engineers register feature definitions SQL queries, transformation code in the feature store’s metadata catalog. At ingestion, a batch pipeline computes features on the offline store, while streaming jobs update the online store with point-in-time freshness.

Characteristic

Detail

Consistency

Single source of feature definitions

Scalability

Handles billions of features reads per second

Low Latency

Online stores serve in under 10 ms

Discoverability

Central catalog for data scientists

Real-Life Example

A ride-sharing app uses Feast as its feature store. Trip history and user behavior features computed daily in Big Query (offline). At request time, Redis (online) serves the latest “rides_last_7d” and “avg_fare” features to the pricing microservice within 5 ms.

Future Scope & Need

As organizations scale AI, feature sprawl threatens maintainability. Future feature stores will integrate AI-driven auto-discovery suggesting new features from data patterns and support federated architecture, where domains expose feature APIs across clouds without centralizing raw data. The need is clear: to accelerate MLOps cycles, reduce duplication, and guarantee production-quality features on a scale.

Chapter 9: Feature Engineering, Metadata Management, and Knowledge Graphs

Authors

Synopsis

Volume

Published

License

How to Cite

Make a Submission

Editor

Analytics

Keywords