Notes on recommendations, search, relevance, and building AI products.
- HNSW Explained: The Algorithm Powering Fast Vector Search Hierarchical Navigable Small World (HNSW) is the approximate nearest neighbor algorithm that powers fast similarity search in most production vector databases. This post explains how HNSW works, what each parameter controls, where the algorithm breaks down, and how it fits
- Why grep Is Beating Your Vector DB Keyword retrieval keeps winning in production for reasons that have little to do with benchmark leaderboard scores. This post explains when grep beats vectors, and why.
- Modern Ranking Architectures, Part 5: The Feedback Loop Welcome to the final post in our series on the anatomy of modern recommender systems. Over the last four parts, we've deconstructed the online request path, following a user's request from the initial billions of items all the way to a final, ranked page.
- Beyond the Hashing Trick: The Math of Scaling to 100M+ IDs in Production. If you follow machine learning today, you’ve been told that tokenization is a solved problem. In the world of Natural Language Processing (NLP), we have Byte Pair Encoding (BPE) or WordPiece. These algorithms compress the infinite complexity of human language into a neat,
- Building the Relevance Layer for the AI World Why retrieval and relevance are becoming the most important infrastructure in modern AI products, and what we're building at Shaped to solve it.
- Why I Built a Database for Relevance After five years at Meta and three years building Shaped, I think relevance infrastructure should work like a database: declarative, composable, and fast enough for humans and agents.
- Modeling Behavior As Language: The Next Era of Recommendations A major shift is underway in recommender systems, moving from traditional Two-Tower and DLRM models to a new paradigm that treats user behavior as a language. This approach models a user's sequence of interactions, such as clicks and purchases, allowing Transformer-based
- Scaling Laws Beyond LLMs: The Future of Search and Recommendations When people talk about scaling laws in AI, they usually mean one thing: language models. The empirical laws first quantified in Kaplan et al. (2020) showed that loss scales predictably as a power law with model size, dataset size, and compute budget. Train a bigger
- Ranking Infrastructure, Part 1: The Serving Layer Welcome to a new, hands-on series for builders. In our previous series, \"Anatomy of a Modern Ranking Architectures,\" we deconstructed the conceptual blueprint of the multi-stage ranking architecture. We followed the logic of a request from retrieval to scoring to the final
- Ranking Infrastructure, Part 2: The Data Layer Welcome back to our series on the infrastructure of modern ranking systems. In Part 1, we designed the online serving layer: a set of decoupled, scalable microservices orchestrated by Kubernetes to handle real-time requests. We built the engine of our ranking system.
- Ranking Infrastructure, Part 3: The MLOps Backbone Welcome to the final post in our series on the infrastructure of modern ranking systems. So far, we've designed our high-performance online services and fueled them with a specialized data layer:
- The Anatomy of Modern Ranking Architectures, Part 1 If you look under the hood of recommendation systems at Netflix, YouTube, or Amazon, you won't find identical models, but you will find a remarkably similar architectural blueprint. This multi-stage ranking system is the industry's shared solution to a fundamental
- Modern Ranking Architectures, Part 2: Retrieval Welcome back to our series on the anatomy of modern recommender systems. In our first post, we established the multi-stage architecture as the industry-standard blueprint for balancing relevance, latency, and cost. We framed it as a system of cascading approximations,
- Modern Ranking Architectures, Part 3: Scoring Welcome back to our series on the anatomy of modern recommender systems. In Part 1, we introduced the multi-stage architecture as a blueprint for balancing relevance, latency, and cost. In Part 2, we explored the Retrieval Stage, where we used an ensemble of strategies to
- Modern Ranking Architectures, Part 4: Ordering Welcome back to our series on the anatomy of modern recommender systems. So far, we've deconstructed the core machine learning pipeline.
- YouTube gets ~5% CTR lift on Shorts by replacing embedding tables with Semantic IDs TL;DR: The shift from massive embedding tables to generative retrieval with Semantic IDs is accelerating. YouTube's new PLUM framework represents the next evolution, using an adapted LLM and enhanced 'SID-v2' to achieve a +4.96% Panel CTR lift for Shorts in live A/B tests.
- Building a HackerNews \"For You\" Feed TL;DR: The HackerNews top feed felt stale, so I built a personalized For You feed in a weekend using Lovable and Shaped. See it at hn.shaped.ai.
- The Vector Bottleneck in Embedding-Based Retrieval DeepMind’s latest paper formalizes a long-suspected limitation of embedding-based retrieval: single-vector models cannot scale to combinatorial query complexity, no matter how large the dimension. The result reframes hybrid and multi-vector approaches, not as patches, but as
- Dual-Flow Generative Ranking Networks TL;DR: Meta's generative recommender (MetaGR) is powerful but slow. Researchers from Meituan and top universities just dropped DFGR, a dual-stream architecture that's 2x faster at training and 4x faster at inference, while also beating MetaGR and heavily-engineered industrial
- Closing the Research-to-Production Gap in Recommendations If you’ve ever tried to take a promising machine learning experiment from an offline notebook to a live A/B test, you know the pain. Weeks, sometimes months, pass between proving an idea works and actually seeing it in front of users. Internal handoffs, infrastructure gaps,
- RentTheRunway Dataset: Deep Dive into Fashion Fit, Context, and Recommendation Challenges Online fashion retail faces unique challenges, moving beyond simple preference prediction. Accurately recommending clothing requires understanding complex factors like fit, body type, and the context of use. The RentTheRunway (RTR) dataset emerges as a crucial and fascinating
- Where Matters: Location Feature Engineering for Search & Recs Location is more than just coordinates, it’s a powerful signal for making search and recommendation systems more relevant. This post explores how proximity, regional preferences, delivery constraints, and geo-targeting can all be encoded into machine learning models through
- LambdaMART Explained: The Workhorse of Learning-to-Rank LambdaMART is one of the most widely used algorithms in Learning-to-Rank, powering the ranking logic behind search engines, recommendation systems, and e-commerce platforms. By combining gradient boosting trees (MART) with metric-aware optimization from LambdaRank, it
- Average Popularity: Are Your Recommendations Just Chasing Trends? Relevance metrics like NDCG and Precision@K are crucial for evaluating recommendation systems, but they don’t tell the full story. Two systems can perform similarly on these scores while exhibiting drastically different behaviors, one favoring only popular hits, the other
- Decoding Timestamps: Time-Based Feature Engineering for Search & Recs Timestamps hold far more value than just marking when an event occurred, they encode powerful signals like recency, seasonality, user lifecycle, and content freshness that can significantly boost the performance of recommendation and search systems. But unlocking their
- GoodReads Datasets: Powering Book Recommendations and Research The GoodReads datasets are a foundational resource for building and evaluating book recommendation systems. They combine explicit ratings, implicit feedback (like user shelves), rich textual reviews, and detailed metadata, making them ideal for hybrid models that mix
- Peering Inside the Black Box: Leveraging User & Item Embeddings Learn how user and item embeddings power personalized recommendations, similarity search, analytics, churn prediction, and custom ML models.
- DLRM-Style Feature Interactions for Ranking Deep Learning Recommendation Models (DLRMs) like Wide & Deep, DeepFM, DCN, and MaskNet have become essential tools for pointwise ranking in recommendation systems, where the goal is to predict the likelihood of user-item interactions such as clicks or conversions. These
- Categorical Features: The Backbone of Search & Recs Engineering Categorical features like category, brand, and user ID are essential to search and recommendation systems, but transforming them into meaningful signals for machine learning is often more complex than it appears. This post explains how to handle categorical data effectively,
- Gowalla Dataset: Understanding Location Check-ins, Social Ties, and Mobility Patterns The Gowalla dataset, a historical benchmark from the now-defunct location-based social network, offers rich check-in and social graph data that has powered foundational research in Point-of-Interest (POI) recommendations, human mobility modeling, and social influence on
- Catalog Coverage: Are Your Recommendations Exploring Your Whole Inventory? While traditional recommendation metrics focus on individual user experience, Catalog Coverage measures the breadth of a system’s recommendations across its entire inventory i.e how much of the catalog gets shown to anyone at all. It’s a valuable diagnostic for spotting
- Content-Based Filtering Explained: Recommending Based on What You Like Content-Based Filtering (CBF) is one of the fundamental approaches to building recommendation systems. Rather than relying on the preferences of similar users, CBF focuses on the characteristics of the items a user has engaged with to suggest others with similar attributes,
- MovieLens Dataset: The Essential Benchmark for Recommender Systems The MovieLens dataset is one of the most widely used benchmarks in recommender systems, offering real-world, explicit feedback data for evaluating collaborative filtering, content-based, and hybrid recommendation models. This article explores why MovieLens remains a gold
- From Zero to Relevant: Solving the Cold Start User Problem New or anonymous users often face irrelevant, generic content, hurting engagement from the very first visit. This article explores the cold start user problem in personalization and search systems, outlining common strategies like global popularity lists, rule-based segments,
- Last.fm Datasets: Unlocking Music Recommendations Through Listening History and Social Connections The article explores the significance of Last.fm datasets in developing music recommendation systems, highlighting their value as benchmarks for modeling implicit feedback, sequential listening behavior, and social influence. It breaks down what’s included in these datasets
- Privacy-First Personalization: The 7-Step Framework for Building Trust and Driving Growth This post introduces a step-by-step framework for building privacy-first personalization systems that earn user trust and support sustainable growth. It covers key strategies like data minimization, user control, edge processing, and satisfaction-based metrics—along with how
- See the Bigger Picture: Image Feature Engineering for Search & Recs In visually rich digital environments, text and tags alone often fall short in powering relevant search and recommendations. This article explores how visual feature engineering, extracting embeddings from images using models like CLIP or ViT, unlocks deeper relevance by
- How YouTube’s Algorithm Works: A Guide to Recommendations YouTube’s recommendation engine combines large-scale data processing, real-time feedback loops, and multi-objective optimization to deliver highly personalized video suggestions that prioritize both engagement and satisfaction. This post breaks down how the system works, from
- MRR: How Quickly Do Users Find the First Relevant Item? Mean Reciprocal Rank (MRR) is a metric that captures how quickly a user finds the first relevant item in a ranked list, making it especially valuable for tasks like known-item search or question answering where just one good result matters. This article introduces the concept
- Mastering Cold Start Challenges: Top Strategies for Personalized AI Experiences Cold start challenges can derail personalization efforts by making it difficult to deliver relevant experiences for new users, items, or markets. This post explores proven strategies and modern system architectures — including modular, AI-native platforms like Shaped — that
- Explainable Personalization: A Practical Guide for Building Trust and Transparency This post examines how to develop explainable personalization systems that enhance user trust, enhance internal visibility, and foster long-term engagement. It covers the key components of explainability, including transparent logic, user feedback, and internal observability,
- Matrix Factorization: The Bedrock of Collaborative Filtering Recommendations Matrix Factorization (MF) has long been a foundational technique in collaborative filtering for recommendation systems. It works by learning latent factors that represent hidden preferences of users and characteristics of items, allowing it to predict unknown interactions.
- Modular AI: Building Composable Personalization Stacks This post explores how modular AI infrastructure enables faster, more flexible, and more scalable personalization systems. It outlines the key components of a composable stack, like data ingestion, candidate generation, ranking, and feedback, and offers design principles to
- NDCG and Graded Relevance in Ranking How do you know if your ranking model is getting the order right, not just retrieving the right items? This post introduces NDCG, a powerful metric that accounts for both how relevant each item is and where it appears in the ranked list, enabling a more nuanced evaluation of
- 10 Best Practices in Data Ingestion: A Scalable Framework for Real-Time, Reliable Pipelines This post outlines 10 best practices for designing robust, scalable data ingestion pipelines that support real-time analytics, personalization, and machine learning. It covers essential topics like choosing the right ingestion pattern, enforcing data contracts, handling
- Unlock Text Data: NLP Feature Engineering for Search & Recs Keyword matching and interaction history aren’t enough for modern relevance. Language data, like product descriptions, search queries, and user reviews, holds rich signals that drive deeper personalization. But turning text into model-ready features requires complex NLP
- Monolithic vs Modular AI Architecture: Key Trade-Offs This blog post explores the differences between monolithic and modular AI-native architectures, helping businesses choose the best approach for their AI personalization systems. It explains the fundamental distinction: monolithic architectures integrate all AI components into
- How to Unify Data Ecosystems for Seamless Personalization This blog post addresses the challenge of fragmented data ecosystems, which hinders companies' ability to provide effective personalization. It presents a 6-step framework for unifying data across systems, enabling seamless, AI-driven customer experiences. The steps include
- AI-Powered Recommendation Engines: A Complete Guide This article explores how AI-powered recommendation systems are transforming digital experiences across e-commerce, music, and marketplaces.
- H&M Dataset: Powering Personalized Fashion Recommendations at Scale The H&M Personalized Fashion Recommendations dataset is a favorite in the ML community for testing large-scale, real-world recommendation systems. With millions of transactions and rich metadata, it offers a challenging benchmark for building personalized fashion experiences.
- Customer Data Platform Essentials: Unlocking Real-Time Personalization with First-Party Data This article explores how effective personalization relies on collecting, unifying, and analyzing first-party data through tools like Customer Data Platforms (CDPs). It highlights the role of data mining, real-time ingestion, and machine learning in transforming raw data—from
- Advancements in Feed Ranking Systems: A Deep Dive into Large-Scale Models This article explores how LinkedIn’s large-scale ranking framework, LiRank, integrates deep learning and large language models (LLMs) to power personalized content across feeds, job recommendations, and ads. It details core innovations such as Residual DCN, isotonic
- Beyond A/B Testing: A Practical Guide to Multi-Armed Bandits This article unpacks how multi-armed bandits offer a smarter alternative to A/B testing for real-time personalization. By dynamically balancing exploration and exploitation, bandit algorithms adapt to user behavior on the fly—delivering more relevant content, faster.
- A Comprehensive Guide to Approximate Nearest Neighbors Algorithms This article explores the role of approximate nearest neighbor (ANN) search in scaling personalization and similarity search across large, high-dimensional datasets. It contrasts ANN with exact search, highlighting its speed-accuracy trade-offs and practical relevance for
- How Does Temu Work? Understanding Its Personalization Strategy This article examines how Temu became one of the fastest-growing e-commerce platforms by using AI to fuel engagement across the user journey. It explores how real-time deep learning models, gamification, and multi-objective optimization drive personalization, session depth,
- Enhance Your AI with Real-Time Data Using RAG This blog post explores the challenges of real-time personalization in AI, such as high computational costs, slow experimentation, and the cold start problem. It introduces retrieval-augmented generation (RAG) as a solution, highlighting how it combines generative AI with
- How Amazon Masterminds Real-Time Product Discovery Beyond Search This article examines how Amazon leads in real-time product discovery by guiding users beyond search through personalized, AI-driven experiences. Using a blend of collaborative filtering, content-based filtering, and reinforcement learning, Amazon tailors recommendations
- Measuring Recommendation Performance: Relevancy, Precision, and Recall This article explains how precision, recall, and relevancy serve as core metrics for evaluating and optimizing recommendation systems. Precision measures how many recommended items are truly relevant, while recall captures how many relevant items are successfully
- Boosting Revenue with AI-Powered Cross-Selling Recommendations This article explores how AI is transforming cross-selling from a static, rules-based tactic into a dynamic personalization engine that adapts in real time. It contrasts traditional methods with AI-driven systems that detect subtle product relationships, adjust suggestions
- A/B Testing Rankings: Metrics That Matter You’ve trained a model, optimized offline metrics, and picked a winner, but how do you know it’ll perform with real users? In this post, we explore why A/B testing is essential for validating personalization and ranking models in production. We cover key online metrics like
- The Power of Deep Learning for Hyper-Personalized Recommendations This blog explores how deep learning is revolutionizing personalized recommendations by enabling real-time, context-aware experiences for users. Traditional recommendation systems, such as collaborative filtering and content-based models, struggle with static data, cold start
- Golden Tests in AI: Ensuring Reliability Without Slowing Innovation This article introduces golden tests as a practical method for detecting regressions in AI systems—especially real-time recommendation engines—by comparing current model outputs against a saved “golden” baseline. Unlike traditional tests, golden tests capture subtle changes
- Bridging Worlds: Training Language Models on User Behavior for Smarter Recommendations Traditional recommendation models face a tradeoff: language models excel at understanding item semantics, while collaborative filtering shines at capturing behavioral patterns. But what if you could combine both? In this post, we explore a new generation of hybrid techniques,
- Evaluation Metrics for Search and Recommendation Systems This article explores key metrics used to evaluate search and recommendation systems, from precision and recall to NDCG and diversity. It explains how offline and online evaluations work together to assess performance, and highlights challenges like data sparsity and feedback
- The Ultimate Guide to Modern Ranking Models This article offers a comprehensive guide to ranking models — algorithms that power personalized search, product recommendations, and content discovery. It breaks down the components of modern ranking systems, including retrieval, scoring, and ordering, and explains key
- Collaborative Filtering Explained This article explores collaborative filtering, a foundational technique behind personalized recommendations on platforms like Netflix and Amazon. It explains how user-based and item-based filtering work, compares memory-based and model-based approaches, and highlights
- Vector Search Explained: How AI Powers Smarter Search and Recommendations This blog post explains how vector search is transforming search and recommendation systems by focusing on the meaning behind data, not just matching keywords.
- Tweedie Regression for Video Watch-Time Prediction (Tubi Case Study) TL;DR: Tubi boosted VOD revenue (+0.4%) and watch time (+0.15%) by ditching weighted LogLoss for CTR and instead using Tweedie Regression to directly predict user watch time. Their paper shows Tweedie loss better models the zero-inflated, skewed nature of watch time data,
- Wayfair & Pinterest: Leveraging Visual Data and User Behavior for Personalized Discovery This blog post explores how leading companies like Wayfair and Pinterest use visual data and user behavior to create personalized discovery experiences. It highlights the growing role of visual data in enhancing personalization, moving beyond traditional text-based methods.
- Netflix Personalization Workshop 2025: Key Insights The Shaped team was thrilled to be at the 2025 Netflix Personalization, Recommendations & Search workshop last week! This event, first held by Netflix in 2016, is one of our highlights on the AI recommendation & search calendar. The day was packed with insightful talks from
- Two-Tower Models for Recommendation Systems The Two-Tower model is a foundational architecture for large-scale recommendation systems, built to efficiently retrieve relevant items from massive catalogs. By learning separate embeddings for users and items, it enables fast candidate generation via approximate nearest
- Criteo Dataset: Tackling Large-Scale Click-Through Rate Prediction Click-through rate (CTR) prediction is central to modern advertising and recommendation systems, and the Criteo dataset has become the de facto benchmark for advancing this task at industrial scale. With hundreds of millions to billions of rows and a blend of dense numerical
- Sequential Models for Recommendations (SASRec, BERT4Rec, and Beyond) In a world where user behavior changes by the minute, traditional recommendation systems fall short. Sequential recommendation models offer a powerful upgrade, capturing evolving intent by analyzing the order of interactions. This article breaks down the evolution of these
- How to Build a Killer 'For You' Feed The “For You” feed has become the gold standard of personalized digital experiences—but behind the magic lies serious technical complexity. From wrangling massive datasets to training cutting-edge ML models and serving results in real time, building a high-quality feed from
- Beyond Retrieval: Optimizing Relevance with Reranking Retrieving a strong list of candidate items is just the first step—the real challenge is ranking them in the most relevant, personalized order for each user and goal. This post explores how reranking transforms basic search results or recommendations into truly optimized
- Precision@K for Ranking Systems Is your recommender system truly hitting the mark? Imagine a user binging blockbusters like Avengers and Top Gun—will they click on Love Actually or John Wick next? This article breaks down Precision@K, the go-to metric for judging how many of your top K recommendations are
- Cross-Encoder Rediscovers a Semantic Variant of BM25 This article explores how cross-encoders, long praised for their performance in neural ranking, may in fact be reimplementing classic information retrieval logic, specifically, a semantic variant of BM25. Through mechanistic interpretability techniques, the authors uncover
- One Embedding to Rule Them All Pinterest’s OmniSearchSage represents a major step forward in unified semantic search. By extending the two-tower model into a multi-task, multi-entity framework, it enables a single query embedding to power retrieval across pins, products, and related queries. The system
- Beyond Relevance: Optimizing for Multiple Objectives in Search and Recommendations Building effective recommendation and search systems means going beyond simply predicting relevance. Modern users expect personalized experiences that cater to a wide range of needs and preferences, and businesses need systems that align with their overarching goals. This
- Beyond Dot Products: Retrieval with Learned Similarities The world of vector databases is exploding. Driven by the rise of large language models and the increasing need for semantic search, efficient retrieval of information from massive datasets has become paramount. Approximate Nearest Neighbor (ANN) search, often using dot
- Is This the ChatGPT Moment for Recommendation Systems? Researchers at Meta recently published a ground-breaking paper that combines the technology behind ChatGPT with Recommender Systems. They show they can scale these models up to 1.5 trillion parameters and demonstrate a 12.4% increase in topline metrics in production A/B
- Evaluating Recommendations: mAP, MMR, and NDCG Imagine you’re shown two ordered feeds of product recommendations from separate algorithms. In the first one (A) you’re shown: Nike sneakers, Adidas shorts, and an Apple Watch. In the second one (B) you’re shown the order: Apple Watch, Adidas shorts, and Nike Sneakers. --
- Evaluating Recommendations: Precision, Recall, and R-Precision Imagine you’re given three movie recommendations from separate algorithms. In the first one (A) you’re given: The Terminator, James Bond, and Star Wars. In the second (B) you’re given: Cars, Toy Story, and Iron Man --
- Day 2 of #RecSys2022: Our favorite 5 papers and talks It’s been another fantastic day at RecSys 2022. Following the Women in RecSys Breakfast, the day started with a keynote from Catherine D’Ignazio and then throughout the day had the following sessions: Fairness & Privacy, Diversity & Novely, and Models and Learning I. Here are
- Data-Centric AI for Ranking Data quality and volume is what makes rankings algorithms at big-tech so seamless. How can you create the same experiences with the data you have? Data-centric AI may be the answer!