Tullie Murrell | Writing

Writing
About

Notes on recommendations, search, relevance, and building AI products.

Apr 30, 2026 HNSW Explained: The Algorithm Powering Fast Vector Search Hierarchical Navigable Small World (HNSW) is the approximate nearest neighbor algorithm that powers fast similarity search in most production vector databases. This post explains how HNSW works, what each parameter controls, where the algorithm breaks down, and how it fits
Apr 23, 2026 Why grep Is Beating Your Vector DB Keyword retrieval keeps winning in production for reasons that have little to do with benchmark leaderboard scores. This post explains when grep beats vectors, and why.
Mar 2, 2026 Modern Ranking Architectures, Part 5: The Feedback Loop Welcome to the final post in our series on the anatomy of modern recommender systems. Over the last four parts, we've deconstructed the online request path, following a user's request from the initial billions of items all the way to a final, ranked page.
Feb 5, 2026 Beyond the Hashing Trick: The Math of Scaling to 100M+ IDs in Production. If you follow machine learning today, you’ve been told that tokenization is a solved problem. In the world of Natural Language Processing (NLP), we have Byte Pair Encoding (BPE) or WordPiece. These algorithms compress the infinite complexity of human language into a neat,
Jan 15, 2026 Building the Relevance Layer for the AI World Why retrieval and relevance are becoming the most important infrastructure in modern AI products, and what we're building at Shaped to solve it.
Dec 11, 2025 Why I Built a Database for Relevance After five years at Meta and three years building Shaped, I think relevance infrastructure should work like a database: declarative, composable, and fast enough for humans and agents.
Nov 13, 2025 Modeling Behavior As Language: The Next Era of Recommendations A major shift is underway in recommender systems, moving from traditional Two-Tower and DLRM models to a new paradigm that treats user behavior as a language. This approach models a user's sequence of interactions, such as clicks and purchases, allowing Transformer-based
Nov 13, 2025 Scaling Laws Beyond LLMs: The Future of Search and Recommendations When people talk about scaling laws in AI, they usually mean one thing: language models. The empirical laws first quantified in Kaplan et al. (2020) showed that loss scales predictably as a power law with model size, dataset size, and compute budget. Train a bigger
Nov 3, 2025 Ranking Infrastructure, Part 1: The Serving Layer Welcome to a new, hands-on series for builders. In our previous series, \"Anatomy of a Modern Ranking Architectures,\" we deconstructed the conceptual blueprint of the multi-stage ranking architecture. We followed the logic of a request from retrieval to scoring to the final
Nov 3, 2025 Ranking Infrastructure, Part 2: The Data Layer Welcome back to our series on the infrastructure of modern ranking systems. In Part 1, we designed the online serving layer: a set of decoupled, scalable microservices orchestrated by Kubernetes to handle real-time requests. We built the engine of our ranking system.
Nov 3, 2025 Ranking Infrastructure, Part 3: The MLOps Backbone Welcome to the final post in our series on the infrastructure of modern ranking systems. So far, we've designed our high-performance online services and fueled them with a specialized data layer:
Oct 13, 2025 The Anatomy of Modern Ranking Architectures, Part 1 If you look under the hood of recommendation systems at Netflix, YouTube, or Amazon, you won't find identical models, but you will find a remarkably similar architectural blueprint. This multi-stage ranking system is the industry's shared solution to a fundamental
Oct 13, 2025 Modern Ranking Architectures, Part 2: Retrieval Welcome back to our series on the anatomy of modern recommender systems. In our first post, we established the multi-stage architecture as the industry-standard blueprint for balancing relevance, latency, and cost. We framed it as a system of cascading approximations,
Oct 13, 2025 Modern Ranking Architectures, Part 3: Scoring Welcome back to our series on the anatomy of modern recommender systems. In Part 1, we introduced the multi-stage architecture as a blueprint for balancing relevance, latency, and cost. In Part 2, we explored the Retrieval Stage, where we used an ensemble of strategies to
Oct 13, 2025 Modern Ranking Architectures, Part 4: Ordering Welcome back to our series on the anatomy of modern recommender systems. So far, we've deconstructed the core machine learning pipeline.
Oct 10, 2025 YouTube gets ~5% CTR lift on Shorts by replacing embedding tables with Semantic IDs TL;DR: The shift from massive embedding tables to generative retrieval with Semantic IDs is accelerating. YouTube's new PLUM framework represents the next evolution, using an adapted LLM and enhanced 'SID-v2' to achieve a +4.96% Panel CTR lift for Shorts in live A/B tests.
Sep 23, 2025 Building a HackerNews \"For You\" Feed TL;DR: The HackerNews top feed felt stale, so I built a personalized For You feed in a weekend using Lovable and Shaped. See it at hn.shaped.ai.
Sep 8, 2025 The Vector Bottleneck in Embedding-Based Retrieval DeepMind’s latest paper formalizes a long-suspected limitation of embedding-based retrieval: single-vector models cannot scale to combinatorial query complexity, no matter how large the dimension. The result reframes hybrid and multi-vector approaches, not as patches, but as
Aug 28, 2025 Dual-Flow Generative Ranking Networks TL;DR: Meta's generative recommender (MetaGR) is powerful but slow. Researchers from Meituan and top universities just dropped DFGR, a dual-stream architecture that's 2x faster at training and 4x faster at inference, while also beating MetaGR and heavily-engineered industrial
Aug 13, 2025 Closing the Research-to-Production Gap in Recommendations If you’ve ever tried to take a promising machine learning experiment from an offline notebook to a live A/B test, you know the pain. Weeks, sometimes months, pass between proving an idea works and actually seeing it in front of users. Internal handoffs, infrastructure gaps,
Aug 5, 2025 RentTheRunway Dataset: Deep Dive into Fashion Fit, Context, and Recommendation Challenges Online fashion retail faces unique challenges, moving beyond simple preference prediction. Accurately recommending clothing requires understanding complex factors like fit, body type, and the context of use. The RentTheRunway (RTR) dataset emerges as a crucial and fascinating
Aug 4, 2025 Where Matters: Location Feature Engineering for Search & Recs Location is more than just coordinates, it’s a powerful signal for making search and recommendation systems more relevant. This post explores how proximity, regional preferences, delivery constraints, and geo-targeting can all be encoded into machine learning models through
Jul 30, 2025 LambdaMART Explained: The Workhorse of Learning-to-Rank LambdaMART is one of the most widely used algorithms in Learning-to-Rank, powering the ranking logic behind search engines, recommendation systems, and e-commerce platforms. By combining gradient boosting trees (MART) with metric-aware optimization from LambdaRank, it
Jul 24, 2025 Average Popularity: Are Your Recommendations Just Chasing Trends? Relevance metrics like NDCG and Precision@K are crucial for evaluating recommendation systems, but they don’t tell the full story. Two systems can perform similarly on these scores while exhibiting drastically different behaviors, one favoring only popular hits, the other
Jul 24, 2025 Decoding Timestamps: Time-Based Feature Engineering for Search & Recs Timestamps hold far more value than just marking when an event occurred, they encode powerful signals like recency, seasonality, user lifecycle, and content freshness that can significantly boost the performance of recommendation and search systems. But unlocking their
Jul 22, 2025 GoodReads Datasets: Powering Book Recommendations and Research The GoodReads datasets are a foundational resource for building and evaluating book recommendation systems. They combine explicit ratings, implicit feedback (like user shelves), rich textual reviews, and detailed metadata, making them ideal for hybrid models that mix
Jul 18, 2025 Peering Inside the Black Box: Leveraging User & Item Embeddings Learn how user and item embeddings power personalized recommendations, similarity search, analytics, churn prediction, and custom ML models.
Jul 17, 2025 DLRM-Style Feature Interactions for Ranking Deep Learning Recommendation Models (DLRMs) like Wide & Deep, DeepFM, DCN, and MaskNet have become essential tools for pointwise ranking in recommendation systems, where the goal is to predict the likelihood of user-item interactions such as clicks or conversions. These
Jul 15, 2025 Categorical Features: The Backbone of Search & Recs Engineering Categorical features like category, brand, and user ID are essential to search and recommendation systems, but transforming them into meaningful signals for machine learning is often more complex than it appears. This post explains how to handle categorical data effectively,
Jul 14, 2025 Gowalla Dataset: Understanding Location Check-ins, Social Ties, and Mobility Patterns The Gowalla dataset, a historical benchmark from the now-defunct location-based social network, offers rich check-in and social graph data that has powered foundational research in Point-of-Interest (POI) recommendations, human mobility modeling, and social influence on
Jul 11, 2025 Catalog Coverage: Are Your Recommendations Exploring Your Whole Inventory? While traditional recommendation metrics focus on individual user experience, Catalog Coverage measures the breadth of a system’s recommendations across its entire inventory i.e how much of the catalog gets shown to anyone at all. It’s a valuable diagnostic for spotting
Jul 8, 2025 Content-Based Filtering Explained: Recommending Based on What You Like Content-Based Filtering (CBF) is one of the fundamental approaches to building recommendation systems. Rather than relying on the preferences of similar users, CBF focuses on the characteristics of the items a user has engaged with to suggest others with similar attributes,
Jul 2, 2025 MovieLens Dataset: The Essential Benchmark for Recommender Systems The MovieLens dataset is one of the most widely used benchmarks in recommender systems, offering real-world, explicit feedback data for evaluating collaborative filtering, content-based, and hybrid recommendation models. This article explores why MovieLens remains a gold
Jun 27, 2025 From Zero to Relevant: Solving the Cold Start User Problem New or anonymous users often face irrelevant, generic content, hurting engagement from the very first visit. This article explores the cold start user problem in personalization and search systems, outlining common strategies like global popularity lists, rule-based segments,
Jun 27, 2025 Last.fm Datasets: Unlocking Music Recommendations Through Listening History and Social Connections The article explores the significance of Last.fm datasets in developing music recommendation systems, highlighting their value as benchmarks for modeling implicit feedback, sequential listening behavior, and social influence. It breaks down what’s included in these datasets
Jun 24, 2025 Privacy-First Personalization: The 7-Step Framework for Building Trust and Driving Growth This post introduces a step-by-step framework for building privacy-first personalization systems that earn user trust and support sustainable growth. It covers key strategies like data minimization, user control, edge processing, and satisfaction-based metrics—along with how
Jun 23, 2025 See the Bigger Picture: Image Feature Engineering for Search & Recs In visually rich digital environments, text and tags alone often fall short in powering relevant search and recommendations. This article explores how visual feature engineering, extracting embeddings from images using models like CLIP or ViT, unlocks deeper relevance by
Jun 19, 2025 How YouTube’s Algorithm Works: A Guide to Recommendations YouTube’s recommendation engine combines large-scale data processing, real-time feedback loops, and multi-objective optimization to deliver highly personalized video suggestions that prioritize both engagement and satisfaction. This post breaks down how the system works, from
Jun 19, 2025 MRR: How Quickly Do Users Find the First Relevant Item? Mean Reciprocal Rank (MRR) is a metric that captures how quickly a user finds the first relevant item in a ranked list, making it especially valuable for tasks like known-item search or question answering where just one good result matters. This article introduces the concept
Jun 18, 2025 Mastering Cold Start Challenges: Top Strategies for Personalized AI Experiences Cold start challenges can derail personalization efforts by making it difficult to deliver relevant experiences for new users, items, or markets. This post explores proven strategies and modern system architectures — including modular, AI-native platforms like Shaped — that
Jun 17, 2025 Explainable Personalization: A Practical Guide for Building Trust and Transparency This post examines how to develop explainable personalization systems that enhance user trust, enhance internal visibility, and foster long-term engagement. It covers the key components of explainability, including transparent logic, user feedback, and internal observability,
Jun 17, 2025 Matrix Factorization: The Bedrock of Collaborative Filtering Recommendations Matrix Factorization (MF) has long been a foundational technique in collaborative filtering for recommendation systems. It works by learning latent factors that represent hidden preferences of users and characteristics of items, allowing it to predict unknown interactions.
Jun 16, 2025 Modular AI: Building Composable Personalization Stacks This post explores how modular AI infrastructure enables faster, more flexible, and more scalable personalization systems. It outlines the key components of a composable stack, like data ingestion, candidate generation, ranking, and feedback, and offers design principles to
Jun 12, 2025 NDCG and Graded Relevance in Ranking How do you know if your ranking model is getting the order right, not just retrieving the right items? This post introduces NDCG, a powerful metric that accounts for both how relevant each item is and where it appears in the ranked list, enabling a more nuanced evaluation of
Jun 11, 2025 10 Best Practices in Data Ingestion: A Scalable Framework for Real-Time, Reliable Pipelines This post outlines 10 best practices for designing robust, scalable data ingestion pipelines that support real-time analytics, personalization, and machine learning. It covers essential topics like choosing the right ingestion pattern, enforcing data contracts, handling
Jun 11, 2025 Unlock Text Data: NLP Feature Engineering for Search & Recs Keyword matching and interaction history aren’t enough for modern relevance. Language data, like product descriptions, search queries, and user reviews, holds rich signals that drive deeper personalization. But turning text into model-ready features requires complex NLP
Jun 9, 2025 Monolithic vs Modular AI Architecture: Key Trade-Offs This blog post explores the differences between monolithic and modular AI-native architectures, helping businesses choose the best approach for their AI personalization systems. It explains the fundamental distinction: monolithic architectures integrate all AI components into
Jun 7, 2025 How to Unify Data Ecosystems for Seamless Personalization This blog post addresses the challenge of fragmented data ecosystems, which hinders companies' ability to provide effective personalization. It presents a 6-step framework for unifying data across systems, enabling seamless, AI-driven customer experiences. The steps include
Jun 4, 2025 AI-Powered Recommendation Engines: A Complete Guide This article explores how AI-powered recommendation systems are transforming digital experiences across e-commerce, music, and marketplaces.
Jun 4, 2025 H&M Dataset: Powering Personalized Fashion Recommendations at Scale The H&M Personalized Fashion Recommendations dataset is a favorite in the ML community for testing large-scale, real-world recommendation systems. With millions of transactions and rich metadata, it offers a challenging benchmark for building personalized fashion experiences.
Jun 3, 2025 Customer Data Platform Essentials: Unlocking Real-Time Personalization with First-Party Data This article explores how effective personalization relies on collecting, unifying, and analyzing first-party data through tools like Customer Data Platforms (CDPs). It highlights the role of data mining, real-time ingestion, and machine learning in transforming raw data—from
Jun 3, 2025 Advancements in Feed Ranking Systems: A Deep Dive into Large-Scale Models This article explores how LinkedIn’s large-scale ranking framework, LiRank, integrates deep learning and large language models (LLMs) to power personalized content across feeds, job recommendations, and ads. It details core innovations such as Residual DCN, isotonic
Jun 3, 2025 Beyond A/B Testing: A Practical Guide to Multi-Armed Bandits This article unpacks how multi-armed bandits offer a smarter alternative to A/B testing for real-time personalization. By dynamically balancing exploration and exploitation, bandit algorithms adapt to user behavior on the fly—delivering more relevant content, faster.
Jun 2, 2025 A Comprehensive Guide to Approximate Nearest Neighbors Algorithms This article explores the role of approximate nearest neighbor (ANN) search in scaling personalization and similarity search across large, high-dimensional datasets. It contrasts ANN with exact search, highlighting its speed-accuracy trade-offs and practical relevance for
Jun 2, 2025 How Does Temu Work? Understanding Its Personalization Strategy This article examines how Temu became one of the fastest-growing e-commerce platforms by using AI to fuel engagement across the user journey. It explores how real-time deep learning models, gamification, and multi-objective optimization drive personalization, session depth,
Jun 2, 2025 Enhance Your AI with Real-Time Data Using RAG This blog post explores the challenges of real-time personalization in AI, such as high computational costs, slow experimentation, and the cold start problem. It introduces retrieval-augmented generation (RAG) as a solution, highlighting how it combines generative AI with
May 30, 2025 How Amazon Masterminds Real-Time Product Discovery Beyond Search This article examines how Amazon leads in real-time product discovery by guiding users beyond search through personalized, AI-driven experiences. Using a blend of collaborative filtering, content-based filtering, and reinforcement learning, Amazon tailors recommendations
May 30, 2025 Measuring Recommendation Performance: Relevancy, Precision, and Recall This article explains how precision, recall, and relevancy serve as core metrics for evaluating and optimizing recommendation systems. Precision measures how many recommended items are truly relevant, while recall captures how many relevant items are successfully
May 29, 2025 Boosting Revenue with AI-Powered Cross-Selling Recommendations This article explores how AI is transforming cross-selling from a static, rules-based tactic into a dynamic personalization engine that adapts in real time. It contrasts traditional methods with AI-driven systems that detect subtle product relationships, adjust suggestions
May 28, 2025 A/B Testing Rankings: Metrics That Matter You’ve trained a model, optimized offline metrics, and picked a winner, but how do you know it’ll perform with real users? In this post, we explore why A/B testing is essential for validating personalization and ranking models in production. We cover key online metrics like
May 27, 2025 The Power of Deep Learning for Hyper-Personalized Recommendations This blog explores how deep learning is revolutionizing personalized recommendations by enabling real-time, context-aware experiences for users. Traditional recommendation systems, such as collaborative filtering and content-based models, struggle with static data, cold start
May 26, 2025 Golden Tests in AI: Ensuring Reliability Without Slowing Innovation This article introduces golden tests as a practical method for detecting regressions in AI systems—especially real-time recommendation engines—by comparing current model outputs against a saved “golden” baseline. Unlike traditional tests, golden tests capture subtle changes
May 23, 2025 Bridging Worlds: Training Language Models on User Behavior for Smarter Recommendations Traditional recommendation models face a tradeoff: language models excel at understanding item semantics, while collaborative filtering shines at capturing behavioral patterns. But what if you could combine both? In this post, we explore a new generation of hybrid techniques,
May 22, 2025 Evaluation Metrics for Search and Recommendation Systems This article explores key metrics used to evaluate search and recommendation systems, from precision and recall to NDCG and diversity. It explains how offline and online evaluations work together to assess performance, and highlights challenges like data sparsity and feedback
May 19, 2025 The Ultimate Guide to Modern Ranking Models This article offers a comprehensive guide to ranking models — algorithms that power personalized search, product recommendations, and content discovery. It breaks down the components of modern ranking systems, including retrieval, scoring, and ordering, and explains key
May 18, 2025 Collaborative Filtering Explained This article explores collaborative filtering, a foundational technique behind personalized recommendations on platforms like Netflix and Amazon. It explains how user-based and item-based filtering work, compares memory-based and model-based approaches, and highlights
May 15, 2025 Vector Search Explained: How AI Powers Smarter Search and Recommendations This blog post explains how vector search is transforming search and recommendation systems by focusing on the meaning behind data, not just matching keywords.
May 14, 2025 Tweedie Regression for Video Watch-Time Prediction (Tubi Case Study) TL;DR: Tubi boosted VOD revenue (+0.4%) and watch time (+0.15%) by ditching weighted LogLoss for CTR and instead using Tweedie Regression to directly predict user watch time. Their paper shows Tweedie loss better models the zero-inflated, skewed nature of watch time data,
May 13, 2025 Wayfair & Pinterest: Leveraging Visual Data and User Behavior for Personalized Discovery This blog post explores how leading companies like Wayfair and Pinterest use visual data and user behavior to create personalized discovery experiences. It highlights the growing role of visual data in enhancing personalization, moving beyond traditional text-based methods.
May 13, 2025 Netflix Personalization Workshop 2025: Key Insights The Shaped team was thrilled to be at the 2025 Netflix Personalization, Recommendations & Search workshop last week! This event, first held by Netflix in 2016, is one of our highlights on the AI recommendation & search calendar. The day was packed with insightful talks from
May 9, 2025 Two-Tower Models for Recommendation Systems The Two-Tower model is a foundational architecture for large-scale recommendation systems, built to efficiently retrieve relevant items from massive catalogs. By learning separate embeddings for users and items, it enables fast candidate generation via approximate nearest
May 8, 2025 Criteo Dataset: Tackling Large-Scale Click-Through Rate Prediction Click-through rate (CTR) prediction is central to modern advertising and recommendation systems, and the Criteo dataset has become the de facto benchmark for advancing this task at industrial scale. With hundreds of millions to billions of rows and a blend of dense numerical
May 6, 2025 Sequential Models for Recommendations (SASRec, BERT4Rec, and Beyond) In a world where user behavior changes by the minute, traditional recommendation systems fall short. Sequential recommendation models offer a powerful upgrade, capturing evolving intent by analyzing the order of interactions. This article breaks down the evolution of these
May 2, 2025 How to Build a Killer 'For You' Feed The “For You” feed has become the gold standard of personalized digital experiences—but behind the magic lies serious technical complexity. From wrangling massive datasets to training cutting-edge ML models and serving results in real time, building a high-quality feed from
Apr 28, 2025 Beyond Retrieval: Optimizing Relevance with Reranking Retrieving a strong list of candidate items is just the first step—the real challenge is ranking them in the most relevant, personalized order for each user and goal. This post explores how reranking transforms basic search results or recommendations into truly optimized
Apr 25, 2025 Precision@K for Ranking Systems Is your recommender system truly hitting the mark? Imagine a user binging blockbusters like Avengers and Top Gun—will they click on Love Actually or John Wick next? This article breaks down Precision@K, the go-to metric for judging how many of your top K recommendations are
Apr 24, 2025 Cross-Encoder Rediscovers a Semantic Variant of BM25 This article explores how cross-encoders, long praised for their performance in neural ranking, may in fact be reimplementing classic information retrieval logic, specifically, a semantic variant of BM25. Through mechanistic interpretability techniques, the authors uncover
Apr 22, 2025 One Embedding to Rule Them All Pinterest’s OmniSearchSage represents a major step forward in unified semantic search. By extending the two-tower model into a multi-task, multi-entity framework, it enables a single query embedding to power retrieval across pins, products, and related queries. The system
Mar 5, 2025 Beyond Relevance: Optimizing for Multiple Objectives in Search and Recommendations Building effective recommendation and search systems means going beyond simply predicting relevance. Modern users expect personalized experiences that cater to a wide range of needs and preferences, and businesses need systems that align with their overarching goals. This
Feb 27, 2025 Beyond Dot Products: Retrieval with Learned Similarities The world of vector databases is exploding. Driven by the rise of large language models and the increasing need for semantic search, efficient retrieval of information from massive datasets has become paramount. Approximate Nearest Neighbor (ANN) search, often using dot
Jun 5, 2024 Is This the ChatGPT Moment for Recommendation Systems? Researchers at Meta recently published a ground-breaking paper that combines the technology behind ChatGPT with Recommender Systems. They show they can scale these models up to 1.5 trillion parameters and demonstrate a 12.4% increase in topline metrics in production A/B
Mar 1, 2023 Evaluating Recommendations: mAP, MMR, and NDCG Imagine you’re shown two ordered feeds of product recommendations from separate algorithms. In the first one (A) you’re shown: Nike sneakers, Adidas shorts, and an Apple Watch. In the second one (B) you’re shown the order: Apple Watch, Adidas shorts, and Nike Sneakers. --
Feb 7, 2023 Evaluating Recommendations: Precision, Recall, and R-Precision Imagine you’re given three movie recommendations from separate algorithms. In the first one (A) you’re given: The Terminator, James Bond, and Star Wars. In the second (B) you’re given: Cars, Toy Story, and Iron Man --
Sep 20, 2022 Day 2 of #RecSys2022: Our favorite 5 papers and talks It’s been another fantastic day at RecSys 2022. Following the Women in RecSys Breakfast, the day started with a keynote from Catherine D’Ignazio and then throughout the day had the following sessions: Fairness & Privacy, Diversity & Novely, and Models and Learning I. Here are
Jul 12, 2022 Data-Centric AI for Ranking Data quality and volume is what makes rankings algorithms at big-tech so seamless. How can you create the same experiences with the data you have? Data-centric AI may be the answer!

© 2026 Tullie Murrell RSS