Tullie Murrell | Writing

Writing
About

Notes on recommendations, search, relevance, and building AI products.

ir
evals
research
eng
shaped
clear

Map view

Apr 30, 2026
HNSW Explained: The Algorithm Powering Fast Vector Search If you've set up a vector database recently, you've seen it. Somewhere in the configuration — maybe when creating an index in Pinecone, or scanning Weaviate's schema options, or reading Qdrant's collection settings — there were the four letters HNSW. You probably selected it because it was the default or the recommended option, made a mental note to understand it properly later, and moved on. This is that article. eng ir
Apr 23, 2026
Why grep Is Beating Your Vector DB Keyword retrieval keeps winning in production for reasons that have little to do with benchmark leaderboard scores. This post explains when grep beats vectors, and why. eng ir
Mar 2, 2026
Modern Ranking Architectures, Part 5: The Feedback Loop Welcome to the final post in our series on the anatomy of modern recommender systems. Over the last four parts, we've deconstructed the online request path, following a user's request from the initial billions of items all the way to a final, ranked page. ir
Feb 5, 2026
Beyond the Hashing Trick: The Math of Scaling to 100M+ IDs in Production. If you follow machine learning today, you've been told that tokenization is a solved problem. In the world of Natural Language Processing (NLP), we have Byte Pair Encoding (BPE) or WordPiece. These algorithms compress the infinite complexity of human language into a neat, static vocabulary of roughly 50,000 to 100,000 tokens. This "dictionary" fits comfortably in a few megabytes of system RAM, and the resulting embedding weights take up a tiny fraction of a modern GPU's VRAM. research ir
Jan 15, 2026
Building the Relevance Layer for the AI World Why retrieval and relevance are becoming the most important infrastructure in modern AI products, and what we're building at Shaped to solve it. shaped ir
Dec 11, 2025
Why I Built a Database for Relevance After five years at Meta and three years building Shaped, I think relevance infrastructure should work like a database: declarative, composable, and fast enough for humans and agents. shaped ir
Nov 13, 2025
Modeling Behavior As Language: The Next Era of Recommendations A major shift is underway in recommender systems, moving from traditional Two-Tower and DLRM models to a new paradigm that treats user behavior as a language. This approach models a user's sequence of interactions, such as clicks and purchases, allowing Transformer-based models to predict the next action with a more nuanced understanding of intent. While this evolution offers powerful capabilities for capturing dynamic user preferences, it also introduces significant new engineering challenges in managing inference costs, adapting feature stores for sequential data, and solving for new user cold-starts. ir
Nov 13, 2025
Scaling Laws Beyond LLMs: The Future of Search and Recommendations When people talk about scaling laws in AI, they usually mean one thing: language models. The empirical laws first quantified in Kaplan et al. (2020) showed that loss scales predictably as a power law with model size, dataset size, and compute budget. Train a bigger transformer on more text, and performance improves, up to the limits of optimization and overfitting. research
Nov 3, 2025
Ranking Infrastructure, Part 1: The Serving Layer Welcome to a new, hands-on series for builders. In our previous series, "Anatomy of a Modern Ranking Architectures," we deconstructed the conceptual blueprint of the multi-stage ranking architecture. We followed the logic of a request from retrieval to scoring to the final ordered page. Now, we shift from the "what" to the "how." eng
Nov 3, 2025
Ranking Infrastructure, Part 2: The Data Layer Welcome back to our series on the infrastructure of modern ranking systems. In Part 1, we designed the online serving layer: a set of decoupled, scalable microservices orchestrated by Kubernetes to handle real-time requests. We built the engine of our ranking system. eng ir
Nov 3, 2025
Ranking Infrastructure, Part 3: The MLOps Backbone Welcome to the final post in our series on ranking infrastructure. We have the serving layer and the data layer in place; now we need the MLOps backbone that connects raw data to production models. eng
Oct 13, 2025
The Anatomy of Modern Ranking Architectures, Part 1 If you look under the hood of recommendation systems at Netflix, YouTube, or Amazon, you won't find identical models, but you will find a remarkably similar architectural blueprint. This multi-stage ranking system is the industry's shared solution to a fundamental engineering problem: how to find the few best needles in an ever-growing haystack, and do it in milliseconds. This is the first in a series of posts where we will deconstruct this blueprint. We'll go beyond high-level funnel diagrams and dive into the practical components, engineering trade-offs, and model architectures required to build a modern recommender system. ir
Oct 13, 2025
Modern Ranking Architectures, Part 2: Retrieval Welcome back to our series on the anatomy of modern recommender systems. In our first post, we established the multi-stage architecture as the industry-standard blueprint for balancing relevance, latency, and cost. We framed it as a system of cascading approximations, designed to efficiently identify the best items from a massive catalog. Today, we're diving deep into the first and arguably most critical part of this blueprint: The Retrieval Stage. ir
Oct 13, 2025
Modern Ranking Architectures, Part 3: Scoring Welcome back to our series on the anatomy of modern recommender systems. In Part 1, we introduced the multi-stage architecture as a blueprint for balancing relevance, latency, and cost. In Part 2, we explored the Retrieval Stage, where we used an ensemble of strategies to generate a high-recall candidate set of about a thousand items. ir
Oct 13, 2025
Modern Ranking Architectures, Part 4: Ordering Welcome back to our series on the anatomy of modern recommender systems. So far, we've deconstructed the core machine learning pipeline. ir
Oct 10, 2025
YouTube gets ~5% CTR lift on Shorts by replacing embedding tables with Semantic IDs TL;DR: The shift from massive embedding tables to generative retrieval with Semantic IDs is accelerating. YouTube's new PLUM framework represents the next evolution, using an adapted LLM and enhanced 'SID-v2' to achieve a +4.96% Panel CTR lift for Shorts in live A/B tests. This deep dive explains how they did it. research
Sep 23, 2025
Building a HackerNews "For You" Feed TL;DR: The HackerNews top feed felt stale, so I built a personalized For You feed in a weekend using Lovable and Shaped. See it at hn.shaped.ai. shaped ir
Sep 8, 2025
The Vector Bottleneck in Embedding-Based Retrieval DeepMind’s latest paper formalizes a long-suspected limitation of embedding-based retrieval: single-vector models cannot scale to combinatorial query complexity, no matter how large the dimension. The result reframes hybrid and multi-vector approaches, not as patches, but as necessary architectures for retrieval at scale. research ir
Aug 28, 2025
Dual-Flow Generative Ranking Networks TL;DR: Meta's generative recommender (MetaGR) is powerful but slow. Researchers from Meituan and top universities just dropped DFGR, a dual-stream architecture that's 2x faster at training and 4x faster at inference, while also beating MetaGR and heavily-engineered industrial models on ranking accuracy. research ir
Aug 13, 2025
Closing the Research-to-Production Gap in Recommendations If you’ve ever tried to take a promising machine learning experiment from an offline notebook to a live A/B test, you know the pain. Weeks, sometimes months, pass between proving an idea works and actually seeing it in front of users. Internal handoffs, infrastructure gaps, and competing priorities all slow you down. And by the time your experiment goes live, the opportunity may have shifted, or worse, been shelved altogether. At Shaped, we’ve been thinking deeply about this “research-to-production gap” and how to eliminate it for recommendation systems. Here’s what we’ve learned. shaped eng
Aug 5, 2025
RentTheRunway Dataset: Deep Dive into Fashion Fit, Context, and Recommendation Challenges Online fashion retail faces unique challenges, moving beyond simple preference prediction. Accurately recommending clothing requires understanding complex factors like fit, body type, and the context of use. The RentTheRunway (RTR) dataset emerges as a crucial and fascinating resource in this domain, offering rich data for researchers and data scientists tackling these fashion recommendation problems. This article provides a comprehensive overview of the RentTheRunway dataset, its unique characteristics, importance, and applications in building better recommendation systems. research
Aug 4, 2025
Where Matters: Location Feature Engineering for Search & Recs Location is more than just coordinates, it’s a powerful signal for making search and recommendation systems more relevant. This post explores how proximity, regional preferences, delivery constraints, and geo-targeting can all be encoded into machine learning models through smart feature engineering. From geohashing and distance calculations to region embeddings and hierarchical modeling, we break down the core techniques and show how platforms like Shaped streamline the entire process, turning raw location data into real-time, personalized ranking power. ir
Jul 30, 2025
LambdaMART Explained: The Workhorse of Learning-to-Rank LambdaMART is one of the most widely used algorithms in Learning-to-Rank, powering the ranking logic behind search engines, recommendation systems, and e-commerce platforms. By combining gradient boosting trees (MART) with metric-aware optimization from LambdaRank, it efficiently learns to rank items in a way that directly improves metrics like NDCG. This post unpacks how LambdaMART works, why it’s effective, and how it fits into modern ranking architectures, especially when integrated with tools like LightGBM or platforms like Shaped. ir
Jul 24, 2025
Average Popularity: Are Your Recommendations Just Chasing Trends? Relevance metrics like NDCG and Precision@K are crucial for evaluating recommendation systems, but they don’t tell the full story. Two systems can perform similarly on these scores while exhibiting drastically different behaviors, one favoring only popular hits, the other surfacing more personalized or niche content. This is where Average Popularity @ K comes in. It quantifies the popularity bias of a model’s recommendations and helps diagnose whether it’s truly personalizing or simply echoing what’s already trending. Used alongside relevance metrics, it offers critical insight into model behavior and helps teams strike the right balance between accuracy and discovery. evals
Jul 24, 2025
Decoding Timestamps: Time-Based Feature Engineering for Search & Recs Timestamps hold far more value than just marking when an event occurred, they encode powerful signals like recency, seasonality, user lifecycle, and content freshness that can significantly boost the performance of recommendation and search systems. But unlocking their potential requires careful feature engineering: time zone normalization, cyclical feature extraction, time-based calculations relative to “now,” and smart handling of missing values. This article breaks down best practices for transforming raw timestamp data into meaningful model features and highlights how platforms like Shaped simplify this process by automating temporal feature engineering, ensuring these critical signals are seamlessly incorporated into your ML models. ir
Jul 22, 2025
GoodReads Datasets: Powering Book Recommendations and Research The GoodReads datasets are a foundational resource for building and evaluating book recommendation systems. They combine explicit ratings, implicit feedback (like user shelves), rich textual reviews, and detailed metadata, making them ideal for hybrid models that mix collaborative filtering with NLP. While the datasets vary in scope and format, they enable research into social influence, genre dynamics, and reader preferences at scale. Despite challenges like sparsity and ethical data handling, GoodReads remains one of the most valuable open datasets for exploring advanced recommendation strategies in the literary domain. research
Jul 18, 2025
Peering Inside the Black Box: Leveraging User & Item Embeddings Embeddings are central to personalized recommendations: dense vector representations of users and items that capture behavioral patterns and semantic relationships. ir
Jul 17, 2025
DLRM-Style Feature Interactions for Ranking Deep Learning Recommendation Models (DLRMs) like Wide & Deep, DeepFM, DCN, and MaskNet have become essential tools for pointwise ranking in recommendation systems, where the goal is to predict the likelihood of user-item interactions such as clicks or conversions. These models excel at capturing complex feature interactions across sparse, high-cardinality data by combining embedding layers with neural networks and specialized interaction mechanisms. This post breaks down how they work, why feature interactions matter, and how platforms like Shaped simplify building and deploying them for high-accuracy personalization. ir
Jul 15, 2025
Categorical Features: The Backbone of Search & Recs Engineering Categorical features like category, brand, and user ID are essential to search and recommendation systems, but transforming them into meaningful signals for machine learning is often more complex than it appears. This post explains how to handle categorical data effectively, from basic encoding strategies like one-hot and label encoding to deep learning approaches like embeddings. It covers challenges like high cardinality, null values, and feature consistency across systems, and shows how platforms like Shaped streamline this process by automating encoding, managing embeddings, and integrating categorical features directly into real-time relevance models. ir
Jul 14, 2025
Gowalla Dataset: Understanding Location Check-ins, Social Ties, and Mobility Patterns The Gowalla dataset, a historical benchmark from the now-defunct location-based social network, offers rich check-in and social graph data that has powered foundational research in Point-of-Interest (POI) recommendations, human mobility modeling, and social influence on real-world behavior. Despite its age, Gowalla remains valuable for studying how time, geography, and social context shape user activity. This post explores its structure, use cases, limitations, and how to leverage it with Shaped to build context-aware recommendation models. research
Jul 11, 2025
Catalog Coverage: Are Your Recommendations Exploring Your Whole Inventory? While traditional recommendation metrics focus on individual user experience, Catalog Coverage measures the breadth of a system’s recommendations across its entire inventory i.e how much of the catalog gets shown to anyone at all. It’s a valuable diagnostic for spotting over-reliance on popular items and uncovering long-tail neglect, but it ignores relevance and personalization. At Shaped, we treat coverage as a secondary signal, useful for monitoring systemic diversity, but never at the expense of delivering high-quality, personalized results. evals
Jul 8, 2025
Content-Based Filtering Explained: Recommending Based on What You Like Content-Based Filtering (CBF) is one of the fundamental approaches to building recommendation systems. Rather than relying on the preferences of similar users, CBF focuses on the characteristics of the items a user has engaged with to suggest others with similar attributes, whether textual, visual, structured, or audio-based. This article introduces how CBF works, its evolution from simple keyword matching to the use of modern embedding models, and the challenges involved in implementing it effectively. It also outlines different design patterns supported by Shaped for applying CBF in practice. ir
Jul 2, 2025
MovieLens Dataset: The Essential Benchmark for Recommender Systems The MovieLens dataset is one of the most widely used benchmarks in recommender systems, offering real-world, explicit feedback data for evaluating collaborative filtering, content-based, and hybrid recommendation models. This article explores why MovieLens remains a gold standard, detailing its structure (ratings, metadata, tags), available versions, and common use cases. It also highlights challenges like data sparsity and cold start, and shows how to connect MovieLens to Shaped to quickly prototype and train recommendation models using real interaction data enriched with movie attributes. research
Jun 27, 2025
From Zero to Relevant: Solving the Cold Start User Problem New or anonymous users often face irrelevant, generic content, hurting engagement from the very first visit. This article explores the cold start user problem in personalization and search systems, outlining common strategies like global popularity lists, rule-based segments, onboarding surveys, and contextual inference. It highlights the challenges each approach presents and why effectively using even limited real-time context or early in-session behavior is key to delivering relevance from the start. ir
Jun 27, 2025
Last.fm Datasets: Unlocking Music Recommendations Through Listening History and Social Connections The article explores the significance of Last.fm datasets in developing music recommendation systems, highlighting their value as benchmarks for modeling implicit feedback, sequential listening behavior, and social influence. It breaks down what’s included in these datasets (such as user listening history, social graphs, and tags) and why they matter for music personalization research. It also walks through how teams can bring these datasets into Shaped to build real-time ranking models, covering schema setup, event ingestion, and optional use of tags or social data, demonstrating how Shaped makes it easy to prototype and productionize music recommenders using this rich, real-world data. research
Jun 24, 2025
Privacy-First Personalization: The 7-Step Framework for Building Trust and Driving Growth Personalization has become a standard expectation across digital experiences, from streaming platforms to e-commerce sites. However, as consumers become increasingly aware of how their data is used, and as regulations tighten, businesses face a new challenge: delivering relevance without compromising trust. ir
Jun 23, 2025
See the Bigger Picture: Image Feature Engineering for Search & Recs In visually rich digital environments, text and tags alone often fall short in powering relevant search and recommendations. This article explores how visual feature engineering, extracting embeddings from images using models like CLIP or ViT, unlocks deeper relevance by capturing visual nuance, style, and cross-modal meaning. While traditional computer vision pipelines are complex and resource-intensive, Shaped streamlines the entire process: ingesting image URLs, generating embeddings with advanced models, and integrating them into ranking APIs, all without requiring custom infrastructure. Whether automatically leveraging visuals or specifying your own Hugging Face model, Shaped makes it simple to activate image data for AI-powered personalization. ir
Jun 19, 2025
How YouTube’s Algorithm Works: A Guide to Recommendations YouTube’s recommendation engine combines large-scale data processing, real-time feedback loops, and multi-objective optimization to deliver highly personalized video suggestions that prioritize both engagement and satisfaction. This post breaks down how the system works, from candidate generation to safeguards, and offers actionable lessons for building adaptable, responsible recommendation systems of your own. ir
Jun 19, 2025
MRR: How Quickly Do Users Find the First Relevant Item? Mean Reciprocal Rank (MRR) is a metric that captures how quickly a user finds the first relevant item in a ranked list, making it especially valuable for tasks like known-item search or question answering where just one good result matters. This article introduces the concept of MRR, explains how it's calculated, compares it to other ranking metrics like NDCG and Hit Rate, and explores when it’s most useful (and when it’s not). It also outlines how Shaped incorporates MRR into its broader evaluation suite to balance speed of discovery with overall relevance. evals
Jun 18, 2025
Mastering Cold Start Challenges: Top Strategies for Personalized AI Experiences Cold start challenges can derail personalization efforts by making it difficult to deliver relevant experiences for new users, items, or markets. This post explores proven strategies and modern system architectures — including modular, AI-native platforms like Shaped — that help teams overcome cold start and personalize from day one. ir
Jun 17, 2025
Explainable Personalization: A Practical Guide for Building Trust and Transparency Personalization helps users discover the right content, products, or experiences, but when it happens without explanation, it can feel invasive, confusing, or even manipulative. As algorithms play a larger role in shaping what we see, hear, and buy, users are beginning to ask a simple question: *Why am I seeing this?* That question isn’t just philosophical. It reflects a growing demand for transparency, control, and trust in algorithmic systems. Whether you're building a recommendation engine, a personalized feed, or a product ranking feature, explainability is becoming essential. It reassures users, supports compliance, and helps teams understand and improve their models. ir
Jun 17, 2025
Matrix Factorization: The Bedrock of Collaborative Filtering Recommendations Matrix Factorization (MF) has long been a foundational technique in collaborative filtering for recommendation systems. It works by learning latent factors that represent hidden preferences of users and characteristics of items, allowing it to predict unknown interactions. This article explains how MF decomposes the sparse user-item interaction matrix into two lower-dimensional matrices, and dives into popular optimization methods like Stochastic Gradient Descent (SGD) and Alternating Least Squares (ALS), including how ALS adapts to implicit feedback with confidence weighting. The post covers enhancements like user/item biases, practical challenges like cold-start, and how MF compares to neighborhood and deep learning approaches. Finally, it shows how platforms like Shaped let teams deploy ALS-based recommendations declaratively, without building pipelines from scratch. ir
Jun 16, 2025
Modular AI: Building Composable Personalization Stacks As user expectations rise and product surfaces multiply, personalization systems are under more pressure than ever. But many teams still operate with rigid, monolithic architectures that make every change slow, risky, and expensive. Updating a ranking strategy, testing a new model, or even adding a new content source can require changes across the entire stack. eng
Jun 12, 2025
NDCG and Graded Relevance in Ranking How do you know if your ranking model is getting the order right, not just retrieving the right items? This post introduces NDCG, a powerful metric that accounts for both how relevant each item is and where it appears in the ranked list, enabling a more nuanced evaluation of recommendation and search quality, especially when relevance varies across results. evals
Jun 11, 2025
10 Best Practices in Data Ingestion: A Scalable Framework for Real-Time, Reliable Pipelines Every real-time dashboard, machine learning model, and personalized user experience depends on one foundational layer: data ingestion. It's the first step in any modern data pipeline, responsible for collecting, validating, and delivering data from source systems into downstream platforms where it can be analyzed, modeled, or acted upon. eng
Jun 11, 2025
Unlock Text Data: NLP Feature Engineering for Search & Recs Keyword matching and interaction history aren’t enough for modern relevance. Language data, like product descriptions, search queries, and user reviews, holds rich signals that drive deeper personalization. But turning text into model-ready features requires complex NLP pipelines, model selection, infrastructure, and ongoing maintenance. Shaped automates all of this. With built-in language understanding and Hugging Face model integration, teams can tap into the full power of semantic signals, without building or managing an NLP stack. ir
Jun 9, 2025
Monolithic vs Modular AI Architecture: Key Trade-Offs The distinction between monolithic and modular approaches is a key consideration of AI-native architectures. Monolithic architectures package all AI components into a single system, while modular approaches split functionality into independent services. For AI personalization systems — architectures explicitly designed for machine learning workloads, real-time data processing, and dynamic recommendation engines—this choice shapes everything from development velocity to long-term scalability. eng
Jun 7, 2025
How to Unify Data Ecosystems for Seamless Personalization Companies that excel at personalization generate 40% more revenue than their competitors. Yet, most struggle with a basic problem: fragmented data ecosystems. When customer information sits isolated in different systems, you miss the complete picture needed for truly effective personalization. eng
Jun 4, 2025
AI-Powered Recommendation Engines: A Complete Guide Recommendation engines have become an essential part of the online shopping experience, enabling businesses to deliver personalized suggestions that resonate with customers. By analyzing user data and behavior, these systems offer tailored product recommendations, helping users discover what they’re most likely to enjoy or purchase next. Recommendation systems powered by artificial intelligence (AI) are at the forefront of this shift. As the AI-based recommendation system market is [expected to grow](https://www.thebusinessresearchcompany.com/report/ai-based-recommendation-system-global-market-report) from $2.44 billion in 2025 to $3.62 billion by 2029, it is clear that the adoption of these systems is expanding rapidly across various industries, including e-commerce, healthcare, and digital advertising. ir
Jun 4, 2025
H&M Dataset: Powering Personalized Fashion Recommendations at Scale The H&M Personalized Fashion Recommendations dataset is a favorite in the ML community for testing large-scale, real-world recommendation systems. With millions of transactions and rich metadata, it offers a challenging benchmark for building personalized fashion experiences. In this post, we show how to connect the H&M dataset to Shaped, an AI-native relevance platform, to go beyond basic co-purchase signals. From implicit feedback and cold-start handling to hybrid ranking with item and user features, Shaped helps teams build smarter fashion recommenders, faster. research
Jun 3, 2025
Customer Data Platform Essentials: Unlocking Real-Time Personalization with First-Party Data Effective personalization hinges on the ability to collect, manage, and analyze diverse streams of customer data. However, unlocking the full potential of this data requires sophisticated data mining techniques and robust infrastructure to transform raw data into actionable insights. eng
Jun 3, 2025
Advancements in Feed Ranking Systems: A Deep Dive into Large-Scale Models Recommendation systems are fundamental to modern digital platforms. They curate the vast content users encounter daily on social media, streaming services, and e-commerce sites. These systems engage users by providing personalized experiences that align with their interests and behaviors. However, managing and optimizing these systems for platforms with over 1 billion members globally presents monumental challenges. The sheer scale necessitates sophisticated techniques not only in model architecture but also in efficient training, compression, and deployment to production. ir
Jun 3, 2025
Beyond A/B Testing: A Practical Guide to Multi-Armed Bandits Personalization has become the backbone of engaging user experiences across industries. But delivering smart personalization isn’t easy. Traditional approaches like A/B testing can be slow, rigid, and resource-intensive. They often force teams to pick one option at a time, missing opportunities to learn and adapt as user preferences shift quickly. ir
Jun 2, 2025
A Comprehensive Guide to Approximate Nearest Neighbors Algorithms Finding the most relevant items from vast datasets is a fundamental challenge in modern machine learning applications. Whether you’re recommending movies on a streaming platform, suggesting products in an online store, or searching for similar images, the ability to quickly locate the nearest neighbor, or neighbors, to a given query point in high-dimensional spaces is critical. Traditional nearest neighbor search algorithms can identify the closest points by calculating exact distances, such as Euclidean distance, between vectors representing data points. But as datasets grow larger and more complex, especially with high-dimensional data like images, text embeddings, or user behavior patterns, exact search becomes prohibitively slow. eng ir
Jun 2, 2025
How Does Temu Work? Understanding Its Personalization Strategy In just a few short years, Temu has evolved from a relatively unknown marketplace to one of the world's fastest-growing e-commerce platforms. Its rapid ascent has left industry watchers asking the same question: how did they do it? The answer lies in Temu’s strategic use of artificial intelligence to power engagement at every step of the user journey. ir
Jun 2, 2025
Enhance Your AI with Real-Time Data Using RAG In a world where personalization and relevance are paramount, AI-driven systems often struggle to handle real-time data and maintain up-to-date information. Traditional models, while powerful, are limited by their reliance on static training data and their inability to adapt quickly to new or unstructured data. As organizations face knowledge-intensive tasks like answering complex queries, providing personalized recommendations, or generating content, the cost of maintaining large, constantly updated datasets becomes a significant barrier. eng ir
May 30, 2025
How Amazon Masterminds Real-Time Product Discovery Beyond Search While many retailers focus on making search faster or more accurate, Amazon has mastered the art of guiding users beyond the search bar. The scale of Amazon’s success is a testament to this mastery. According to [Statista](https://www.statista.com/statistics/273963/quarterly-revenue-of-amazoncom/), during the first quarter of 2024, Amazon generated total net sales of over $143 billion, surpassing the $127 billion from the same quarter in 2023. This relentless growth is powered by Amazon’s sophisticated approach to real-time product discovery and personalization. ir
May 30, 2025
Measuring Recommendation Performance: Relevancy, Precision, and Recall It’s not enough to just serve up suggestions. You need to serve the *right* suggestions. That’s where understanding performance metrics like precision and recall comes in. These evaluation metrics help you measure how well your machine learning (ML) model identifies relevant results and balances the tricky trade-off between minimizing false positives and false negatives. Precision and recall shape how users experience your product and influence tangible business outcomes. For example, a high precision score means your system makes fewer false alarms by minimizing irrelevant recommendations, while high recall ensures you’re not missing out on positive cases that matter. evals
May 29, 2025
Boosting Revenue with AI-Powered Cross-Selling Recommendations Cross-selling succeeds because it helps shoppers discover complementary products or add-ons they might not have considered, creating a more complete and satisfying purchase without becoming pushy. Customers benefit by finding exactly what they need, while retailers boost sales and maximize revenue from their existing customer base. ir
May 28, 2025
A/B Testing Rankings: Metrics That Matter You’ve trained a model, optimized offline metrics, and picked a winner, but how do you know it’ll perform with real users? In this post, we explore why A/B testing is essential for validating personalization and ranking models in production. We cover key online metrics like CTR, CVR, and North Star Metrics, how to design statistically rigorous experiments, and how Shaped makes it easy to deploy, bucket, and measure real-world impact. evals
May 27, 2025
The Power of Deep Learning for Hyper-Personalized Recommendations Whether browsing for products, discovering new content, or navigating a website, users now demand a personalized experience tailored to their unique preferences and behaviors. However, traditional recommendation systems, often built on simple rule-based or content-based filtering systems, struggle to deliver the dynamic, context-aware experiences users crave. ir
May 26, 2025
Golden Tests in AI: Ensuring Reliability Without Slowing Innovation For teams building AI-driven experiences, especially those delivering real-time recommendations, speed is everything. Whether you're personalizing a homepage feed or updating product rankings, models must continually evolve to stay relevant. But that velocity comes with risk. Even minor changes to a model or pipeline can have unexpected consequences once deployed. A tweak meant to boost click-through rates might unintentionally bury high-converting items. eng
May 23, 2025
Bridging Worlds: Training Language Models on User Behavior for Smarter Recommendations Traditional recommendation models face a tradeoff: language models excel at understanding item semantics, while collaborative filtering shines at capturing behavioral patterns. But what if you could combine both? In this post, we explore a new generation of hybrid techniques, like the beeFormer framework, that fine-tune pre-trained language models using user interaction data. The result: smarter, cold-start-ready embeddings that understand both meaning and behavior. We break down how this works, why it matters, and how platforms like Shaped make it easy to put these powerful models into production. research ir
May 22, 2025
Evaluation Metrics for Search and Recommendation Systems Search and recommendation systems power everything from e-commerce product discovery to streaming content suggestions. Without clear, effective metrics, it is impossible to measure how well they perform or identify areas for improvement. evals
May 19, 2025
The Ultimate Guide to Modern Ranking Models People today are inundated with choices, whether browsing products, searching for information, or discovering new content. The challenge for businesses is not just to present options, but to ensure the most relevant, engaging, and valuable items rise to the top. This is where ranking models come into play. These sophisticated algorithms power the search results you see, the recommendations you receive, and the content you’re most likely to click, watch, or buy. Ranking models are at the heart of personalization and discovery in industries ranging from e-commerce and media to online marketplaces. We’ll demystify ranking models, explore their key components, and outline best practices for implementing them across various use cases. ir
May 18, 2025
Collaborative Filtering Explained Have you ever wondered how Netflix seems to know what to suggest next or how Amazon always recommends products you'll likely buy? This isn’t by chance. It's powered by recommendation engines. The global recommendation engine market was [valued at $5.43 billion in 2023](https://www.kingsresearch.com/recommendation-engine-market-1945) and is expected to grow rapidly, reaching $74.24 billion by 2031. ir
May 15, 2025
Vector Search Explained: How AI Powers Smarter Search and Recommendations Search is undergoing a quiet transformation. As users expect instant, relevant results, whether shopping online, exploring a streaming platform, or using an AI assistant, traditional keyword search is no longer enough. Leading companies like Netflix, Amazon, and Spotify already use a different approach behind the scenes: vector search. We’ll explore how vector search powers today’s most advanced discovery and recommendation systems, looking at how it works, where it fits into modern AI infrastructure, and why it’s becoming a cornerstone of user experience across industries. eng ir
May 14, 2025
Tweedie Regression for Video Watch-Time Prediction (Tubi Case Study) TL;DR: Tubi boosted VOD revenue (+0.4%) and watch time (+0.15%) by ditching weighted LogLoss for CTR and instead using Tweedie Regression to directly predict user watch time. Their paper shows Tweedie loss better models the zero-inflated, skewed nature of watch time data, leading to better alignment with core business goals, even with a slight dip in a simpler conversion metric. research
May 13, 2025
Netflix Personalization Workshop 2025: Key Insights The Shaped team was thrilled to be at the 2025 Netflix Personalization, Recommendations & Search workshop last week! This event, first held by Netflix in 2016, is one of our highlights on the AI recommendation & search calendar. The day was packed with insightful talks from academic and industry leaders, all tackling the fast-paced evolution of AI-driven user experiences. While Large Foundation Models (LFMs) and Generative AI were, as expected, major topics, the conversations dug deep into real-world applications, innovative architectures, and the changing face of AI product development. Here’s our summary of the keynotes and insights that stood out. research
May 13, 2025
Wayfair & Pinterest: Leveraging Visual Data and User Behavior for Personalized Discovery Personalized discovery has become essential for digital platforms aiming to engage users effectively. Today’s consumers expect experiences that reflect their unique tastes, especially when browsing visually driven categories such as home goods or lifestyle content. ir
May 9, 2025
Two-Tower Models for Recommendation Systems The Two-Tower model is a foundational architecture for large-scale recommendation systems, built to efficiently retrieve relevant items from massive catalogs. By learning separate embeddings for users and items, it enables fast candidate generation via approximate nearest neighbor search—critical for real-time personalization. This article breaks down how the model works, why it scales, and where it fits in modern recsys stacks, highlighting its strengths, limitations, and role alongside ranking and graph-based approaches. ir
May 8, 2025
Criteo Dataset: Tackling Large-Scale Click-Through Rate Prediction Click-through rate (CTR) prediction is central to modern advertising and recommendation systems, and the Criteo dataset has become the de facto benchmark for advancing this task at industrial scale. With hundreds of millions to billions of rows and a blend of dense numerical and sparse categorical features, it poses unique modeling and computational challenges. This article unpacks the dataset’s structure, scale, and role in driving innovations like embedding techniques and hybrid model architectures—offering a clear lens into why Criteo remains a crucial resource for anyone building large-scale machine learning systems. research
May 6, 2025
Sequential Models for Recommendations (SASRec, BERT4Rec, and Beyond) In a world where user behavior changes by the minute, traditional recommendation systems fall short. Sequential recommendation models offer a powerful upgrade, capturing evolving intent by analyzing the order of interactions. This article breaks down the evolution of these models, from simple N-Grams to advanced Transformers and Generative Recommenders like HSTU. It also explores the real-world challenges of deploying them and how platforms like Shaped make cutting-edge sequential modeling accessible, scalable, and production-ready. ir
May 2, 2025
How to Build a Killer 'For You' Feed The “For You” feed has become the gold standard of personalized digital experiences—but behind the magic lies serious technical complexity. From wrangling massive datasets to training cutting-edge ML models and serving results in real time, building a high-quality feed from scratch demands deep expertise and infrastructure. This post breaks down the full journey: what it takes to deliver a truly personalized feed, the common pain points at each stage, and how to think strategically about solving them—whether you're just getting started or scaling an existing system. ir
Apr 28, 2025
Beyond Retrieval: Optimizing Relevance with Reranking Retrieving a strong list of candidate items is just the first step—the real challenge is ranking them in the most relevant, personalized order for each user and goal. This post explores how reranking transforms basic search results or recommendations into truly optimized experiences, the technical hurdles of building high-performance reranking systems, and why mastering reranking is key to delivering better engagement, clicks, and conversions. ir
Apr 25, 2025
Precision@K for Ranking Systems Is your recommender system truly hitting the mark? Imagine a user binging blockbusters like Avengers and Top Gun—will they click on Love Actually or John Wick next? This article breaks down Precision@K, the go-to metric for judging how many of your top K recommendations are actually relevant. With clear, intuitive examples and a sharp look at where the metric excels—and where it doesn’t—you’ll get a fast, practical understanding of how to measure recommendation quality where it matters most: the top of the list. evals
Apr 24, 2025
Cross-Encoder Rediscovers a Semantic Variant of BM25 We stand in awe of modern neural ranking models. Transformers like BERT, fine-tuned as cross-encoders, achieve state-of-the-art results on information retrieval leaderboards. They process a query and document together, capturing incredibly nuanced semantic relationships, far surpassing traditional methods like BM25. But how do they do it? We often treat them as powerful black boxes: feed them data, get amazing results, but shrug when asked about the internal logic. research ir
Apr 22, 2025
One Embedding to Rule Them All Pinterest’s OmniSearchSage represents a major step forward in unified semantic search. By extending the two-tower model into a multi-task, multi-entity framework, it enables a single query embedding to power retrieval across pins, products, and related queries. The system integrates GenAI captions, user-curated board metadata, and behavioral signals to overcome sparse content, while maintaining compatibility with legacy models like PinSage. Deployed at massive scale, OmniSearchSage delivers strong gains in search fulfillment, ad performance, and downstream tasks, showcasing a pragmatic and scalable approach to representation learning in production. research ir
Mar 5, 2025
Beyond Relevance: Optimizing for Multiple Objectives in Search and Recommendations Building effective recommendation and search systems means going beyond simply predicting relevance. Modern users expect personalized experiences that cater to a wide range of needs and preferences, and businesses need systems that align with their overarching goals. This requires optimizing for multiple objectives simultaneously – a complex challenge that demands a nuanced approach. This post explores the concept of value modeling and multi-objective optimization (MOO), explaining how these techniques enable the development of more sophisticated and valuable recommendation and search experiences. ir
Feb 27, 2025
Beyond Dot Products: Retrieval with Learned Similarities This paper, by Bailu Ding (Microsoft) and Jiaqi Zhai (Meta), which is in the proceedings of the WWW '25 conference, proposes a novel approach called Mixture of Logits (MoL) that offers a generalized interface for learned similarity functions. It not only achieves state-of-the-art results across recommendation systems and question answering but also demonstrates significant latency improvements, potentially reshaping the landscape of vector databases. research ir
Jun 5, 2024
Is This the ChatGPT Moment for Recommendation Systems? Researchers at Meta recently published a ground-breaking paper that combines the technology behind ChatGPT with Recommender Systems. They show they can scale these models up to 1.5 trillion parameters and demonstrate a 12.4% increase in topline metrics in production A/B tests. We dive into the details below. research ir
Mar 1, 2023
Evaluating Recommendations: mAP, MMR, and NDCG Imagine you’re shown two ordered feeds of product recommendations from separate algorithms. In the first one (A) you’re shown: Nike sneakers, Adidas shorts, and an Apple Watch. In the second one (B) you’re shown the order: Apple Watch, Adidas shorts, and Nike Sneakers. -- Which feed is more relevant to you? evals
Feb 7, 2023
Evaluating Recommendations: Precision, Recall, and R-Precision Imagine you’re given three movie recommendations from separate algorithms. In the first one (A) you’re given: The Terminator, James Bond, and Star Wars. In the second (B) you’re given: Cars, Toy Story, and Iron Man -- Which recommendation is more relevant to you? evals
Sep 20, 2022
Day 2 of #RecSys2022: Our favorite 5 papers and talks It’s been another fantastic day at RecSys 2022. Following the Women in RecSys Breakfast, the day started with a keynote from Catherine D’Ignazio and then throughout the day had the following sessions: Fairness & Privacy, Diversity & Novely, and Models and Learning I. Here are our favorite 5 papers and talks. research
Jul 12, 2022
Data-Centric AI for Ranking Data quality and volume is what makes rankings algorithms at big-tech so seamless. How can you create the same experiences with the data you have? Data-centric AI may be the answer! ir

No posts match.

© 2026 Tullie Murrell RSS