Notes on recommendations, search, relevance, and building AI products.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Gowalla Dataset: Understanding Location Check-ins, Social Ties, and Mobility Patterns The Gowalla dataset, a historical benchmark from the now-defunct location-based social network, offers rich check-in and social graph data that has powered foundational research in Point-of-Interest (POI) recommendations, human mobility modeling, and social influence on real-world behavior. Despite its age, Gowalla remains valuable for studying how time, geography, and social context shape user activity. This post explores its structure, use cases, limitations, and how to leverage it with Shaped to build context-aware recommendation models. research -
-
-
MovieLens Dataset: The Essential Benchmark for Recommender Systems The MovieLens dataset is one of the most widely used benchmarks in recommender systems, offering real-world, explicit feedback data for evaluating collaborative filtering, content-based, and hybrid recommendation models. This article explores why MovieLens remains a gold standard, detailing its structure (ratings, metadata, tags), available versions, and common use cases. It also highlights challenges like data sparsity and cold start, and shows how to connect MovieLens to Shaped to quickly prototype and train recommendation models using real interaction data enriched with movie attributes. research -
-
Last.fm Datasets: Unlocking Music Recommendations Through Listening History and Social Connections The article explores the significance of Last.fm datasets in developing music recommendation systems, highlighting their value as benchmarks for modeling implicit feedback, sequential listening behavior, and social influence. It breaks down what’s included in these datasets (such as user listening history, social graphs, and tags) and why they matter for music personalization research. It also walks through how teams can bring these datasets into Shaped to build real-time ranking models, covering schema setup, event ingestion, and optional use of tags or social data, demonstrating how Shaped makes it easy to prototype and productionize music recommenders using this rich, real-world data. research -
-
See the Bigger Picture: Image Feature Engineering for Search & Recs In visually rich digital environments, text and tags alone often fall short in powering relevant search and recommendations. This article explores how visual feature engineering, extracting embeddings from images using models like CLIP or ViT, unlocks deeper relevance by capturing visual nuance, style, and cross-modal meaning. While traditional computer vision pipelines are complex and resource-intensive, Shaped streamlines the entire process: ingesting image URLs, generating embeddings with advanced models, and integrating them into ranking APIs, all without requiring custom infrastructure. Whether automatically leveraging visuals or specifying your own Hugging Face model, Shaped makes it simple to activate image data for AI-powered personalization. ir -
-
-
-
Explainable Personalization: A Practical Guide for Building Trust and Transparency Personalization helps users discover the right content, products, or experiences, but when it happens without explanation, it can feel invasive, confusing, or even manipulative. As algorithms play a larger role in shaping what we see, hear, and buy, users are beginning to ask a simple question: *Why am I seeing this?* That question isn’t just philosophical. It reflects a growing demand for transparency, control, and trust in algorithmic systems. Whether you're building a recommendation engine, a personalized feed, or a product ranking feature, explainability is becoming essential. It reassures users, supports compliance, and helps teams understand and improve their models. ir -
-
-
-
-
-
-
-
-
-
-
-
-
A Comprehensive Guide to Approximate Nearest Neighbors Algorithms Finding the most relevant items from vast datasets is a fundamental challenge in modern machine learning applications. Whether you’re recommending movies on a streaming platform, suggesting products in an online store, or searching for similar images, the ability to quickly locate the nearest neighbor, or neighbors, to a given query point in high-dimensional spaces is critical. Traditional nearest neighbor search algorithms can identify the closest points by calculating exact distances, such as Euclidean distance, between vectors representing data points. But as datasets grow larger and more complex, especially with high-dimensional data like images, text embeddings, or user behavior patterns, exact search becomes prohibitively slow. eng ir -
-
-
-
-
-
A/B Testing Rankings: Metrics That Matter You’ve trained a model, optimized offline metrics, and picked a winner, but how do you know it’ll perform with real users? In this post, we explore why A/B testing is essential for validating personalization and ranking models in production. We cover key online metrics like CTR, CVR, and North Star Metrics, how to design statistically rigorous experiments, and how Shaped makes it easy to deploy, bucket, and measure real-world impact. evals -
-
-
-
-
The Ultimate Guide to Modern Ranking Models People today are inundated with choices, whether browsing products, searching for information, or discovering new content. The challenge for businesses is not just to present options, but to ensure the most relevant, engaging, and valuable items rise to the top. This is where ranking models come into play. These sophisticated algorithms power the search results you see, the recommendations you receive, and the content you’re most likely to click, watch, or buy. Ranking models are at the heart of personalization and discovery in industries ranging from e-commerce and media to online marketplaces. We’ll demystify ranking models, explore their key components, and outline best practices for implementing them across various use cases. ir -
-
Vector Search Explained: How AI Powers Smarter Search and Recommendations Search is undergoing a quiet transformation. As users expect instant, relevant results, whether shopping online, exploring a streaming platform, or using an AI assistant, traditional keyword search is no longer enough. Leading companies like Netflix, Amazon, and Spotify already use a different approach behind the scenes: vector search. We’ll explore how vector search powers today’s most advanced discovery and recommendation systems, looking at how it works, where it fits into modern AI infrastructure, and why it’s becoming a cornerstone of user experience across industries. eng ir -
-
-
-
-
-
-
-
-
-
-
One Embedding to Rule Them All Pinterest’s OmniSearchSage represents a major step forward in unified semantic search. By extending the two-tower model into a multi-task, multi-entity framework, it enables a single query embedding to power retrieval across pins, products, and related queries. The system integrates GenAI captions, user-curated board metadata, and behavioral signals to overcome sparse content, while maintaining compatibility with legacy models like PinSage. Deployed at massive scale, OmniSearchSage delivers strong gains in search fulfillment, ad performance, and downstream tasks, showcasing a pragmatic and scalable approach to representation learning in production. research ir -
-
-
-
-
-
-
No posts match.