We will explore how to build a product search system that leverages sentence embeddings and similartiy scoring to improve search relevance.
For this projekt, we need a lightweight model from “sentence-tansformers” library. Wyh: Because we need per Product Vector Space, that must be fast and stabil.
I Founded this “all-MiniLM-L6-v2” model, is small, efficient and maps sentences to 384-dimensional dense vector space, making it suitable for tasks like semantic search.
Let’s Start, Step 1: Setting Up the Envrioment:
First, install the necessary library:
pip install sentence-transformers
Then, import the required modules and load the model:
from sentence_transformers import SentenceTransformer, util import torch # Load the sentence transformer model model = SentenceTransformer('all-MiniLM-L6-v2') model.max_seq_length = 256 # Optional: Adjust the maximum sequence length if needed
Step 2: Generation Embeddings:
We will generate embeddings for our products descriptions. The “model.encode” method converts sentences into embeddings, which are numerical vectors.
# List of product descriptions products = [ 'A comfortable red chair for your living room.', 'A spacious green sofa for your living room.', 'A sturdy blue table for your dining area.', 'A bright yellow lamp for your study.', 'A sleek black desk for your office.', # Add more product descriptions as needed ] # Generate embeddings for the product descriptions product_embeddings = model.encode(products, normalize_embeddings=True)
Step 3: Handling Search Queries:
When a user submits a search query, we will generate an embedings for the query and compare it to the embeddings of the product descriptions.
# Example search query query = "I need a comfortable chair" # Generate embedding for the search query query_embedding = model.encode(query, normalize_embeddings=True)
Step 4: Calculation Similarity Score
With the embeddings ready, we can calculate the similarity between the query embedding and each product embedding. Important thing. We will use the “dot product to mesaure similartiy since the embeddings are normalized”.
# Calculate dot product similarity scores dot_scores = util.dot_score(query_embedding, product_embeddings)[0] # Get the top 5 most similar products top_results = torch.topk(dot_scores, k=5)
Final: Display the Results
for score, idx in zip(top_results[0], top_results[1]): print(products[idx], "(Score: {:.4f})".format(score))
What is the Dot Product?
The dot product is a way to measure how similar two vectors (list of numbers) are. If two vecetors point in the same direction, their dot product is hight, if they point in opposite directions, the dot product is low.
To visually understand the dot product search, let’s break it down step-by-step using PHP to represent the mathematical formula and the code.
Given two vectors A und B:
A = [a1, a2, a3]B = [b1, b2, b3]
The dot product is calculated as:
A · B = a1 · b1 + a2 · b2 + a3 · b3Example Calculation:
Imagine you have a search query and product descriptions represented as vectors:
Query = [0.1, 0.3, 0.4]Product 1 = [0.1, 0.3, 0.4]
Product 2 = [0.2, 0.1, 0.5]
Now, let’s see how this work in code (I’am PHP developer, i try to explain in PHP 🙂 )
<?php function dotProduct($vecA, $vecB) { $dotProduct = 0.0; for ($i = 0; $i < count($vecA); $i++) { $dotProduct += $vecA[$i] * $vecB[$i]; } return $dotProduct; } // Example embeddings for products $products = [ 'A comfortable red chair for your living room.' => [0.1, 0.3, 0.4], 'A sturdy blue table for your dining area.' => [0.2, 0.1, 0.5], 'A bright yellow lamp for your study.' => [0.4, 0.4, 0.2], ]; // Example embedding for the search query $query = "I need a comfortable chair"; $queryEmbedding = [0.1, 0.3, 0.4]; // Normally generated by a model // Calculate similarity scores $similarities = []; foreach ($products as $product => $embedding) { $similarity = dotProduct($queryEmbedding, $embedding); $similarities[$product] = $similarity; } // Sort products by similarity score arsort($similarities); // Display top results foreach ($similarities as $product => $score) { echo "Product: $product (Score: $score)\n"; }
Views: 144