Building a Product Search System with Sentence Embeddings and Similarity Scoring

We will explore how to build a product search system that leverages sentence embeddings and similartiy scoring to improve search relevance.

For this projekt, we need a lightweight model from “sentence-tansformers” library. Wyh: Because we need per Product Vector Space, that must be fast and stabil.

I Founded this “all-MiniLM-L6-v2” model, is small, efficient and maps sentences to 384-dimensional dense vector space, making it suitable for tasks like semantic search.

Let’s Start, Step 1: Setting Up the Envrioment:

First, install the necessary library:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
pip install sentence-transformers
pip install sentence-transformers
pip install sentence-transformers

Then, import the required modules and load the model:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
from sentence_transformers import SentenceTransformer, util
import torch
# Load the sentence transformer model
model = SentenceTransformer('all-MiniLM-L6-v2')
model.max_seq_length = 256 # Optional: Adjust the maximum sequence length if needed
from sentence_transformers import SentenceTransformer, util import torch # Load the sentence transformer model model = SentenceTransformer('all-MiniLM-L6-v2') model.max_seq_length = 256 # Optional: Adjust the maximum sequence length if needed
from sentence_transformers import SentenceTransformer, util
import torch

# Load the sentence transformer model
model = SentenceTransformer('all-MiniLM-L6-v2')
model.max_seq_length = 256  # Optional: Adjust the maximum sequence length if needed

Step 2: Generation Embeddings:

We will generate embeddings for our products descriptions. The “model.encode” method converts sentences into embeddings, which are numerical vectors.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
# List of product descriptions
products = [
'A comfortable red chair for your living room.',
'A spacious green sofa for your living room.',
'A sturdy blue table for your dining area.',
'A bright yellow lamp for your study.',
'A sleek black desk for your office.',
# Add more product descriptions as needed
]
# Generate embeddings for the product descriptions
product_embeddings = model.encode(products, normalize_embeddings=True)
# List of product descriptions products = [ 'A comfortable red chair for your living room.', 'A spacious green sofa for your living room.', 'A sturdy blue table for your dining area.', 'A bright yellow lamp for your study.', 'A sleek black desk for your office.', # Add more product descriptions as needed ] # Generate embeddings for the product descriptions product_embeddings = model.encode(products, normalize_embeddings=True)
# List of product descriptions
products = [
    'A comfortable red chair for your living room.',
    'A spacious green sofa for your living room.',
    'A sturdy blue table for your dining area.',
    'A bright yellow lamp for your study.',
    'A sleek black desk for your office.',
    # Add more product descriptions as needed
]

# Generate embeddings for the product descriptions
product_embeddings = model.encode(products, normalize_embeddings=True)

Step 3: Handling Search Queries:

When a user submits a search query, we will generate an embedings for the query and compare it to the embeddings of the product descriptions.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
# Example search query
query = "I need a comfortable chair"
# Generate embedding for the search query
query_embedding = model.encode(query, normalize_embeddings=True)
# Example search query query = "I need a comfortable chair" # Generate embedding for the search query query_embedding = model.encode(query, normalize_embeddings=True)
# Example search query
query = "I need a comfortable chair"

# Generate embedding for the search query
query_embedding = model.encode(query, normalize_embeddings=True)

Step 4: Calculation Similarity Score

With the embeddings ready, we can calculate the similarity between the query embedding and each product embedding. Important thing. We will use the “dot product to mesaure similartiy since the embeddings are normalized”.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
# Calculate dot product similarity scores
dot_scores = util.dot_score(query_embedding, product_embeddings)[0]
# Get the top 5 most similar products
top_results = torch.topk(dot_scores, k=5)
# Calculate dot product similarity scores dot_scores = util.dot_score(query_embedding, product_embeddings)[0] # Get the top 5 most similar products top_results = torch.topk(dot_scores, k=5)
# Calculate dot product similarity scores
dot_scores = util.dot_score(query_embedding, product_embeddings)[0]

# Get the top 5 most similar products
top_results = torch.topk(dot_scores, k=5)

Final: Display the Results

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
for score, idx in zip(top_results[0], top_results[1]):
print(products[idx], "(Score: {:.4f})".format(score))
for score, idx in zip(top_results[0], top_results[1]): print(products[idx], "(Score: {:.4f})".format(score))
for score, idx in zip(top_results[0], top_results[1]):
    print(products[idx], "(Score: {:.4f})".format(score))

What is the Dot Product?

The dot product is a way to measure how similar two vectors (list of numbers) are. If two vecetors point in the same direction, their dot product is hight, if they point in opposite directions, the dot product is low.

To visually understand the dot product search, let’s break it down step-by-step using PHP to represent the mathematical formula and the code.

Given two vectors A und B:

A = [a1, a2, a3]
B = [b1, b2, b3]

The dot product is calculated as:

A · B = a1 · b1 + a2 · b2 + a3 · b3

Example Calculation:

Imagine you have a search query and product descriptions represented as vectors:

Query = [0.1, 0.3, 0.4]
Product 1 = [0.1, 0.3, 0.4]
Product 2 = [0.2, 0.1, 0.5]

Now, let’s see how this work in code (I’am PHP developer, i try to explain in PHP 🙂 )

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<?php
function dotProduct($vecA, $vecB) {
$dotProduct = 0.0;
for ($i = 0; $i < count($vecA); $i++) {
$dotProduct += $vecA[$i] * $vecB[$i];
}
return $dotProduct;
}
// Example embeddings for products
$products = [
'A comfortable red chair for your living room.' => [0.1, 0.3, 0.4],
'A sturdy blue table for your dining area.' => [0.2, 0.1, 0.5],
'A bright yellow lamp for your study.' => [0.4, 0.4, 0.2],
];
// Example embedding for the search query
$query = "I need a comfortable chair";
$queryEmbedding = [0.1, 0.3, 0.4]; // Normally generated by a model
// Calculate similarity scores
$similarities = [];
foreach ($products as $product => $embedding) {
$similarity = dotProduct($queryEmbedding, $embedding);
$similarities[$product] = $similarity;
}
// Sort products by similarity score
arsort($similarities);
// Display top results
foreach ($similarities as $product => $score) {
echo "Product: $product (Score: $score)\n";
}
<?php function dotProduct($vecA, $vecB) { $dotProduct = 0.0; for ($i = 0; $i < count($vecA); $i++) { $dotProduct += $vecA[$i] * $vecB[$i]; } return $dotProduct; } // Example embeddings for products $products = [ 'A comfortable red chair for your living room.' => [0.1, 0.3, 0.4], 'A sturdy blue table for your dining area.' => [0.2, 0.1, 0.5], 'A bright yellow lamp for your study.' => [0.4, 0.4, 0.2], ]; // Example embedding for the search query $query = "I need a comfortable chair"; $queryEmbedding = [0.1, 0.3, 0.4]; // Normally generated by a model // Calculate similarity scores $similarities = []; foreach ($products as $product => $embedding) { $similarity = dotProduct($queryEmbedding, $embedding); $similarities[$product] = $similarity; } // Sort products by similarity score arsort($similarities); // Display top results foreach ($similarities as $product => $score) { echo "Product: $product (Score: $score)\n"; }
<?php

function dotProduct($vecA, $vecB) {
    $dotProduct = 0.0;
    for ($i = 0; $i < count($vecA); $i++) {
        $dotProduct += $vecA[$i] * $vecB[$i];
    }

    return $dotProduct;
}
// Example embeddings for products
$products = [
    'A comfortable red chair for your living room.' => [0.1, 0.3, 0.4],
    'A sturdy blue table for your dining area.' => [0.2, 0.1, 0.5],
    'A bright yellow lamp for your study.' => [0.4, 0.4, 0.2],
];

// Example embedding for the search query
$query = "I need a comfortable chair";
$queryEmbedding = [0.1, 0.3, 0.4]; // Normally generated by a model

// Calculate similarity scores
$similarities = [];
foreach ($products as $product => $embedding) {
    $similarity = dotProduct($queryEmbedding, $embedding);
    $similarities[$product] = $similarity;
}

// Sort products by similarity score
arsort($similarities);

// Display top results
foreach ($similarities as $product => $score) {
    echo "Product: $product (Score: $score)\n";
}

Views: 78