Building a Product Search System with Sentence Embeddings and Similarity Scoring

We will explore how to build a product search system that leverages sentence embeddings and similartiy scoring to improve search relevance.

For this projekt, we need a lightweight model from “sentence-tansformers” library. Wyh: Because we need per Product Vector Space, that must be fast and stabil.

I Founded this “all-MiniLM-L6-v2” model, is small, efficient and maps sentences to 384-dimensional dense vector space, making it suitable for tasks like semantic search.

Let’s Start, Step 1: Setting Up the Envrioment:

First, install the necessary library:

pip install sentence-transformers

Then, import the required modules and load the model:

from sentence_transformers import SentenceTransformer, util
import torch

# Load the sentence transformer model
model = SentenceTransformer('all-MiniLM-L6-v2')
model.max_seq_length = 256  # Optional: Adjust the maximum sequence length if needed

Step 2: Generation Embeddings:

We will generate embeddings for our products descriptions. The “model.encode” method converts sentences into embeddings, which are numerical vectors.

# List of product descriptions
products = [
    'A comfortable red chair for your living room.',
    'A spacious green sofa for your living room.',
    'A sturdy blue table for your dining area.',
    'A bright yellow lamp for your study.',
    'A sleek black desk for your office.',
    # Add more product descriptions as needed
]

# Generate embeddings for the product descriptions
product_embeddings = model.encode(products, normalize_embeddings=True)

Step 3: Handling Search Queries:

When a user submits a search query, we will generate an embedings for the query and compare it to the embeddings of the product descriptions.

# Example search query
query = "I need a comfortable chair"

# Generate embedding for the search query
query_embedding = model.encode(query, normalize_embeddings=True)

Step 4: Calculation Similarity Score

With the embeddings ready, we can calculate the similarity between the query embedding and each product embedding. Important thing. We will use the “dot product to mesaure similartiy since the embeddings are normalized”.

# Calculate dot product similarity scores
dot_scores = util.dot_score(query_embedding, product_embeddings)[0]

# Get the top 5 most similar products
top_results = torch.topk(dot_scores, k=5)

Final: Display the Results

for score, idx in zip(top_results[0], top_results[1]):
    print(products[idx], "(Score: {:.4f})".format(score))

What is the Dot Product?

The dot product is a way to measure how similar two vectors (list of numbers) are. If two vecetors point in the same direction, their dot product is hight, if they point in opposite directions, the dot product is low.

To visually understand the dot product search, let’s break it down step-by-step using PHP to represent the mathematical formula and the code.

Given two vectors A und B:

A = [a1, a2, a3]
B = [b1, b2, b3]

The dot product is calculated as:

A · B = a1 · b1 + a2 · b2 + a3 · b3

Example Calculation:

Imagine you have a search query and product descriptions represented as vectors:

Query = [0.1, 0.3, 0.4]
Product 1 = [0.1, 0.3, 0.4]
Product 2 = [0.2, 0.1, 0.5]

Now, let’s see how this work in code (I’am PHP developer, i try to explain in PHP 🙂 )

<?php

function dotProduct($vecA, $vecB) {
    $dotProduct = 0.0;
    for ($i = 0; $i < count($vecA); $i++) {
        $dotProduct += $vecA[$i] * $vecB[$i];
    }

    return $dotProduct;
}
// Example embeddings for products
$products = [
    'A comfortable red chair for your living room.' => [0.1, 0.3, 0.4],
    'A sturdy blue table for your dining area.' => [0.2, 0.1, 0.5],
    'A bright yellow lamp for your study.' => [0.4, 0.4, 0.2],
];

// Example embedding for the search query
$query = "I need a comfortable chair";
$queryEmbedding = [0.1, 0.3, 0.4]; // Normally generated by a model

// Calculate similarity scores
$similarities = [];
foreach ($products as $product => $embedding) {
    $similarity = dotProduct($queryEmbedding, $embedding);
    $similarities[$product] = $similarity;
}

// Sort products by similarity score
arsort($similarities);

// Display top results
foreach ($similarities as $product => $score) {
    echo "Product: $product (Score: $score)\n";
}

Views: 144