Building an E-commerce Search Engine with Tantivy (Rust)

Published 2026-03-30 15:40:16 · 55 views

Introduction

Search is the backbone of any e-commerce platform. Users expect fast, relevant, and typo-tolerant results when searching for products. In the Rust ecosystem, Tantivy is a powerful full-text search engine library inspired by Apache Lucene.

In this article, we’ll build a production-grade search system for e-commerce data using Rust and Tantivy.


What is Tantivy?

Tantivy is a high-performance, full-text search engine library written in Rust.

🔑 Features

  • Full-text search with ranking (BM25)

  • Schema-based indexing

  • Fast and memory-efficient

  • Tokenization and text analysis

  • Faceting and filtering support


🛒 Use Case: E-commerce Product Search

We want to support:

  • Search by product name

  • Filter by category

  • Sort by price or relevance

  • Handle typos (“iphnoe” → “iphone”)


📦 Step 1: Setup Project

cargo new ecommerce-search
cd ecommerce-search

Add dependencies:

[dependencies]
tantivy = "0.21"
serde = { version = "1", features = ["derive"] }

🧱 Step 2: Define Schema

Tantivy requires a schema to define searchable fields.

use tantivy::schema::*;

let mut schema_builder = Schema::builder();

let id = schema_builder.add_u64_field("id", STORED);
let name = schema_builder.add_text_field("name", TEXT | STORED);
let description = schema_builder.add_text_field("description", TEXT);
let category = schema_builder.add_text_field("category", STRING | STORED);
let price = schema_builder.add_f64_field("price", STORED);

let schema = schema_builder.build();

🧠 Field Types Explained

  • TEXT → full-text search (tokenized)

  • STRING → exact match (for filtering)

  • STORED → retrievable in results


📥 Step 3: Index Product Data

use tantivy::{doc, Index};

let index = Index::create_in_dir("./index", schema.clone())?;
let mut writer = index.writer(50_000_000)?;

writer.add_document(doc!(
    id => 1,
    name => "iPhone 14",
    description => "Latest Apple smartphone",
    category => "electronics",
    price => 999.0
));

writer.commit()?;

🔍 Step 4: Searching Products

use tantivy::query::QueryParser;

let reader = index.reader()?;
let searcher = reader.searcher();

let query_parser = QueryParser::for_index(&index, vec![name, description]);

let query = query_parser.parse_query("iphone")?;

let top_docs = searcher.search(&query, &tantivy::collector::TopDocs::with_limit(10))?;

📊 Step 5: Retrieve Results

for (_score, doc_address) in top_docs {
    let retrieved = searcher.doc(doc_address)?;
    println!("{:?}", schema.to_json(&retrieved));
}

⚡ Step 6: Add Filters (Category)

let query = query_parser.parse_query("iphone AND category:electronics")?;

🧠 Step 7: Ranking (BM25)

Tantivy uses BM25 scoring by default:

  • Matches keywords

  • Boosts relevance

  • Considers term frequency

👉 No extra work needed—great results out of the box


🔤 Step 8: Tokenization & Text Analysis

Customize tokenizer:

use tantivy::tokenizer::*;

let en_stem = TextAnalyzer::from(SimpleTokenizer)
    .filter(LowerCaser)
    .filter(Stemmer::new(Language::English));

👉 Helps match:

  • “running” → “run”

  • “phones” → “phone”


🧪 Step 9: Typo Tolerance (Fuzzy Search)

use tantivy::query::FuzzyTermQuery;

👉 Enables:

  • “iphnoe” → “iphone”


📈 Step 10: Sorting by Price

use tantivy::collector::TopDocs;

let top_docs = searcher.search(
    &query,
    &TopDocs::with_limit(10).order_by_fast_field("price", tantivy::Order::Asc)
)?;

🏗️ Production Architecture

1. Indexing Pipeline

  • Ingest product data (DB → Tantivy)

  • Batch indexing

  • Periodic commits


2. Search API Layer

Use a web framework (like Axum):

  • /search?q=iphone

  • /search?q=phone&category=electronics


3. Caching Layer

  • Cache popular queries

  • Use Redis or in-memory cache


4. Re-ranking Layer (Advanced)

Combine:

  • Text relevance (BM25)

  • Business signals (sales, ratings)


🔑 Advanced Features

Faceted Search

  • Filter by category, brand, price range

Autocomplete

  • Prefix queries

Synonyms

  • “mobile” = “phone”


⚠️ Common Pitfalls

  • Not storing fields → cannot return results

  • Over-indexing large text → memory overhead

  • Frequent commits → performance hit


🎯 When to Use Tantivy

Choose Tantivy if:

  • You need embedded search (no external service)

  • You want high performance in Rust

  • You want full control over indexing


🧠 Comparison

FeatureTantivyElasticsearch
LanguageRustJava
DeploymentEmbeddedDistributed
PerformanceVery highHigh
ComplexityLowHigh

Conclusion

Tantivy is a powerful and efficient choice for building search engines in Rust. With features like BM25 ranking, tokenization, and filtering, it can power real-world e-commerce search systems with excellent performance.

If you want full control and Rust-native performance, Tantivy is an excellent choice.