Vector Search vs Keyword Search in Study Tools
Keyword search finds exact words; vector search finds nearby meaning. Strong study systems usually need both.
Site connection
Lykke uses document embeddings, hybrid search, and reranking to retrieve course evidence before generating study artifacts.
Visual model
Two signals, one ranked evidence set
The ranked evidence demo shows why exact keyword matches and vector similarity should both influence retrieval.
Interactive
Hybrid retrieval turns a vague study question into ranked evidence
Two Kinds of Relevance
Keyword search is lexical. It cares whether the same tokens appear. Vector search is semantic. It cares whether two chunks mean similar things under an embedding model.
Why Exact Matching Still Matters
A vector model may understand that 'midterm review' and 'exam prep' are related, but it can blur exact names. If a student asks for 'CS 111 Project 2 rubric,' lexical matching should strongly preserve those terms.
Full-text indexes such as SQLite FTS5 are designed to efficiently find documents containing specific terms. That is still a basic superpower.
Why Semantic Search Matters
Students rarely ask in the same words used by a slide deck. A lecture may say 'gradient-based optimization' while the student asks 'how does the model learn from error?' Embeddings can connect those meanings.
The best pipeline retrieves both lexical and semantic candidates, merges them, reranks them, and keeps source labels attached.
| Query type | Best first signal |
|---|---|
| Exact assignment name | Keyword |
| Vague conceptual question | Vector |
| Formula with notation | Keyword plus metadata |
| Study-plan request | Vector plus calendar metadata |
| Acronym or abbreviation | Keyword |
Common Pitfalls
- Using only vector search and missing exact course identifiers.
- Using only keyword search and missing paraphrases.
- Merging rankings without deduplication.
- Letting semantically similar but stale course content outrank the current assignment.
Quick check
Quiz
Why is hybrid search useful in course tools?
- It combines exact terms with semantic similarity
- It avoids all indexing
- It removes source metadata
- It only works for images
Course questions often contain both exact identifiers and fuzzy intent.