BM25 is the workhorse of search; vectors are its visionary cousin

nicolay-ai · on Nov 20, 2024

Vector search is more precise and effective for semantic similarity, but its operational costs and memory requirements make it prohibitive for massive datasets like GitHub’s over 100 billion documents.

BM25’s scaling challenges (e.g., reliance on disk IOPS) are manageable compared to the memory-bound nature of vector search engines like HNSW and IVF.