2. We don't use any vector datastores (yet). You can do a lot in memory, it's faster and it does exact matches (no KNN, approx matching)
Feel free to ask if you were looking for something more specific?
1. content / question vector mismatch
2. what types of embedding you experimented with storing per-chunk (text only? Hypothetical question? Metadata?)
3. choice of embeddings model (eg OpenAI vs instructorEmbeddings or an alternative from the MTEB leaderboard)
It’s a great project, going to have a deeper dig today.
2. We don't use any vector datastores (yet). You can do a lot in memory, it's faster and it does exact matches (no KNN, approx matching)
Feel free to ask if you were looking for something more specific?