Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

1. One of the reasons we created Khoj was being able to do natural language search with embeddings generated offline using open-source models!

2. We don't use any vector datastores (yet). You can do a lot in memory, it's faster and it does exact matches (no KNN, approx matching)

Feel free to ask if you were looking for something more specific?



Thank you! I’d love to hear more about your experiences with:

1. content / question vector mismatch

2. what types of embedding you experimented with storing per-chunk (text only? Hypothetical question? Metadata?)

3. choice of embeddings model (eg OpenAI vs instructorEmbeddings or an alternative from the MTEB leaderboard)

It’s a great project, going to have a deeper dig today.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: