I wonder, if something like this were trained on Wikipedia, could it become a re...

simonw · 2025-10-13T22:02:44 1760392964

I don't think so. Training on documents is not a great way of building a search engine for those for the information in those documents, because the training process mixes all of that information together in ways that detach the individual words from the source documents they came from.

As usual, if you want an LLM to be able to help search a corpus of text the best way to achieve that is to teach it how to use a search tool against that text.

victor106 · 2025-10-13T23:25:26 1760397926

> the best way to achieve that is to teach it how to use a search tool against that text.

Any examples of this?

simonw · 2025-10-13T23:41:45 1760398905

I've seen this called "agentic RAG" by some people. The easiest way to get a local demo is with Claude Code or Codex CLI. They know how to use grep, and you can set them loose on a folder full of text files and tell them to use grep to answer questions - it can work really well.

I just tried this in "claude --dangerously-skip-permissions":

> Use Python and AppleScript to find Apple Notes that mention UPS

... and fell down a rabbit hole of optimizations because my Notes collection is HUGE, but it got there in the end!