Isn't r/AskDocs in the corpus on which ChatGPT was trained on the first place?

PrimeMcFly · on April 29, 2023

It probably used that more to figure out how to word sentences, but I assumed it relied more on wiki or academic articles to diagnose.

sorokod · on April 29, 2023

Why would it be more probable and why would you assume "wiki" and "academic articles"?

PrimeMcFly · on April 29, 2023

Not sure why you put those things in quotes, that's kind of strange.

That aside, the training isn't blind, it's guided, and it's likely they use verified correct sources of info to train for some things, like medical diagnoses.

sorokod · on April 29, 2023

I can help with "verified correct sources", have a look at "Language Models are Few-Shot Learners" section 2.2 [1].

You may also be interested in Apendix A in the same document: "Details of Common Crawl Filtering"

[1] https://arxiv.org/pdf/2005.14165.pdf