I keep reading that GPT is a "smarter" "complex" Markov, in the end, just a func...

Lerc · on Feb 24, 2024

There are two parts of the knowledge at play here.

1. The trained knowledge included in the parameters of the model

2. The context of the conversation

The 'learning' you are experiencing here is due to the conversation context retaining the new facts. Historically the context windows were very short and as the conversation continues it would quickly forget the new facts.

More recently context windows have grown to rather massive lengths.

pests · on Feb 24, 2024

Mostly #2 of what the other response says. The entire input to the NN is what it's learning off - the weights are static, you are changing the probabilities based on which context is provided and it's length.

drexlspivey · on Feb 24, 2024

Markov chains also depend on the initial state. In GPTs the initial state is your conversation history