It only takes a few thousand lines (easily less than 10k even with zero dependencies and no standard library) to implement QUIC.
Kernel management of transport protocols has zero actual benefit for latency or throughput given proper network stack design. Neither does hardware offload except for crypto offload. Claimed differences are just due to poor network stack design and poor protocol implementation.
Not fully standards compliant since I skipped some irrelevant details like bidirectional streams when I can just make a pair of unidirectional streams, but handles all of the core connection setup and transport logic. It is not actually that complicated. And just to get ahead of it, performance is perfectly comparable.
FWIW, quic-go, a fully-featured implementation in Go used by the Caddy web server, is 36k lines in total (28k SLoC), excluding tests. Not quite 10k, but closer to that than to your figure.