I wrote something similar in my last job where we had to parse and query data fr...

I wrote something similar in my last job where we had to parse and query data from huge (50+ GB? I remember they weren't even fitting in my laptop) json files that were stored in an S3 Bucket..

We used the streaming parser to create an index of the file locally {json key: (byte offset, byte size)} and then simply used http range queries to access the data we needed.

Here is the full write up about it:

https://dinesh.cloud/2022/streaming-json-for-fun-and-profit/

And here is the open sourced code:

https://github.com/multiversal-ventures/json-buffet