John Carreyrou's book "Bad blood" is extremely good. Full of suspense, amazing revelations. I highly recommend it. It explores a lot of Elizabeth Holmes' and Sunny Balwani's insanity.
I was looking to setup Proxmox for my homelab soon but this comment got me interested in Incus. Mostly because I've never heard of any Proxmox alternatives before this. You can try out Incus in your browser here: https://linuxcontainers.org/incus/try-it/
The demo does take ~10m to get into a working instance.
Are apps run through WinBoat limited to 60hz like regular Windows VMs? I’ve gotten to used to higher refresh rates and 1 window being a lower rate drives me nuts!
I'm very curious as to how this works in the backend. I realize it uses Bluesky's firehose to get the posts, but I'm more curious on how it's checking whether a post contains any of the available words. Any guesses?
Hey! this is my site - it's not all that complex, i'm just using a sqlite db with two tables - one for stats, the other for all the words that's just word | count | first use | last use | post.
using this, a combo of "covered enough" for the bit and easy to use
also, since i'm tracking every word (technically a better name for this project would be The Bluesky Corpus) all inflected forms are different words, which aligns with my thinking
You can probably fit all words under 10-15MB of memory, but memory optimisations are not even needed for 250k words...
Trie data structures are memory-efficient for storing such dictionaries (2-4x better than hashmaps). Although not as fast as hashmaps for retrieving items. You can hash the top 1k of the most common words and check the rest using a trie.
The most CPU-intensive task here is text tokenizing, but there are a ton of optimized options developed by orgs that work on LLMs.
I very much hope that the backend uses one of the bluesky jetstream endpoints.
When you only subscribe to new posts, it provides a stream of around 20mbit/s last time I checked, while the firehose was ~200mbit/s.
Probably just a big hashtable mapping word -> the number of times it's been seen, and another hashset of all the words it hasn't seen. When a post comes in you hash all the words in it and look them up in the hashtable, increment it, and if the old value was 0 remove it from the hash set.
250k words at a generous 100 bytes per word is only 25MB of memory...
Maybe I'm being naive, but with only ~275k words to check against, this doesn't seem like a particularly hard problem. Ingest post, split by words, check each word via some db, hashmap, etc... and update metadata.