The speech recognition tool is impressive! I've tried many different attempts at browser-local speech-to-text from Ermine.ai to Google's Tensorflow.js speech toolkit (recognition only), and this is the best I've seen. And it works on a standard GPU.
You should have some text explaining what is going and how they are running locally on the GPU - otherwise everyone sharing the link will need to write an explanation of why they're sharing it. Otherwise, fine work.
The announcement seems somewhat disingenuous. The PR[1] found from their release notes[2] seems to contain only boilerplate and no real support for Mistral models or their weights.
LLM stands for large language model. So tiny is kind of the opposite of large. Large usually means >=7B parameters (2023). If it's less, we would just call it language model, not large language model.
And language model has a very specific meaning: It models text. Usually we mean an auto-regressive language model, i.e. input is partial text, and output is the the prediction of the following text. Although there are also other kind of language models. Language model always means text-only.
E.g, a model for speech recognition is a speech recognition model, not a language model. You might use a language model in addition here (shallow fusion etc), but it's not necessary (end-to-end models). None of the models I see here are language models.
It would also be nice to add a bit more details on what kind of models we see here, how large they are, etc. E.g. for speech recognition, I see that this uses a port of Whisper to the web (https://github.com/xenova/whisper-web) based on the Transformers.js library (https://github.com/xenova/transformers.js). That uses ONNX, and the standard conversion is via Hugging Face Optimum
(https://github.com/huggingface/optimum), and that usually would do some dynamic quantization, i.e. compression of the model. So maybe the "tiny large" is referring to that. But I did not really find out which Whisper model this is based on, i.e. whether it is Whisper-large (which is still not too large with 1.5B parameters).
I've never heard that before. I agree that the "language model" part has an accepted definition. I'd call e.g. GPT-2 an LLM and don't think anyone would bat an eye.
BERT from a year prior also makes the list at https://en.wikipedia.org/wiki/Large_language_model#List but I think that's what the (2023) is supposed to represent: outside the few initial models from years ago >= 7B parameters is the typical expectation for the term (it actually lines up with that table extremely well).
At the same time, if you're off by less than an order of magnitude (where GPT-2 would fall if released today) I don't think anyone will be harping 7B. Gotta leave a bit of fuzzy interpretation for the real world as no single number is going to please everyone in all cases but some number in the ballpark is still useful to discuss.
Ah good question. I think I have read that statement by some other people. But the limit is kind of arbitrary. And of course, this limit will be higher and higher over time, that's why I put the year.
I think the limit should also not be much lower. We already have language models three order of magnitude larger (>1T params), and we also call them "large", so in this context, all those single-digit billion parameter models feel quite small.
Similarly, when is a network "deep"? It used to mean more than 2 or 3 layers. And then there was a definition for "very deep", starting with more than 10 layers (I think Schmidhuber introduced that definition many years ago, https://arxiv.org/abs/1404.7828). Obviously, that's totally outdated now. Networks are often very deep, e.g. those large language models often 96 layers.
I cannot make this work on Chrome in Linux. I even tried with --enable-dawn-features=allow_unsafe_apis --enable-unsafe-webgpu