Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Tiny LLMs – Browser-based private AI models for a wide array of tasks (tinyllms.vercel.app)
142 points by bilater on Nov 16, 2023 | hide | past | favorite | 23 comments


Here is an actual LLM that runs in the browser: https://webllm.mlc.ai/#chat-demo

I cannot make this work on Chrome in Linux. I even tried with --enable-dawn-features=allow_unsafe_apis --enable-unsafe-webgpu


The speech recognition tool is impressive! I've tried many different attempts at browser-local speech-to-text from Ermine.ai to Google's Tensorflow.js speech toolkit (recognition only), and this is the best I've seen. And it works on a standard GPU.


Which one of these is an LLM? The youtube summary one maybe? Doesn't seem to run in the browser.


The other tools section is just extra helpful links to other stuff I've done.


None of these are LLMs. LLM means Large Language Model (for text generation, like ChatGPT).


Looks like we've hit full kleenex on terms for models, maybe.

Even if it were a tiny LLM, which there are none on that site, it would technically be a SLM, for small language model.



You should have some text explaining what is going and how they are running locally on the GPU - otherwise everyone sharing the link will need to write an explanation of why they're sharing it. Otherwise, fine work.


Tiny Models would have been a better name. LLMs are a specific type of model that generate text. Or Tiny GenAI to capitalize on the marketing names?


Nice, whats the text to video model? Also, you could try to go for a 1b llm for the browser, would fit.


Good idea! Mistral 1B is supported by transformers.js now so I can add it!

The Text to Video model is a pipeline Text to Speech + FFMPEG Magic to stitch together a video.


What’s Mistral 1B? I only know of Mistral 7B.


I think it might be a < 1B fine tuned model. Read about it in this release

https://twitter.com/xenovacom/status/1722661501180256311


The announcement seems somewhat disingenuous. The PR[1] found from their release notes[2] seems to contain only boilerplate and no real support for Mistral models or their weights.

[1]: https://github.com/xenova/transformers.js/pull/379 [2]: https://github.com/xenova/transformers.js/releases/tag/2.8.0


Love that the speech to text even works in FF on Mac, and rather well I'd say.


This is cool but named pretty poorly. It doesn't seem to have anything to do with LLM's - TinyAIs would've been perfect.


Perfect fit. But some jokers try to sell tinyai.com for $120,000


These aren't LLMs...


The name "Tiny LLMs" does not make sense.

LLM stands for large language model. So tiny is kind of the opposite of large. Large usually means >=7B parameters (2023). If it's less, we would just call it language model, not large language model.

And language model has a very specific meaning: It models text. Usually we mean an auto-regressive language model, i.e. input is partial text, and output is the the prediction of the following text. Although there are also other kind of language models. Language model always means text-only.

E.g, a model for speech recognition is a speech recognition model, not a language model. You might use a language model in addition here (shallow fusion etc), but it's not necessary (end-to-end models). None of the models I see here are language models.

So, to put a title here, you could maybe "neural network models in browser" or so. Or maybe "natural language models". "Natural language processing (NLP)" has a different meaning (https://en.wikipedia.org/wiki/Natural_language_processing) (compared to "language model", https://en.wikipedia.org/wiki/Language_model) and includes speech recognition. Sometimes this was also referred to as "human language technology (HLT)" (https://en.wikipedia.org/wiki/Language_technology), which would also include all those models.

It would also be nice to add a bit more details on what kind of models we see here, how large they are, etc. E.g. for speech recognition, I see that this uses a port of Whisper to the web (https://github.com/xenova/whisper-web) based on the Transformers.js library (https://github.com/xenova/transformers.js). That uses ONNX, and the standard conversion is via Hugging Face Optimum (https://github.com/huggingface/optimum), and that usually would do some dynamic quantization, i.e. compression of the model. So maybe the "tiny large" is referring to that. But I did not really find out which Whisper model this is based on, i.e. whether it is Whisper-large (which is still not too large with 1.5B parameters).


> Large usually means >=7B parameters (2023).

I've never heard that before. I agree that the "language model" part has an accepted definition. I'd call e.g. GPT-2 an LLM and don't think anyone would bat an eye.


BERT from a year prior also makes the list at https://en.wikipedia.org/wiki/Large_language_model#List but I think that's what the (2023) is supposed to represent: outside the few initial models from years ago >= 7B parameters is the typical expectation for the term (it actually lines up with that table extremely well).

At the same time, if you're off by less than an order of magnitude (where GPT-2 would fall if released today) I don't think anyone will be harping 7B. Gotta leave a bit of fuzzy interpretation for the real world as no single number is going to please everyone in all cases but some number in the ballpark is still useful to discuss.


Ah good question. I think I have read that statement by some other people. But the limit is kind of arbitrary. And of course, this limit will be higher and higher over time, that's why I put the year.

I think the limit should also not be much lower. We already have language models three order of magnitude larger (>1T params), and we also call them "large", so in this context, all those single-digit billion parameter models feel quite small.

Similarly, when is a network "deep"? It used to mean more than 2 or 3 layers. And then there was a definition for "very deep", starting with more than 10 layers (I think Schmidhuber introduced that definition many years ago, https://arxiv.org/abs/1404.7828). Obviously, that's totally outdated now. Networks are often very deep, e.g. those large language models often 96 layers.


I could see TinyLLM meaning a method of reducing/compression the effective runtime size of an LLM. Ie quantization stuff, etc.

No idea what this is, tho.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: