For me the test is; when will a Siri-LLM be able to run locally on my iPhone at ...

adam_arthur · on Sept 7, 2023

Given that phones are a few years behind PCs on RAM, likely whenever the average PC can do it, plus a few years. There are phones out there with 24GB of RAM already, it looks like.

Of course battery life would be a concern there, so I think LLM usage on phones will remain in the cloud.

Haven't studied phone RAM capacity growth rates in detail though

nico · on Sept 7, 2023

That’s for LLMs, but at the same time, there are other types of models coming out

Wouldn’t be surprised if we get small models that can run locally on a phone and just retrieve data from the network as needed (without sending your data out), within the next couple of years

baq · on Sept 7, 2023

Wonder if someone is thinking of LLM specific RAM, slower but much denser. Bonus points for not having to reload the model after power cycling.

Maybe call this fantastic technology something idiotic like 3d XPoint?

AnthonyMouse · on Sept 7, 2023

> slower but much denser. Bonus points for not having to reload the model after power cycling.

This is called a solid state drive.

baq · on Sept 8, 2023

Goes to show how badly Intel executed that one.

AnthonyMouse · on Sept 8, 2023

What? You can do this right now. Put your >100GB model on your SSD in your computer with <100GB of RAM and use mmap. It's not fast, but it runs.

baq · on Sept 8, 2023

My point is Intel had the perfect tech for this and killed it.

https://en.wikipedia.org/wiki/3D_XPoint

AnthonyMouse · on Sept 8, 2023

They didn't really. What this wants is gobs of memory bandwidth. The fastest NVMe SSDs can essentially saturate the PCIe bus. Using a dozen or more of them in parallel might even have reasonable performance for this. (Most desktops don't have this many PCIe lanes but HEDT and servers do). And they're a lot cheaper than Optane was.

To do better than that would have required the version of Optane that used DIMM slots, which was something like a quarter of the performance of actual DRAM for half the price.

So you had something that costs more than ordinary SSDs if your priority is cost and is slower than DRAM if your priority is performance. A lot of times a middle ground like that is still valuable, but since cache hierarchies are a thing, having a bit of fast DRAM and a lot of cheap SSD serves that part of the market well too.

And in the meantime ordinary SSDs got faster and cheaper and DRAM got faster and cheaper. Now you can get older systems with previous generation DRAM that are faster than Optane for less money. They stopped making it because people stopped buying it.

ronsor · on Sept 7, 2023

The problem with that is LLM speed is mostly bottlenecked by memory bandwidth. Slower RAM means worse performance.

bugglebeetle · on Sept 7, 2023

Apple is already training their own LLM to rival GPT-4, so I doubt it will take that long.

visarga · on Sept 7, 2023

> vs by improving model architectures to be more efficient?

or data quality, you get more from small models if you use high quality data