ARM’s Cortex A710: Winning by Default

modeless · on Aug 11, 2023

I wish someone, anyone, would attempt to compete with Apple on single threaded performance in phone SoCs. Nobody is even trying. They all have business strategies that prioritize other things.

brucethemoose2 · on Aug 11, 2023

Huge, GPU heavy SoCs are not economical without the huge volume, software ecosystem and margins of Apple.

Smaller, higher clocked SoCs make sense for Android.

And seperate CPUs+dGPUs are what users want on PCs. Intel and AMD tried to sell GPU heavy designs (eDRAM Broadwell, Vega M, Van Gogh (the Steam Deck SoC)) and PC OEMs unequivocally rejected them. And they are trying to make better cores, but again they have to balance die area and target servers and cheap consumer PCs with the same cores.

Dalewyn · on Aug 12, 2023

>And seperate CPUs+dGPUs are what users want on PCs. Intel and AMD tried to sell GPU heavy designs (eDRAM Broadwell, Vega M, Van Gogh (the Steam Deck SoC)) and PC OEMs unequivocally rejected them.

That's because the perceived or real performance of integrated GPUs is crap compared to discrete GPUs, especially when the latter was much cheaper several years ago.

These days though, discrete GPUs have gotten very expensive off the heels of cryptomining and now "AI", and there are signs integrated GPUs might see significant improvements in performance.

If integrated GPUs can provide sufficient performance for the prices demanded, PC users will switch. They couldn't care less if their GPU is integrated or discrete, they care whether they can run Fortnite at 500 frames per second.

Panzer04 · on Aug 12, 2023

I think it will always be hard to justify. The amount of different configurations proliferate very quickly, so you end up spending a lot of die area etc on parts that much of the market doesn't need.

Intel and AMD seem to have settled on sticking a relatively small (but still significant die-wise) GPU, sufficient for 95% of consumers. They can't justify spending the die area needed for a large, powerful GPU that a relatively small segment of the market wants (and even then there's 2-3 cores from low end to high end, just for the GPU)

To achieve the same market for today you're looking at 2x CPU dies and 3x GPU dies, or at least 6 different combinations of CPU and GPU, and you still have loads of people who don't want the GPU at all. It just doesn't make sense to fully integrate them unless die area is so cheap that power consumption is the main limiting factor (the big justification for lots of fixed function hardware - but that's all way smaller than a GPU

Dalewyn · on Aug 12, 2023

>Intel and AMD seem to have settled on sticking a relatively small (but still significant die-wise) GPU

Intel yes, but AMD only finally began integrating a GPU into most of their CPU line up with the latest Zen 4 generation.

>a large, powerful [integrated] GPU that a relatively small segment of the market wants

I think you're underselling the demand there, particularly if the M* line of CPUs with iGPUs from Apple are anything to go by.

Like I said before, people want powerful graphics processing at an affordable (FSVO affordable with regards to Apple) price.

>and you still have loads of people who don't want the GPU at all.

Intel's been doing iGPUs for well over a decade now because "loads of people" do want iGPUs.

Enthusiasts and professionals alike want them because they are a Lowest Common Denominator fallback for when shit hits the fan. People on a budget (whether monetary or electric) want them regardless of their poor performance because they need a display output, especially in recent years.

And even if you have a discrete GPU anyway, an integrated GPU is still useful to have for offloading certain tasks.

>It just doesn't make sense to fully integrate them unless die area is so cheap that power consumption is the main limiting factor

Five years ago I would have completely agreed with you, but with how bloody expensive discrete GPUs are now I feel there's a vacated market segment where integrated GPUs with sufficient performance and reasonable pricing can come in to conquer and dominate; not unlike how discrete sound cards became extinct when integrated sound cards became Good Enough(tm).

mjan22640 · on Aug 12, 2023

Whats the disadvantage in running regular applications and processes on GPU instead of CPU?

tacker2000 · on Aug 12, 2023

They are different beasts for different tasks. Basically, GPUs are optimized for calculating small stuff in parallel (polygons). CPUs calculate big stuff serially, until the advent of multi-core obviously, but GPUs have much more parallel pipelines.

nine_k · on Aug 12, 2023

Impossibility to do so, due to dire architectural differences.

It's like asking why you can't wash dishes in a washing machine intended for clothes.

grogenaut · on Aug 12, 2023

For me it's more that it decouples the upgrade cycle a ton and lets me amoritize keeping my PC up to date over the years. My current PC setup is worth like $8-11k depending on how you price it. But there are some parts in there still (Drives) from 2008. I'd probbably still have monitors from 2012 but there was a hella sale in 2019 which let me upgrade them. My 2nd gpu to drive 4 of my 5 monitors is still a 2080ti from 2017/2018ish.

smoldesu · on Aug 12, 2023

> and there are signs integrated GPUs might see significant improvements in performance.

What exactly are you referring to here?

Dalewyn · on Aug 12, 2023

Intel is looking to integrate their ARC GPUs into their CPUs from 14th gen Meteor Lake onwards. If those are anywhere near as powerful as their discrete counterparts, things could become interesting.

smoldesu · on Aug 12, 2023

Correct me if I'm wrong, but that's basically what AMD did with their Vega lineup. I quite like AMD's mobile GPU lineup, but their desktop Vega cards got trashed by the AMD's last-gen cards and wasn't even visible from the Nvidia benchmark at the time. Those cards were discontinued faster than a bat outta hell.

Intel has also done a fairly good job pushing their mobile GPUs forward, but their most powerful card is currently comparable to last-gen mobile GPUs from Nvidia. Unless Intel intends to be a TSMC customer, I'd be surprised if they could outperform AMD's attempt at that.

imtringued · on Aug 12, 2023

Why? The problem with iGPUs has nothing to do with die sharing. Consoles are based on AMD's APUs and they are competitive with discrete GPUs because the main memory is GDDR RAM.

pjmlp · on Aug 13, 2023

Kind of, unless you missed all the FPS drama versus PC gaming.

MonaroVXR · on Aug 12, 2023

The GPU in my 5700u is enough to run Valorant on high.

huijzer · on Aug 11, 2023

I think the main benefit of joining two dies together via some advanced interconnect is the latency. Memory lookups are much quicker on Apple Silicon which, in turn, avoids many wasted clock ticks waiting for data.

brucethemoose2 · on Aug 11, 2023

Are you talking about the M1 Extreme? This is extremely expensive, and I suspect the dual die GPU only works because of Metal.

TBH I prefer AMD's 7900 series approach of pushing the memory controller + cache out onto little memory controller dies. This allows for a monolithic GPU with a really wide bus, without all that wasted space on the controller/cache/pins. And those dies don't care about being split up since the memory access is interleaved anyway.

58028641 · on Aug 12, 2023

Isn’t it Pro, Max, Ultra?

brucethemoose2 · on Aug 12, 2023

thonk I can't remember, you are probably right.

There is the little, medium, and big die, and the 2x big die.

sliken · on Aug 12, 2023

The biggest single die is the m2 max.

Dual die chip is the ultra.

olliej · on Aug 12, 2023

"M1 Extreme? This is extremely expensive" hence the name :D

That said I think it's pro and ultra, and I can't recall which (if not both) use the magic interconnect

flangola7 · on Aug 12, 2023

>And seperate CPUs+dGPUs are what users want on PCs

In the age of unified memory and large AI models, that may not be the case for very long.

smoldesu · on Aug 12, 2023

What are you talking about? How does a unified memory architecture change the bottlenecks of AI inferencing? The fastest AI servers in existence today only exist in heterogeneous form.

wmf · on Aug 12, 2023

If you want, say, 70 GB of VRAM your options are either unified memory or pay $35K.

smoldesu · on Aug 12, 2023

Or you can swap to DRAM and stream over PCI. I do that all the time and it works fine, especially since you rarely need to load all 70gb into memory at once.

If you're a serious (read: enterprise) customer, you can buy InfiniBand-enabled cards and get duplex bandwidth faster than the M2 Max's entire memory bus. 'Unified memory' isn't even a bullet point on their spec sheet, it means nothing to their customers when they have CUDA primatives that do the same thing faster at a larger scale.

justinclift · on Aug 12, 2023

> and get duplex bandwidth faster than the M2 Max's entire memory bus.

Doesn't seem like it. Looking at the wikipedia article for the M2 Max, that says:

    ... with up to 400 GB/s memory bandwidth.

https://en.wikipedia.org/wiki/Apple_silicon#Apple_M2_Max

Looking at the current "Nvidia networking" product page, it lists ConnectX-7 adapters with 400Gb/s total bandwidth:

https://www.nvidia.com/en-us/networking/infiniband-adapters/

So, accounting for the bit -> byte unit difference in those figures it seems like having eight ConnectX-7 adapters would roughly match up.

However, the data sheet for those adapters seems to say those 400Gb/s adapters are for PCIe Gen5 x32 slots (not the standard x16 ones).

    Host interface: PCIe Gen5, up to x32 lanes

Eight slots of x32 lanes doesn't quite seem possible, even with very latest generation AMD EPYC (9004 series) processors:

https://www.amd.com/system/files/documents/epyc-9004-series-...

Those have 128 PCIe Gen5 lanes per cpu. Dual socket systems seem to expand that out a bit, allowing up to 160x usable PCIe lanes in a server.

https://www.servethehome.com/pcie-lanes-and-bandwidth-increa...

So, at least on paper it doesn't seem possible for a single server with a bunch of PCIe Infiniband links to actually match the bandwidth of the M2 Max memory bus. Maybe 3/4 of it though, which isn't terrible. :)

smoldesu · on Aug 12, 2023

InfiniBand is intended to be run in parallel. You could handily exceed the M2 Ultra bandwidth if you ran 4 channels of IB, but rarely do people need more bandwidth than what PCI offers. At least for AI.

justinclift · on Aug 12, 2023

> You could handily exceed the M2 Ultra bandwidth if you ran 4 channels of IB ...

Hmmm, that doesn't seem to be the case?

Those adapters are 400Gb/s "total bandwidth" each. Not 400Gb/s "each way". And you can't get 8 of those adapters into a server (with x32 links anyway).

Where's my calculation going wrong? :)

smoldesu · on Aug 12, 2023

> And you can't get 8 of those adapters into a server (with x32 links anyway).

Nvidia's DGX-100 system promises over 3.2Tb/s across 10x Mellanox ConnectX-6 cards. That's their old Epyc system too, with the Grace Superchip you can get 900GB/s of card interconnect and 1TB/s of memory bandwidth. Either one could exceed the bandwidth of the M2 Max's memory.

The "each way" shtick is worth contesting, and ultimately comes down to how you use Nvidia's CUDA primatives. Agree to disagree on that - however I think my point still stands. "Unified memory architecture" is an anachronism at that scale, literally rendered obsolete by a unified address space and fast enough interconnect.

justinclift · on Aug 13, 2023

Cool. Yeah that's an interesting point about the DGX-100 system. I'd forgotten about those. :)

Looking at the data sheet, it seems to have a maximum of 8 single port Nvidia ConnectX-7 adapters:

https://resources.nvidia.com/en-us-dgx-systems/ai-enterprise...

Bearing in mind the unit conversions (Tb/s vs GB/s), 3.2Tb/s matches up with the 400GB/s bandwidth of the M2 Max's memory bandwidth.

---

Their newly announced Grace and/or Grace Hopper "Superchip" does seem interesting. Haven't (yet) seen now it's supposed to connect to other infrastructure though.

Their whitepaper talks about "OEM-defined I/O" but doesn't (in my skimming thus far) indicate what the upper bounds are.

May look further later on, but we're pretty far into the weeds already. ;)

---

Further along in the whitepaper, it says the "NVLink Switch System" in them communicates with the network at 900GB/s "total bandwidth". If that's indeed the case, they yep they're beating the M2 Max's memory bandwidth (400GB/s).

That even beats the M2 Ultra's memory bandwidth (800GB/s):

https://en.wikipedia.org/wiki/Apple_silicon#Apple_M2_Ultra

ohgodplsno · on Aug 12, 2023

Cool story, but nobody wants 70GB of VRAM except the tiniest portion of ultra enthusiasts that already have $35k to blow anyways. Top end GPUs are already extremely rare. Steam reports 0.72% of RTX 4090.

Running LLMs and various models locally will not require 70GB of RAM. It'll go down to manageable amounts, or noone except a tiny portion of ultra enthusiasts will run them.

flangola7 · on Aug 12, 2023

> Cool story, but nobody wants 70GB of VRAM except the tiniest portion of ultra enthusiasts

I already said that may not be the case for very long. Read the entire comment next time.

Nobody but the "tiniest portion of ultra enthusiasts" ever needed more than 640K RAM but here we are.

>Running LLMs and various models locally will not require 70GB of RAM.

It will not require 70GB, it will require far more. Jevons paradox, increased efficiency opens more doors. Almost all AI models are single modal and as small as they will ever be, we have not even trained them on more than text yet and text is microscopic. CommonCrawl? How about training on the entire YouTube media repository. Training on every pixel and audio sample owned by Disney. Training on the 6-axis motion data of every accelerometer with a network MAC.

smoldesu · on Aug 12, 2023

I'm gonna drop a big [Citation Needed] next to your claim that "it will require far more". If you're a consumer, you are not going to be running a 70GB model on your computer for regular purposes. It is too large, loading from disk takes too long, and inferencing would be impossible without a massive GPU to accelerate it. Multimodality be damned, that is just too large for any end-user, let alone Joe Shmoe on his iPhone.

If resource consumption increases in AI, that will just further bolster Nvidia's control, and the presence of the cloud for AI compute. I have no idea how you can twist this into an epic win for the M2 Ultra owners who are stuck inferencing with... checks clipboard Metal compute shaders.

flangola7 · on Aug 12, 2023

When a dark raincloud is rolling in we do not need a citation to say we will be very wet soon. If a person never developed the fundamental ability to anticipate future events based on present observations, I cannot help them with that.

smoldesu · on Aug 12, 2023

That's an intentionally vague answer, to the point that I need you to qualify it if you want me to take you seriously. Presently, I've been observing LLMs that retain 90% of their problem-solving ability with just 10% of the model size.

What are you seeing that makes you assume otherwise?

ohgodplsno · on Aug 12, 2023

This kind of justification of events is the exact same blockchain fanatics have been using. Needless to say, that doesn't make your argument look very strong.

flangola7 · on Aug 13, 2023

The only trait blockchain and crypto culture has in common with transformer models is the use of massive compute. Do not lump me in with cryptobros, I want absolutely nothing to do with that hive of thickheaded villainy.

ksec · on Aug 12, 2023

>Nobody is even trying. They all have business strategies that prioritize other things.

That is not true. Not even close. This is exactly what everyone said in 2020-2021 and suggest no one will ever catch up to Apple because they are so far ahead. From software industry / programming industry / web framework prominent figures, yes all the software guys all apparently have ZERO understanding about hardware. And then people blindly follows them as source of truth.

If you actually follow the path of ARM Cortex-X design and using GeekBench as performance indicator ( Please dont argue about using Geekbench as reference ) They have seen performance increase with every generation edging towards Apple's Single Thread Performance. You have to also remember the lead time from design to market is roughly 3-4 years. So a lot of these designs started before the insane internet uproars.

What was at one point 2-2.5x the difference in Single Thread Performance, Cortex X-4, on a N4 node, coming in later this year in Snapdragon Gen 3, should be within 8% difference on equal clock speed, compared to Apple's A16 on N5.

I cant remember the ISO die size difference on X4 and A16, but what used to be the case ARM were only using 60% of die size compared to Apple in X2 era. I would imagine the X3 and X4 has gotten larger. But generally speaking ARM's design are much stricter in Die Size simply because that is their potential customer's margin.

Of course, not every SoC vendor will use the maximum X4 design due to all sort of constrain. Especially on Android which tends to prioritise Multi-Thread scenario. That is not a CPU design issue, but a vendor issue.

So unless Apple A17 has another 10-15% performance jump without increasing clock speed. The ARM Cortex X design is surprisingly close in terms of pref per watts and yet no one is talking about it.

And the good thing about Cortex X design is that they are commercially available for license to anyone. Sony, LG, Samsung, Microsoft, Google, Amazon, Facebook, Oracle, or anyone with interest to fab their own SoC could buy the design from ARM. Heck even Intel or AMD.

modeless · on Aug 12, 2023

> all the software guys all apparently have ZERO understanding

Leave the ad hominem on reddit please.

> not every SoC vendor will use the maximum X4 design due to all sort of constrain

> ARM's design are much stricter in Die Size simply because

... these are the exact "business strategies that prioritize other things" I was referring to. You are proving my point for me.

ksec · on Aug 12, 2023

You are obviously arguing about product, when the post and submitted subject is by my understanding about CPU Design. ARM dont make Cortex CPU to be sold, they sell blueprint and IP only.

In any case I proved that the current design and products are already competing with Apple on single core performance. If you insist that is not the case, that would be arguing AMD and Intel hasn't been competing on CPU design for the past 3 decades.

re-thc · on Aug 12, 2023

> This is exactly what everyone said in 2020-2021 and suggest no one will ever catch up to Apple because they are so far ahead.

Except Nuvia etc happened, i.e. Apple lost a lot of their core talent that made that a thing. Apple has stagnated for generations since this loss. It takes a lot of time to build a good team.

MonaroVXR · on Aug 12, 2023

> From software industry / programming industry / web framework prominent figures, yes all the software guys all apparently have ZERO understanding about hardware.

Which websites give me a better understanding?

Anandtech, website mentioned above and what more?

ksec · on Aug 13, 2023

I actually have a half finished blog post on the subject ( Specifically intended for HN ) but here are some links.

- Anandtech.

- Servethehome

- SemiWiki

- WikiChip

- Asianometry

- Patrick Moorhead on Twitter and Forbes

- Realworldtech

- Beyond3D

- Semiengineering

- Wikipedia ( Yeah )

- TechTechPotato from Dr. Ian Cutress who used to work in Anandtech. Arguably the guy who picked up from Anand.

- And recently, ChipandCheese

That is about 100 hours of reading. You dont need to drill into every single topic. But just continue to read and find answer within these site. The single common theme on these links is that they are all business / industry related. i.e They will provide the "why" some companies are doing certain things. And in real world engineering it is all about trade-offs.

They also have their own set of bias, SemiEngineeing likes to bump up cost of certain tools and cost of making chips. Patrick Moorhead tends to be American bias and may not be the best source for Pure Play Foundry Business. Asianometry is somewhat China bias. SemiWiki is too optimistic on certain things ( Which is why I won the bet Samsung Foundry wont compete with TSMC by 2020 or 2023 ). Realworldtech is somewhat pro ARM according to many who hates ARM. ( You be your own judge ) ServetheHome is Server focus. Beyond3D is no longer updated but an very old gold mine for GPU related topics.

But once you gain enough knowledge within a year or two, you should find 99.999999% of information or so called rumours on the Internet are just pure BS.

And yes, nearly every single links above, those authors who I follow they all visit HN regularly. They just dont comment much. So if you know where on HN to look the 1% of HN comments are still absolute gold.

Iulioh · on Aug 12, 2023

As a newbie the youtube channel Asianometry is very good but don't trust me too much

wyldfire · on Aug 11, 2023

> Nobody is even trying.

Qualcomm has a $1.5B wager that they can, with the same engineers from Apple who helped them get where they are now.

> . Qualcomm, and then Samsung decided licensing ARM’s cores would be easier than trying to outdo them.

...

> ARM has a firm grip on the Android market. Samsung, Qualcomm, and MediaTek may develop their own SoCs, but all use CPU core designs from ARM (the company).

But they've now reversed that decision, so we'll see whether they can change it.

mschuster91 · on Aug 11, 2023

The problem that Qualcomm and all the other ARM manufacturers have: they're impossible for tinkerers to get ahold of, outside of Raspberry Pi and a truckload of shitty, barely supported clones. And even then, the RPi still doesn't have the basics of PCIe working - in 2023 [1]. What. The. Fuck. Yes, Apple theoretically offers ARM devices, but they are not cheap, not extendable at all beyond USB-C and a ton of stuff doesn't work on Linux.

In contrast, say I want to develop something on Intel? No problem, I head to Amazon, buy a CPU, a motherboard and try if that old ATX power supply is still working. I plug in whatever card I need and it Just Works.

Steve Ballmer was right on track with "developers, developers, developers" - because if the ecosystem is crap or impossible to use for creative people on a low budget, guess what, they won't and go for the alternative. Linux started out on x86 for a reason, and Android blasted Windows Mobile (and everyone else but Apple) to pieces despite it being a solidly established player. A large part was due to neglected developers: outdated APIs, expensive and half-broken dev tools (developing for WinCE was a real pain in the proverbial arse), and a complete inability to even try to match Apple's innovation. Apple had a monopoly on capacitive touchscreens for years!

If the ARM ecosystem players actually want to throw punches towards the unholy duopoly of Intel/AMD, they need to standardize on a common core of UEFI and get modular, working components out on the market.

[1] https://www.jeffgeerling.com/blog/2022/external-graphics-car...

lizknope · on Aug 12, 2023

> The problem that Qualcomm and all the other ARM manufacturers have: they're impossible for tinkerers to get ahold of

This sounds more like a problem tinkerers have rather than a problem that Qualcomm or ARM manufacturers have.

Qualcomm doesn't care about the tinkerers so it's not a problem to them.

Embedded ARM systems like my router or smart thermostat seem fine.

Google controls Android and it also seems fine for 99% of their users. There are thousands of Android developers.

There are ARM boot standards like

https://developer.arm.com/documentation/den0029/latest/

https://en.wikipedia.org/wiki/Server_Base_System_Architectur...

But in general embedded device makers don't care about having a common boot system. They want to make their device as cheaply as possible. If the common standard helps them do that then they would adopt it. Otherwise it is unnecessary overhead.

So what do you actually want? A cheap desktop level ARM system? I doubt that is going to happen. The number of people who want such a thing is extremely low even though places like Hacker News make it seem like it would be popular.

mschuster91 · on Aug 12, 2023

What I want is anything ARM that's actually general purpose (i.e. no SoC whose BSP only is usable on Android or whose IO peripherals barely meet standards), with full Linux upstream support for all components, and at least some performance compared to your average Intel i5.

The status quo is that the most powerful general-purpose ARM system you can get is an RPi or its clones (which both fail the performance test, and outside of the RPi a lot fail the Linux test), as the only ones actually able to buy high-performance CPUs are phone manufacturers and large hardware ODMs such as QNAP (who're using Annapurna CPUs IIRC).

Apple is the "compromise" - extremely high performance, to a tune no one other matches, but sparsely documented/supported by anything not macOS (as Asahi Linux shows), and a price tag to match.

imtringued · on Aug 12, 2023

Nothing stops you from buying this machine https://www.ipi.wiki/products/ampere-altra-developer-platfor... (except the poor performance Vs cheaper x86 processors).

mschuster91 · on Aug 12, 2023

Yeah... 2.666 $ and a lead time of 3 weeks. That's exactly the kind of bullshit I'm talking about - the target class for this is embedded developers, but not tinkerers and ordinary developers, the latter of whom you need as a platform the most.

lizknope · on Aug 12, 2023

> AADP is a prototyping system targeting general embedded applications based on the COM-HPC Ampere Altra module and a reference carrier board, the COM-HPC Server Base.

They do use the term embedded on that site which I thought was strange.

Ampere is an ARM server startup. Their big customer so far is Oracle who sells Ampere cloud instances.

https://www.oracle.com/cloud/compute/arm/

I've worked at multiple companies with ARM architectural licenses including ARM itself. No one there cares about making a non embedded, non server, chip in a system at a price equivalent to a desktop or laptop x86 system that you can buy from Newegg / Best Buy.

You may think it is cool but no one is going to make it for you.

The trend for ARM is to continue down the embedded way and at the hyperscale data center area like Amazon Graviton and their competitors with their own custom ARM server chips. Since they are custom you won't be able to buy one to put in your house, only rent it in their cloud.

Ampere is a startup so their goal is to get bought by a big company. I don't know how much longer you will even be able to buy that system.

What is the actual price and performance point that you would pay as in the exact amount of dollars? Go talk to the marketing departments of these companies. They have already looked at this. They think the market is tiny and know the investment would be hundreds of millions and is not worth it.

Why do you want an ARM system so bad? What is wrong with x86 for the non embedded / non server / middle desktop range? Literally billions of people are fine with x86 in this range.

mschuster91 · on Aug 12, 2023

> Why do you want an ARM system so bad?

I want more actually viable options in the ARM space for consumer general-purpose computing. Why? Because Windows laptop battery life sucks ass because both Intel and AMD can't be arsed to do what it takes to even get to half the runtime Apple gets out of their systems. I'm happy with my 2022 MBA, but Apple's pricing surcharge on memory and storage is only possible because there is no competition.

At the moment, the only option consumers have is Apple, and that only runs macOS, but no Windows (and Linux, but with some serious drawbacks). People didn't buy Microsoft's first dabbles in ARM that used Qualcomm SoCs because the SoCs were underpowered garbage (no wonder, they were designed for phones) and there were no apps, app developers didn't develop for ARM because there were no actually usable development machines that didn't cost an arm and a leg/had ridiculous lead times, and there was no Rosetta equivalent because the CPUs were garbage and underpowered... a self-reinforcing circle.

lizknope · on Aug 13, 2023

Highly doubtful that you will see any in the near future.

Qualcomm's previous custom ARM core team and server team got let go. Most went to Microsoft. Microsoft was trying to copy Apple and integrate hardware and software more tightly together. Microsoft laid off their custom ARM core team about 4 months ago. Most of them have gone to ARM. Anything Microsoft does in the future with ARM would be a licensed design from ARM.

Qualcomm bought Nuvia a couple of years ago. Nuvia was a startup that had a custom ARM core design. Most people think Qualcomm will use the team for either the mobile or server markets again. This article does mention desktop but with ARM's lawsuits against them who knows when or if it would happen.

https://www.datacenterdynamics.com/en/news/qualcomm-pushes-f...

The other companies like Amazon will want to push you to their Graviton ARM cloud. Ampere wants to get bought. The people there that I know joined because it was a startup. They want to get rich more than they want to push some kind of ARM general purpose computing concept.

pjmlp · on Aug 13, 2023

Volterra devices are quite affordable.

Windows will always have this issue, because Microsoft isn't Apple, their hardware is only to inspire OEMs, they don't have the means to drag everyone screaming into new hardware no matter what.

Windows NT supported several CPU architectures from day 1, they died as consumers couldn't care less that PowerPC, Alpha, Itanium... were supported.

amluto · on Aug 12, 2023

> Qualcomm doesn't care about the tinkerers so it's not a problem to them.

Qualcomm is selling performance. Performance is a combination of the hardware and the tubing that developers do to make their products perform well on the hardware.

On x86 or M1/M2, developers develop on those chips. Graviton and Qualcomm trail significantly in this regard.

lizknope · on Aug 12, 2023

Listening to the Qualcomm management they seemed more interested in selling licenses to technology than actual physical products.

But Qualcomm is selling more than just performance. Cost, quality, schedule, delivery of products. Qualcomm has shown they can make an SoC good enough satisfy a lot of phone manufacturers and there are thousands of Android developers and applications.

Would the Qualcomm / Google / Android ecosystem be better if they opened up more of the hardware specs to tinkerers? Maybe but I don't think they care or they don't see the business case. Apple's ecosystem is even more closed and their performance seems to be better in general.

I sat in a lot of all hands meetings when I was there listening to revenue and profit breakdown. The QTL licensing division was far more profitable than the QCT hardware division. These numbers are publicly available.

pjmlp · on Aug 13, 2023

Anyone can easily develop on Graviton.

The solution is to do as we did back when everyone on the company shared UNIX development servers. There wasn't an Aix tower under my desk.

geerlingguy · on Aug 11, 2023

To be fair, the PCI Express bus on the current Pi was kind of an afterthought, only meant to work with a very limited set of devices, so I'm pretty sure nobody at Broadcom, and few in the design stages at Raspberry Pi, had ever tested more complex devices with it.

It works fine in _most_ cases for simple devices (USB controllers, SATA, NVMe, WiFi, and the like), but really falls apart for more advanced devices (hardware RAID, GPU, TPU, etc.).

And all Arm processors have to deal with cache coherence issues (which aren't a problem on X86), meaning some drivers (notably, AMD still) need to program for the different architecture (some patches exist but they're not perfect yet, and not in mainline Linux).

mschuster91 · on Aug 11, 2023

> To be fair, the PCI Express bus on the current Pi was kind of an afterthought, only meant to work with a very limited set of devices, so I'm pretty sure nobody at Broadcom, and few in the design stages at Raspberry Pi, had ever tested more complex devices with it.

The first Raspberry Pi was sold over a decade ago and I 'member people actually using them as embedded boards for whatever stuff they were working with eight years ago (especially once the GPU performance became powerful enough to run digital signage). Sorry but that a company like Broadcom can't be arsed to develop a standards-compliant PCIe interface is a joke, and with this kind of attitude the ARM world complains that no one buys their chips?!

(Edit: Oh, just noticed whom I replied to - the person who wrote the article I referred to. HN is a small world indeed, and I guess we share at least some of our frustrations)

> meaning some drivers (notably, AMD still) need to program for the different architecture (some patches exist but they're not perfect yet, and not in mainline Linux).

Drivers... oh don't get me started on that front. Everyone in the x86 space seems to have learned over the last two decades that it is a good idea to submit drivers to the Linux kernel early. Intel and AMD both do that for CPUs and also for a lot of their other stuff. In contrast, the entire embedded world still locks away drivers behind years-old kernel forks, ridiculous NDAs, absurdly expensive dev boards, completely whack u-boot forks and even more whack BSPs.

csdreamer7 · on Aug 11, 2023

> And all Arm processors have to deal with cache coherence issues (which aren't a problem on X86), meaning some drivers

Why are they a problem on Arm and not x86?

NewJazz · on Aug 11, 2023

x86 is two well organized behemoths designing chips (including the base designs for most motherboards).

On ARM, some SoC designers slap together IP blocks without fully understanding the implications. Not all SoC designers. But there are a lot out there, and not all have as high of standards as Intel and AMD.

IIRC, the PCIe implementation on the SolidRun Honeycomb is quite good and can be used with GPUs.

schaefer · on Aug 12, 2023

> ...they need to standardize on a common core of UEFI and get modular, working components out on the market.

Except that this DOES already exists. Right now.

The standard is called: Arm SystemReady SR [1]. I'll be the first to admit: It's far from ideal. There are only a couple of high-end options on the market and lead times are measured in weeks.

But this development is new as of this year. And I think it's terribly exciting movement in the right direction (finally).

[1]: https://www.arm.com/architecture/system-architectures/system...

mschuster91 · on Aug 12, 2023

I admit it's been two years since I last dabbled in embedded, never heard of that one. I hope this initiative finds some success!

saratogacx · on Aug 12, 2023

Qualcomm doesn't have a storefront, that's true, but they have a partner search that will show you all the partner-made development kits you can buy. They aren't going to be nearly as inexpensive as an rPI but they don't make finding products you can get too difficult and it isn't just nabbing some random board from AliExpress.

https://www.qualcomm.com/support/qan/member-directory?active...

IOT_Apprentice · on Aug 12, 2023

I’ve been wondering if ANYONE has done something with the more recent AppleTV4K hardware and ported say ashai or other Linux versions being worked on for Apple Silicon. Or perhaps getting containers running on the hardware.

re-thc · on Aug 12, 2023

> Qualcomm has a $1.5B wager that they can, with the same engineers from Apple who helped them get where they are now.

Maybe or maybe not. Qualcomm has a history of price fixing, bullying and shutting off competitors rather than trying to do better. Perhaps they're just shutting down the Nuvia here too.

Back then nVidia Tegra tried to compete until Qualcomm went around complaining...

The pest is Qualcomm more than anything.

lnsru · on Aug 11, 2023

The question is if customers care about that. Crazy engineering is cool for sure, but it must be profitable too. In my environment people stopped buying newest phones years ago anyway.

makapuf · on Aug 11, 2023

People around me didnt buy the newest phones for benchmarks then either. They just wanted to have the last one, maybe a better screen or better camera, more storage or radio but cpu was way down the list.

olliej · on Aug 12, 2023

I would legit take same soc with improved process _purely_ for battery improvement with improved camera hardware at this point (but I don't play games on my phone, so ???).

Even then the year to year improvements in camera hardware aren't as extreme as they were 5+ years ago.

NovaDudely · on Aug 12, 2023

I think that is the crux of it. New features are either not that big of a leap any more or are to nebulous to be a big attraction.

Things like iPhone 14 satellite SOS calls. Neat tech but the audience for that is not huge.

msh · on Aug 12, 2023

Removed comment, wrong parent

beebmam · on Aug 11, 2023

Who cares? I genuinely haven't needed my phone to be faster for at least 5 years. The only reason my phone becomes slower is because of operating system bloat that they keep introducing on phones.

Just give me an open OS, an old phone, and a web browser and I'm good. If I need serious computation I'll use a 400-core ephemeral cloud computer with 12 TB of RAM, or a modern GPU, depending on workload.

NovaDudely · on Aug 12, 2023

Pretty much. My phone is an Oppo... something. If I benchmark it, it is considered very slow but in general use I never hit the ceiling. A part of all this more starts to look the same as people debating over super cars they will never own. Pure Philosophical.

Yes, there will be many users, especially on here that have their use cases for this extra grunt. But more often than not, for the average person - they will never see or need it.

msh · on Aug 12, 2023

I have a mid-range android from work (Samsung a53) and the UI feels annoyingly sluggish compared to my iphone 12.

beebeepka · on Aug 12, 2023

I have a super low end Xiaomi that is 4 years old and feels just as snappy as when I bought it. I am very sensitive to lag of any sort. Been advocating for 120 Hz screens (on computers and TVs) back in 2000.

What, exactly, is your point? Phones should get faster hardware because the UI is slow? Clearly not a hardware problem. Hasn't been one for at least a decade

msh · on Aug 12, 2023

I was responding to comments about people not needing faster phones and there was no need to compete on single core performance with apple.

Apple phones are always fast in the UI. Some is fast hardware, some is software. But if samsung is not able to stripdown android to run decently on a midrange phone maybe they actually need the faster single core performance.

MonaroVXR · on Aug 12, 2023

Using both the A53 and A52s 5G and I feel that the performance is better on the A52s5G which I'm currently using now the type this comment. I use the A52s5G s lot more.

I think the latter is better due headphone jack, speed and the option to record calls.

re-thc · on Aug 12, 2023

> Nobody is even trying. They all have business strategies that prioritize other things.

That's not true. Everyone tried and gave up. The bully here isn't Apple. It's Qualcomm. They hold the modem patents and the modem market. A phone needs a modem. So when Qualcomm offers you a phone SoC that includes a modem do you take it? It's cheaper than you doing your own SoC + external Qualcomm modem and if you try to do your own modem they'll sue you.

Back in the days nVidia tried to Tegra but was ultimately forced out by Qualcomm and their shady deals and price fixing. Even Apple tried / is trying with the whole Intel modem fiasco. Samsung tried with Exynos. Ultimately you're still stuck with Qualcomm and its modem so people give up.

sliken · on Aug 11, 2023

Or memory bandwidth (800GB/sec).

Or iGPU perf, which of course needs the above mentioned bandwidth.

Or ML inference performance (5 tokens/sec with llama 65B).

vietvu · on Aug 12, 2023

The only thing I care is better battery, I don't need more powerful CPU. Phones are insanely good already.

TiredOfLife · on Aug 11, 2023

They are trying, but ARM is suing and demanding that the cores that are faster than ARM cortex ones be destroyed.

flakiness · on Aug 12, 2023

I didn't know Qualcomm gave up its own CPU core design. And this site covers that as well. Good work folks!

https://chipsandcheese.com/2023/07/12/kryo-qualcomms-last-in...

adithyassekhar · on Aug 12, 2023

Interesting that geekerwan found the exact opposite in their testing of A710 https://youtube.com/watch?v=s0ukXDnWlTY&feature=share8

ARM efficiencies at 13:10. Granted these are across multiple chips. The whole video is a good watch.

skavi · on Aug 13, 2023

There are so many other factors at play in that test. Total platform power of the different devices and all the different implementation details of the SoCs apart from the core choice. Different nodes at different fabs, for example.

Silly to think it invalidates anything said in TFA.

skavi · on Aug 12, 2023

found the exact opposite of what

kasabali · on Aug 12, 2023

performance/watt is worse in A710 than in older A78 according to do tests done in the linked segment

skavi · on Aug 13, 2023

Not even true if you watch the video. The 8+ gen 1 is clearly the most efficient SoC according to their tests.

kasabali · on Aug 14, 2023

Timestamp 14:43. Both mediatek, A710 is worse despite being on a better node.

At 15:45, you can see they even managed to fuck up the little cores in that A510 has worse efficiency than A55.

packetlost · on Aug 11, 2023

I wish I could buy a A710 in a SBC. I'd love to cluster up as many ARM cores as I can.

NewJazz · on Aug 11, 2023

Don't SBCs have a lot of peripherals that aren't particularly useful for a generic compute cluster but balloon up the cost?

Anyway you can find A78s these days if you look hard:

https://www.ipi.wiki/pages/i-pi-smarc-1200

And Rockchip is pushing A76s in their rk3588. Bit of a joke, but it seems like that is the best we got.

adgjlsfhk1 · on Aug 12, 2023

Yes and no. consumer oriented SBCs are a niche enough project that the price of extra silicon is probably more than made up for by the cost of not making a custom chip

packetlost · on Aug 11, 2023

I mean, sure. Less than all the extras that a phone would have. I already have a few RK3588 boards, and they're pretty good.

MaxikCZ · on Aug 12, 2023

Jesus christ that page is anoying, cant scroll fast enough because text takes its time to slide in, constantly something sliding in/out the page, constantly something moving..... I mean the only way they could make it worse is make it so that it animates at 7fps and warms my phone.

imtringued · on Aug 12, 2023

No you don't. You would be better off with hardware that is designed for your usecase.

packetlost · on Aug 12, 2023

Ok, so where do I get it?