Hacker Newsnew | past | comments | ask | show | jobs | submit | jokowueu's commentslogin

Oh it's written by Nadim Kobeissi, such a huge fan of his work didn't expect him see him here

In the README:

> Approximately 95% of the engineering work was done by Lyapsus. Lyapsus improved an incomplete kernel driver, wrote new kernel codecs and side-codecs, and contributed much more. I want to emphasize his incredible kindness and dedication to solving this issue. He is the primary force behind this fix, and without him, it would never have been possible.

> I (Nadim Kobeissi) conducted the initial investigation that identified the missing components needed for audio to work on the 16IAX10H on Linux. Building on what I learned from Lyapsus's work, I helped debug and clean up his kernel code, tested it, and made minor improvements. I also contributed the solution to the volume control issue documented in Step 8, and wrote this guide.


For those wondering:

> Sincere thanks to everyone who pledged a reward for solving this problem. The reward goes to Lyapsus.


I didn't mean he wrote the fix but the read me instead, looking back at my comment people might have assumed that Nadim made the fix

It's not a closed loop though , many use evaporative cooling towers ( wet towers )

But that water remains in the water cycle. With agriculture the water goes into the crops and is then shipped off to other places, exiting the water cycle of its origin.

That's backwards. When data centers evaporate water for cooling, it becomes vapor that blows away to fall as rain somewhere else then it's gone from the local area or its discharged a waste water. Farm water mostly stays put but plants release it back into the local air, excess irrigation soaks into local groundwater, and only a fraction leaves in the harvested crops.

Farmers can reuse the same local water year after year. Data centers need fresh water constantly because their evaporated water doesn't come back.


“But the water cycle” is the dunning-krugerest counter argument of them all. It assumes the reader doesn’t remember 4th grade science class, while misapplying that same basic knowledge.

There’s a fundamental difference between water ending up in a tomato which is shipped across the world and leaves permanently and water that evaporates and rains down later. Regardless of whatever names you call me that is true.

Metabolic dysfunction is the root of many diseases which addiction is one of them .


Where were you two weeks ago! Gonna try it


Rather, where were they three years ago?


Iraq flash backs , they were sure very happy to greet their liberators , it's amazing to see propaganda's effects working in action


How much are NPUs more efficient than GPUs ? What are the limitations , it seems it will have support for deepseek R1 soon


A decent chunk of AI computation is the ability to do matrix multiplication fast. Part of that is reducing the amount of data transferred to and from the matrix multiplication hardware on the NPU and GPU; memory bandwidth is a significant bottleneck. The article is highlighting 4-bit format use.

GPUs are an evolving target. New GPUs have tensor cores and support all kinds of interesting numeric formats, older GPUs don't support any of the formats that AI workloads are using today (e.g. BF16, int4, all the various smaller FP types).

NPU will be more efficient because it is much less general an GPU and doesn't have any gates for graphics. However, it is also fairly restricted. Cloud hardware is orders of magnitude faster (due to much higher compute resources I/O bandwidth), e.g. https://cloud.google.com/tpu/docs/v6e.


NPU also has no more memory bandwidth than CPU, but then the GPU on these machines doesnt either.


Agree on NPU vs CPU memory bandwidth, but not sure about characterizing the GPU that way. GDDR is usually faster than DDR of the same generation, and on higher end graphics cards has a width bus width. A few GPUs have HBM and pretty much all datacenter ML accelerators (NVidia B200 / H100 / A100, Google TPU, etc). The PCIe bus between the host memory and GPU memory is a bottleneck for intensive workloads.

To perform a multiplication on CPU, even SIMD, that values have to fetched and converted to a form the CPU has multipliers for. This means smaller numeric types penalised. For a 128-bit memory bus, an NPU can fetch 32 4-bit values per transfer; the best case for a CPU is 16 8-bit values.

Details are scant on Microsoft's NPU, but it probably has many parallel multipliers; either in the form of tensor cores or a systolic array. The effective number of matmul's per second (or per memory operation) is higher.


Yeah standalone GPUs do indeed have more bandwidth, but most of these Copilot PCs that have NPUs just have shared memory for everything I think.

fetching 16 8 bit values vs 32 4 bit values is the same, this is the form they are stored in memory. Doing some unpacking into more registers and back is more or less free anyway, if you are memory bandwidth bound. Largely on these lower end machines everything is memory bound not compute bound, although the CPUs cant often use the full memory bandwidth in some systems (eg the Macs) but the GPU can.


Yes, agree. Probably the main thing is the NPU is just a dedicated unit without the generality / complexity of a CPU and so able to crunch matmuls more efficiently.


Fantastic , thanks


What browser are you using ? Its laggy mess on mine


Firefox with NoScript. Javascript is slow indeed.


Much less of an issue with XR and after a few months you won't feel any discomfort

Most annoying thing for me was the metal taste in my mouth for a few months

Had to take it for my reactive hypoglycemia


Standardizing ink cartridges would be amazing


Unlike smart watches where integrating the watch and phone probably has lots of opportunity for innovation left, I agree there isn't much value in proprietary printer inks and standardization probably has more consumer benefit.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: