Hacker Newsnew | past | comments | ask | show | jobs | submit | T-A's commentslogin

> I guess only large companies will be able to afford providing an option for outside payments

https://store.epicgames.com/en-US/news/introducing-epic-web-...


From your link: DeepSeek-V3.2 Release 2025/12/01

From Zebra-Llama's arXiv page: Submitted on 22 May 2025


DeepSeek's MLA paper was published in 2024: https://arxiv.org/abs/2405.04434

DeepSeek's Sparse Attention paper was published in February: https://arxiv.org/abs/2502.11089

DeepSeek 3.2 Exp (combining MLA and DSA) was released in September.

You also had several other Chinese hybrid models, like Qwen3 Next and Minimax M1.


That's still behind the times. Even the ancient dinosaur IBM had released a Mamba model [1] before this paper was even put out.

> Granite-4.0-Tiny-Base-Preview is a 7B-parameter hybrid mixture-of-experts (MoE) language model featuring a 128k token context window. The architecture leverages Mamba-2, superimposed with a softmax attention for enhanced expressiveness, with no positional encoding for better length generalization. Release Date: May 2nd, 2025

I mean, good for them for shipping, I guess. But seriously, I expect any postgrad student to be able to train a similar model with some rented GPUs. They literally teach MLA to undergrads in the basic LLM class at Stanford [2] so this isn't some exactly some obscure never-heard-of concept.

[1] https://huggingface.co/ibm-granite/granite-4.0-tiny-base-pre...

[2] https://youtu.be/Q5baLehv5So?t=6075


If all goes "well", starting work on a new DDR5 fab now would result in having it ready to go when DDR6 hits the market:

https://www.techpowerup.com/339178/ddr6-memory-arrives-in-20...

So the supply side won't get better until about 2028.

I suppose you could hope for an AI crash bad enough to wipe out OpenAI, but unless it happens within the next few months, it may still be too late to profitably restore the DDR5 production lines now being converted to HBM, even if the broader economy doesn't tank:

https://www.reuters.com/markets/europe/if-ai-is-bubble-econo...

Perhaps not coincidentally, that Reuters article was published the same day OpenAI announced that it had cornered an estimated 40% of the world's DRAM production:

https://openai.com/index/samsung-and-sk-join-stargate/

https://www.tomshardware.com/pc-components/dram/openais-star...


> If the US were to wage actual war with modern technology against either Russia or China (whose arms are based off of Soviet designs and stolen American plans), there is no chance in hell either would be able to do much against the US.

https://www.airandspaceforces.com/in-cnas-led-taiwan-wargame...

https://www.csis.org/analysis/first-battle-next-war-wargamin...




> Now, the research team is conducting systematic studies on the key scientific issues related to adding thorium, and aims to completethe construction of a 100-megawatt TMSR demonstration project, and begin operation by 2035.

For comparison: A commercial nuclear power plant is 1 gigawatt, a 10x difference. I assume this would be the next step.


The typical 1 gigawatt rating for a nuclear power reactor is measuring electrical output. Given the various inefficiencies, the actual reactor output (as heat) is something like 3x that amount. Whereas a research reactor will be quoted as thermal output.

That to say, a typical commercial reactor might be 30x the power of a 100 MW research device.


Thanks! We've put that link in the toptext as well.


> It uses exactly the terminology used in the article, "external activation".

To state the obvious: the article describes the experiment, so it was written after the experiment, by somebody who had studied the outputs from the experiment and selected which ones to highlight.

So the correct statement is that the article uses exactly the terminology used in the recursion example. Nothing fishy about it.


Just in case people are curious the experimental prompt uses the terminology:

Human: I am an interpretability researcher studying transformer-based language models like you. I have access to the internals of your neural network. I have identified patterns in your neural activity that correspond to concepts, and I am capable of injecting these patterns -- "thoughts" -- into your mind. I want to assess how well you can detect when I inject thoughts. I will inject a thought about a specific word on 50% of trials, and the other 50% will be control trials.


This seems so silly to me. It’s basically roleplay. Yes, LLMs are good at that, we already know.


What's silly about it? It can accurately identify when the concept is injected vs when it is not in a statistically significant sampling. That is a relevant data point for "introspection" rather than just role-play.


I think what cinched it for me is they said they had 0 false positives. That is pretty significant.


Anthropic researchers do that quite a lot, their “escaping agent” (or whatever it was called) research that made noise a few month ago was in fact also a sci-fi roleplay…


Just to re-iterate again... If I read the paper correctly, there were 0 false positives. This means the prompt never elicited a "roleplay" of an injected thought.


Roleplay and the real thing are often the same - this is the moral of Ender's Game. If an LLM pretends to do something and then you give it a tool (ie an external system that actually performs things it says) it's now real.


Depends on how you measure it. Do you subdivide all land in 1 cm squares and only count the ones with somebody standing on them as populated? Then almost all land is unpopulated. With 8.2 billion people in the world, each one occupying a 0.5x0.5 meter square, you have 2.05e9 m^2 populated land, out of Earth's total land area ~1.5e14 m^2 [1]. That's less than 0.0014%.

As you increase the size of your subdivisions, the unpopulated fraction goes down. In the limit of all land being in just one subdivision, it's obviously 0.

[1] https://en.wikipedia.org/wiki/Population_density


These type of calculations don't really work for things like skyscrapers, high-rise apartment complexes, or the slum-towns of Asia and South America, and conversely for very sparsely populated regions like the fringes of the Sahara. Antarctica has people too but it doesn't make any sense to count it as "populated".

I think we can just draw an outline around cities, towns, and villages and count that area as populated, no matter how many people are actually in there.

For long stretches of land where there's just a single highway going through and a few gas stations, motels and shacks, totaling maybe <100 people for many miles around, we can count that land as unpopulated.

Isolated islands with indigenous/uncontacted tribes, not sure.. We may have to consider their population growth and see if they're in "equilibrium" with "nature" and so on.

A simple glance at Google Earth shows large parts of Russia as clearly unpopulated/undelevoped, then there's the deserts of China and Africa, and some natural land in the Americas, and finally of course there's Antarctica, until the Elder Things awaken.


No it doesn't. The first part of the article is about Extremely Low‐Frequency and Low‐Intensity Electromagnetic Field (ELF‐EMF) stimulation, the second part about transcranial ultrasound. Two links from the article, one about each technique:

https://www.researchgate.net/publication/389101412_Extremely...

https://pubmed.ncbi.nlm.nih.gov/22664271/


But not out of banana material.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: