Hacker Newsnew | past | comments | ask | show | jobs | submit | desideratum's commentslogin

The Scaling ML textbook also has an excellent section on TPUs. https://jax-ml.github.io/scaling-book/tpus/


I also enjoyed https://henryhmko.github.io/posts/tpu/tpu.html https://news.ycombinator.com/item?id=44342977 .

The work that XLA & schedulers are doing here is wildly impressive.

This feels so much drastically harder to work with than Itanium must have been. ~400bit VLIW, across extremely diverse execution units. The workload is different, it's not general purpose, but still awe inspiring to know not just that they built the chip but that the software folks can actually use such a wildly weird beast.

I wish we saw more industry uptake for XLA. Uptakes not bad, per-se: there's a bunch of different hardware it can target! But what amazing secret sauce, it's open source, and it doesn't feel like there's the industry rally behind it it deserves. It feels like Nvidia is only barely beginning to catch up, to dig a new moat, with the just announced Nvidia Tiles. Such huge overlap. Afaik, please correct if wrong, but XLA isn't at present particularly useful at scheduling across machines, is it? https://github.com/openxla/xla


I do think it's a lot simpler than the problem Itanium was trying to solve. Neural nets are just way more regular in nature, even with block sparsity, compared to generic consumer pointer-hopping code. I wouldn't call it "easy", but we've found that writing performant NN kernels for a VLIW architecture chip is in practice a lot more straightforward than other architectures.

JAX/XLA does offer some really nice tools for doing automated sharding of models across devices, but for really large performance-optimized models we often handle the comms stuff manually, similar in spirit to MPI.


I agree with regards to the actual work being done by the systolic arrays, which sort of are VLIW-ish & have a predictable plannable workflow for them. Not easy, but there's a very direct path to actually executing these NN kernels. The article does an excellent job setting up how great at win it is that the systolic MXU's can do the work, don't need anything but local registers and local communication across cells, don't need much control.

But if you make it 2900 words through this 9000 word document, to the "Sample VLIW Instructions" and "Simplified TPU Instruction Overlay" diagrams, trying to map the VLIW slots ("They contain slots for 2 scalar, 4 vector, 2 matrix, 1 miscellaneous, and 6 immediate instructions") to useful work one can do seems incredibly incredible challenging. Given the vast disparity of functionality and style of the attached units that that governs, and given the extreme complexity in keeping that MXU constantly fed, keeping very tight timing so that it is constantly well utilized.

> Subsystems operate with different latencies: scalar arithmetic might take single digit cycles, vector arithmetic 10s, and matrix multiplies 100s. DMAs, VMEM loads/stores, FIFO buffer fill/drain, etc. all must be coordinated with precise timing.

Where-as Itanium's compilers needed to pack parallel work into a single instruction, there's maybe less need for that here. But that quote there feels like an incredible heart of the machine challenge, to write instruction bundles that are going to feed a variety of systems all at once, when these systems have such drastically different performance profiles / pipeline depths. Truly an awe-some system, IMO.

Still though, yes: Itanium's software teams did have an incredibly hard challenge finding enough work at compile time to pack into instructions. Maybe it was a harder task. What a marvel modern cores are, having almost a dozen execution units that cpu control can juggle and keep utilized, analyzing incoming instructions on the fly, with deep out-of-order depenency-tracking insight. Trying to figure it all out ahead of time & packing it into the instructions apriori was a wildly hard task.


Thanks for sharing this. I agree w.r.t. XLA. I've been moving to JAX after many years of using torch and XLA is kind of magic. I think torch.compile has quite a lot of catching up to do.

> XLA isn't at present particularly useful at scheduling across machines,

I'm not sure if you mean compiler-based distributed optimizations, but JAX does this with XLA: https://docs.jax.dev/en/latest/notebooks/Distributed_arrays_...


In Itanium's heyday, the compilers and libraries were pretty good at handling HPC workloads, which is really the closest anyone was running then to modern NN training/inference. The problem with Itanium and its compilers was that people obviously wanted to run workloads that looked nothing like HPC (databases, web servers, etc) and the architecture and compilers weren't very good at that. There have always been very successful VLIW-style architectures in more specialized domains (graphics, HPC, DSP, now NPU) it just hasn't worked out well for general-purpose processors.


Side note, just ran into this article that mentions how Amazon is planning to have XLA / JAX support in the future for their Trainium's. https://newsletter.semianalysis.com/p/aws-trainium3-deep-div...


Aside: this guy regularly posts on the Discord server for an open-source post-training framework I maintain, demanding repayment for bugs in nightly builds and generally abusing the maintainers.


I assume you offered him to buy a support contract or get banned. Otherwise why is he still allowed to do that?


This is an exceptional salary for the UK.


Plus, jobs like this are for people that have a connection with the organization. I can't see myself doing anything, however good the pay might be, for Arsenal. Would be very glad to do that for Chelsea.


Oh that’s a fascinating reaction I never thought of. I mean people might refuse to work for any gambling company or any arms maker, but I cannot I imagine someone offered a job in banking refuse say Goldman but take JP Morgan, simply because their family have been Morgan fans for generations and would never accept any other bank …


Not sure if you intentionally obtuse or simply don't get it. There is no reason to be a fan of an organization whose main goal is to be making money, but there are plenty of reasons to be a fan of an organization whose main goal is to be better at a sport than other similar organizations.


I don't think the distinction is about whether the organizations make money; the distinction is about entertainment. Professional sports are primarily a form of entertainment. Sports fans are a bit like music fans in that regard. The rivalries in sports can get toxic sometimes, but there's weird snobbery in music and other arts as well.


There's no reason to be a fan of an organization whose main goal is to make money, yet GP said he's a Chelsea fan.


You don't think professional soccer teams are an organization whose main goal is to make money?


If it is, and it probably isn't, they are really not very good at it.


So they are looking for people in the Venn diagram cross section of AI researchers and Arsenal FC supporters, of which I imagine there are very few.


You underestimate how much being a soccer fan means, especially in countries like UK.


I met some Americans who would never work for the New England Patriots.


Having done a bit of consulting for Chelsea FC, I wouldn't recommend working there in an office job. Poor pay (I doubt anyone below "Head of"/CxO even touches 6 figures) and very average working conditions.


I thought at first that you might be bantering but after reading your last sentence I am not sure anymore. I don't think being associated with Arsenal at this posting is ever going to be a blemish on anyone's CV. They are higher than Chelsea in PL standings at this moment.


Why bantering?

150k a year puts you in the 1% earners in UK, plenty to comfortable live on.

As you mentioned, having such a job in ones CV can only help.

But, I've been a Chelsea fan for most of my life, if I take such a job at Arsenal and do it properly, I'll be actively working against the organization that brought me so many emotions over the past 30ish years.


Aren’t they both owned by pompous rich dudes and private equity, like most companies? Tasty is the boot we enjoy licking.


Arsenal's owned by Stan Kroenke. Chelsea by Todd Boehly the LA Dodgers guy.


Everything is owned by rich people, is your solution to not enjoy anything?


Actually in countries like Portugal or Germany, most clubs are owned by its members (or at the very least, 51% owned by its members), which can e.g. vote on its president.


You're not totally wrong about the 51% thing (although it's really 50% + 1 vote), but it's not like that is a panacea that keeps out corporate interests. Leverkusen and Wolfsberg are owned by Bayer and Volkswagen respectively (although I understand there's historical reasons for that) and Leipzig is 99% owned by Red Bull, but they only have 50% - 1 voting rights to comply.


Yes I think that’s just in the top 1% of salaries

But I’m pretty sure Brad got Jonah Hill for much less.


Surely you mean Scorcese got Jonah??


I think he means in Moneyball


I'd reccomend checking out the CUDA mode Discord server! They also have a channel for Metal https://discord.gg/ZqckTYcv


torchtune (https://github.com/pytorch/torchtune) - a PyTorch library for fine-tuning LLMs, particularly for memory-constrained setups. Try it out and fine-tune Llama3.1 8B on a single RTX 4090!


Hi Stefano. I'm an ML Engineer/Researcher looking for a new role, and very interested in learning more about Epistemic. Am I correctly visualizing something similar to https://www.connectedpapers.com/ as part of your product? I can see incredible potential using ML/NLP to build knowledge graphs from a diverse set of sources of biomedical knowledge, from derisking future research to exposing connections and ideas people haven't even considered! Thanks, Salman


Hi Salman, happy to chat, shoot me an email!


Nick Bostrom's "Superintelligence" is a sober perspective on this issue and a very worthwhile read.


Yup that's a good recommendation. I've read it and some of the AI Safety work that a small portion of the AI community is working on. At the moment there seems no reason to believe that we can solve this.


Hi David, wonder if you'd be open to remote within the UK? I'd be interested in the ML Engineer role primarily, but would also happily be considered for other roles.


We're open to remote depending on the role. Naturally some roles can't be remote due to the nature of our work. Please reach out to me at david@optimal.ag.


Some truly impressive results. I'll pick my usual point here when a fancy new (generative) model comes out, and I'm sure some of the other commenters have alluded to this. The examples shown are likely from a set of well-defined (read: lots of data, high bias) input classes for the model. What would be really interesting is how the model generalizes to /object concepts/ that have yet to be seen, and which have abstract relationships to the examples it has seen. Another commenter here mentioned "red square on green square" working, but "large cube on small cube", not working. Humans are able to infer and understand such abstract concepts with very few examples, and this is something AI isn't as close to as it might seem.


It seems unlikely the model has seen "baby daikon radishes in tutus walking dogs," or cubes made out of porcupine textures, or any other number of examples the post gives.


It might not have seen that specific combination, but finding an anthropomorphized radish sure is easier than I thought: type "大根アニメ" in your search engine and you'll find plenty of results


Image search “大根 擬人化” do return similar results to the AI-generated pictures, e.g. 3rd from top[0] in my environment, but sparse. “大根アニメ” in text search actually gives me results about an old hobbyist anime production group[1], some TV anime[2] with the word in title...hmm

Then I found these[3][4] in Videos tab. Apparently there’s a 10-20 year old manga/merch/anime franchise of walking and talking daikon radish characters.

So the daikon part is already figured in the dataset. The AI picked up the prior art and combined it with the dog part, which is still tremendous but maybe not “figuring out the daikon walking part on its own” tremendous.

(btw anyone knows how best to refer to anime art style in Japanese? It’s a bit of mystery to me)

0: https://images.app.goo.gl/LPwveUJPWHr6oK8Y8

1: https://ja.wikipedia.org/wiki/DAICON_FILM

2: https://ja.wikipedia.org/wiki/%E7%B7%B4%E9%A6%AC%E5%A4%A7%E6...

3: https://youtube.com/watch?v=J1vvut5DvSY

4: https://youtu.be/1Gzu2lJuVDQ?t=42


> anyone knows how best to refer to anime art style in Japanese?

The term mangachikku (漫画チック, マンガチック, "manga-tic") is sometimes used to refer to the art style typical of manga and anime; it can also refer to exaggerated, caricatured depictions in general. Perhaps anime fū irasuto (アニメ風イラスト, anime-style illustration), while a less colorful expression, would be closer to what you're looking for.


At least for certain types of art, sites such as pixiv and danbooru are useful for training ML models: all the images on them are tagged and classified already.


If you type in different plants and animals into GIS, you don’t even get the right species half the time. If GPT-3 has solved this problem, that would be substantially more impressive than drawing the images.


What is GIS? I only know Geographical Information System.


probably Google Image Search


Yea, with these kind of generative examples, they should always include the closest matches from the training set to see how much it just "copied".


It's very hard to define closest...


This is a spot on point. My prediction is that it wouldn't be able to. Given its difficulty to generate correct counts of glasses, it seems as though it still struggles with systematic generalization and compositionality. As a point of reference, cherrypicking aside, it could model obscure but probably well-defined baby daikon radish in tutu walking dog, but couldn't model red on green on blue cubes. Maybe more sequential perception, action, video data or system-2 like paradigm, but it remains to be seen.


Yes, I don't really see impressive language (i.e. GPT3) results here? It seems to morph the images of the nouns in the prompt in an aesthetically-pleasing and almost artifact-free way (very cool!).

But it does not seem 'understand' anything like some other commenters have said. Try '4 glasses on a table' and you will rarely see 4 glasses, even though that is a very well-defined input. I would be more impressed about the language model if it had a working prompt like: "A teapot that does not look like the image prompt."

I think some of these examples trigger some kind of bias, where we think: "Oh wow, that armchair does look like an avocado!" - But morphing an armchair and an avocado will almost always look like both because they have similar shapes. And it does not 'understand' what you called 'object concepts', otherwise it should not produce armchairs where you clearly cannot sit in due to the avocado stone (or stem in the flower-related 'armchairs').


> I would be slightly more impressed about the language model if it had a working prompt like: "A teapot that does not look like the image prompt."

Slightly? Jesus, you guys are hard to please.


Right, that was unnecessary and I edited it out.

What I meant is that 'not' is in principal an easy keyword to implement 'conservatively'. But yes, having this in a language model has proven to be very hard.

Edit: Can I ask, what do you find impressive about the language model?


Perhaps the rest of the world is less blasé - rightly or wrongly. I do get reminded of this: https://www.youtube.com/watch?v=oTcAWN5R5-I when I read some comments. I mean... we are telling the computer "draw me a picture of XXX" and it's actually doing it. To me that's utterly incredible.


> "draw me a picture of XXX" and it's actually doing it. To me that's utterly incredible.

Sure, would be, but this is not happening here.

And yes, rest assured, the rest of the world is probably less 'blasé' than I am :) Very evident by the hype around GPT3.


I'm in the open ai beta for GPT-3, and I don't see how to play with DALL-E. Did you actually try "4 glasses on a table"? If so, how? Is there a separate beta? Do you work for open ai?


In the demonstrations click on the underlined keywords and you can select alternates from dropdown menu.


Sounds like the perfect case for a new captcha system. Generate a random phrase to search an image for, show the user those results, ask them to select all images matching that description.


Thanks for this. I had the same thought about this being a lesson they'll need to learn. The other two engineers put up very little resistance and seem to be perfectly happy with their TC.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: