Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This weekend I just cracked into nanoGPT (https://github.com/karpathy/nanoGPT), an older but fabulous learning exercise where you build and train a crappy shakespeare GPT with ~0.8M parameters on a cpu. Results are about what you'd expect from that, they suck, but you can start to feel the magic, especially if you're not a deep learning professional and you just want to poke around and hack on it.

I started writing up a blog post on my weekend with nanoGPT but it's not done yet... Would have been great to link to here lol oh well



It's a useful exercise. A lot of the good ML work is first validated at small scale.

And this new example goes even further - adds instruction following and tool use SFT, as well as RLVR. Makes for a more useful baseline.


Absolutely, it's wildly fun to read the outputs of even a little tiny 0.8M model trained on CPU. And now I've actually got a much better understanding of the transformer architecture after playing around with it for a day. This repo is probably going to spawn some new folks to try out ideas which will turn into new researchers in the field, no doubt.


the shakespeare code tuned a little with different training data does a good job of generating Magic The Gathering commander decks


Somewhat related: I wrote up a MTG card generator based on nanoGPT a while ago that I think produces pretty good results for being 1m parameters.

The real neat thing about this is that WotC makes a few thousand new cards each year, so my training data set just grows over time and the model gets better with no effort spent on my part.

https://github.com/jlwitthuhn/TCGGPT


It would be interesting to come up with a use case which requires a freshly trained model and isn't just something that generic models can already, especially with 1MM context window


would love more details on this. this is exactly the type of project I'd like to dabble in to get more up to speed.


People have been doing this for a while.

https://x.com/roborosewater

https://bsky.app/profile/roborosewaterm.bsky.social

You can see the invention of RLHF/ChatGPT here because text generation suddenly became much more coherent and also much less interesting. You have to go back to older tech for surrealism because nobody will let you see the good stuff (the base models).


I guess I was much more interested in being able to work with an LLM to create good, synergistic Commander decks and less interested in generating custom Magic cards.

I'm sure I can dig up info on how to do this and piece it together, but I thought OP might have a guide specifically for it.


FWIW, there was a pretty popular post on HN around generating MTG cards using AI a couple years back but I believe that their approach was a fine-tune on an existing LLM.

https://news.ycombinator.com/item?id=37427854


I like the idea of specific-purpose toy models. How did you tune the code and what dataset you used?




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: