In MLIR, there are two representations of memory, `tensor` and `memref`, which enables you to do some high-level things[0] in SSA before "bufferizing" to memrefs, which are eventually lowered to LLVM pointers.
Have you worked with TypeScript? I’m working with both every day and I’m always frustrated by the limits of the 'type' system in Python- sure it’s better than nothing but it’s so basic compared to what you can do in TypeScript. It’s very easy to use advanced generics in TypeScript but a hell to do (sometimes outright impossible) in Python.
Yep, although never in a project of a similar size. One advantage of the Python setup is that the types are ignored at runtime, so there's no overhead at startup/compilation time. Although it's also a disadvantage in terms of what you can do in the system, of course.
I agree it is pretty nice (with uv and as long as you REALLY don't care about performance). But even if you are one of the enlightened few to use that setup, you still have to deal with dependencies that don't have type annotations, or only basic ones like `dict`.
Typescript (via Deno) is still a better option IMO.
Can someone familiar with performance of LLMs please tell me how important this is to the overall perf? I'm interested in looking into optimizing tokenizers, and have not yet run the measurements. I would have assumed that the cost is generally dominated by matmuls but am encouraged by the reception of this post in the comments.
Tokenizing text is ridiculously small part of the overall computation that goes into serving a request. With that said if you’re doing this on petabytes of data, never hurts to have something faster.
To echo the other replies, the tokenizer is definitely not the bottleneck. It just happens to be the first step in inference, so it's what I did first.
Tokenization performance is complicated, but your guidepost is that the institutions with the resources and talent to do so choose to write extremely fast tokenizers: sentencepiece and tiktoken both pay dearly in complexity (particularly complexity of deployment because now you've got another axis of architecture-specific build/bundle/dylib to manage in addition to whatever your accelerator burden always was: its now aarch64 cross x86_64 cross CUDA capability...)
Sometimes it can overlap with accelerator issue, but pros look at flame graphs: a CPU core running the AVX lanes hard isn't keeping the bus fed, million things. People pre-tokenize big runs all the time.
I don't know why this thread is full of "nothing to see here", this obliterates the SOTA from the money is no object status quo: I'd like to think better of the community than the obvious which is that C++ is threatening a modest mindshare comeback against a Rust narrative that's already under pressure from the explosion of interest in Zig. Maybe there's a better reason.
I really want to switch to Zed from Cursor but the battery usage for my Python project with Pyright is unjustifiable. There are issues for this on GitHub and I'm just sad that the team isn't prioritising this more.
It’s funny you mention this because I have an issue with Cursor where if I leave it open for more than a few hours my M3 Max starts to melt, the fans spin up, and it turns out to be using most of my CPU when it should be idling.
Zed on the other hand works perfectly fine for me. Goes to show how a slightly different stack can completely change one’s experiences with a tool.
I love that language and frequently show it to people. I'm sad to see that my local install doesn't work any more. I actually used it to solve a puzzle in Evoland 2 that I'm relatively sure was added as a joke, and is not solvable in a reasonable time without a solver. I'm actually doing a PhD in compilers right now, and would love to chat about sentient if you have the time. My email is sasha@lopoukhine.com.
You might be interested in looking at MiniZinc (https://minizinc.org/) which is an open source modelling language for combinatorial problems. The system comes from a constraint programming background but the language is solver agnostic can be used to compile into many different types of solvers.
I once used Sentient[0] to solve a similar puzzle in a different game that I think made a puzzle that required a constraint solver as a joke on the player. I'm a bit saddened to see that the repo hasn't been updated in 6 years, and a node update broke the binary installed on my computer, I find it a much more ergonomic environment than Prolog, hopefully someone else will pick up the mantle.
Zig’s comptime feature gives a lot of the power of C++ templates. This makes it easier to implement a lot of libraries which are just a pain to do in plain C. Even seemingly simple things like a generic vector are annoying to do in C, unless you abandon type safety and just start passing pointers to void.
I believe Zig even has some libraries for doing more exotic things easily, such as converting between an array of structs and a struct of arrays (and back).
>Zig’s comptime feature gives a lot of the power of C++ templates. This makes it easier to implement a lot of libraries which are just a pain to do in plain C. Even seemingly simple things like a generic vector are annoying to do in C, unless you abandon type safety and just start passing pointers to void.
I get it, thanks. in C you have to cast everything to (void *) to do things like a generic vector.
>I believe Zig even has some libraries for doing more exotic things easily, such as converting between an array of structs and a struct of arrays (and back).
yes, iirc, there was a zig thread about this on hn recently.
the prejudice and preconceived notions of some of these hn frigtards would be amazing, if it was not known in advance, for a long time now, by me and many others around here, who have more maturity and sense, and who don't have insecurity chips on their shoulders.
i mean, downvoting a perfectly innocuous comment. maybe due to use of the words nerd or noob, even though applied by me to myself, triggers their atavistic, dumbfuck response?
it's good that more sensible people like chongli, pjmlp (old standby here, and usually interesting to read his proglang comments) and infamouscow, are around here (referring to this subthread), otherwise this site would not be worth visiting at all. I, and many others probably, only stick around here, for the relatively rare better threads, that are worth checking out.
as amy hoy of stacking the bricks famously said, fuck hn. it's only optional, not part of my daily tech / biz fix (any more), and i am actively looking out for other, better places to hang out online, to read comments by, and chat with, more sensible people.
Possibly your GP comment came across as a snarky attack because of the first sentence. It's clear that you didn't mean it that way, but there isn't enough information in "explanation needed, bro" to disambiguate your intent (https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...).
I know of a few projects looking in that direction, each optimising for different things, and none getting near the capability of LLVM, which is going to take some time. I spoke with some of the core MLIR developers about this, and they're generally open to the notion, but it's going to take a lot of volunteer effort to get there, and it's not clear who the sherpa will be, especially given the major sponsors of the LLVM project aren't in a particular hurry. If you're interested in this feel free to look our paper up in a week or two, we've had a bit of trouble uploading it to arxiv but should be ready soon.
The typing version is still useful when you want to communicate that the result conforms to a certain interface, which doesn't include mutability in the case of Set, but not the exact type.
Edit: I see that he imported the typing Set, which is deprecated, instead of collections.abc.Set, which is still useful, so your comment is correct.
[0]: https://mlir.llvm.org/docs/Dialects/TensorOps/
reply