Not common in Silicon Valley, but much more common in the rest of the country.
There’s an archetype for bootstrapped tech businesses:
- highly vertical specific
- couple hundred million TAM
- founder started the business in their 30s and is now in their 40s
It’s a tensor stored in GPU memory to improve inference throughput. Check out the PagedAttention (which introduces vLLM) paper for how most systems implement it nowadays.
I might be missing something, but DeepSeek’s recipe is right there in plain sight. Most of the cost efficiency of DeepSeek v3 seem to be attributable to MoE and FP8 training. DeepSeek R1s improvements are from GRPO-based RL.
Interesting to note - we have no idea how much R1 cost to train.
To speculate - maybe DeepSeek’s release made an upcoming Llama release moot in comparison.
They slightly restructure their MoE [1], but I think the main difference is that other big models (e.g Llama 504B) are dense and have higher FLOP requirements. MoE should represent a ~5x improvement. FP8 should be about a ~2x improvement.
We don’t know how much of a speed improvement GRPO represents. They didn’t say how many GPU hours went into to RLing DeepSeek-r1 and we don’t have a o1 numbers to compare.
There’s definitely lots of misinformation spreading though. The $5.5m number refers to Deepseek-v3, not Deepseek-r1. I don't want to take away from HighFlyer's accomplishment, though. I think a lot of these innovations were forced to work around H800 networking limitations, and it's impressive what they've done.
It's interesting that only having access to less powerful hardware motivated/necessitated more efficient training--like how tariffs can backfire if left in place too long.
LLMs are inherently bad at this due to tokenization, scaling, and lack of training on the task. Anthropic’s computer use feature has a specialized model for pixel-counting:
> Training Claude to count pixels accurately was critical. Without this skill, the model finds it difficult to give mouse commands. [1]
For a VLM trained on identifying bounding boxes, check out PaliGemma [2]
You may also be able to get the computer use API to draw bounding boxes if the costs make sense.
That said, I think the correct solution is likely to use a non-VLM to draw bounding boxes. Depends on the dataset and problem.
PaliGemma on computer use data is absolutely not good. The difference between a FT YOLO model and a FT PaliGemma model is huge if generic bboxes are what you need. Microsoft's OmniParser also winds up using a YOLO backbone [1]. All of the browser use tools (like our friends at browser-use [2]) wind up trying to get a generic set of bboxes using the DOM and then applying generative models.
PaliGemma seems to fit into a completely different niche right now (VQA and Segmentation) that I don't really see having practical applications for computer use.
Conditionally yes. There are many libraries that cannot be tree shaken for various reasons. Libraries typically need to stick to a subset of full JS to ensure that the code can be statically analyzed.
GraphQL is very powerful when combined with Relay. It’s useless extra bloat if you just use it like REST.
The difference between the two technologies is that LangChain was developed and funded before anyone know what to do with LLMs and GraphQL was internal tooling using to solve a real problem at Meta.
In a lot of ways, LangChain is a poor abstraction because the layer it’s abstracting was (and still is) in it’s infancy.
While it may not happen for you, “too lazy to look it up” is the vast majority of CS requests.
My understanding from talking to a couple of CS execs is that these have been a slam dunk in terms of ROI because CS agents don’t need to handle type C requests. I expect we’ll only see more as time goes on.
I've analyzed support ticket requests before, and that doesn't seem to be the case. At least for the two times I've done this: 1) IT support tickets for a local school, and 2) Tickets for a B2B SaaS app. In both cases the majority of tickets where for things that seemed to me to be obvious. That if the user just bothered to spend 10 seconds looking they would figure it out. But they didn't. Some training helped on the IT side, and some UX improvements helped in SaaS app, but the bar is _sooo_ much lower than many expect.
This should be a lot more obvious to the tech crowd than it is. I suppose it's the familiarity effect (see https://xkcd.com/2501/)--what's obvious to us isn't necessarily obvious to most people, and we heavily undercount the degree to which confusion-of-basic-things exist because it's second nature to us.
I wonder that too. If you're only measure one part of the funnel (e.g. CS costs) and not the total funnel (e.g. losses due to poor CS quality like a customer dropping the project) then it's easy to conclude that making CS more painful is a win.
It depends on the business, but the kind of metrics you are talking about are measured and taken seriously. People have absolutely gotten fired for CS quality KPI drops.
I don't doubt you, but if that's the case why not make it easy to get to a human? I'm fine explaining my problem to a robot, but if (when) they don't understand what I'm saying, hand me off to a human! For example, it's maddening to call the pharmacy and go through something like this:
Pharmacy Robot: Hello, thanks for calling <pharmacy>. What can I do for you? You can say anything like, "Check pharmacy hours" or "order a refill".
Me: Hi, I have a refill for <specific medication with rules around it> that is due next week but I'll be traveling out of the country to <other country> for a couple of weeks. I need to know what my options are.
Pharmacy Robot: Ok, you want a refill. Please enter the prescription number now.
Me: No, if we try to refill it, the automated system will just reject it. I need to talk to a h...<cut off by robot>
Pharmacy Robot: Sorry, I didn't get that number. Using your phone's keypad, enter the number of your prescription refill.
Me: Jesus Christ, do I have to hang up and go through this whole thing ag... <cut off by robot>
Pharmacy Robot: Sorry, I didn't get that number. Using...<cut off by human hanging up>
That's just the most recent one I had. There are often better examples of madness...
Because unless the chatbot is both better than a human in every way, and everyone knows that, the first thing people will do is push the button to reach the human. Why wouldn't they? They're calling in the first place because they don't want to make an effort to use the available tools to answer their question. They want a human.
> They're calling in the first place because they don't want to make an effort to use the available tools to answer their question.
That's not correct. I NEVER call without first exhausting every available source because I despise the phone system and it's inefficiencies. Most companies may think they have resources available, but they really don't. And no, just throwing up a zendesk or equivalent "knowledge base" isn't the same as providing tools and manuals/guides/etc.
That said, there is definitely a subset of people for whom calling is step 1 (before even googling). They tend to be older and/or on the tech illiterate side. But if you design and build for the worst-case scenario, you're really screwing over your more self-help customers and even driving them away.
To be fair, LLM-based chatbots are much better about this because you don't need to discover the magic incantation to talk to a human. It's a trade-off because that same property introduces the possibility of hallucination.
You can get out of the automated useless system. They don't make it easy.
But I once managed to get through to an actual agent with this question:
1. I want to buy a kindle version of this book [amazon link, for the paper version of the book].
2. On the page for the book, there is a link for the kindle edition: [link].
3. That link goes to a page for what appears to be an entirely different book. (Under the same name; this was an edition of the Arabian Nights.)
4. However, I have independently found this page: [link], which appears to be for the kindle version of the book I'm interested in.
5. Given that I want to buy the kindle version of the book linked up in step (1), which one should I purchase?
The agent directed me to buy the book that purported to be the book I wanted, instead of the book that Amazon believed was the book I wanted but which claimed to be something different. I would have assumed that anyway. But a couple days later I checked on the book and the "kindle version" link for the paper version had been corrected.
Unfortunately, while they did correct the issue on the one book that I took the time to point out to them, it's still rampant all over their website.
Actually this does sometimes work because some of the systems now have sentiment analysis baked in and can tell if the user is getting pissed off. I've used this a few times to get through as well.
Be careful, your voice could be used to train the next chat bots, and they could start yelling angrily at customers... actually, if the new chat bot is genuinely helpful, a screaming conversation would be kind of cathartic.
I just went through that recently, chat bot responded instantly to the mail with the same reply as the FAQ help, then the human responded after an hour asking for screenshots to see that showed I actually tried, then after a day an engineer fixed it.