More

ahzhou · 2025-04-08T16:52:12 1744131132

VC vs bootstrap is usually based on company TAM. There are certainly high growth bootstrapped businesses.

ahzhou · 2025-04-08T16:50:57 1744131057

Not common in Silicon Valley, but much more common in the rest of the country. There’s an archetype for bootstrapped tech businesses: - highly vertical specific - couple hundred million TAM - founder started the business in their 30s and is now in their 40s

ahzhou · 2025-01-29T03:53:46 1738122826

It’s a tensor stored in GPU memory to improve inference throughput. Check out the PagedAttention (which introduces vLLM) paper for how most systems implement it nowadays.

ahzhou · 2025-01-28T14:12:10 1738073530

You can easily do a fermi estimate based on the information given. They are comparing GPU hours.

See: https://planetbanatt.net/articles/v3fermi.html

ahzhou · 2025-01-28T13:22:31 1738070551

I might be missing something, but DeepSeek’s recipe is right there in plain sight. Most of the cost efficiency of DeepSeek v3 seem to be attributable to MoE and FP8 training. DeepSeek R1s improvements are from GRPO-based RL.

Interesting to note - we have no idea how much R1 cost to train. To speculate - maybe DeepSeek’s release made an upcoming Llama release moot in comparison.

pptr · 2025-01-28T19:07:52 1738091272

What is different about Deepseek's use of MoE vs all the other MoE models that makes training more efficient?

FP8 training and GRPO make sense to me, but that only gets you a 4x improvement total, right?

ahzhou · 2025-01-28T19:35:36 1738092936

They slightly restructure their MoE [1], but I think the main difference is that other big models (e.g Llama 504B) are dense and have higher FLOP requirements. MoE should represent a ~5x improvement. FP8 should be about a ~2x improvement.

We don’t know how much of a speed improvement GRPO represents. They didn’t say how many GPU hours went into to RLing DeepSeek-r1 and we don’t have a o1 numbers to compare.

There’s definitely lots of misinformation spreading though. The $5.5m number refers to Deepseek-v3, not Deepseek-r1. I don't want to take away from HighFlyer's accomplishment, though. I think a lot of these innovations were forced to work around H800 networking limitations, and it's impressive what they've done.

[1] https://arxiv.org/abs/2401.06066

karmakaze · 2025-01-28T20:06:40 1738094800

It's interesting that only having access to less powerful hardware motivated/necessitated more efficient training--like how tariffs can backfire if left in place too long.

ahzhou · 2025-01-24T15:03:05 1737730985

LLMs are inherently bad at this due to tokenization, scaling, and lack of training on the task. Anthropic’s computer use feature has a specialized model for pixel-counting: > Training Claude to count pixels accurately was critical. Without this skill, the model finds it difficult to give mouse commands. [1] For a VLM trained on identifying bounding boxes, check out PaliGemma [2]

You may also be able to get the computer use API to draw bounding boxes if the costs make sense.

That said, I think the correct solution is likely to use a non-VLM to draw bounding boxes. Depends on the dataset and problem.

1. https://www.anthropic.com/news/developing-computer-use 2. https://huggingface.co/blog/paligemma

nostrebored · 2025-01-24T18:48:02 1737744482

PaliGemma on computer use data is absolutely not good. The difference between a FT YOLO model and a FT PaliGemma model is huge if generic bboxes are what you need. Microsoft's OmniParser also winds up using a YOLO backbone [1]. All of the browser use tools (like our friends at browser-use [2]) wind up trying to get a generic set of bboxes using the DOM and then applying generative models.

PaliGemma seems to fit into a completely different niche right now (VQA and Segmentation) that I don't really see having practical applications for computer use.

[1] https://huggingface.co/microsoft/OmniParser?language=python [2] https://github.com/browser-use/browser-use

ahzhou · on Oct 3, 2024

Author: @fandzomga Username: fsndz

Why try to funnel us to your paywalled article?

ahzhou · on July 2, 2024

Conditionally yes. There are many libraries that cannot be tree shaken for various reasons. Libraries typically need to stick to a subset of full JS to ensure that the code can be statically analyzed.

throwAGIway · on July 2, 2024

Basically the only forbidden thing is dynamically calculating import paths, or dynamically generating the module.exports object.

ahzhou · on June 21, 2024

GraphQL is very powerful when combined with Relay. It’s useless extra bloat if you just use it like REST.

The difference between the two technologies is that LangChain was developed and funded before anyone know what to do with LLMs and GraphQL was internal tooling using to solve a real problem at Meta.

In a lot of ways, LangChain is a poor abstraction because the layer it’s abstracting was (and still is) in it’s infancy.

ahzhou · on May 28, 2024

While it may not happen for you, “too lazy to look it up” is the vast majority of CS requests.

My understanding from talking to a couple of CS execs is that these have been a slam dunk in terms of ROI because CS agents don’t need to handle type C requests. I expect we’ll only see more as time goes on.

awofford · on May 28, 2024

My guess is the ROI is provided by people giving up before they actually get help from a human.

jacobr1 · on May 28, 2024

I've analyzed support ticket requests before, and that doesn't seem to be the case. At least for the two times I've done this: 1) IT support tickets for a local school, and 2) Tickets for a B2B SaaS app. In both cases the majority of tickets where for things that seemed to me to be obvious. That if the user just bothered to spend 10 seconds looking they would figure it out. But they didn't. Some training helped on the IT side, and some UX improvements helped in SaaS app, but the bar is _sooo_ much lower than many expect.

jcranmer · on May 28, 2024

This should be a lot more obvious to the tech crowd than it is. I suppose it's the familiarity effect (see https://xkcd.com/2501/)--what's obvious to us isn't necessarily obvious to most people, and we heavily undercount the degree to which confusion-of-basic-things exist because it's second nature to us.

jacobr1 · on May 29, 2024

And we are talking even more basic than most people on HN can imagine.

Such as:

"Is your device turned on?"

"Are you logged into the site and not just searching google for the thing your want our application to do?"

"Have you actually purchased our product and not a competitor's you just think is similar?"

bgirard · on May 28, 2024

I wonder that too. If you're only measure one part of the funnel (e.g. CS costs) and not the total funnel (e.g. losses due to poor CS quality like a customer dropping the project) then it's easy to conclude that making CS more painful is a win.

ahzhou · on May 28, 2024

It depends on the business, but the kind of metrics you are talking about are measured and taken seriously. People have absolutely gotten fired for CS quality KPI drops.

freedomben · on May 28, 2024

I don't doubt you, but if that's the case why not make it easy to get to a human? I'm fine explaining my problem to a robot, but if (when) they don't understand what I'm saying, hand me off to a human! For example, it's maddening to call the pharmacy and go through something like this:

    Pharmacy Robot:  Hello, thanks for calling <pharmacy>.  What can I do for you?  You can say anything like, "Check pharmacy hours" or "order a refill".

    Me:  Hi, I have a refill for <specific medication with rules around it> that is due next week but I'll be traveling out of the country to <other country> for a couple of weeks.  I need to know what my options are.

    Pharmacy Robot:  Ok, you want a refill.  Please enter the prescription number now.

    Me:  No, if we try to refill it, the automated system will just reject it.  I need to talk to a h...<cut off by robot>

Pharmacy Robot: Sorry, I didn't get that number. Using your phone's keypad, enter the number of your prescription refill.

Me: Jesus Christ, do I have to hang up and go through this whole thing ag... <cut off by robot>

Pharmacy Robot: Sorry, I didn't get that number. Using...<cut off by human hanging up>

That's just the most recent one I had. There are often better examples of madness...

mike_hearn · on May 28, 2024

Because unless the chatbot is both better than a human in every way, and everyone knows that, the first thing people will do is push the button to reach the human. Why wouldn't they? They're calling in the first place because they don't want to make an effort to use the available tools to answer their question. They want a human.

freedomben · on May 29, 2024

> They're calling in the first place because they don't want to make an effort to use the available tools to answer their question.

That's not correct. I NEVER call without first exhausting every available source because I despise the phone system and it's inefficiencies. Most companies may think they have resources available, but they really don't. And no, just throwing up a zendesk or equivalent "knowledge base" isn't the same as providing tools and manuals/guides/etc.

That said, there is definitely a subset of people for whom calling is step 1 (before even googling). They tend to be older and/or on the tech illiterate side. But if you design and build for the worst-case scenario, you're really screwing over your more self-help customers and even driving them away.

ahzhou · on May 28, 2024

To be fair, LLM-based chatbots are much better about this because you don't need to discover the magic incantation to talk to a human. It's a trade-off because that same property introduces the possibility of hallucination.

malfist · on May 28, 2024

They're especially slam dunks when they don't provide you with a way to get out of the automated useless system. Looking at you Amazon

thaumasiotes · on May 28, 2024

You can get out of the automated useless system. They don't make it easy.

But I once managed to get through to an actual agent with this question:

1. I want to buy a kindle version of this book [amazon link, for the paper version of the book].

2. On the page for the book, there is a link for the kindle edition: [link].

3. That link goes to a page for what appears to be an entirely different book. (Under the same name; this was an edition of the Arabian Nights.)

4. However, I have independently found this page: [link], which appears to be for the kindle version of the book I'm interested in.

5. Given that I want to buy the kindle version of the book linked up in step (1), which one should I purchase?

The agent directed me to buy the book that purported to be the book I wanted, instead of the book that Amazon believed was the book I wanted but which claimed to be something different. I would have assumed that anyway. But a couple days later I checked on the book and the "kindle version" link for the paper version had been corrected.

Unfortunately, while they did correct the issue on the one book that I took the time to point out to them, it's still rampant all over their website.

codeduck · on May 28, 2024

shouting "speak to a fucking human" repeatedly seems to work, though i may be suffering from confirmation bias

freedomben · on May 28, 2024

Actually this does sometimes work because some of the systems now have sentiment analysis baked in and can tell if the user is getting pissed off. I've used this a few times to get through as well.

ASalazarMX · on May 28, 2024

Be careful, your voice could be used to train the next chat bots, and they could start yelling angrily at customers... actually, if the new chat bot is genuinely helpful, a screaming conversation would be kind of cathartic.

codeduck · on May 28, 2024

It's dystopian and yet, somehow, soothing.

RheingoldRiver · on May 28, 2024

type `agent` repeatedly into the chatbot and it will let you request a callback

ackfoobar · on May 28, 2024

"Getting through chat bots to get to a human" is the new "getting through tech support to get to an engineer".

https://xkcd.com/806/

ASalazarMX · on May 28, 2024

I think the path is now Chat bot -> Help Desk -> Engineer.

Jensson · on May 28, 2024

I just went through that recently, chat bot responded instantly to the mail with the same reply as the FAQ help, then the human responded after an hour asking for screenshots to see that showed I actually tried, then after a day an engineer fixed it.