More

AnthOlei · 2025-08-19T17:06:28 1755623188

Does anyone else think this method of discourse incredibly rude? This poster posted a quasi researched comment that looks surface credible. A HN user took the time to respond to each point - only for the response to the response to be “whoops sorry, wasted your time!”

Feels like a breach of social contract. I used to see help requests that were well written as a signal of effort - as in “I’ll put in the effort to help since they did”

AnthOlei · 2025-07-04T00:27:09 1751588829

I also asked claude to roast it for fun. this one made me lol:

> The use case is Chef's Kiss levels of overengineering. They want to avoid Git commits... so they built a custom S3 server... that runs in a container... that gets rebuilt on every NixOS rebuild... to serve static files... to FluxCD. At some point, just make the damn Git commits.

AnthOlei · 2025-06-18T13:17:44 1750252664

The nix file is besides the point - it gives you a totally hermetic build environment. Not OP, but it’s the only way I know how to get gcc to use a static glibc. All you should pay attention to is that it’s using a static glibc.

$out is a magic variable in nix that means the output of the derivation - the directory that nix moves to its final destination

JoshTriplett · 2025-06-18T19:25:39 1750274739

> Not OP, but it’s the only way I know how to get gcc to use a static glibc.

    /tmp$ gcc -O3 test.c -o test
    /tmp$ ldd test
     linux-vdso.so.1 (0x00007f3d9fbfe000)
     libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f3d9f9e8000)
     /lib64/ld-linux-x86-64.so.2 (0x00007f3d9fc00000)
    /tmp$ gcc -static -O3 test.c -o test
    /tmp$ ldd test
     not a dynamic executable

fuzztester · 2025-06-18T21:29:30 1750282170

>> Not OP, but it’s the only way I know how to get gcc to use a static glibc.

> /tmp$ gcc -static -O3 test.c -o test /tmp$ ldd test not a dynamic executable

yes, that last line above means it's a statically linked executable.

yes, i had a doubt about what the GP said, about their nix way being the only way to create a statically linked executable.

but I didn't remember all the details, because it's been a while since I worked with C in depth (moved to Java, Ruby, Python, etc.)(though I did a lot of that earlier, even in pre-Linux years), so I didn't say anything else. thanks, Josh Triplett for clarifying.

but one thing I do remember, is that static linking was the only option in the beginning, at least on Unix, and dynamic linking came only some time later.

when I started working on UNIX and C, there was no dynamic linking at all, IIRC.

https://en.m.wikipedia.org/wiki/Static_library

("dynamic linking" topic in above page links to the below page in Wikipedia: )

https://en.m.wikipedia.org/wiki/Dynamic_linker

3836293648 · 2025-06-18T22:07:28 1750284448

I thought glibc had some hacks in it to prevent it from working fully when statically linked? Is this just a myth or outdated or only affects C/C++ or what?

JoshTriplett · 2025-06-19T00:37:30 1750293450

The issue is that some features of glibc want to dlopen additional libraries, most notably NSS. If you call `gethostbyname`, even a static glibc will try to dlopen NSS libraries based on /etc/nsswitch.conf, and if the dynamic NSS libraries are incompatible with your statically linked glibc, you'll have problems.

musl, by contrast, doesn't support NSS at all, only /etc/hosts and DNS servers listed in /etc/resolv.conf, so whether you statically or dynamically link musl, you just won't have any support for (for instance) mDNS, or dynamic users, or local containers, or various other bits of name resolution users may expect to Just Work.

fuzztester · 2025-06-18T16:36:49 1750264609

thanks.

AnthOlei · 2025-02-19T15:40:22 1739979622

They’ve now removed your second example from the testing set - I bet they won’t regenerate their benchmarks without this test.

Good sleuthing, seems someone from OpenAI read your comment and found it embarrassing as well!

yorwba · 2025-02-19T16:54:18 1739984058

For future reference, permalink to the original commit with the RADICAL BUG comment: https://github.com/openai/SWELancer-Benchmark/blob/a8fa46d2b...

The new version (as of now) still has a comment making it obvious that there's an intentionally introduced bug, but it's not as on the nose: https://github.com/openai/SWELancer-Benchmark/blob/2a77e3572...

Snuggly73 · 2025-02-19T17:58:42 1739987922

It was just two examples of widespread problems with the introduced bugs and the tests.

How about this - https://github.com/openai/SWELancer-Benchmark/blob/08b5d3dff... (Intentionally use raw character count instead of HTML-converted length)

Or this one - https://github.com/openai/SWELancer-Benchmark/blob/08b5d3dff... (user is complaining of flickering, so the reintroduced bug adds flickering code :) )

Or the one that they list in A.10 of the paper as O1 successfuly fixing - https://github.com/openai/SWELancer-Benchmark/blob/main/issu...

O1 doesnt actually seem to fix anything (besides arbitrary dumping all over the code), the reintroduced bug is messing with the state, not with the back button navigation.

Anyways, I went thru a sample of 20-30 last night and gave up. Noone needs to take my words - force pushing aside, anyone can pull the repo and check for themselves.

Most of the 'bugs' are trivialized to a massive degree, which a) makes them very easy to solve for b) doesnt reflect their previous monetary value, which in effect makes the whole premise of 'let measure how SWE agents can provide real money value' invalid.

If they wanted to create a real one, they should've found the commits reflecting the state of the app as of the moment of the bug and setup up the benchmarks around that.

izucken · 2025-02-20T05:52:53 1740030773

So it's much worse than I assumed from paper and repo overview?

For further clarification: 1. See the issue example #14268 https://github.com/openai/SWELancer-Benchmark/tree/08b5d3dff.... It has a patch that is supposed to "reintroduce" the bug into the codebase (note the comments):

  +    // Intentionally use raw character count instead of HTML-converted length
  +    const validateCommentLength = (text: string) => {
  +        // This will only check raw character count, not HTML-converted length
  +        return text.length <= CONST.MAX_COMMENT_LENGTH;
  +    };

Also, the patch is supposedly applied over commit da2e6688c3f16e8db76d2bcf4b098be5990e8968 - much later than original fix, but also a year ago, not sure why, might be something to do with cut off dates.

2. Proceed to https://github.com/Expensify/App/issues/14268 to see the actual original issue thread.

3. Here is the actual merged solution at the time: https://github.com/Expensify/App/pull/15501/files#diff-63222... - as you can see the diff is quite different... Not only that, but the point to which the "bug" was reapplied is so far to the future that repo migrated to typescript even.

---

And they still had to add a whole another level of bullshit with "management" tasks on top of that, guess why =)

Prior "bench" analysis for reference: https://arxiv.org/html/2410.06992v1

(edit: code formatting)

AnthOlei · on Nov 6, 2024

Oh my this is awful - could you post a source?

AnthOlei · on Oct 14, 2024

I’ve recently spent time truly learning git, and I’m realizing how much better it is to take care of your commits in a PR even if you squash to main.

My reviewers love when I write “review commit by commit” in the PR description. Then each individual commit has its own reasoning, and you can mentally switch into reviewing if that commit does it’s one thing it’s supposed to do correctly. I will accept the argument that each commit should be its own PR though :)

keybored · on Oct 14, 2024

This is so puzzling to me. It’s like people[1] get so stuck in the GitHub PR framework/mindset that they then, after realizing that GitHub PRs suck for reviewing individual commits, discard commits as the reviewable unit and then shoehorn one-commit-per-PR as a replacement.

The PR is just the best GitHub had to offer. There are other approaches to code review.

[1] Here we are generalizing.

skydhash · on Oct 14, 2024

I treat PRs like the email workflow. You send a diff my way for a particular changes, I either accept it or reject it. Or I suggest modifications. Recursively. It’s the whole patch I’m interested in, not the implementation history (I’m not your tutor). Once approved, I make a commit in main for this diff.

keybored · on Oct 14, 2024

The classic email workflow is either one patch or a patch series. Where each patch becomes a commit. And each patch can be reviewed in isolation (like a commit).

It is not anemic like the squash workflow.

skydhash · on Oct 14, 2024

There’s nothing stopping anyone from creating a PR series. My reasoning for the squash workflow is described here[0]. I just equate a PR to a patch. And it becomes a commit in the main branch. I don’t really care about the commits in the PR, just like no one cares about the commits that produced the patch in the email workflow.

[0]: https://news.ycombinator.com/item?id=41839282

keybored · on Oct 14, 2024

> I don’t really care about the commits in the PR, just like no one cares about the commits that produced the patch in the email workflow.

They do care. They go through the trouble of reviewing it so that the resulting commit[1] that lands in the upstream repository is good.

[1] Presumably you don’t mean “they don’t care about the commit that produced the patch”… since the patch is just a transport format for the commit.

skydhash · on Oct 14, 2024

> Presumably you don’t mean “they don’t care about the commit that produced the patch”… since the patch is just a transport format for the commit.

Commits. Not everyone will care to clean up their local history just to produce a single patch. You can git diff it out.

EDIT:

I was using "patch" for diff so scratch the above comment. Even then, when using a forge, I'd rather use squash unless everyone clean up their commit history when producing a PR.

keybored · on Oct 15, 2024

This must mean that you don’t use PRs like the email workflow since nothing of what you’ve described looks like any email workflow that I’ve seen.

ngrilly · on Oct 14, 2024

Exactly! Stacked diffs/changes are the way, with one commit per PR. https://www.stacking.dev/

AnthOlei · on Aug 5, 2024

What’s the source of this? Where can I read more?

JackOfCrows · on Aug 5, 2024

I mean start here: https://www.washingtonpost.com/technology/2024/04/18/ai-bubb...

AnthOlei · on Aug 5, 2024

I was under the impression that you were referring to a specific event that happened over the weekend, not a general vibe you were feeling

AnthOlei · on July 12, 2024

Ha, I think this site is styled by a single-sheet CSS called Tufte.css

notpushkin · on July 12, 2024

I don't think it is. In Tufte CSS, sidenotes are implemented using float: right [1], while here CSS Grid is used instead.

[1]: https://github.com/edwardtufte/tufte-css/blob/957e9c6dc3646a...

AnthOlei · on July 4, 2024

Oh wow. I went though some of the perl book you linked and I was noticing the examples were really familiar; I then realized you were the same guy who wrote the awk book I have been going though.

Your work is excellent! Thank you, I’ll buy a copy soon.

AnthOlei · on July 4, 2024

this is amazing! I guess I was searing the wrong keywords. Thanks, I’ll evaluate these.