Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

my experience with just copying and pasting things from ghidra into LLMs and asking it to figure it out wasn't so successful. it'd be cool to have benchmarks for this stuff though.


I actually have only tried this once but had the opposite experience. Gave it 5 or so related functions from a ps2 game and it correctly inferred they were related to graphics code, properly typing and naming the parameters. I’m sure this sort of thing is extremely hit or miss though


Had the same experience. Took the janky decompilation from ghidra, and it was able to name parameters and functions. Even figured out the game based on a single name in a string. Based in my read of the labeled decompilation, it seemed largely correct. And definitely a lot faster than me.

Even if I weren’t to rely on it 100% it was definitely a great draft pass over the functions.


Most likely there was just a mangled symbol somewhere that it recognised from its training data.


Where is that coming from? The chances that some random ps2 games code symbols are in the training data are infinitesimal. It's much more likely that it can understand code and rewrite it. Basically what LLM have been capable of for years now.


Parent is supposing w/o any experience. LLMs can see in hex, bytecode and base64, rot13, etc. I use LLMs to decompile bytecode all the time.


I've been thinking on how to build a benchmark for this stuff for a while, and don't have a good idea other than LLM-as-judge (which quickly gets messy). I guess there's a reason why current neural decompilation attempts are all evaluated on "seemingly meaningless" benchmarks like "can it recompile without syntax error" or "functional equivalence of recompilation" etc.


Hmm, specifically when it comes to reverse engineering, you have the best benchmark ever - you can check the original code, no?


that requires LLM as judge


no it doesn't, you just diff against the real source code. probably something more fuzzy/continuous than actual diff, but still


Besides functional equivalence, a significant part of the value in neural decompilation is the symbol (function names, variable names, struct definition including member names) it recovered. So, if the LLM predicted "FindFirstFitContainer" for a function originally called "find_pool", is this correct? Wrong? 26.333% correct?


Proving that two pieces of code are equivalent sounds very hard (incomputable)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: