Hacker Newsnew | past | comments | ask | show | jobs | submit | maemre's favoriteslogin

> the repo is too far off the data distribution

ah, this explains why these models have been useless to me this whole time. everything i do is just too far off the data distribution!


$ find web -type f \( -name '.go' -o -name '.tsx' \) | tar -cf code.tar -T; cat code.tar | pbcopy

Then I paste it in and say "can you spot any bugs in the API usage? Write out a list of tasks for a senior engineer to get the codebase in basically perfect shape," or something along those lines.

Alternately: "write a go module to support X feature, and implement the react typescript UI side as well. Use the existing styles in the tsx files you find; follow these coding guidelines, etc. etc."


100% agreed. I've been developing deep neural networks for over 10 years and this is just surreal.

On the bright side, one source of "sanity" that I'm finding is to review a collection of daily "hot" publications in AI/ML curated here: https://huggingface.co/papers


"Query expansion"[0] has been an information retrieval technique for a while, but using LLMs to help with query expansion is fairly new and promising, e.g. "Query Expansion by Prompting Large Language Models"[1], and "Query2doc: Query Expansion with Large Language Models"[2].

[0] https://en.wikipedia.org/wiki/Query_expansion

[1] https://arxiv.org/abs/2305.03653

[2] https://arxiv.org/abs/2303.07678


Optimizer person here, who has implemented each of these at least 20 times each ;-)

The tradeoffs and engineering are quite complex.

The short answer is that the algorithms themselves are not the hard part. Compilers do not get better by magic algorithms most of the time, they get better by careful and hard tuning and testing (where are optimizations missed, what performance is lost or gained somewhere, etc).

They get better 0.01% at a time over 20 years. There are no magic bullets, only really hard work.

Those that try to implement these without LLVM discover it themselves, and either put in the work, or give up and realize it's better to reuse the work.

I mean, don't get me wrong - we spend plenty of time reducing complexity of algorithms, etc. The difference between a textbook algorithm and one implemented in a production compiler is often the difference between a fisher price cell phone and and iphone.

That isn't always true mind you (sparse constant prop is pretty simple in both cases), but it's often true.

But this is applied engineering.

For example, the SSA construction algorithm in LLVM is based on sreedhar and gao's linear time algorithm. The paper describes a mechanism that requires construction of separate data structures, is somewhat complex, etc. If you were to implement it straight out, it's pretty slow. Much slower than other mechanisms.

LLVM's version is simple, 200 lines of code, and faster than just about any other algorithm you will find on both small and large functions. Oh, it also handles liveness pruning and works to compute both forward and reverse iterated dominance frontiers.

See https://llvm.org/doxygen/GenericIteratedDominanceFrontier_8h...

Could language authors spend their time understanding the theory well enough to do this, reducing complexity, and engineering something that works as well. Sure, it's just software.

Is it a good use of their time? Probably not.

This did not come out of thin air like this either. It's based on 10+ years of people improving pieces of it, reducing complexity, reusing it elsewhere/etc. It's easy to look at it as having come this way fully formed, but it didn't ;) (in this particular case, even the in-llvm code history does not do it justice).

Someday, I hope LLVM is not really necessary, whether it's because we can run the sorts of complex/combined algorithms and not worry about it, or because AI is good enough at approximating optimizing pipelines or whatever.

But right now? if you want to compete on performance for real, you'd be hard pressed to do it.


Hi folks. Nice to see our new free (and ad-free) course here on HN! This course is for folks that are already comfortable training neural nets and understand the basic ideas of SGD, cross-entropy, embeddings, etc. It will help you both understand these foundations more deeply, since you'll be creating everything from scratch (i.e. from only Python and its standard library), and to understand modern generative modeling approaches.

We study and implement a lot of papers, including many that came out during the course, which is a great way to get practice and get comfortable with reading deep learning literature.

If you need to get up to speed on the foundations first, you should start with part 1 of the course, which is here: https://course.fast.ai

If you've got any questions about the course, generative modeling, or deep learning in general, feel free to ask!


> A semester long class is about 45 hours of learning.

Assuming your semester is twelve weeks long (as is the case at my university), that's less than 4h/week. I'm guessing you're only counting lectures as learning. If you only go to lectures for learning and don't do any kind of work on your own, no tutorials, no office hours, no revising for exams, no practice exercises, no homework, no discussions with your peers, nothing... Yeah, you'll probably feel like your first two weeks at the job is a crash course. But I'd say you failed at taking advantage of learning at a university to its fullest.


Another UChicago student here (former math major). I think the UChicago math department took education seriously in a unique way that I wish they got more credit for; the Inquiry-Based Learning classes, the summer classes for Chicago public school teachers, and the Research Experience for Undergraduates were all pretty special, I think.

Also, the math classes all had a shared policy of "you can work with anyone you like on a problem set, as long as everyone's name is listed on there when you hand it in." I loved that, and I did all my problem sets in groups while I was there, and I think I learned more because we were always explaining things to each other and arguing about solutions. I heard a rumor at one point that this policy was the direct product of Peter May's bad experience at Princeton, which required students to work alone when he was there.

TFA does seem like a pretty stinging indictment of Princeton's math department.


The first time I started thinking about it that way was when I bought a book featuring a selection of Banksy's works [0]. In the opening pages it had a quote of him, making pretty much that point:

The public space is ours. It's not their for corporations. It's to be enjoyed by the people and if you want to come in with your hands full of money to make something else of it you can fuck right off. He's a true artist.

[0] https://www.amazon.com/Wall-Piece-Banksy/dp/1844137872


There are various people speculating on the economic significance of this solution, which, to me, is rather missing the point. It's like measuring the significance of the excavation of Tutankhamen's† tomb by the tourist revenues of museums. The point of economics is that it keeps us alive so we can do math and also think in other ways; the point of consciousness is not to make money.

I don't think there is any economic significance. They found a closed-form formula, not a manufacturing process. They verified that it produces numerically correct results to 12 significant figures, but typical lens grinding is only accurate to about 100 nm; if your lens is 10 mm thick, that's an error in the 5th significant figure of any coordinate. Calculating a numerical solution to the Wasserman–Wolf problem to 5 significant figures is straightforward, and you could probably do it by hand if you didn't have a computer (although that would involve significant economic cost). In fact, it's not that hard to calculate it to 14 significant figures. The achievement is finding a closed-form solution rather than an iterative numerical approximation.

† Or Tutankhaten, as we used to call him.


It's remarkable how much this still rings true today.

The idea that we should try and understand life as a battle against the second law of thermodynamics has had a significant influence on many of today's great thinkers - such as Friston [1] and Dennett [2], as well as countless others.

[1] https://royalsocietypublishing.org/doi/full/10.1098/rsif.201...

[2] https://www.youtube.com/watch?v=iJ1YxR8qNpY


It seems many comments missed the point. The article is not about how bloated modern software is, how many useless features and programs are wasting CPU cycles for pointless jobs, etc. (Yes, modern software is bloated, for this reason, I'm using the MATE desktop on a minimum Gentoo installation, but this is not what the article is about.)

It is describing how web browser, a piece of software with extremely high inherent complexity, interacts with the memory allocator of the operating system, another piece of software with high inherent complexity, combined with a rarely used feature from Gmail, can trigger complex and complicated interactions and cause major problems due to hidden bugs in various places. This type of apparent "simple" lockup requires "the most qualified people to diagnose".

These problematic interactions are unavoidable by running fewer "gadgets" on the desktop environment, it can be triggered and cause lockups even if the system in question is otherwise good-performing. Installing a Linux desktop doesn't solve this type of problem (though this specific bug doesn't exist).

The questions worth discussing are, why/how does it happen? how can we make these problems easier to diagnose? what kind of programming language design can help? what kind of operating system/browser architecture can help? how can we manage complexity, and the problems came with such complexity, what is its implications in software engineering, parallel programming? etc.

From another perspective, bloated software is also an on-topic question worth talking about. But instead of the talking point of "useless programs wasting CPU cycles", or "install minimum Debian", we can ask questions like "do _ALL_ modern software/browser/OS have to be as complex as this?", "what road has led us towards this complexity nowadays?", "what encouraged people to make such decisions?", "can we return to a simpler software design, sometimes?" (e.g. a vendor machine near my home, trivially implementable in BusyBox, or even a microcontroller, are now coming with full Windows 7 or Ubuntu desktop! Even the advertising screens use Windows 8, and BSoD sometimes, despite all they need to do is just showing a picture. same thing for modern personal computers.), or even "is Web 2.0 a mistake?" (so we are here on Hacker News, one of the fastest website in the world!). These topics are also interesting to talk.


Also, we should praise companies that actually encourage and support repairing behavior.

Case in point: Baratza coffee grinders. They establish repairing as one of the companies top priorities and part of their mission.[0]

They sell almost every part necessary for fixing their grinders[1]. They deliberately make them easy do disassemble and reassemble and provide lots of instructions on how to fix most problems, both in print and in video.[2] They also have a program of buying used grinders to resell them refurbished. Whenever a model is upgraded they also sell the upgrading kit for owners of old models.

[0] https://www.baratza.com/social-responsibility/

[1] https://www.baratza.com/product-category/parts/

[2] https://www.baratza.com/troubleshooting/


This question is not just worth understanding in itself but it would be essential to ask at many levels, as science education is becoming the key to the survival of humanity. As a Hungarian who left the country 20 years ago I have been fascinated by this surprising success of the country, partly because I did not personally get a good science education there. Actually, it was very unbalanced, with most of it barely mediocre, while some of it absolutely brilliant. The successful people came from 2-3 high schools (or "gymnasiums", for ages 12-18) in Budapest in the beginning of the 20th century, and whatever success Hungary had in the natural sciences was mostly a result of the work and heritage of that generation. There is a very good description of this in Norman MacRae's book on John von Neumann, worth reading.

A quick summary of the "reasons":

1. Boom: at that time (from 1867 till WWI) Budapest was booming, more attractive to immigrants than New York. Actually, most of the city you can see today was built in those times.

2. Culture: as a result of the boom and the wave of immigrants it became a very liberal and open minded place (though this did not apply to the feudal class at all - their kids typically became soldiers or playboys)

3. Motivation: in a feudal society studying and intellectual eminence was the way to go unless you were born an aristocrat. Parents, students, teachers were willing to put time and money necessary to make their kids excel. This may not sound unusual with all those helicopter parents you see nowadays, but actually this was huge. Imagine growing up in a family where you knew - and your whole family knew - that your only chance of making it is to be the best at math your abilities allow.

4. Education: I don't know where to start, so I can only give examples. Imagine you are a 12 year old child and your teacher borrows you his favourite papers on quantum physics and asks your opinion on them. Then you give a smart comment and your teacher contacts the relevant professor at the university to have a tea with you at your house next Sunday.

5. Language: most of these kids learned Latin and Greek, and in before their teenage years also spoke at least German fluently.

6. The "marble table": Stanislaw Ulam (in his autobiography, another amazing read) and also MacRae tells about the most important ingredient, the marble table at the café (the easy-to-erase whiteboard of the time). In Central Europe mathematicians met at cafeterias, discussed all day (often meaning 12 hour days at a cafe!), challenged each other, and did rarely work in isolation, not worrying too much about "who thought of it first". They happily took young kids in, 15 year olds sipping juice and 50 year old Banach drinking something much stronger (may not have been Banach, but you'll read the book!). It was such a well-known "way of doing math" that the IAS in Princeton was officially established to re-create this culture and pull the typical American professors out of their ivory towers.

It is an elitist system, I know, and does not solve mass education challenges. But this small elite circle had an impact on almost everyone in the country's education system. Even if you weren't a Wigner, Teller or Neumann, you spent 6 years in this environment and possibly became a great teacher, similar to the one who taught half of these people, the great László Rácz [1] and taught in this fashion.

Also, a similar great science education happened in Japan at some point (50's, I think ), but I only read this as a side note in the book on Neumann. Anyway, it is possible to do this again and with two small kids I'm very interested to know how.

[1] https://en.wikipedia.org/wiki/L%C3%A1szl%C3%B3_R%C3%A1tz


> What would be your ideal non-introsive ads mechanism?

I am not the OP but the answer to question is very close to my heart.

The pinnacle of non-intrusive online ads have been the original Google search ads. They were out of the way, clearly marked as ads - and hence could be visually filtered out. They were pure text, so could be neatly included as elements on the rendered page. And they were always targeting an INTEREST. Not an individual.

So I would take that as the minimum acceptable advertising behaviour. Not implying that it's perfect, but it's a clear set of ground rules. With that in mind, _my_ ideal, non-intrusive ads mechanism builds on the following rules:

* Ads must never be inline to page content

* Even when clearly out of the way, ads must not be allowed to mimic page content; they must be clearly marked as ads

* Text only.

* I might accept an image within the ad, provided it was always served from the content provider's system.

* As an extension to previous point: if the served image size would exceed a notable fraction of the page size, it must not be included in the output.

* No user tracking of any kind.

* No third-party javascript. Ever.

* At most 15% of display real estate allowed to be used by ads. Including the padding in the UI. (Counts as space denied from content.)

* Not allowed to affect page content load times. Ad material must be included at the end of the page code. If your service pushes ads from internal and separate system, hard timeouts must be imposed: if the internal system cannot serve an ad within an allotted time, the frontend must never be forced to wait. You just missed an ad impression. Tough.

* If clicking an ad takes a user through a bounce page, all identifiable information from the user must be stripped. Bounce page or redirect must not impose any further page loading delay.

* No beacons

Breaking even one of the rules automatically disqualifies you.

If you, as an advertiser, find these rules unacceptable - well, then we are in mutual disagreement. I find your ads equally unacceptable and treat them as a form of cancer.

However, as a genuine service to the user... please allow the user to search for ads that have been displayed to them. Preferably by display context. I would be glad to return to a subject at a later date and search for something I remember seeing earlier.

The above is still not ideal, but everything that behaved according to it would at least be palatable.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: