I wish we had a constitutional amendment that opensourced all AI commercial AI m...

bigyabai · 2025-10-05T23:28:47 1759706927

What you are describing is more-or-less a planned economy, the polar opposite of America's market economy. The government has the power to appropriate things for the common good because it's perceived that private enterprise isn't a necessary force. Sometimes it works, sometimes it doesn't; only certain countries can "moneyball" their way through economics like that, though. America has long since passed the point of even trying.

Your heart is in the right place here (I agree about FOSS), but there is a snowball's chance in hell that any of this ever happens in the USA. We'll be lucky if AI doesn't resemble cable TV by 2030.

astrange · 2025-10-06T00:36:51 1759711011

There's nothing new about being able to copyright something that's a transformation of another work. And they definitely aren't exclusively trained on public data.

TheDong · 2025-10-06T02:36:45 1759718205

> There's nothing new about being able to copyright something that's a transformation of another work

There is something novel here.

Google Books created a huge online index of books, OCRing, compressing them, and transforming them. That was copyright infringement.

Just because I download a bunch of copyrighted files and run `tar c | gzip` over them does not mean I have new copyright.

Just because I download an image and convert it from png to jpg at 50% quality, throwing away about half the data, does not mean I have created new copyright.

AI models are giant lossy compression algorithms. They take text, tokenize it, and turn it into weights, and then inference is a weird form of decompression. See https://bellard.org/ts_zip/ for a logical extension to this.

I think this is the reason that the claim of LLM models being unencumbered by copyright is novel. Until now, a human had to do some creative transformation to transform a work, it could not simply be a computer algorithm that changed the format or compressed the input.

astrange · 2025-10-06T03:48:56 1759722536

Google Books is not transformative. It shows you all the same data for the same purpose as they were published for.

A better example is Google Image Search. Thumbnails are transformative because they have a different purpose and aren't the same data. An LLM is much more transformative than a thumbnail.

It's more lossy than even lossy compression because of the regularization term; I'm pretty sure you can train one that's guaranteed to not retain any of the pretraining text. Of course then it can't answer things like "what's the second line of The Star Spangled Banner".

johanyc · 2025-10-06T10:37:05 1759747025

Google Books is transformative. It's a decided case. And it's the same as Google Image, i.e. for search.

https://news.ycombinator.com/item?id=45489807

astrange · 2025-10-06T16:56:14 1759769774

Well yeah now it is, otherwise it wouldn't exist. I don't think showing the entire book would be though.

TheDong · 2025-10-06T06:31:21 1759732281

Thumbnails are not transformative, they are fair use. They would be copyright infringement, except that a court case ruled them as fair use: https://en.wikipedia.org/wiki/Perfect_10,_Inc._v._Amazon.com... .

The fact that compression is incredibly lossy does not change the fact that it's copyright infringement.

I have a lossy compression algorithm with simply outputs '0' or '1' depending on the parity of bits of the input.

If I run that against a camcording of a disney film, the result is a 0 copyrighted by disney, and in fact posting that 0 in this comment would make this comment also illegal so I must disclaim that I did not actually produce that from a camcorded disney film.

If I run it against the book 'dracula' the result is a 0 under the public domain.

The law does not understand bits, it does not understand compression or lossiness, it understands "humans can creatively transform things, algorithms cannot unless a human imbues creativity into it". It does not matter if your compressed output does not contain the original.

astrange · 2025-10-06T16:55:12 1759769712

> The court held that framing and hyperlinking of original images for use in an image search engine constituted a fair use of Perfect 10's images because the use was highly transformative

?

dboreham · 2025-10-06T15:21:37 1759764097

You're missing something: whether or not it's copyright infringement depends on a) how much money you have and hence bribes you can give and b) whether you can say what you're doing is "to beat China".

astrange · 2025-10-06T18:54:31 1759776871

Who exactly are you imagining is being bribed here?

johanyc · 2025-10-06T10:27:06 1759746426

> Google Books created a huge online index of books, OCRing, compressing them, and transforming them. That was copyright infringement.

No. It's a decided case. It's transformative and fair use. My understanding why it's transformative is that Google Books mainly offers a search interface for books and it also have measures to make sure only snippets of books are shown.

halperter · 2025-10-05T22:23:32 1759703012

Unfortunately very unlikely in our forseeable future with the U.S. having a "U.S. against the world" mentality to the AI race. Would love to see this but this would get shot down immediately.

ben_w · 2025-10-05T23:04:48 1759705488

> I wish we had a constitutional amendment that opensourced all AI commercial AI models and requires documentation and links to all training data and base prompts.

> They are trained on public data at our expense so We The People should own them.

The people who appear to have been trained off for the interesting parts of the blog post are mostly, like me, not American.

> AI should be free. Overhyped and Overpriced. I would love this setup for privacy and security.

Also, this entire blog post only exists because they're curious about a specific free open-weights model.

The "source" being ~"the internet", which we've got as much access to as most of the model makers (i.e. where you don't, they've got explicit licensing rights anyway), and possibly also some explicitly* pirated content (I've not been keeping track of which model makers have or have not done that).

* as in: not just incidentally

timcobb · 2025-10-06T00:02:11 1759708931

> They are trained on public data

this is questionable, but okay...

> at our expense

?

> so We The People should own them.

in addition to training data, it is my understanding that a model's architecture also largely determines its efficacy. Why should we own the architecture?

rileymat2 · 2025-10-05T22:37:18 1759703838

Why would it require a constitutional amendment?

delichon · 2025-10-05T22:41:00 1759704060

The takings clause of the fifth amendment allows seizure of private property for public use so long as it provides just compensation. So the necessary amendment already exists if they're willing to pay for it. Otherwise they'd need an amendment to circumvent the fifth amendment, to the extent the document is honored.

heavyset_go · 2025-10-05T23:17:44 1759706264

Are models necessarily IP?

If generative AI models' output can't be copyrighted and turned into private IP, who is to say the output of gradient descent and back-propagation similarly can't be copyrighted? Neither are the creative output of a human being, but both are the product of automated and computed statistical processes.

Similarly, if AI companies want to come at dataset compilation and model training from a fair use angle, would it not be fair use to use the same models for similar purposes if models were obtained through eminent domain? Or through, like in Anthropic's training case, explicit piracy?

delichon · 2025-10-06T11:32:19 1759750339

It doesn't make sense to me that whether the result of intellectual effort is property or not depends on the legal status of its output, whether its production involved automation, or if it involved statistical computation. These look like vague justifications to take something made by someone else because it has value to you, without compensation.

heavyset_go · 2025-10-06T12:18:46 1759753126

I'm looking at this through the lens of US copyright, where the Copyright Office determined that AI output isn't protected by copyright, and thus isn't private IP, as it isn't the creative output of a human being.

If the results of inference and generation can't be protected under copyright, as they aren't the creative output of a human being, why wouldn't the results of back-propagation and gradient descent follow the same logic?

This isn't about how we feel about it, it's a legal question.

rileymat2 · 2025-10-06T15:55:21 1759766121

But things like logarithmic table books existed in a world where the results of the calculations were not protectable as IP, no matter how much effort went into creating them.

heavyset_go · 2025-10-05T22:07:13 1759702033

I'd settle with them being held in a public trust for public benefit

canadiantim · 2025-10-05T22:31:46 1759703506

Wouldn’t the same argument then be applied to all scraped data?