More

freeamz · 2025-05-03T02:31:39 1746239499

Thanks for the point dang. Is it just me find it disturbing that the original article's referencing page is gone, and now we had to go to wayback machine to get a copy.

A bit off topic are there any BitTorrent/ipfs effort to archive archive.org ?

ChrisArchitect · 2025-05-03T05:24:26 1746249866

We don't, they just don't have their redirects in order https://www.vice.com/en/article/the-totalitarian-buddhist-wh...

dang · 2025-05-03T06:08:29 1746252509

Great catch, thanks! I've switched to that from https://web.archive.org/web/20211117094441/https://www.vice.... above.

dang · 2025-05-03T06:09:47 1746252587

archive.org is definitely not preferred when there are other places to access an old article. It's ok when there isn't.

Simply googling the title is usually enough to find an article's new home, when there is one.

freeamz · 2025-05-03T01:48:24 1746236904

or forgejo

freeamz · 2025-05-03T01:34:51 1746236091

Isn't this just https://xtermjs.org ?

Possible past discussions:

https://news.ycombinator.com/item?id=28797535

can we get a dump of the hacker news archive?

freeamz · 2025-05-02T02:55:08 1746154508

Feel like all this cookies thing is just white wash, when if you enable JS then they can track you no matter if you have cookies or not!

Nothing is private: https://nothingprivate.gkr.pw

More effort ought to be put into how to make web spec to NOT be able track user even if JS is turned on.

Browser vendor Brave, Firefox suppose to privacy browser are NOT doing anything about it.

At this point, do we need to using JS disabled browser to really get privacy on the web?

idle_zealot · 2025-05-02T03:07:00 1746155220

> At this point, do we need to using JS disabled browser to really get privacy on the web?

My thoughts are that we need a distinction between web pages (no JS) which are minimally interactive documents that are safe to view, and web apps (sites as they exist now) which require considerable trust to allow on your device. Of course, looking that the average person's installed app list indicates that we have a long way to go culturally with regards to establishing a good sense of digital hygiene, even for native software.

wtallis · 2025-05-02T05:38:24 1746164304

It doesn't help that web browsers aren't even trying to help users make the distinction. They have an ever-growing list of features and permissions that sites can take advantage of, with no attempt to coalesce anything into a manageable user interface. Instead, it takes a hundred clicks to fully trust or distrust a site/app.

freeamz · 2025-05-02T09:52:57 1746179577

More UI/UX distinction is needed! Just the green lock for security! The browser should indicate the level of privacy of the page. If the page use no js or any GPU compromising (css I'm looking at you), then it gets a green kind. For every privacy/security compromising feature you add the turns yellow. Once it start to ask for WebUSB, MIDI, then it should be in some kind of Native Mode. More like a UI/UX issue for the major browser makers!

iggldiggl · 2025-05-02T14:17:51 1746195471

The problem is that there is a lot of grey area between pure document-style pages and full-on apps (take online shops for example) and even for the former category of pages a lot of UI niceties are only possible with scripting.

littlecranky67 · 2025-05-02T07:23:02 1746170582

Any other tracking methods are way more obvious, and way harder to implement for the advertising industry. We shouldn't think in black/white here - the more difficult it is to track a user, the less likely it is implemented. It is okay if 30% of tracking sites dissapear as the cost/value ratio don't work for them. We don't have to sit in silence and do nothing, just because we can't have the 100% privacy.

matthewdgreen · 2025-05-02T11:33:43 1746185623

I do think there is a point here: any technical means to block tracking is going to be overrun by technical means to overcome the anti-tracking tech. There are simply too many dollars at stake for anything else to happen. If anti-tracking stops some players, that just means the industry will consolidate into a few large and well-resourced players.

While I am all in favor of continuing the technical battle against tracking, it’s time to recognize that the war will only be won with legislation.

GCUMstlyHarmls · 2025-05-02T03:24:05 1746156245

https://nothingprivate.gkr.pw seems to (not) work fine in Firefox... I am running ublock-origin though, no other special things.

Diti · 2025-05-02T07:38:50 1746171530

Same here, it’s not just you. Judging by the other comments, it only seems to “work” on Blink-based browsers.

Kovah · 2025-05-02T07:41:57 1746171717

Also not working on Brave, without UBlock or similar extensions. Brave says it blocked one requests, probably that for fingerprinting.

karl-j · 2025-05-02T08:11:01 1746173461

The site also fails to track on mobile Safari with ”Prevent Cross-Site Tracking” turned on.

red_trumpet · 2025-05-02T13:03:09 1746190989

Same, they were "fooled" by a private window. I was recognized when just using a different Multi-Account Container[1] though.

[1] https://addons.mozilla.org/en-US/firefox/addon/multi-account...

brookst · 2025-05-02T03:01:09 1746154869

It’s an interesting question: is it possible for JavaScript to be turing complete, able to read/write the DOM, and somehow prevent fingerprinting / tracking?

My gut says no, not possible.

Maybe we need a much lighter way to express logic for UI interactions. Declarative is nice, so maybe CSS grows?

But I don’t see how executing server-controlled JS could ever protect privacy.

Enginerrrd · 2025-05-02T03:12:49 1746155569

I've always thought there should be a way to use the browser like a condom. It should obfuscate all the things that make a user uniquely identifiable. Mouse movement/clicks/typing cadence should be randomized and sanitized a bit. And no website should have any authority whatsoever to identify your extensions or other tabs, or even whether or not your tab is open. And it certainly shouldn't allow a website to overrule your right click functionality, or zoom, or other accessibility features.

JSteph22 · 2025-05-02T04:16:00 1746159360

The obfuscation makes you more easily identifiable.

victorbjorklund · 2025-05-02T13:31:06 1746192666

I think their idea was that it would be in the browser everyone uses.

Enginerrrd · 2025-05-02T18:49:03 1746211743

Exactly. My thought was this should be the default configuration in the browser.

teo_zero · 2025-05-02T05:32:55 1746163975

How so?

codyvoda · 2025-05-02T05:37:06 1746164226

Eldo Kim

you stand out when you obviously hide

chii · 2025-05-02T06:36:38 1746167798

only if you are the only one doing the obfuscation.

It's why tor browser is set to a specific dimension (in terms of pixel size), have the same set of available fonts etc.

klabb3 · 2025-05-02T10:01:12 1746180072

And yet you still stand out if you use tor.

chii · 2025-05-02T10:17:15 1746181035

yes, and it's because not enough people use tor-browser (i meant the browser, not the network).

But if privacy is truly the desired goal, the regular browser ought to behave just like tor-browser.

freeamz · 2025-05-02T17:55:59 1746208559

Tor Browser safe mode. That is one of few ways to defeat that fingerprinting thing.

6510 · 2025-05-02T04:00:55 1746158455

I don't know what it is called but if you try to open a window from a timeOut it wont work. The user has to click on something then the click even grants the permission.

You could make something similar where fingerprint worthy information cant be posted or used to build an url. For example, you read the screen size then add it to an array. The array is "poisoned" and cant be posted anymore. If you use the screen size for anything those things and everything affected may stay readable but are poisoned too. New fingerprinting methods can be added as they are found. Complex calculations and downloads might make time temporarily into a sensitive value too.

degamad · 2025-05-02T05:48:26 1746164906

In the old days, something similar to what you're calling "poisoned" was called "tainted" [0].

In those scenarios, tainted variables were ones which were read from untrusted sources, so could cause unexpected behaviour if made part of SQL strings, shell commands, or used to assemble html pages for users. Taint checking was a way of preventing potentially dangerous variables being sent to vulnerable places.

In your scenario, poisoned variables function similarly, but with "untrusted" and "vulnerable" being replaced with "secret" and "public" respectively. Variables read from privacy-compromising sources (e.g. screen size) become poisoned, and poisoned values can't be written to public locations like urls.

There's still some potential to leak information without using the poisoned variables directly, based on conditional behaviour - some variation on

    if posioned_screenwidth < poisoned_screenheight then load(mobile_css) else load(desktop_css)

is sufficient to leak some info about poisoned variables, without specifically building URLs with the information included.

[0] https://en.wikipedia.org/wiki/Taint_checking

6510 · 2025-05-04T07:33:08 1746343988

I mean everything inside the if statement becomes tainted.

Like opening a window requires a click (in the chain of events), load() wouldn't work in a tainted conditional.

febusravenga · 2025-05-02T08:37:08 1746175028

Yes, it is.

Just create _strict_ content security profile, which doesn't allow any external requests (fetch) and only allow load of resources (css, image, whatever) from predefined manifest.

App cannot exfiltrate any data in that case.

You may add permissions mechanisms of course (local disk, some cloud user controls, etc).

That's a big challenge in standards and not sure if anyone is working on such strongly restricted profile for web/js.

chongli · 2025-05-02T03:28:20 1746156500

It’s an interesting question: is it possible for JavaScript to be turing complete, able to read/write the DOM, and somehow prevent fingerprinting / tracking?

Yes, of course: restrict its network access. If JS can't phone home, it can't track you. This obviously lets you continue to write apps that play in a DOM sandbox (such as games) without network access.

You could also have an API whereby users can allow the JS application to connect to a server of the user's choosing. If that API works similarly to an open/save dialog (controlled entirely by the browser) then the app developer has no control over which servers the user connects to, thus cannot track the user unless they deliberately choose to connect to the developer's server.

This is of course how desktop apps worked back in the day. An FTP client couldn't track you. You could connect to whatever FTP server you wanted to. Only the server you chose to connect to has any ability to log your activity.

adrr · 2025-05-02T06:42:49 1746168169

There's no point. If you diaable JS. Can track you other ways, fingerprint your dns packets like timestamp clock skew and other things. With IPV6 can assign you unique ip address for a dnslookup that can function like a cookie,

Don't want to be tracked. Don't go on the internet.

HumanOstrich · 2025-05-02T07:44:34 1746171874

Websites can't fingerprint my dns packets by their clock skew, nor can they assign me a unique IP address for a dns lookup (what?). "Don't go on the internet" isn't a great starting point to improve things.

adrr · 2025-05-02T17:44:06 1746207846

Used to fingerprint your TCP packets when i built a large neobank. Could easily tell if you're behind a proxy, falsifying your user agent via syn numbers, and more. We used it to detect bots but it could be easily be used to fingerprint individual users. DNS trick is already used for DNS based CDNs, you can just keep refining it down to more specificity. CDN edge for each individual user.

waynesonfire · 2025-05-02T04:48:51 1746161331

Why does it have to be a technological solution? That's what the media industry tried to do with DRM and it failed. The solution is legislation. We need the equivalent of DMCA for our privacy. Make it illegal to fingerprint.

chongli · 2025-05-02T10:10:19 1746180619

I’m completely unsold on legislation. Another headline that recently hit the top of HN is about how Apple flagrantly ignored a court order. The judge has recommended the case for criminal contempt prosecution [1].

The comments on the story are completely unconvinced that anyone at Apple will ever be convicted. Any fines for the company are almost guaranteed to be a slap on the wrist since they stand to lose more money by complying with the law.

I think the same could be said about anti-cookie/anti-tracking legislation. This is an industry with trillions of dollars at stake. Who is going to levy the trillions of dollars in fines to rein it in? No one.

With a technological solution at least users stand a chance. A 3rd party browser like Ladybird could implement it. Or even a browser extension with the right APIs. Technology empowers users. Legislation is the tool of those already in power.

[1] https://news.ycombinator.com/item?id=43856795

chii · 2025-05-02T06:39:03 1746167943

> The solution is legislation. We need the equivalent of DMCA for our privacy

and how does one know their privacy has been invaded? How does the user know to enforce the DMCA law for privacy?

I think the solution has to be technological. Just like encryption, we need some sort of standard to ensure all browsers are identical and unidentifiable (unless the user _chooses_ to be identified - like logging in). Tor-browser is on the right track.

jenadine · 2025-05-02T05:10:21 1746162621

That'd be the GDPR

cluckindan · 2025-05-02T05:40:29 1746164429

Which is only applicable in the EU

deadbolt · 2025-05-02T03:21:35 1746156095

Just tried this with Brave and it didn't seem to work, assuming the site working means that it can remember me in an incognito browser. I gave the site a name, and then opened it in incognito (still using brave), and it acts as if I visited the site for the first time.

What am I supposed to witness?

cptskippy · 2025-05-02T03:31:18 1746156678

It didn't work on Firefox mobile either... Why are all these browser companies breaking the web!

gkbrk · 2025-05-02T08:29:35 1746174575

Doesn't work on Brave. It says to check it on private mode, but when I switch to private mode it just asks for my name again.

IMTDb · 2025-05-02T13:08:32 1746191312

On me it had the opposite effect of what was intended:

I opened the website on non anonymous session safari: it asked my name. Then I opened another new non anonymous window on the same browser: it showed my name as expected. I then opened the same browser in incognito mode: it asked my name again. I then opened chrome (non anonymous) and again it asked my name.

Exactly what I expected to see; everything seems to be working as intended. Anonymization online seems to be working perfectly fine.

FridgeSeal · 2025-05-02T08:52:56 1746175976

Also doesn’t work on iOS (for me).

matheusmoreira · 2025-05-02T03:40:00 1746157200

They can track you just fine via CSS and countless other ways. They'll even fingerprint the subtle intricacies of your network stack.

What we need to do is turn the hoarding of personal information into a literal crime. They should be scrambling to forget all about us the second our business with them is concluded, not compiling dossiers on us as though they were clandestine intelligence agencies.

emsign · 2025-05-02T03:43:48 1746157428

Web Browsers Must Be Removed

They run arbritrary code from sketchy servers called "websites" on people's hardware with way too many privileges. While free and open source standalone web applications exist that only use minimal JS code to access the same web resources with a much better user experience. Without trackers, without ads and third parties.

Kiro · 2025-05-02T04:31:54 1746160314

I want a browser to be able to run arbitrary code. That's the whole point. I want to play a game or use a complex application in the browser without having to install anything.

afavour · 2025-05-02T03:58:48 1746158328

It won’t happen because people don’t care enough.

I don’t mean to sound glib. But people derive a ton of utility from the web as it stands today. If they were asked if they supported the removal of web browsers they would absolutely say no. The privacy costs are worth the gains. If you want change you have to tackle that perception.

hi_hi · 2025-05-02T07:09:57 1746169797

I think this is a bit overblown. Brave and Safari we're both private when I just tested. Chrome not so much, but thats expected.

hobs · 2025-05-02T03:18:48 1746155928

I by default block JS on the web and only allow it for domains I accept. It's a tiny bit of work for a whole lot of safety.

switch007 · 2025-05-02T14:10:55 1746195055

I've tried this recently and I found it very difficult. Cloudflare bot protection is everywhere, other anti-scrape protections, many 'document' sites using JS to render with no fallback, basic forms requiring JS, authentication requiring JS, payments requiring JS etc

Not intending to sound snarky but do you just not use the web much? Or if you're adding allows all the time, what's the net gain?

hobs · 2025-05-02T21:31:06 1746221466

I use the web fairly constantly and yeah, if I am visiting a new site and I want to see the content there's a 50/50 chance I have to press a button in noscript (like 2-3 clicks) - but when you setup your initial set (usually takes me about a week) you'd be surprised how few net new properties you set in a week - maybe 100 or less?

I also set temporary permissions for any site I dont think I will be spending a lot of time on because they might change what's running and I dont have any trust or insight into their process - so I might authorize that site 3-4x a year sometimes before I say it can stay.

antihipocrat · 2025-05-02T04:23:35 1746159815

Unmodified server request headers contain enough information for tracking even if JS is disabled. If you're keen to modify http headers while browsing, then you could also modify any JS run on your system that snoops system information (or strip the info from any request sent to the server) and continue with JS enabled.

myHNAccount123 · 2025-05-02T03:02:25 1746154945

Works as advertised on Edge but not on safari

kstrauser · 2025-05-02T04:35:42 1746160542

I can't get that site to work on Safari on my Mac, with JS enabled.

sensanaty · 2025-05-02T09:33:36 1746178416

The more egregious and frankly disgusting one is https://fingerprint.com

IMO this service should straight up be made illegal. I love the tagline they have of supposedly "stopping fraud" or "bots", when it's obvious it's just privacy invasive BS that straight up shouldn't exist, least of all as an actual company with customers.

alkonaut · 2025-05-02T07:06:45 1746169605

I have almost no hope that this is a matter that has a technical solution. The GDPR shows that law - even if not global, and even if not widely enforced - is pretty good at getting people to act. And most importantly, it will make the largest players the most afraid as they have the most to lose. And if just a handful of the largest players online are looking after peoples privacy then that is a huge win for privacy.

Doing what this demo shows, is clearly a violation of the GDPR if it works the way I assume it does (via fingerprints stored server side).

xiaomai · 2025-05-02T18:43:44 1746211424

hmm, this didn't recognize me in a private window in either firefox or brave.

freeamz · 2025-04-25T15:39:34 1745595574

Interesting. How does this compare to abliteration of LLM? What are some 'debug' tools to find out the constrain of these models?

How does pasting a xml file 'jailbreaks' it?

freeamz · 2025-04-23T12:08:03 1745410083

What about hono client side? Seems to have most React functionalities:

https://hono.dev/docs/guides/jsx-dom ?

freeamz · 2025-04-22T09:01:21 1745312481

>- Instead, product was built the old-fashioned way - by talking to customers; quite often, customers would reach out to us! "Please build time-saving feature x", "support new medical procedure y", "help us publish more research by analyzing z".

Think that might be a plus. PM/PO has ruined the industry. This way at least one has a direction to the customers, which is something I can't say about large companies.

freeamz · 2025-04-21T11:49:54 1745236194

Hmm, think we ought to judge on a case by case basis. However, for megacorp and especially banks that has almost 0 to 1% access to cost of capital, vs rest of us who at at 20 - 30 % ( for credit card, loan sharks), then there should be a different license for these people. There should be a GLP type license adjusted to the cost of the capital.

chii · 2025-04-21T12:12:25 1745237545

There should not be any difference between small or large entitise in how you deal with them as an opensource maintainer. Just because someone has more money (or less), should not automatically mean you treat them with more leniency or ethics.

You set up your standard, and stick to it whomever comes.

latexr · 2025-04-21T12:41:42 1745239302

Companies are never just money. There is a monumental difference between:

1. A small company which is barely profitable but is building something which aligns with your values and you see as a positive to the world.

2. A massive mega corporation whose only purpose is profit, mistreats employees, and you view as highly unethical.

You shouldn’t treat those the same way. It’s perfectly ethical to offer your work for free to the first one (helping them succeed in creating a better world) and charging up the wazoo (or better yet, refusing to engage in any way with) the second one.

bbarnett · 2025-04-21T12:59:25 1745240365

There is no such difference.

A company is not a person, and can literally have its entire staff changed in short order. Or be bought.

Companies have no morals. Sometimes people in companies do, but again, that person can vanish instantly.

You should treat a company as a person which may receive a brain transplant at any time. Most especially, when writing contracts or having any expectation of what that company will do.

potato3732842 · 2025-04-21T14:24:27 1745245467

This is an exceptionally ignorant viewpoint.

A business that is privately owned, is run by its founders and which represents the lion's share of its officers income and net worth can be dealt with like any other small business.

Some guy who makes bespoke firmware for industrial microcontrollers or very niche audio encoding software isn't Microsoft. You won't be able to do business with him in a useful way if you treat him like Microsoft.

scarface_74 · 2025-04-21T14:55:27 1745247327

If the business is run by its founders and has taken VC funding, the founder’s “values” no longer matters.

bee_rider · 2025-04-21T15:17:55 1745248675

There exist companies which have taken VC money, and others which haven’t. We’ve carved out one exception, but this doesn’t indicate that small personally-run companies can’t exist, right?

Spooky23 · 2025-04-21T14:20:54 1745245254

The key is contract. Casual chat with a corporate representative who isn’t selling you something about something you own requires some sort of contractual relationship and consideration.

thfuran · 2025-04-21T13:59:54 1745243994

A sole proprietorship pretty much is a person.

mindcrime · 2025-04-21T14:08:01 1745244481

Or a single member LLC.

WD-42 · 2025-04-21T14:48:45 1745246925

How do you refuse to engage if you use the MIT license?

dec0dedab0de · 2025-04-21T15:47:35 1745250455

don't respond to their emails.

If you want to be extreme don't distribute it to them in the first place. Licenses do not come into effect until after distribution. So you could have a pay-to-download model that comes with a %100 discount if you're a lone developer or an organization with under X amount of revenue. You wouldn't be able to stop someone redistributing it after the fact, but you're not engaging.

LtWorf · 2025-04-22T07:29:51 1745306991

Unfortunately now that everything is based on automated pipelines, something that doesn't integrate well is not so good.

Although at work we have a provider of proprietary software that has an APT repository where the URL includes a secret token, so they can track from where it's being accessed.

vasco · 2025-04-21T12:22:41 1745238161

Interacting with faceless entities with the power to buy multiple countries the same way you'd interact with some interested independent young person wanting to learn.

Interesting moral proposition, I doubt you'd get many followers. I think it's perfectly reasonable to treat people differently from corporations, and random small and medium corporations differently than huge megacorps without losing any sleep.

Specially in business, charging more to those that can pay more is a very common approach.

chii · 2025-04-21T12:43:14 1745239394

> charging more to those that can pay more is a very common approach.

and all consumers dislike price discrimination. Airlines is the classic example.

It's just that those companies do this because they can. And i hate it. I much prefer a static, single price for a product.

homebrewer · 2025-04-21T13:26:58 1745242018

No, it's also because some consumers can't pay the "original" price. Steam in "developing" countries is a classic example — you as a game developer can ask a guy from my country $60 for a game (and some companies do try that), but he will simply go back to torrent trackers because $60 is a week's worth of living expenses.

gaben figured that out and successfully expanded into many markets that were considered basket cases for software licensing.

jimbokun · 2025-04-21T15:21:07 1745248867

> Interesting moral proposition, I doubt you'd get many followers.

But the US Supreme Court would be one of them.

PeeMcGee · 2025-04-21T14:52:42 1745247162

> You set up your standard, and stick to it whomever comes.

Well, the standard for software licensing is to sell cheaper licenses to smaller businesses and more expensive licenses to larger businesses.

LtWorf · 2025-04-21T12:41:15 1745239275

So you're equally like to give your change to a poor beggar and to a guy begging from inside his rolls royce?

chii · 2025-04-21T12:44:41 1745239481

If i ahead of time decided to give my next dollar to the next guy begging, why not?

wizzwizz4 · 2025-04-21T12:51:23 1745239883

That's a really silly precommitment. If you were sensible, your actual commitment should be "help the next person who requires help, provided that help can be provided in the form of one dollar".

chii · 2025-04-21T13:01:45 1745240505

That's why the premise in the grandparent post is ridiculous.

But the license of a piece of software is not ridiculous - if you chose a very permissive license, you cannot then go and choose who should or shouldnt be profiting off your software. The license was a pre-commitment.

But lots of people make this pre-commitment, but then makes a moral/ethical judgement post-facto when someone rich seems to be able to extract more value out of the software than what "they deserve", and complain about it.

wizzwizz4 · 2025-04-21T13:34:25 1745242465

"Permissive" licenses, in fields where abusive corporations are known to operate, are a really silly precommitment. Copyleft exists for a reason. But, even if you (foolishly) made that precommitment, that doesn't then mean you have to do free labour for the abusive corporations, out of some misguided ideological consistency. (Such consistency is the hobgoblin of little minds.)

taormina · 2025-04-21T14:10:55 1745244655

I mean, the MIT license might be a “more permissive” license but it says very explicit things that Microsoft is explicitly ignoring. Your license choice doesn’t matter when they ignore the license anyway.

flysand7 · 2025-04-21T13:09:55 1745240995

If a guy comes begging for money out of rolls royce, I guess they either are pretty bad at begging or have a pretty bad sense of humor. I guess I wouldn't give money to them, it doesn't seem like it'll help them regardless.

freeopinion · 2025-04-21T18:54:09 1745261649

What is the difference between a rolls royce and a celebrity benefit? You shun Shriners if they have a catered $1000 fund-raising dinner?

formerly_proven · 2025-04-21T12:18:13 1745237893

> You set up your standard, and stick to it whomever comes.

Why? Most businesses don't entertain standard rates, either. It's case-by-case negotiations ("call us", "request quote"). Why should I, as a private person putting stuff out there for free, set up "my standard" and stick to it?

keepamovin · 2025-04-21T12:18:49 1745237929

Because otherwise it's not a value, it's a whim.

But I guess they don't mean set the same price for everyone - but rather stick to your values in what you do.

polotics · 2025-04-21T12:37:52 1745239072

Clearly you have yet to experience some of the less savoury behaviours from Megacorps sharks. You're looking at people trying to make a name for themselves internally and if this means being economical with attributions, this is the least they would do for their place in the California sun.

keepamovin · 2025-04-21T12:18:24 1745237904

This is The Way

keepamovin · 2025-04-21T11:57:08 1745236628

Why, if they are paying their employees and aim to earn from their enterprise, should so disrespect your time and IP as to attempt to not pay you?

Tho pricing tailored to customers works, as long as it's efficient and non-zero.

freeamz · 2025-04-20T15:33:45 1745163225

so what is the real comparison against DeepSeek r1 ? Would be good to know which is actually more cost efficient and open (reproducible build) to run locally.

behnamoh · 2025-04-20T15:44:49 1745163889

half the amount of those dots is what it takes. but also, why compare a 27B model with a +600B? that doesn't make sense.

smallerize · 2025-04-20T19:06:55 1745176015

It's an older image that they just reused for the blog post. It's on https://ai.google.dev/gemma for example

freeamz · 2025-04-20T15:26:53 1745162813

>I'm also interested in their applications for journalism, specifically for dealing with extremely sensitive data like leaked information from confidential sources.

Think it is NOT just you. Most company with decent management also would not want their data going to anything outside the physical server they have in control of. But yeah for most people just use an app and hosted server. But this is HN,there are ppl here hosting their own email servers, so shouldn't be too hard to run llm locally.

simonw · 2025-04-20T15:45:44 1745163944

"Most company with decent management also would not want their data going to anything outside the physical server they have in control of."

I don't think that's been true for over a decade: AWS wouldn't be trillion dollar business if most companies still wanted to stay on-premise.

ipdashc · 2025-04-20T23:17:30 1745191050

Yeah, this has been confusing me a bit. I'm not complaining by ANY means, but why does it suddenly feel like everyone cares about data privacy in LLM contexts, way more than previous attitudes to allowing data to sit on a bunch of random SaaS products?

I assume because of the assumption that the AI companies will train off of your data, causing it to leak? But I thought all these services had enterprise tiers where they'll promise not to do that?

Again, I'm not complaining, it's good to see people caring about where their data goes. Just interesting that they care now, but not before. (In some ways LLMs should be one of the safer services, since they don't even really need to store any data, they can delete it after the query or conversation is over.)

pornel · 2025-04-21T00:22:56 1745194976

It is due to the risk of a leak.

Laundering of data through training makes it a more complicated case than a simple data theft or copyright infringement.

Leaks could be accidental, e.g. due to an employee logging in to their free-as-in-labor personal account instead of a no-training Enterprise account. It's safer to have a complete ban on providers that may collect data for training.

6510 · 2025-04-21T01:10:20 1745197820

Their entire business model based on taking other peoples stuff. I cant imagine someone would willingly drown with the sinking ship if the entire cargo is filled with lifeboats - just because they promised they would.

vbezhenar · 2025-04-21T02:46:52 1745203612

How can you be sure that AWS will not use your data to train their models? They got enormous data, probably most data in the world.

simonw · 2025-04-21T04:27:14 1745209634

Being caught doing they would be wildly harmful to their business - billions of dollars harmful, especially given the contracts they sign with their customers. The brand damage would be unimaginably expensive too.

There is no world in which training on customer data without permission would be worth it for AWS.

Your data really isn't that useful anyway.

mdp2021 · 2025-04-21T20:30:07 1745267407

> Your data really isn't that useful anyway

? One single random document, maybe, but as an aggregate, I understood some parties were trying to scrape indiscriminately - the "big data" way. And if some of that input is sensitive, and is stored somewhere in the NN, it may come out in an output - in theory...

Actually I never researched the details of the potential phenomenon - that anything personal may be stored (not just George III but Random Randy) -, but it seems possible.

simonw · 2025-04-21T20:56:31 1745268991

There's a pretty common misconception that training LLMs is about loading in as much data as possible no matter the source.

That might have been true a few years ago but today the top AI labs are all focusing on quality: they're trying to find the best possible sources of high quality tokens, not randomly dumping in anything they can obtain.

Andrej Karpathy said this last year: https://twitter.com/karpathy/status/1797313173449764933

> Turns out that LLMs learn a lot better and faster from educational content as well. This is partly because the average Common Crawl article (internet pages) is not of very high value and distracts the training, packing in too much irrelevant information. The average webpage on the internet is so random and terrible it's not even clear how prior LLMs learn anything at all.

mdp2021 · 2025-04-21T22:31:26 1745274686

Obviously the training data should be preferably high quality - but there you have a (pseudo-, I insisted also elsewhere citing the rights to have read whatever is in any public library) problem with "copyright".

If there exists some advantage on quantity though, then achieving high quality imposes questions about tradeoffs and workflows - sources where authors are "free participants" could have odd data sip in.

And the matter of whether such data may be reflected in outputs remains as a question (probably tackled by some I have not read... Ars longa, vita brevis).

freeamz · 2025-04-21T09:27:06 1745227626

In Scandinavian financial related severs must in the country! That always sounded like a sane approach. The whole putting your data on saas or AWS just seems like the same "Let's shift the responsibility to a big player".

Any important data should NOT be in devices that is NOT physically with in our jurisdiction.

terhechte · 2025-04-20T15:59:32 1745164772

Or GitHub. I’m always amused when people don’t want to send fractions of their code to a LLM but happily host it on GitHub. All big llm providers offer no-training-on-your-data business plans.

tarruda · 2025-04-20T16:34:11 1745166851

> I’m always amused when people don’t want to send fractions of their code to a LLM but happily host it on GitHub

What amuses me even more is people thinking their code is too unique and precious, and that GitHub/Microsoft wants to steal it.

AlexCoventry · 2025-04-20T16:40:52 1745167252

Concern about platform risk in regard to Microsoft is historically justified.

Terretta · 2025-04-20T16:41:27 1745167287

Unlikely they think Microsoft or GitHub wants to steal it.

With LLMs, they're thinking of examples that regurgitated proprietary code, and contrary to everyday general observation, valuable proprietary code does exist.

But with GitHub, the thinking is generally the opposite: the worry is that the code is terrible, and seeing it would be like giant blinkenlights* indicating the way in.

* https://en.wikipedia.org/wiki/Blinkenlights

vikarti · 2025-04-20T19:50:03 1745178603

Regulations sometimes matter. Stupid "security" rules sometimes matter too.

__float · 2025-04-20T16:17:44 1745165864

While none of that is false, I think there's a big difference from shipping your data to an external LLM API and using AWS.

Using AWS is basically a "physical server they have control of".

simonw · 2025-04-20T17:11:50 1745169110

That's why AWS Bedrock and Google Vertex AI and Azure AI model inference exist - they're all hosted LLM services that offer the same compliance guarantees that you get from regular AWS-style hosting agreements.

IanCal · 2025-04-20T18:22:56 1745173376

As in aws is a much bigger security concern?

Tepix · 2025-04-21T04:14:01 1745208841

on-premises

https://twominenglish.com/premise-vs-premises/

mjlee · 2025-04-21T03:43:36 1745207016

AWS has a strong track record, a clear business model that isn’t predicated on gathering as much data as possible, and an awful lot to lose if they break their promises.

Lots of AI companies have some of these, but not to the same extent.

belter · 2025-04-21T13:27:22 1745242042

> "Most company with decent management also would not want their data going to anything outside the physical server they have in control of."

Most companies physical and digital security controls are so much worst than anything from AWS or Google. Note I dont include Azure...but a physical server they have control of is a phrase that screams vulnerability.