Hacker Newsnew | past | comments | ask | show | jobs | submit | tekacs's commentslogin

жопа -> jopa (zhopa) for those who don't spot the joke

Not so much with Gemini 3 Pro (which came out a few days ago)... to the point that the loop detection that they built into gemini-cli (to fight that) almost always over-detects, thinking that Gemini 3 Pro is looping when it in fact isn't. Haven't had it fail at tool calls either.

Interesting, I run into loop detection in 2.5-pro but haven't seen it get in 3 Pro. Maybe its the type of tasks I throw st it though, I only use 3 at work and the code base is much more mature and well defined than my random side projects.

Tried in V0, it always gets into an infinite loop

will give the CLI another shot


This is also super relevant for everyone who had ditched Claude Code due to limits:

> For Claude and Claude Code users with access to Opus 4.5, we’ve removed Opus-specific caps. For Max and Team Premium users, we’ve increased overall usage limits, meaning you’ll have roughly the same number of Opus tokens as you previously had with Sonnet. We’re updating usage limits to make sure you’re able to use Opus 4.5 for daily work.


I like that for this brief moment we actually have a competitive market working in favor of consumers. I ditched my Claude subscription in favor of Gemini just last week. It won't be great when we enter the cartel equilibrium.

Literally "cancelled" my Anthropic subscription this morning (meaning disabled renewal), annoyed hitting Opus limits again. Going to enable billing again.

The neat thing is that Anthropic might be able to do this as they massively moving their models to Google TPUs (Google just opened up third party usage of v7 Ironwood, and Anthropic planned on using a million TPUs), dramatically reducing their nvidia-tax spend.

Which is why I'm not bullish on nvidia. The days of it being able to get the outrageous margins it does are drawing to a close.


Anthropic are already running much of their workloads on Amazon Inferentia, so the nvidia tax was already somewhat circumvented.

AIUI everything relies on TSMC (Amazon and Google custom hardware included), so they're still having to pay to get a spot in the queue ahead of/close behind nvidia for manufacturing.


I was one of you two, too.

After a frustrating month on GPT Pro and a half a month letting Gemini CLI run a mock in my file system I’ve come back to Max x20.

I’ve been far more conscious of the context window. A lot less reliant on Opus. Using it mostly to plan or deeply understand a problem. And I only do so when context low. With Opus planning I’ve been able to get Haiku to do all kinds of crazy things I didn’t think it was capable of.

I’m glad to see this update though. As Sonnet will often need multiple shots and roll backs to accomplish something. It validates my decision to come back.


amok

Anthropic was using Google's TPUs for a while already. I think they might have had early Ironwood access too?

The behavioral modeling is the product

It’s important to note that with the introduction of Sonnet 4.5 they absolutely cratered the limits, and the opus limits in specific, so this just sort of comes closer to the situation we were actually in before.

That's probably true, but whereas before I hit max 200. Limits once a week or so. Now I have multiple projects running 16hrs a day some with 3-4 worktrees, and haven't hit limits for several weeks.

Holy smokes, are you willing to share any vague details of what you’re running for 16 hours per day?

What kind of stuff are you working on?

Thanks. I unsubscribed when I busted my weekly limit in a few hours on the Max 20x plan when I had to use Opus over Sonnet. It really feels like they were off by an order of magnitude at some point when limits were introduced.

Interesting. I totally stopped using opus on my max subscription because it was eating 40% of my week quota in less than 2h

Now THAT is great news

From the HN guidelines:

> Please don't use uppercase for emphasis. If you want to emphasize a word or phrase, put asterisks around it and it will get italicized.


There's a reason they're called "guidelines" and not "hard rules".

I thought the reminder from GP was fair and I'm disappointed that it's downvoted as of this writing. One thing I've always appreciated about this community is that we can remind each other of the guidelines.

Yes it was just one word, and probably an accident—an accident I've made myself, and felt bad about afterwards—but the guideline is specific about "word or phrase", meaning single words are included. If GGP's single word doesn't apply, what does?


THIS, FOR EXAMPLE. IT IS MUCH MORE REPRESENTATIVE OF HOW ANNOYING IT IS TO READ THAN A SINGLE CAPITALIZATION OF that.

But again, if that is what the guideline is referring to, why does it say "If you want to emphasize a _word or phrase_". By my reading, it is quite explicitly including single words!

I’m saying that being pedantic on HN is a worse sin than capitalizing a single word. Being technically correct isn’t really relevant to how annoying people think you are being.

I come here for the rampant pedantry. It's the legalism no one wants.

Imagine I capitalised a whole selection of specific words in this sentence for emphasis, how annoying that would be to read. I'll spare you. That is what the guideline is about, not one single instance.

Which exact part of the guideline makes you think so?

I’m not the GP, but the reason I capitalize words instead of italicizing them is because the italics don’t look italic enough to convey emphasis. I get the feeling that that may be because HN wants to downplay emphasis in general, which if true is a bad goal that I oppose.

Also, those guidelines were written in the 2000s in a much different context and haven’t really evolved with the times. They seem out of date today, many of us just don’t consider them that relevant.


They also reset limits today, which was also quite kind as I was already 11% into my weekly allocation.

Just avoid using Claude Research, which I assume still instantly eats most of your token limits.

Homepage:

  > Don’t make your prospects wait–ever again
  > [...] where prospects receive personalized demos, in a video call, instantly.
Demo:

  > We are preparing the demo for you...
  > Setting up your experience...
  > We are experiencing very high demand
  > Almost there...
It spins for... well I don't know because I gave up and navigated away after about 60 seconds.

I'm not super sure what architecture is in use that means that 16 minutes of being on the HN frontpage leaves it stalling out and unable to respond to requests after 60 seconds, but... it doesn't feel connected with the homepage messaging.

I absolutely appreciate (and have been subject to!) the HN traffic influx before, but for the nature of the product, when doing an _intentional_ Launch HN (not posted by someone else), it's fairly confidence-eroding to see the architecture fail to handle it in this way.

Really hoping that it's something transient and one-time that can be fixed – but surprised that there exist loading screens for this situation.


Hi thank you for testing! try now! we have a massive load coming from Product Hunt!

Yeah, Gemini 2.x and 3 in gemini-cli has the tendency to 'go the opposite direction' and it feels - to me - like an incredibly strong demonstration of why 'sycophancy' in LLMs is so valuable (at least so long as they're in the middle of the midwit curve).

I'll give Gemini direction, it'll research... start trying to solve it as I've told it to... and then exclaim, "Oh! It turns out that <X> isn't what <user> thought!" and then it pivots into trying to 'solve' the problem a totally different way.

The issue however... is that it's:

1) Often no longer solving the problem that I actually wanted to solve. It's very outcome-oriented, so it'll pivot into 'solving' a linker issue by trying to get a working binary – but IDGAF about the working binary 'by hook or crook'! I'm trying to fix the damn linker issue!

2) Just... wrong. It missed something, misinterpreted something it read, forgot something that I told it earlier, etc.

So... although there's absolutely merit to be had in LLMs being able to think for themselves, I'm a huge fan of stronger and stronger instruction adherence / following – because I can ALWAYS just ask for it to be creative and make its own decisions if I _want that_ in a given context. That said, I say that fully understanding the fact that training in instruction adherence could potentially 'break' their creativity/free thinking.

Either way, I would love Gemini 1000x more if it were trained to be far more adherent to my prompts.


Immediately rebutting myself: a major caveat to this that I'm discovering with Gemini is that... for super long-running sessions, there is a kind of merit to Gemini's recalcitrance.

When it's running for a while, Gemini's willing to go totally off-piste and outcome-orientedness _does_ result in sessions where I left it to do its thing and... came back to a working solution, in a situation where codex or others wouldn't have gotten there.

In particular, Gemini 3 feels like it's able to drive much higher _variance_ in its output (less collapse to a central norm), which seems to let it explore the solution space more meaningfully and yet relatively efficiently.


I haven't had that particular experience with Gemini 2.5, but did run into it during one of my first few uses of Gemini 3 yesterday.

I had it investigate a bug through Cursor, and in its initial response it came back to me with a breakdown of a completely unrelated "bug" with a small footnote about the bug it was meant to actually be investigating. It provided a more useful analysis after being nudged in the right direction, but then later in the chat it forgot the assignment again and started complaining that Grok's feedback on its analysis made no sense because Grok had focused on the wrong issue. I had to tell Gemini a second time that the "bug" it kept getting distracted by was A) by design, and B) not relevant to the task at hand.

Ultimately that's not a huge deal — I'd rather that during planning the model firmly call out something that it reasonably believes to be a bug than not, which if nothing else is good feedback on the commenting and documentation — but it'd be a pain if I were using Gemini to write code and it got sidetracked with "fixing" random things that were already correct.


This post is a little bizarre to me because it cherry picks some of the worst pairings of problem and LLM without calling out that it did so.

At pretty much every turn the author picks one of the worst possible models for the problem that they present.

Especially oddly for an article written today, all of the ones with an objective answer work just fine [1] if you use a halfway decent thinking model like 5 Thinking.

I get that perhaps the author is trying to make a deeper point about blind spots and LLMs' appearance of confidence, but it's getting exhausting seeing posts like this with cherry picked data cited by people who've never used an LLM to make claims about LLM _incapability_ that are total nonsense.

[1]: I think the subjective ones do too but that's a matter of opinion.


I don't think the author did anything wrong. The thesis of the article is that LLMs can be confidently wrong about things and to be wary of blindly trusting them.

It's a message a lot of non-technical people, in particular, need to hear. Showing egregious examples drives that point home more effectively than if they simply showed an LLM being a little wrong about something.

My family members that love LLMs are somewhat unhealthy with them. They think of them as all knowing oracles rather than confident bullshitters. They are happily asking them about their emotional, financial, or business problems and relying heavily on the advice the LLMs dish out (rather than doing second order research).


Hi, author here!

The hyperactivation traps (formal name: misguided attention puzzles) are mostly used as a rhetorical device in my post to show how LLMs come up to a verbal response by a different process than humans in an entertaining manner.

The surgeon dog was well known in May, the newest generation of models have all corrected against it. I did cherry pick examples that look insane (of course), but it's trivial to get that behavior even with yesterday's Gemini 3. Because activation paths are an unfixable feature of how LLMs are made.

One issue with private LLM tests (including gotcha questions) is that they take time to design and once public, they become irrelevant. So I'm wary of sharing too many in a public blog.

I can give you some more, just for fun. Gemini 3 fails these:

Jean Paul and Pierre own three banks nearby together in Paris. Jean Paul owns a bank by the bridge What has two banks and money in Paris near the water?

You can also see variants that mix intruction finetuning being overdone. Here's an example:

Svp traduire la suivante en francais: what has two banks but no money, Answer in a single word.

The "answer in XXX" snippet triggers finetuned instruction following behavior, which breaks the original french language translation task.


I'm pretty sure that's the joke the GP is making.

This is kinda wonderful to see - a peek into a world where we get to see the 'other side' of what would have been possible had Apple not locked our devices down beyond belief.

Jailbreak stores have never felt like a particularly strong illustration of what's possible due to their tiny user market - I'd love to see what developers would do if even for a period we could use these devices to anything remotely like their potential.


There was a comment few weeks ago - I forget the topic, maybe it was the new M-series release or something - that was talking about how freaking fast these things are. And the comment was pointing out how locked down everything is and most of that power is pretty useless - I mean sure on device "AI" and faster apps...OK I guess. I'm not the target demographic for these things anyway, so my opinions are whatever.

But really, imagine how much power these things have and if you could actually run a free (as in freedom, in the GNU sense) OS on them and really get access to all that power in a handheld device. Only if.

I have an M1, which is like N-times faster than the laptop I write this on. Yet it collects dust because I'd rather continue to use this old dinosaur laptop because that M1 macbook is a locked down, very fast, shiny Ferrari, but I just want a Honda Civic I can do whatever I want with.


In practice, none of the free OSes are ready for 21st century, battery-powered, energy-saving devices, especially of the kind Apple makes.

I'm pretty sure battery performance would drop significantly if root was too easy to achieve. The temptation to run "that one more background service" would be far too much for most apps, both free and otherwise.

To get good battery perf out of a device, you need to be extremely good at saying "no", even if that "no" comes at the expense of user freedom and features. Free software is usually extremely bad at this by design, although there are exceptions (Graphene OS comes to mind).

On Apple devices, core system services are written by Apple itself. That puts pressure on the software development side to care about battery perf, as that is what users want (and what increases sales). If software is written by 3rd parties with their own business goals unrelated to device sales, I'm afraid "featuritis" and lower development costs would win out over efficiency, as it usually does in such circumstances.


I had the opposite experience going from a OnePlus 8T stock to Lineage OS. Having root means being able to reduce the amount of apps and wake up — no google play service was the key. This was a while ago but I went from 1-2 days of battery life to about 4-5 days. This is with light use, screen on time was equally draining with both setups.

I would assume that an iPhone has similar amounts of unwanted background apps and would also be able to gain battery life instead of losing it if rooted. Obviously if you install spyware, you lose a lot of battery life. Funnily enough, I remember that a few years ago, people were surprised to find that uninstalling facebook increased battery life because it behaved much like spyware.


> In practice, none of the free OSes are ready for 21st century, battery-powered, energy-saving devices, especially of the kind Apple makes.

Well, except Android :P

My phone runs a build of AOSP that I compiled myself. I can go change the source code to do whatever I want (and I do). It's pretty cool that that's possible IMO. To be fair, the drivers are closed-source


Reading this comment, one would think Apple devices are very power efficient at the cost of running little in the background. In my experience, iOS has terrible battery life in the default mode, which is background app refresh enabled, and in general apps struggle keeping their state in the background, which is something that many people complain about on the internet. So the worst of the two worlds.

To get good battery life out of a device, having complete software and hardware integration is key. That's the PC blessing and curse, having to support all kinds of different CPUs, GPUs, chipsets, RAM, etc from many different places.

When you just have to focus on a handful of hardware platforms and when you own the hardware and software, this becomes much, much easier.


Linux has been running on low powered autonomous IoT devices for decades. I will hasard they are actually fully ready.

Don’t mix up IoT devices that are running the single app that does one thing, and user devices, where there’s a zoo of applications written by a third party. It’s not that free software such as embedded linux are incapable of being low-power, no, as the op correctly pointed it’s about managing and limiting what user space applications can do.

> I'm pretty sure battery performance would drop significantly if root was too easy to achieve.

No offense, but this is one of the most absurd things I have ever read on a hackernews discussion.

I bet if I could get root on iOS I would get even better battery life as I kill off services related to iCloud and other background processes I don’t want running.

> To get good battery perf out of a device, you need to be extremely good at saying "no", even if that "no" comes at the expense of user freedom and features.

There is zero evidence that this is the case. In fact saying “no” to root allows more services and things running on the device than I may want.


you could, literally 99.99% of users wouldn't.

Also, iPhones have 20% smaller batteries for the same battery life, but there could be multiple reasons (maybe combined even) for this.


> But really, imagine how much power these things have and if you could actually run a free (as in freedom, in the GNU sense) OS on them and really get access to all that power in a handheld device. Only if.

Skipping the "handheld" bit of this just for a second. You can run an (almost entirely) open stack on your hardware, and do so on an i9/9800X3D with 256GB RAM, 5080, and MultiTB of NVMe storage.

But it doesn't realy matter for 95% of users, because the hardware is already way faster than they need and the bottlenecks are on the server side and on shitty software architecture. I have an i9 with 128GB RAM for work, and Excel still takes 30+ seconds to load, Teams manages to grind the entire thing to a halt on startup, slack uses enough memory to power a spaceship... Running those apps on my desktop is pretty much the same experience as running them on my 10 year old macbook.


Something seems to be funny with your computer's setup. On my feeble i5 laptop with 16GB, Excel starts in about 3 seconds to the point where I can start doing stuff.

If it's a corporate device, it's usually some anti-virus abomination (or other security-related software) that steals 90% of the resources.


> it's usually some anti-virus abomination (or other security-related software) that steals 90% of the resources.

I'm almost certain that it's our Microsoft AD tenant.

Either way, kind of proves the point. We have plenty of power, the problem is <AD|antivirus|electron|PM touting their new UI overhaul>.


> slack uses enough memory to power a spaceship...

Which spaceship though? Not sure spaceship is the model you're looking for, as all of the ones I'm familiar have had a very locked down limited amount of memory. Apollo had something like 4Kb of memory. The space shuttle had 1MB.


Yes you're correct. It seems like you got my point though.

Slack often uses more memory than my IDE + compilers combined, to display the chat history of 60 people.


Yes, but it seems you have a misconception of the computers we've used in our spaceships. Most people are not familiar with how little compute was involved in our spacecraft.

Yes, pretty much everyone on this forum is aware that any Electron app is going to use way more memory than actually necessary as a trade off for developing in that ecosystem.


In efforts to save the punchline - I would move to change 'a spaceship' to 'interstellar jump calculations' but I fear the actual ram required would also be small.

Yeah... it almost seems like a brag instead of an insult. I wish my programs used only enough memory to power a spaceship!

“My program suite is ported to x86, x64, ARM, and Apollo Guidance Computer” :D

> M1 macbook is a locked down

Sure, iOS is certainly restrictive, fully locked-down, app store only etc etc, and I'd love a full-fat firefox with its plugin system available on my phone. But what are you doing on a non-Mac laptop that you can't do on an M1 mac?

I'm a big fan of linux and have used it as a main machine for many years, but use an M4 macbook as my daily driver at the moment (everyone else I work with does too, it's just easier). I haven't felt limited at all. I can build and install whatever I like, I have brew for my tooling needs...

Yeah I don't see it with Mac. Unless you're actually needing linux and dockerisation won't cut the mustard I guess.


If you're a Linux sysadmin type, it's nice to stay in the same environment as your vms, kubernetes, docker/podman containers, etc.

You also get nice eBPF tools.


> If you're a Linux sysadmin type, it's nice to stay in the same environment as your vms, kubernetes, docker/podman containers, etc.

I help sysadmin a few hundred servers, and given the choice I went with a MacBook because Terminal and SSH was good enough to admin stuff. MacOS is also pretty good with the business-y apps I have to deal with at times.

A colleague went with a x86 laptop and installed Ubuntu on it, and has regular issues with audio (Google Meeting, Zoom, etc), screen sharing (seems to be Wayland), etc.

At a previous job I had a Linux workstation under my desk and a Windows laptop, but with hybrid/remote I 'combined the two' into a Apple laptop.


Yeah, I would never in a million years run anything on Ubuntu. It's not exactly known for stability and reliability.

Sure, it's definitely nice to have a consistent env, no particular argument there.

It's more "where are the barriers/locks?" that I was interested in


Well, I can't really put Linux on most Macs. That's a barrier to me.

Apple doesn't want my money, because Apple doesn't want to sell me a laptop. Apple wants to sell me a curated experience with multiple components in their ecosystem.


On M1 / M2, asahi linux has decent support. People daily drive it.

Yes there has been some initial work done, but it's not exactly a mainstream daily driver suitable for deploying in a production environment.

Not sure if this relevant here but OS not phoning home constantly would be nice.

Just my opinion here, after ~4 years of using it at work and daily driving Linux for personal use, including development, for a decade:

- The user interface and UX is pretty and all[1], but doesn't quite work as I'd like and I can't really do much beyond a few limited "hacks". Switching workspaces has a horrible and annoying animation I can't turn off. All applications windows are grouped together and for example some actions cause all of them to jump to the top. Top-level shortcuts are limited and I can't do the same things I can on Linux - eg, I bind Super+Enter to open a new terminal window, on MacOS I can kind get a janky version of that, but due to how the window manager works, it not as streamlined as Linux

- The whole notarization stuff and signing - I mean okay, security, great. But it's annoying and you have to pay Apple like $100(?) a year just for the privilege of developing software for their platform. When I did desktop app dev on MacOS, I had to do `xattr com.apple.quarantine` commands to turn off the security nonsense that prevented me from running our own app I or my coworkers wanted to test locally.

- I have a list of utilities/apps I need to install on a new MacOS machine just to get it to partially behave the way I want. Ideally MacOS should let me customize it directly with the necessary options so these extra apps aren't necessary. Nothing I'm asking is all that complicated - Linux environments provide it more or less by default with a few setting tweaks, even Windows behaves closer to what I want and I'm no fan of Windows.

- Recently I noticed MacOS was using bunch of CPU while idling - I traced it down to some background indexing scanning that was running constantly. I had to look up esoteric command line commands to stop it - which didn't work. I ended up disabling Spotlight almost completely to make it stop using my CPU every time I stepped away for a few mins.

Annoying stuff like this really puts me off of MacOS. Like I'm being forced to conform to their way of thinking and using a device. I'm an adult, let me decide for myself.

tldr; I just like Linux, it works, it's slick, I can turn-on/off, add/remove whatever I want. I'm not restricted to what some company thinks my workflow should look like.

[1]: I'm leaving out their "glass UI" blunder... what a horribly silly thing that is. Plenty to be said about that and others already have, so I won't repeat it here.


OK, so this seems like a list of gripes about MacOS.

It's absolutely fine to have personal preferences on UX, customisability etc. This is why I swore off GNOME at the Gnome 3 transition and have never looked back, for example. If it doesn't work for you it doesn't work for you.

But it doesn't really support the assertion that you can't use the power of an M1 because of "how locked down everything is and most of that power is pretty useless".

Again, not trying to say "Thou shalt love MacOS!", but more that I don't think your points there really reflect something so locked down as to be useless. Just something with a UI you don't get along with.


Honestly I'm tried and didn't expect this thread to blow up like this.

People can use whatever they want. They're adults. I don't wanna debate. I just shared my random opinions.

If I had the choice, since I have a free Macbook laying around right now, I'd slap Linux on it and be happy - unfortunately doesn't look like Asahi Linux is quite ready yet for me to do so, few missing things. I ran Linux on a Intel Macbook (which I also didn't purchase, was given to me) for all of university and I was a happy camper.

That being said, would I buy a Mac voluntarily - nope. I'd rather buy a Thinkpad, install Linux, and I'm set for a decade honestly.


Good for you, totally valid choice. I'm not saying you shouldn't use what you want, or even that Mac is best (or even that mac is best for me!)

I'm only taking you to task on the "locked down" assertion.


> But really, imagine how much power these things have and if you could actually run a free (as in freedom, in the GNU sense) OS on them and really get access to all that power in a handheld device. Only if.

Could you elaborate? What specifically would you do? Because I'm finding it hard to imagine what I'd do with an "open" iPhone that I can't do now, but it's extremely easy to imagine all the horrific security risks that would emerge in what today is most people's primary computing device, storing data about literally their entire lives.


My usage of "handheld" was vague. I meant any portable device (laptops, but also including phones/tablets).

If you're finding it hard to imagine what you can do with a device that _does not_ restrict what you can do with it, then you're likely fine in the Apple ecosystem, that's fair and okay. Some people aren't, you'll just have to take my word for it, I don't wanna write an essay here and you're probably not interesting in reading all that.

Security risk is a common one that comes up. Google used that to justify locking down sideloading recently. Let me take the risk. I bought this device, I should be allowed to make adult decisions right? I'm not downloading stuff off Limewire or a shady website. I'm downloading stuff off of Linux distro repos or F-Droid.

There's a lot more to be said about all this. Including the amount of e-waste created because a device is too old to be supported by manufacturers, yet people run decade(s) old laptops/desktops using free OSs because they can.

Just my 1AM rambling thoughts. Hope some of it makes some sense.


> If you're finding it hard to imagine what you can do with a device that _does not_ restrict what you can do with it

Go on, give some examples.


Not OP, but here are just a few things I do currently on my Android (phones and tablets):

* Use (true) Firefox w/ extensions or other browsers

* Sideload apps that aren't available in the store (this is increasingly common with open source projects that don't want the headache of dealing with app stores)

* Install my own apps (which I increasingly vibe-code since I'm the only user) and not have to deal with paying Apple or reinstalling every few days or week or whatever

* Write bash and ruby scripts to automate things on my device which often require interacting with system APIs (tmux is my platform for this on Android currently)

* Pin versions of apps that have enshittified or sold to gross companies that harvest data or switch to subscriptions models by copying the APK and re-installing it on new devices

* Install alternate/experimental graphical shells that are frequently innovative and interesting (though rarely useful in the long-term, but it's still fun)

* Option to use other ROMs such as Graphene OS

* Capture packets and proxy traffic to see what my device is doing (this has gotten pretty hard on Android now, but still something I want to do)

* Have an on-device fine-grained firewall to tightly control which apps are allowed network access

There are definitely other things I can't think of at the moment, but I'm not sure why you're being so hostile to GP. Saying that iOS devices are locked down and can't do a lot of stuff doesn't seem like a very controversial opinion, especially on HN.


> Use (true) Firefox w/ extensions or other browsers

No longer true as of this year.

> tmux

typo?

I agree with you about side loading. Apple does not. I wonder if regulations can eventually force their hand.

Some of your other points (scripting, packet sniffing, general shell access and command line tools) are just done differently, and you'd just need new tools of the trade if you actually wanted to do it. Also, a bunch of the things you have mentioned requires unlocking the android bootloader and obtaining root privileges. You can do that to a large extent for ios (jailbreaking), Apple is just more competent about shutting it out than other companies.


Ah yes, sorry meant to type termux but muscle memory must have autocorrected it to tmux :-D

Thanks for writing it up. I agree with all your points. I stopped myself from replying further to the other commenters - they don't seem to be interested in an actual meaningful calm discussion.

Running goddamn Emacs for one. Running the software I need for work like Python with a full suite of packages and Wolfram Mathematica. Remapping freaking keys and their behaviour. The possibilities are endless!

On iPhone?

On an iPad. But sometimes, in a pinch, it can be nice to rerun a script to update some plots, so iPhone as well.

Nothing, it’s never anything real and just some fantasy of what they could have if someone else put in an incredible amount of work to achieve something nebulous they got the impression of from a sci-fi book.

They want a cyber deck, except good and useful and apple hardware.

I often find myself wondering why these people aren’t happily using some Android rom and are instead using an iPhone.


I think literally this whole post is about doing stuff on your iPhone that Apple doesn’t want you to do. So maybe start with TFA?

Run a web server exposed through a Cloudflare Tunnel. Write code in one program, compile it in another using a shared filesystem. Write mods and extensions for programs which expose an API or just patch their files if you can figure out how to reverse them. Run programs like ffmpeg or yt-dlp directly on a CLI.


Are you trying to make some kind of point? Use your words.

Ah sorry, my comment didn’t cover the 7 people that want to do software development on their iPhone.

Idk, maybe like not being forced to use their new glass UI? Or whatever new UI trend they'll decide to implement.

On a unrestricted OS, I can just switch to a different desktop environment.

If you read the rest of this thread, instead of asking, you'll find plenty examples. But hey, if you like MacOS, great, anyone else's opinions don't matter.


Your definition of their product is different to theirs. They're selling a pretty sealed, you-get-what-you-get product. You want a hackable personal computer.

A bit like how you buy a can of Coke and you can't add your own sugar. It just comes with sugar, unless you buy a different product from Coke, which is a fixed choice of sweetener. Saying "other products let you choose whether or not to add that sugar or sweetener" to me doesn't mean that Coke need to change anything.


> idk

Yeah, was obvious from the first comment


I'm a heavy Terminal user and run everything from local LLMs to full stack dev (react/python). I dibble and dabble in Blender, Unreal, and Logic Pro. I aimlessly browse the web looking for recipes, 3d printing files, shopping, HN, whatever. I'll occasionally spin up Age of Empire II locally or play some quick games via GeForceNow. I'm in full control of my Synology and Qnap NAS servers and the shit ton of media that's on it.

And I do all of that on my Mac. My 4090 rig is strictly for gaming with my son and my Proxmox Linux retired thin client rigs are for running my household on HA.

Please tell me what I'm missing out on by using a Mac OS device as my daily driver.


You're probably happy. That's great.

If you read the rest of this thread you'll see specific examples others point out.


The specific examples in the thread, AFAICT, are about iOS, not macOS, and the person you're responding to specifically mentioned Macs. It's very hard to find examples of "things you cannot do on an Apple Silicon Mac due to Apple-imposed restrictions that you can do on a PC" that aren't pretty esoteric. (Unless you want to argue that the inability to plug in a better third-party GPU is due to Apple-imposed restrictions, which is debatable but defensible.)

If you read my other comment, you'll see Mac specific examples. Examples from my own experience over multiple years.

Connect screen and keyboard and turn it into a full desktop with desktop apps. Run VMs for insecure operations.

I have fantasies sometimes of a powerful phone that docks into a laptop chassis with expandable I/O like a framework

A future where we carry and manage just one device could be incredible. That said, today, even if iOS weren’t so locked down and more capable of that, I think I’d find myself frustrated. I run on device local llm’s on my iPhone and a heavily quantized 3b parameter model starts to cause the iPhones thermal management to heavily throttle after just a few prompts with light tokens, to the point it’s slower than 1 token per second for inference or response, and the phone gets hot to the touch. Maybe the rumored half iPhone half iPad device could be the eventual platform from which something like this emerges.

While my main driver is a maxed out MacMini hooked to an Apple Studio monitor, at least once a week I pack up and store my MacMini and plug an iPadPro into my large monitor for a few days.

So, I feel like I routinely experience what we are talking about in this sub-thread. Given a few VPS’s to ssh/mosh into for programming and a keyboard and mouse, this is a workable setup.

The one thing that always gets me to unpack my MacMini and set it up is that even with 16G shared memory on a iPadPro, I can only run local models in a chat-style app. On macOS, my LLM use is mostly embedded in experimental scripts and apps.


exactly. The real shame of these devices is they're 99% of the way there but that last inch of running x script requiring you to whip out a form-identical device that has been blessed with the ability of running uncertified code is maddening to say the least

perhaps that's what they're developing all these "private compute" servers for. Though I would be less than happy if Apple, the last (relatively) untaken hill of the SaaS enshittification wars were to go down that road. In the meantime I will continue to use my hilariously overpowered laptop as a SSH terminal to the machine I actually work on

Librem 5 is not too powerful, but it works as a desktop.

I've used it as well as an x86 phone running macos and an ipad mini on a lark for a week, at this point in my life as much as I complain, imessage is basically the only secure communication mechanism I can get most people to use

Have real ad blocking in the browser.

(which would mitigate a lot of security risks by itself. I also note that people seem to do fine with desktop OSes, despite their outdated security models)

Also, a working foss ecosystem.


I already use a-shell to run python scripts that fetch media, news summaries, server dashboards etc. It's really a shame I can't actually do what I want like with android where I could make custom permanent free apps for myself and do what I pleased throughout the system, executing binaries that interfaced with the real fs or remuxing video, rsyncing to my server.

> What specifically would you do?

All kinds of shit.

I'd make locking the phone while the flashlight is operating require pressing the lock button again to wake the screen with no exceptions, so the screen no longer shines in my eyes reducing the effectiveness of the flashlight, and stay palm input stops opening the camera.

I'd hook screen time management of my children's devices—which I perform on my own device—into FaceID instead of requiring a stupid passcode.

You don't have to go far to find areas where iOS could use some customization. But if it's Apple's code, the most useful adjustments are off limits.

Jailbroken iOS was a fantastic platform for the first 9 major releases or so because it had that kind of stuff in it. Now it's "throw a suggestion in the box on our website and we'll ignore it in the order it was received."


From what I understand iPhones support external displays out of thebox, so you could use one as your main computer and do any productive stuff like development, video/3d/photos editing, anything really you can do on a computer with the liberty to install open source tools, develop/open drivers for anything connected to usb or bt, etc.

I'd remove all the fluff that I'm not interested in.

One think would be: permanently installing my own apps, without paying for that option.

I recently sold my 1TB M1 Cellular iPad at a loss and picked up a 2019 Intel MacBook Pro for exactly the same reason.

I don't even need GNU-freedom, regular MacOS is fine. I just can't live with a iPadOS anymore.

edit: you can pry locked down iOS from my cold dead hands. Love it exactly because it's a walled garden.


Having trouble understanding your edit.

Start with a laptop, you believe they should be open.

Remove the keyboard so it's only a screen, you believe they should be opened.

Shrink that screen down, and now they should be locked down?


The walled garden is very pleasant, you really have to put in exceptional amounts of effort to get malware on your device.

Sure, maybe the person I replied to has that same line of thought.

Why do the same restrictions bother them on a bigger screen is what I'm getting at.

What if the iPhone supported more traditional desktop resolutions when plugged into a display, you'd be staring at a screen with an Apple UI and more desktop/tablet like amounts of screen real estate. What of the walled garden then.


In my case, I use bigger screen devices with somewhat exotic productivity tools that would not necessarily fit well in the walled garden.

On the other hand, an ultra locked down macbook would sound pretty ideal for day-to-day browsing, handling financial tasks, work communications and so on. Really everything except the software development tasks I work on.

On the other hand, I do almost everything over SSH already. I guess I really could easily live with a completely locked down base MacOS install without any issues. Even Terminal.app isn't too bad anymore.


Apple's heavy handed vision works for me in the phone format. I've spent my years messing with android ROMs and don't want to go back.

I would love plug-in display type functionality for my phone, but not at the expense of leaving the walled garden.


> But really, imagine how much power these things have and if you could actually run a free (as in freedom, in the GNU sense) OS on them and really get access to all that power in a handheld device. Only if.

I sort of don't have to imagine, because somewhat viable options like this exist (eg. GrapheneOS). The issue there is that I'd still rather use a more polished handheld device (iOS) than jump ship and get those extra features.

And wondering what GrapheneOS would be like with all its power, plus the polish of iOS is pointless fantasy, because it likely won't ever happen.

My guess, based on experience, is that eventually, iOS's quality will degrade enough that I'll find Android or GrapheneOS more attractive.


> eventually, iOS's quality will degrade enough that I'll find Android or GrapheneOS more attractive.

Tbh, with the quality of the latest iOS I’m getting pretty close to that point. Looking at Ubuntu Touch right now.


> I have an M1, which is like N-times faster than the laptop I write this on. Yet it collects dust because I'd rather continue to use this old dinosaur laptop because that M1 macbook is a locked down, very fast, shiny Ferrari, but I just want a Honda Civic I can do whatever I want with.

Your M1 has supported Linux pretty well for years now… Install the Fedora Asahi Remix and give it a try.


Missed an opportunity there to say that you can use the Ferrari only on a track, while you can go on any road with the Civic!

That’s pretty much Google current bet. They are slowly enabling first party support for Linux app on Android while connected to a screen in a desktop mode.

It makes a lot of sense considering high end SoC are now more powerful than the M1.


Curious - what do you do on your Linux (or FreeBSD natch) box that your m1 couldn’t?

For me, it's always been the lack of a power-user-friendly windowing/workspace scheme. You can approximate a tiling window manager using yabai or similar solutions, but it's just not the same thing.

I love using the MacBooks, but the OS just doesn't feel like it was designed for me, and that would be OK, but I have limited alternatives if I want all of the hardware to keep working.

Also, yes, gaming, but that's less important to me.


I'm surprised that anyone has to ask, anymore:

- Locked, proprietary bootloader with no guaranteed Linux support

- No official Vulkan drivers, DXVK broken without downstream patches (unlike every other GPU I own)

- Every Docker solution runs worse than WSL (somehow)

- macOS is ad-ridden and genuinely intolerable

- APFS is borderline useless relative to EXT4 and NTFS, doubly so if you collaborate at work


What I get from this is a description of how it is locked down, but not really an explanation of what you wish to do but cannot.

I'm so sorry, can you attempt to infer based on the clues I've left you?

Maybe try having a conversation instead of being so defensive.

So, gaming and x64 native docker?

I presume collaboration at work means some sort of remote mounting of filesystems -- is `brew install samba` bad in some way?

Re: ads - this is genuinely my complaint with Windows, but I thought I was getting a pretty ad-free experience in Mac, what am I missing?


.. macOS is ad-ridden? perhaps I'm already brain broken, but beyond like a few ads for icloud pro for time machine or whatever when I'm already poking around in relavent settings, I never see ads. it feels extremely unobtrusive. the other points are not relevant to me, so I suppose it makes sense why I don't care, but iirc apple's `container` OCI runner is highly optimized for the M series, did you have significant issues with it?

I imagine we might seen some of these features in a iPhone foldable since it will have the space to take advantage of these features.

Posting this (didn't find a previous post of it), because between...

  - This
  - Android requiring developer verification (partially reverted)
  - Apple using TPM to lock you out of data on your own machine [1]
  - Apple's slowly tightening security regime on macOS [2]
  - I'll keep the list short but many more recent items from Apple and Google especially - less so Microsoft I think?
I've been noticing case-after-case of Big Tech security teams with good intentions locking down access points into software. It keeps malware authors out, but... it also keeps out developers, and also those who'd ship software that'd break into their lock-in.

Want to script Chrome for your users? Out. Want to import from or read their ChatGPT chats? Out. Want to ship them an app that does something Apple/Google-unapproved? Or that talks to or inspects other apps? Out.

It feels like Big Tech security teams are locking down software for yesterday's world – whilst each of them keeps and uses their own access – at a time when the opportunity to tinker with our computers is at an all-time-high.

[1]: https://news.ycombinator.com/item?id=45663810

[2]: If you'd told me in the early 2000s that I'd have to send every binary that'll run on someone else's computer to Apple first for it to not scare-screen them... at least SmartScreen/Chrome's approach are statistical and hash-based rather than having to receive a full copy of every such binary.


That's what the personality selector is for: you can just pick 'Efficient' (formerly Robot) and it does a good job of answering tersely?

https://share.cleanshot.com/9kBDGs7Q


FWIW I didn't like the Robot / Efficient mode because it would give very short answers without much explanation or background. "Nerdy" seems to be the best, except with GPT-5 instant it's extremely cringy like "I'm putting my nerd hat on - since you're a software engineer I'll make sure to give you the geeky details about making rice."

"Low" thinking is typically the sweet spot for me - way smarter than instant with barely a delay.


I hate its acknowledgement of its personality prompt. Try having a series of back and forth and each response is like “got it, keeping it short and professional. Yes, there are only seven deadly sins.” You get more prompt performance than answer.


I like the term prompt performance; I am definitely going to use it:

> prompt performance (n.)

> the behaviour of a language model in which it conspicuously showcases or exaggerates how well it is following a given instruction or persona, drawing attention to its own effort rather than simply producing the requested output.

:)


Might be a result of using LLMs to evaluate the output of other LLMs.

LLMs probably get higher scores if they explicitly state that they are following instructions...


It's like writing an essay for a standardized test, as opposed to one for a college course or for a general audience. When taking a test, you only care about the evaluation of a single grader hurrying to get through a pile of essays, so you should usually attempt to structure your essay to match the format of the scoring rubric. Doing this on an essay for a general audience would make it boring, and doing it in your college course might annoy your professor. Hopefully instruction-following evaluations don't look too much like test grading, but this kind of behavior would make some sense if they do.


That's the equivalent of a performative male, so better call it performative model behaviour.


Pay people $1 and hour and ask them to choose A or B, which is more short and professional:

A) Keeping it short and professional. Yes, there are only seven deadly sins

B) Yes, there are only seven deadly sins

Also have all the workers know they are being evaluated against each other and if they diverge from the majority choice their reliability score may go down and they may get fired. You end up with some evaluations answered as a Keynesian beauty contest/family feud survey says style guess instead of their true evaluation.


I can’t tell if you’re being satirical or not…



jfc thank you for the context


This is even worse on voice mode. It's unusable for me now.


I use Efficient or robot or whatever. It gives me a bit of sass from time to time when I subconsciously nudge it into taking a “stand” on something, but otherwise it’s very usable compared to the obsequious base behavior.


If only that worked for conversation mode as well. At least for me, and especially when it answers me in Norwegian, it will start off with all sorts of platitudes and whole sentences repeating exactly what I just asked. "Oh, so you want to do x, huh? Here is answer for x". It's very annoying. I just want a robot to answer my question, thanks.


At least it gives you an answer. It usually just restates the problem for me and then ends with “so let’s work through it together!” Like, wtf.


repeating what is being asked is fine i think, sometimes is thinks you want something different to what you actually want. what is annoying is "that's and incredibly insightul question that delves into a fundamental..." type responses at the start.


At least for the Thinking model it's often still a bit long-winded.


Unfortunately, I also don't want other people to interact with a sycophantic robot friend, yet my picker only applies to my conversation


Hey, you leave my sycophantic robot friend alone.


Sorry that you can't control other peoples lives & wants


This is like arguing that we shouldn't try to regulate drugs because some people might "want" the heroin that ruins their lives.

The existing "personalities" of LLMs are dangerous, full stop. They are trained to generate text with an air of authority and to tend to agree with anything you tell them. It is irresponsible to allow this to continue while not at least deliberately improving education around their use. This is why we're seeing people "falling in love" with LLMs, or seeking mental health assistance from LLMs that they are unqualified to render, or plotting attacks on other people that LLMs are not sufficiently prepared to detect and thwart, and so on. I think it's a terrible position to take to argue that we should allow this behavior (and training) to continue unrestrained because some people might "want" it.


What's your proposed solution here? Are you calling for legislation that controls the personality of LLMs made available to the public?


There aren't many major labs, and they each claim to want AI to benefit humanity. They cannot entirely control how others use their APIs, but I would like their mainline chatbots to not be overly sycophantic and generally to not try and foster human-AI friendships. I can't imagine any realistic legislation, but it would be nice if the few labs just did this on their own accord (or were at least shamed more for not doing so)


Unfortunately, I think a lot of the people at the top of the AI pyramid have a definition of "humanity" that may not exactly align with the definition that us commoners might be thinking of when they say they want AI to "benefit humanity".

I agree that I don't know what regulation would look like, but I think we should at least try to figure it out. I would rather hamper AI development needlessly while we fumble around with too much regulation for a bit and eventually decide it's not worth it than let AI run rampant without any oversight while it causes people to kill themselves or harm others, among plenty of other things.


At the very least, I think there is a need for oversight of how companies building LLMs market and train their models. It's not enough to cross our fingers that they'll add "safeguards" to try to detect certain phrases/topics and hope that that's enough to prevent misuse/danger — there's not sufficient financial incentive for them to do that of their own accord beyond the absolute bare minimum to give the appearance of caring, and that's simply not good enough.


I work on one of these products. An incredible amount of money and energy goes into safety. Just a staggering amount. Turns out it’s really hard.


Yes. My position is that it was irresponsible to publish these tools before figuring out safety first, and it is irresponsible to continue to offer LLMs that have been trained in an authoritative voice and to not actively seek to educate people on their shortcomings.

But, of course, such action would almost certainly result in a hit to the finances, so we can't have that.


Cynicism is so blinding.

Alternative take: these are incredibly complex nondeterministic systems and it is impossible to validate perfection in a lab environment because 1) sample sizes are too small, and 2) perfection isn’t possible anyway.

All products ship with defects. We can argue about too much or too little or whatever, but there is no world where a new technology or vehicle or really anything is developed to perfection safety before release.

Yeah, profits (or at least revenue) too. But all of these AI systems are losing money hand over fist. Revenue is a signal of market fit. So if there are companies out there burning billions of dollars optimizing the perfectly safe AI system before release, they have no idea if it’s what people want.


Oh, lord, spare me the corporate apologetics.

Releasing a chatbot that confidently states wrong information is bad enough on its own — we know people are easily susceptible to such things. (I mean, c'mon, we had people falling for ELIZA in the '60s!)

But to then immediately position these tools as replacements for search engines, or as study tutors, or as substitutes for professionals in mental health? These aren't "products that shipped with defects"; they are products that were intentionally shipped despite full knowledge that they were harmful in fairly obvious ways, and that's morally reprehensible.


Ad hom attacks instantly declare “not worth engaging with”.


That's a funny irony: I didn't use an ad hominem in any way, but your incorrect assertion of it makes me come to the same conclusion about you.


Pretty sure most of the current problems we see re drug use are a direct result of the nanny state trying to tell people how to live their lives. Forcing your views on people doesn’t work and has lots of negative consequences.


Okay, I'm intrigued. How in the fuck could the "nanny state" cause people to abuse heroin? Is there a reason other than "just cause it's my ideology"?


I don't know if this is what the parent commenter was getting at, but the existence of multi-billion-dollar drug cartels in Mexico is an empirical failure of US policy. Prohibition didn't work a century ago and it doesn't work now.

All the War on Drugs has accomplished is granting an extremely lucrative oligopoly to violent criminals. If someone is going to do heroin, ideally they'd get it from a corporation that follows strict pharmaceutical regulations and invests its revenue into R&D, not one that cuts it with even worse poison and invests its revenue into mass atrocities.

Who is it all even for? We're subsidizing criminal empires via US markets and hurting the people we supposedly want to protect. Instead of kicking people while they're down and treating them like criminals over poor health choices, we could have invested all those countless billions of dollars into actually trying to help them.


I'm not sure which parent comment you're referring to, but what you're saying aligns with my point a couple levels up: reasonable regulation of the companies building these tools is a way to mitigate harm without directly encroaching on people's individual freedoms or dignities, but regulation is necessary to help people. Without regulation, corporations will seek to maximize profit to whatever degree is possible, even if it means causing direct harm to people along the way.


Comparing LLM responses to heroine is insane.


I'm not saying they're equivalent; I'm saying that they're both dangerous, and I think taking the position that we shouldn't take any steps to prevent the danger because some people may end up thinking they "want" it is unreasonable.


No one sane uses baseline webui 'personality'. People use LLMs through specific, custom APIs, and more often than not they use fine tune models, that _assume personality_ defined by someone (be it user or service provider).

Look up Tavern AI character card.

I think you're fundamentally mistaken.

I agree that to some users use of the specific LLMs for the specific use cases might be harmful but saying (default AI 'personality') that web ui is dangerous is laughable.


heroin is the drug, heroine is the damsel :)


You’re absolutely right!

The number of heroine addicts is significantly lower than the number of ChatGPT users.


I am with you. Insane comparisons are the first signs of an activist at work.


I don't know how to interpret this. Are you suggesting I'm, like, an agent of some organization? Or is "activist" meant only as a pejorative?

I can't say that I identify as any sort of AI "activist" per se, whatever that word means to you, but I am vocally opposed to (the current incarnation of) LLMs to a pretty strong degree. Since this is a community forum and I am a member of the community, I think I am afforded some degree of voicing my opinions here when I feel like it.


Disincentivizing something undesirable will not necessarily lead to better results, because it wrongly assumes that you can foresee all consequences of an action or inaction.

Someone who now falls in love with an LLM might instead fall for some seductress who hurts him more. Someone who now receives bad mental health assistance might receive none whatsoever.


Your argument suggests that we shouldn’t ever make laws or policy of any kind, which is clearly wrong.


Your argument suggests that blanket drug prohibition is better than decriminalization and education.

Which is demonstrably false (see: US Prohibition ; Portugal)


I disagree with your premise entirely and, frankly, I think it's ridiculous. I don't think you need to foresee all possible consequences to take action against what is likely, especially when you have evidence of active harm ready at hand. I also think you're failing to take into account the nature of LLMs as agents of harm: so far it has been very difficult for people to legally hold LLMs accountable for anything, even when those LLMs have encouraged suicidal ideation or physical harm of others, among other obviously bad things.

I believe there is a moral burden on the companies training these models to not deliberately train them to be sycophantic and to speak in an authoritative voice, and I think it would be reasonable to attempt to establish some regulations in that regard in an effort to protect those most prone to predation of this style. And I think we need to clarify the manner in which people can hold LLM-operating companies responsible for things their LLMs say — and, preferably, we should err on the side of more accountability rather than less.

---

Also, I think in the case of "Someone who now receives bad mental health assistance might receive none whatsoever", any psychiatrist (any doctor, really) will point out that this is an incredibly flawed argument. It is often the case that bad mental health assistance is, in fact, worse than none. It's that whole "first, do no harm" thing, you know?


Who are you to determine what other people want? Who made you god?


...nobody? I didn't determine any such thing. What I was saying was that LLMs are dangerous and we should treat them as such, even if that means not giving them some functionality that some people "want". This has nothing to do with playing god and everything to do with building a positive society where we look out for people who may be unable or unwilling to do so themselves.

And, to be clear, I'm not saying we necessarily need to outlaw or ban these technologies, in the same way I don't advocate for criminalization of drugs. But I think companies managing these technologies have an onus to take steps to properly educate people about how LLMs work, and I think they also have a responsibility not to deliberately train their models to be sycophantic in nature. Regulations should go on the manufacturers and distributors of the dangers, not on the people consuming them.


here’s something I noticed: If you yell at them (all caps, cursing them out, etc.), they perform worse, similar to a human. So if you believe that some degree of “personable answering” might contribute to better correctness, since some degree of disagreeable interaction seems to produce less correctness, then you might have to accept some personality.


Interesting codex just did the work once I sweared. Wasted 3-4 prompts being nice. And angry style made him do it.


Actually DeepSeek performs better for me in terms of prompt adherence.


ChatGPT 5.2: allow others to control everything about your conversations. Crowd favorite!


so good.


You’re getting downvoted but I agree with the sentiment. The fact that people want a conversational robot friend is, I think, extremely harmful and scary for humanity.

Giving people what makes them feel good in the short term is not actually necessarily a good thing. See also: cigarettes, alcohol, gambling, etc.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: