Yes, it's a mess, and there will be a lot of churn, you're not wrong, but there are foundational concepts underneath it all that you can learn and then it's easy to fit insert-new-feature into your mental model. (Or you can just ignore the new features, and roll your own tools. Some people here do that with a lot of success.)
The foundational mental model to get the hang of is really just:
* An LLM
* ...called in a loop
* ...maintaining a history of stuff it's done in the session (the "context")
* ...with access to tool calls to do things. Like, read files, write files, call bash, etc.
Some people call this "the agentic loop." Call it what you want, you can write it in 100 lines of Python. I encourage every programmer I talk to who is remotely curious about LLMs to try that. It is a lightbulb moment.
Once you've written your own basic agent, if a new tool comes along, you can easily demystify it by thinking about how you'd implement it yourself. For example, Claude Skills are really just:
1) Skills are just a bunch of files with instructions for the LLM in them.
2) Search for the available "skills" on startup and put all the short descriptions into the context so the LLM knows about them.
3) Also tell the LLM how to "use" a skill. Claude just uses the `bash` tool for that.
4) When Claude wants to use a skill, it uses the "call bash" tool to read in the skill files, then does the thing described in them.
and that's more or less it, glossing over a lot of things that are important but not foundational like ensuring granular tool permissions, etc.
This works really well already. You can fire up something like Wispr Flow and dump what you're saying directly into Claude Code or similar, it will ignore the ums and usually figure out what you mean.
I use ChatGPT voice mode in their iPhone app for this. I walk the dog for an hour and have a loose conversation with ChatGPT through my AirPods, then at the end I tell it to turn everything we discussed into a spec I can paste into Claude Code.
Hi Alan!
I've got some assumptions regarding the upcoming big paradigm shift (and I believe it will happen sooner than later):
1. focus on data processing rather than imperative way of thinking (esp. functional programming)
2. abstraction over parallelism and distributed systems
3. interactive collaboration between developers
4. development accessible to a much broader audience, especially to domain experts, without sacrificing power users
In fact the startup I'm working in aims exactly in this direction. We have created a purely functional visual<->textual language Luna ( http://www.luna-lang.org ).
By visual<->textual I mean that you can always switch between code, graph and vice versa.
As the user, not you the author of some code, make changes to the UI it will be via events. In the respective event handlers just update a big state object with the identity of the node and how it is changed. When the page loads simply read from that state object to recreate the respective node exactly how they were when the user modified them. That's it. Nothing more.
Its how I wrote a full OS gui for the browser. It does far more than SPA framework, does it substantially faster, and requires only a tiny fraction of the code.
Yes, yes, here come all the excuses, like: I can't code or it wont work in a team, because other people cant code. Stop with the foolishness. Provide some governance, preferably fully automated, around event handling and you have complete obsoleted your giant nightmare framework.
Interests must align. People must know common jargon which convey high density information faster. I would like to have positive ROI on the conversations that I have.
In computing and engineering, almost all "new" developments are actually variations or recombinations of things that have already been done.
BUT it's not just computing! That is also true of all the other arts, crafts, literature, music, ....
There is nothing wrong with this! It can be fun and rewarding to experiment this way, and some of the variations and recombinations are sufficiently novel and/or valuable that they are taken up by others. You just have to try the experiment and see where it goes.
An interesting form of variation is to remove something, or revert to something that was almost abandoned years ago. There have been a number of interesting projects recently that do this - the uxn virtual machine, the Gemini internet protocol, even the SerenityOS operating system. All three of these projects have attracted many developers besides their original authors.
Another form of variation is scale. Pick something insanely ambitious -- that seems to require a huge corporation -- and scale it down so you can do it all by yourself. The marginalia search engine is created and operated by one guy and runs on a computer in his living room.
This highlights an incentive issue and a paradigm issue in the blockchain space. There is a strong incentive to get new protocols up and running and even though users are technically responsible as they can audit the code, most users, of course, are not qualified to do so, and stand to make more from exploiting any vulnerabilities they find than they are pointing for them out.
The issue of paradigm is just how poorly suited EVM is as a smart contract language. It is too hard to manage the complexity of bytecode with memory managed on chain and in contract code. When small errors have huge consequences and there are no second chances, EVM is and should be pointed out to be one of the worst standards Ethereum has brought to crypto.
When it comes to high stakes programming and especially in a space where new programs are written quickly and often, its almost objectively obvious that functional styles with more guard rails is the better choice. Nothing running on chain needs an infinite loop or many other fancy features one typically expects from a programming language - restrictions and readability in smart contract code are more important to computational chains than seat-belts are to automobiles, but the network effect is slowing the space down.
Edit: Some people have pointed out this was an issue with Solana contracts, which run on Rust - so I was wrong to use this as an example for half of my point, but still believe the point stands. Even Rust IMO is not tied down enough for contract code, but the fact that it can happen on Rust which is loads better then EVM for bug catching I think proves the point a bit more if anything.
Scenario a) No authentication-y bits in the url. User goes to the url, site checks if the user is already logged in via a cookie. If so, does the POST request.
Typically in this case the urls are easily guessable, so that's an easy CSRF. In principle they could be made per-user (some sort of HMAC on a user+timestamp). In practise, I think its fairly common for websites not to do that in this sort of situation.
scenario b) The url contains some sort of nonce, or signed assertion that automatically logs the user in. I think this is fairly common in email urls, because web developers want people to be able to take an action just from clicking the link in the email, even if the device they read email on is not the same as the device they normally use to interact with the application. I also think this is the scenario that applies to this discussion, since it was started around talking about the issues caused by google auto following links, and in scenario A, google auto-following links would not be an issue.
Of course, in principle, its possible that the authentication bits in the url, just authenticate that action, and don't generally log the user in. In practise I think its really common to just generally log the user in, since most sites want people to stay on the site once the user does anything, and not just immediately exit after the email action is completed.
These urls are typically not guessable. A small percentage of users do tend to smatter these across the internet (e.g. https://urlscan.io/), but ignoring that, this is a login-CSRF. That is, an attacker can generate their own such url, and force their victim to log into the attacker's account.
The impact of a login-csrf tend to be very application specific. Sometimes its kind of minor, but I have definitely seen cases in major websites where a login-csrf can lead to a full account take-over of the victim's account.
> Actually if the page can open a popup I think it could also execute JavaScript within the context of the page and perform the user interaction right there.
This is only true if the pop-up has the same origin as the site that opened it. Otherwise there is just a very limited API (Basically, postMessage(). Also both sides can change the current url of the other side, which is a bit nuts). Also there is now a new http header, Cross-Origin-Opener-Policy that affects this.
Yes, it's a mess, and there will be a lot of churn, you're not wrong, but there are foundational concepts underneath it all that you can learn and then it's easy to fit insert-new-feature into your mental model. (Or you can just ignore the new features, and roll your own tools. Some people here do that with a lot of success.)
The foundational mental model to get the hang of is really just:
* An LLM
* ...called in a loop
* ...maintaining a history of stuff it's done in the session (the "context")
* ...with access to tool calls to do things. Like, read files, write files, call bash, etc.
Some people call this "the agentic loop." Call it what you want, you can write it in 100 lines of Python. I encourage every programmer I talk to who is remotely curious about LLMs to try that. It is a lightbulb moment.
Once you've written your own basic agent, if a new tool comes along, you can easily demystify it by thinking about how you'd implement it yourself. For example, Claude Skills are really just:
1) Skills are just a bunch of files with instructions for the LLM in them.
2) Search for the available "skills" on startup and put all the short descriptions into the context so the LLM knows about them.
3) Also tell the LLM how to "use" a skill. Claude just uses the `bash` tool for that.
4) When Claude wants to use a skill, it uses the "call bash" tool to read in the skill files, then does the thing described in them.
and that's more or less it, glossing over a lot of things that are important but not foundational like ensuring granular tool permissions, etc.