You gain experience getting interactions with other agencies optimised by dealing with them yourself. If the AI you rely on fails, you are dead in the water. And I'm speaking as a fairly resilient 50 year old with plenty of hands-on experience, but concerned for the next generation. I know generational concern has existed since the invention of writing, and the world hasn't fallen apart, so what do I know? :)
The Jupiter Ace was unreal, but only from a computer science perspective. You had to know a lot to know how to program Forth which was the fundamental language of that white but Spectrum-looking dish of a PC, in spite of a manual that read like HGTTG. Critically, it didn't reward you from the start of your programming journey like Logo or Basic did, and didn't have the games of the ZX Spectrum. I knew a person who tried to import and sell them in Australia. When I was young, he gave me one for free as the business had failed. RIP IM, and thanks for the unit!
A CorelDraw version from the 1990s I used had an honest progress bar. Sometimes it went backwards, but by the time it got to the end, it was truly finished.
A question:
Does anyone know how well AI does generating performative SQL in years-old production databases? In terms of speed of execution, locking, accuracy, etc.?
It's very hit or miss. Claude does OK-ish, others less so. You have to explicitly state the DB and version, otherwise it will assume you have access to functions / features that may not exist. Even then, they'll often make subtle mistakes that you're not going to catch unless you already have good knowledge of your RDBMS. For example, at my work we're currently doing query review, and devs have created an AI recommendation script to aid in this. It recommended that we create a composite index on something like `(user_id, id)` for a query. We have MySQL. If you don't know (the AI didn't, clearly), MySQL implicitly has a copy of the PK in every secondary index, so while it would quite happily make that index for you, it would end up being `(user_id, id, id)` and would thus be 3x the size it needed to be.
Can someone please answer these questions because I still think AI stinks of a false promise of determinable accuracy:
Do you need an expert to verify if the answer from AI is correct?
How is it time saved refining prompts instead of SQL? Is it typing time?
How can you know the results are correct if you aren't able to do it yourself?
Why should a junior (sorcerer's apprentice) be trusted in charge of using AI?
No matter the domain, from art to code to business rules, you still need an expert to verify the results.
Would they (and their company) be in a better place to design a solution to a problem themselves, knowing their own assumptions? Or just check of a list of happy-path results without a FULL knowledge of the underlying design?
This is not just a change from hand-crafting to line-production, it's a change from deterministic problem-solving to near-enough is good enough, sold as the new truth in problem-solving. It smells wrong.
We recently did the first speed run where Louie.ai beat teams of professional cybersecurity analysts in an open competition, Splunk's annual Boss of the SOC. Think writing queries, wrangling Python, and scanning through 100+ log sources to answer frustratingly sloppy database questions:
- We get 100% correct for basic stuff in the first half that takes most people 5-15 minutes per question, and 50% correct in the second half that most people take 15-45+ minute per question, and most teams time out in that second half.
- ... Louie does a median 2-3min per question irrespective of the expected difficulty, so about 10X faster than a team of 5 (wall clock), and 30X less work (person hours). Louie isn't burnt out at the end ;-)
- This doesn't happen out-of-the-box with frontier models, including fancy reasoning ones. Likewise, letting the typical tool here burn tokens until it finds an answer would cost more than a new hire, which is why we measure as a speedrun vs deceptively uncapped auto-solve count.
- The frontier models DO have good intuition , understand many errors, and for popular languages, DO generate good text2query. We are generally happy with OpenAI for example, so it's more on how Louie and the operator uses it.
- We found we had to add in key context and strategies. You see a bit in Claude Code and Cursor, except those are quite generic, so would have failed as well. Intuitively in coding, you want to use types/lint/tests, and same but diff issues if you do database stuff. But there is a lot more, by domain, in my experience, and expecting tools to just work is unlikely, so having domain relevant patterns baked in and that you can extend is key, and so is learning loops.
This is our first attempt at the speed run. I expect Louie to improve: my answers represent the current floor, not the ceiling of where things are (dizzyingly) going. Happy to answer any other q's where data might help!
Splunk Boss of the SOC is the realistic test, it is one of the best cyber ranges. Think effectively 30+ hours of tricky querying across 100+ real log source types (tables) with a variety of recorded cyber incidents - OS logs, AWS logs, alerting systems, etc. As I mentioned, the AI has to seriously look at the data too, typically several queries deep for the right answer, and a lot of rabbit holes before then - answers can't just skate by on schema. I recommend folks look at the questions and decide for themselves what this signifies. I personally gained a lot of respect for the team create the competition.
The speed run formulation for all those same questions helps measure real-world quality vs cost trade-offs. I don't find uncapped solve rates to be relevant to most scenarios. If we allowed infinite time, yes we would have scored even higher... But if our users also ran it that way, it would bankrupt them.
If anyone is in the industry, there are surprisingly few open tests here. That is another part of why we did BOTS. IMO sunlight here brings progress, and I would love to chat with others on doing more open benchmarks!
> Do you need an expert to verify if the answer from AI is correct?
If the underling data has a quality issue that is not obvious to a human, the AI will miss it too. Otherwise, the AI will correct it for you.
But I would argue that it's highly probable that your expert would have missed it too...
So, no, it's not a silver bullet yet, and the AI model often lacks enough context that humans have, and the capacity to take a step back.
> How is it time saved refining prompts instead of SQL?
I wouldn't call that "prompting". It's just a chat. I'm at least ~10x faster (for reasonable complex & interesting queries).
There isn't one perfect solution to SQL queries against complex systems.
A suduko has one solution.
A reasonably well-optimised SQL solution is what the good use of SQL tries to achieve. And it can be the difference between a total lock-up and a fast running of a script that keeps the rest of a complex system from falling over.
The number of solutions doesn't matter though. You can easily design a sudoku game that has multiple solutions, but it's still easier to verify a given solution than to solve it from scratch.
It's not even about whether or not the number of solutions is limited. A math problem can have unlimited amount of proofs (if we allow arbitrarily long proofs), but it's still easier to verify one than to come up with one.
Of course writing SQL isn't necessarily comparable to sudoku. But the difference, in the context of verifiability, is definitely not "SQL has no single solution."
I've recently started asking the free version of chat-gpt questions on how I might do various things, and it's working great for me - but also my questions come from a POV of having existing "domain knowledge".
So for example, I was mucking around with ffmpeg and mkv files, and instead of searching for the answer to my thought-bubble (which I doubt would have been "quick" or "productive" on google), I straight up asked it what I wanted to know;
> are there any features for mkv files like what ffmpeg does when making mp4 files with the option `--movflags faststart`?
And it gave me a great answer!
(...the answer happened to be based upon our prior conversation of av1 encoding, and so it told me about increasing the I-frame frequency).
Another example from today - I was trying to build mp4v2 but ran in to drama because I don't want to take the easy road and install all the programs needed to "build" (I've taken to doing my hobby-coding as if I'm on a corporate-PC without admin rights (windows)). I also don't know about "cmake" and stuff, but I went and downloaded the portable zip and moved the exe to my `%user-path%/tools/` folder, but it gave an error. I did a quick search, but the google results were grim, so I went to chat-gpt. I said;
> I'm trying to build this project off github, but I don't have cmake installed because I can't, so I'm using a portable version. It's giving me this error though: [*error*]
And the aforementioned error was pretty generic, but chat-gpt still gave a fantastic response along the lines of;
> Ok, first off, you must not have all the files that cmake.exe needs in the same folder, so to fix do ..[stuff, including explicit powershell commands to set PATH variables, as I had told it I was using powershell before].
> And once cmake is fixed, you still need [this and that].
> For [this], and because you want portable, here's how to setup Ninja [...]
> For [that], and even though you said you dont want to install things, you might consider ..[MSVC instructions].
> If not, you can ..[mingw-w64 instructions].
[Going to give myself a self-reply here, but what-ev's. This is how I talk to chat-gpt, FYI]... So I happened to be shopping for a cheap used car recently, and we have these ~15 year old Ford SUV's in Aus that are comfortable, but heavy and thirsty. Also, they come in AWD and RWD versions. So I had a thought bubble about using an AWD "gearbox" in a RWD vehicle whilst connecting an electric motor to the AWD front "output", so that it could work as an assist. Here was my first question to chat-gpt about it;
> I'm wondering if it would be beneficial to add an electric-assist motor to an existing petrol vehicle. There are some 2010 era SUV's that have relatively uneconomical petrol engines, which may be good candidates. That is because some of them are RWD, whilst some are AWD. The AWD gearbox and transfer case could be fitted to the RWD, leaving the transfers front "output" unconnected. Could an electric motor then be connected to this shaft, hence making it an input?
It gave a decent answer, but it was focused on the "front diff" and "front driveshaft" and stuff like that. It hadn't quite grasped what I was implying, although it knew what it was talking about! It brought up various things that I knew were relevant (the "domain knowledge" aspect), so I brought some of those things in my reply (like about the viscous coupling and torque split);
> I mentioned the AWD gearbox+transfer into a RWD-only vehicle, thus keeping it RWD only. Thus both petrol+electric would be "driving" at the same time, but I imagine the electric would reduce the effort required from the petrol. The transfer case is a simple "differential" type, without any control or viscous couplings or anything - just simple gear ratio differences that normally torque-split 35% to the front and 65% to the rear. So I imagine the open-differential would handle the 2 different input speeds could "combine" to 1 output?
That was enough to "fix" its answer (see below). And IMO, it was a good answer!
I'm posting this because I read a thread on here yesterday/2-days-ago about people stuggling with their AI's context/conversation getting "poisoned" (their word). So whilst I don't use AI that much, I also haven't had issue with it, and maybe that's because of that way I converse with it?
Not sure if anyone here saw the movie Clueless, but a great quote was, "That guy is such a Monet. From a distance he looks great, but up close he's a real mess."
I've seen the assertion made that we can statistically measure how many people died due to (for example) a heat wave, but we can't say for sure which ones.
I'd imagine something similar applies here. You'd have some number of deaths specifically attributable to lose of power, plus countless other deaths caused or prevented in non-obvious ways. This might be visible at a high level as a statistical outlier in the total number of deaths during the time period of an outage.
If you remove yourself from a group, how will they change their minds without a dissenting opinion? I had to do it myself eventually, for my own sanity, but I believe this is still a real problem I am no longer addressing among my loved ones.
In my case, my goal isn't to change anyone's mind. It's to preserve sanity -- I can't in good faith "pretend" to get along and have normal conversations when people are actively engaging in behavior that directly harms myself and others.
People proudly voting for parties and policies that demonise trans people, of which I know many. I cannot be your friend in good conscience if you're willing to destroy the lives of my other friends.
No it isn’t. When people see the anti trans party winning elections they see that as permission to bully trans people. The vote directly leads to abuse.
voting for trump tells everyone in the country that you dont mind if trans people are abused. This creates a culture that is uninviting even if no one acts poorly beyond the voting. You dont have to physically abuse someone for your actions to have direct consequences
Being told that you have to follow the same rules as everyone else for e.g. spaces designated to be used solely by the opposite sex, doesn't seem so bad.
I don't believe you're asking this question in good faith, but there are many, many attempts at erasing them from public existence: https://translegislation.com/
I don't think they're arguing in a good faith with you.
"“Never believe that anti-Semites are completely unaware of the absurdity of their replies. They know that their remarks are frivolous, open to challenge. But they are amusing themselves, for it is their adversary who is obliged to use words responsibly, since he believes in words. The anti-Semites have the right to play. They even like to play with discourse for, by giving ridiculous reasons, they discredit the seriousness of their interlocutors. They delight in acting in bad faith, since they seek not to persuade by sound argument but to intimidate and disconcert. If you press them too closely, they will abruptly fall silent, loftily indicating by some phrase that the time for argument is past.”
Doesn't this change make it more historically accurate? In 1969, the year of the Stonewall uprising, the "TQ+" hadn't been invented yet as a cultural concept. The Stonewall Inn was a gay bar and was being targeted for that reason.
Interesting to see the difference and I agree that's an inaccurate edit. For historical accuracy it should describe Zazu Nova as a gay man who was also a transvestite or drag queen.
You do realize that "gay man", "transvestite", "drag queen", and "trans woman" are all different things right?
None of them implies the others. And using any term besides trans woman would be disingenuous, as trans people existed before before 1969, with that exact nomenclature already existing. Just because the letters might not have been attached to an "LGBT" title, neither T or Q are new. Only their increased acceptance and knowledge is.
And deleting references to those is, as you can see, seen as an obvious attempt to walk back on that public perception and acceptance.
While ( as often is ) a very summarized version of the history can be found on the wikipedia page https://en.wikipedia.org/wiki/Transgender_history , the sources should lead you to more detailed info, if you do care about learning about the historical accuracy.
The Stonewall Inn was a gay bar so we know that Zazu Nova must have been there due to being a gay man. As the previous iteration of the website describes Nova as "queen", "she" and "transgender woman", this means that in 1960s terminology Nova would almost certainly have been understood to be a transvestite, possibly a drag queen.
Sources on the web refer to Nova having involvement with the Street Transvestites Action Revolutionaries group, which fits with that description also.
This is an oversimplified strawman argument. Biological sex is a complex subject. The cultural understanding of sex is complex. If I has a man take my 2 year old daughter to the men's room is that a bad thing? (For the record I don't have any children)
I don't think anyone is arguing that you should be barred from taking your hypothetical two-year-old daughter into the men's bathroom if the need arises. That's really not the issue.
but I thought "Being told that you have to follow the same rules as everyone else for e.g. spaces designated to be used solely by the opposite sex, doesn't seem so bad."?