I have a slight fascination with sweeteners. About five years ago I imported a kilo of "Neotame" sweetener from a chem factory in Shanghai. It was claimed to be 10,000-12,000 times sweeter than sugar. It's a white powder and came in a metal can with a crimped lid and typically plain chemical labeling. Supposedly it is FDA-approved and a distant derivative of aspartame.
US customs held it for two weeks before sending it on to Colorado with no explanation. When received, the box was covered in "inspected" tape and they had put the canister in a clear plastic bag. The crimped lid looked like a rottweiler chewed it open and white powder was all over the inside of the bag. I unwisely opened this in my kitchen with no respirator as advised by the MSDS which I read after the fact (I am not a smart man).
Despite careful handling of the bag, it is so fine in composition that a small cloud of powder erupted in front of me and a hazy layer of the stuff settled over the kitchen. Eyes burning and some mild choking from inhaling the cloud, I instantly marveled at how unbelievably sweet the air tasted, and it was delicious. For several hours I could still taste it on my lips. The poor customs inspector will have had a lasting memory of that container I'm pretty sure.
Even after a thorough wipe-down, to this day I encounter items in my kitchen with visually imperceptible amounts of residue. After touching it and getting even microscopic quantities of the stuff on a utensil or cup, bowl, plate, whatever, it adds an intense element of sweetness to the food being prepared, sometimes to our delight. I still have more than 900g even after giving away multiple baggies to friends and family (with proper safety precautions).
We have been hooked on it since that first encounter. I keep a 100mL bottle of solution in the fridge which is used to fill smaller dropper bottles. I've prepared that 100mL bottle three times over five years, and that works out to about 12g of personal (somewhat heavy) usage for two people in that time. Probably nowhere near the LD50.
I carry a tiny 30mL dropper bottle of the solution for sweetening the nasty office coffee and anything else as appropriate. Four drops to a normal cup of coffee. We sweeten home-carbonated beverages, oatmeal, baked goods (it is heat stable), use it in marinades, and countless other applications.
I don't know if it's safe. The actual quantity used is so incredibly tiny that it seems irrelevant. I'd sweeten my coffee with polonium-210 if it could be done in Neotame-like quantities. Between this, a salt shaker loaded with MSG and a Darwin fish on my car, I'm doomed anyway.
Lots of people make the mistake of thinking there's only two vectors you can go to improve performance, high or wide.
High - throw hardware at the problem, on a single machine
Wide - Add more machines
There's a third direction you can go, I call it "going deep". Today's programs run on software stacks so high and so abstract that we're just now getting around to redeveloping (again for like the 3rd or 4th time) software that performs about as well as software we had around in the 1990s and early 2000s.
Going deep means stripping away this nonsense and getting down closer to the metal, using smart algorithms, planning and working through a problem and seeing if you can size the solution to running on one machine as-is. Modern CPUs, memory and disk (especially SSDs) are unbelievably fast compared to what we had at the turn of the millenium, yet we treat them like they're spare capacity to soak up even lazier abstractions. We keep thinking that completing the task means successfully scaling out a complex network of compute nodes, but completing the task actually means processing the data and getting meaningful results in a reasonable amount of time.
This isn't really hard to do (but it can be tedious), and it doesn't mean writing system-level C or ASM code. Just seeing what you can do on a single medium-specc'd consumer machine first, then scaling up or out if you really need to. It turns out a great many problems really don't need scalable compute clusters. And in fact, the time you'd spend setting that up, and building the coordinating code (which introduces yet more layers that soak up performance) you'd probably be better off just spending the same time to do on a single machine.
Bonus, if your problem gets too big for a single machine (it happens), there might be trivial parallelism in the problem you can exploit and now going-wide means you'll probably outperform your original design anyways and the coordination code is likely to be much simpler and less performance degrading. Or you can go-high and toss more machine at it and get more gains with zero planning or effort outside of copying your code and the data to the new machine and plugging it in.
Oh yeah, many of us, especially experienced people or those with lots of school time, are taught to overgeneralize our approaches. It turns out many big compute problems are just big one-off problems and don't need a generalized approach. Survey your data, plan around it, and then write your solution as a specialized approach just for the problem you have. It'll likely run much faster this way.
Some anecdotes:
- I wrote an NLP tool that, on a single spare desktop with no exotic hardware, was 30x faster than a 6-high-end-system-distributed-compute-node that was doing a comparable task. That group eventually used my solution with a go-high approach and runs it on a big multi-core system with as fast of memory and SSD as they could procure and it's about 5 times faster than my original code. My code was in Perl, the distributed system it competed against was C++. The difference was the algorithm I was using, and not overgeneralizing the problem. Because my code could complete their task in 12 hours instead of 2 weeks, it meant they could iterate every day. A 14:1 iteration opportunity made a huge difference in their workflow and within weeks they were further ahead than they had been after 2 years of sustained work. Later they ported my code to C++ and realized even further gains. They've never had to even think about distributed systems. As hardware gets faster, they simply copy the code and data over and realize the gains and it performs faster than they can analyze the results.
Every vendor that's come in after that has been forced to demonstrate that their distributed solution is faster than the one they already have running in house. Nobody's been able to demonstrate a faster system to-date. It has saved them literally tens of millions of dollars in hardware, facility and staffing costs over the last half-decade.
- Another group had a large graph they needed to conduct a specific kind of analysis on. They had a massive distributed system that handled the graph, it was about 4 petabytes in size. The analysis they wanted to do was an O(N^2) analysis, each node needed to be compared potentially against each other node. So they naively set up some code to do the task and had all kinds of exotic data stores and specialized indexes they were using against the code. Huge amounts of data was flying around their network trying to run this task but it was slower than expected.
An analysis of the problem showed that if you segmented the data in some fairly simple ways, you could skip all the drama and do each slice of the task without much fuss on a single desktop. O(n^2) isn't terrible if your data is small. O(k+n^2) isn't much worse if you can find parallelism in your task and spread it out easily.
I had a 4 year old Dell consumer level desktop to use so I wrote the code and ran the task. Using not much more than Perl and SQLite I was able to compute a large-ish slice of a few GB in a couple hours. Some analysis of my code showed I could actually perform the analysis on insert in the DB and that the size was small enough to fit into memory so I set SQLite to :memory: and finished it in 30 minutes or so. That problem solved, the rest was pretty embarrassingly parallel and in short order we had a dozen of these spare desktops occupied running the same code on different data slices and finishing the task 2 orders of magnitude than what their previous approach had been. Some more coordinating code and the system was fully automated. A single budget machine was theoretically now capable of doing the entire task in 2 months of sustained compute time. A dozen budget machines finished it all in a week and a half. Their original estimate on their old distributed approach was 6-8 months with a warehouse full of machines, most of which would have been computing things that resulted in a bunch of nothing.
To my knowledge they still use a version of the original Perl code with SQlite running in memory without complaint. They could speed things up more with a better in-memory system and a quick code port, but why bother? It's completing the task faster than they can feed it data as the data set is only growing a few GB a day. Easily enough for a single machine to handle.
- Another group was struggling with handling a large semantic graph and performing a specific kind of query on the graph while walking it. It was ~100 million entities, but they needed interactive-speed query returns. They had built some kind of distributed Titan cluster (obviously a premature optimization).
Solution, convert the graph to an adjacency matrix and stuff it in a PostgreSQL table, build some indexes and rework the problem as a clever dynamically generated SQL query (again, Perl) and now they were realizing .01second returns, fast enough for interactivity. Bonus, the dataset at 100m rows was tiny, only about 5GB, with a maximum table-size of 32TB and diskspace cheap they were set for the conceivable future. Now administration was easy, performance could be trivially improved with an SSD and some RAM and they could trivially scale to a point where dealing with Titan was far into their future.
Plus, there's a chance for PostgreSQL to start supporting proper scalability soon putting that day even further off.
- Finally, a e-commerce company I worked with was building a dashboard reporting system that ran every night and took all of their sales data and generated various kinds of reports, by SKU, by certain number of days in the past, etc. It was taking 10 hours to run on a 4 machine cluster.
A dive in the code showed that they were storing the data in a deeply nested data structure for computation and building and destroying that structure as the computation progressed was taking all the time. Furthermore, some metrics on the reports showed that the most expensive to compute reports were simply not being used, or were being viewed only once a quarter or once a year around the fiscal year. And cheap to compute reports, where there were millions of reports being pre-computed, only had a small percentage actually being viewed.
The data structure was built on dictionaries pointing to other dictionaries and so-on. A quick swap to arrays pointing to arrays (and some dictionary<->index conversion functions so we didn't blow up the internal logic) transformed the entire thing. Instead of 10 hours, it ran in about 30 minutes, on a single machine. Where memory was running out and crashing the system, memory now never went above 20% utilization. It turns out allocating and deallocating RAM actually takes time and switching a smaller, simpler data structure makes things faster.
We changed some of the cheap to compute reports from being pre-computed to being compute-on-demand, which further removed stuff that needed to run at night. And then the infrequent reports were put on a quarterly and yearly schedule so they only ran right before they were needed instead of every night. This improved performance even further and as far as I know, 10 years later, even with huge increases in data volume, they never even had to touch the code or change the ancient hardware it was running on.
It seems ridiculous sometimes, seeing these problems in retrospect, that the idea was that to make these problems solvable racks in a data center, or entire data centeres were ever seriously considered seems insane. A single machine's worth of hardware we have today is almost embarrassingly powerful. Here's a machine that for $1k can break 11 TFLOPS [1]. That's insane.
It also turns out that most of our problems are not compute speed, throwing more CPUs at a problem don't really improve things, but disk and memory are a problem. Why anybody would think shuttling data over a network to other nodes, where we then exacerbate every I/O problem would improve things is beyond me. Getting data across a network and into a CPU that's sitting idle 99% of the time is not going to improve your performance.
Analyze your problem, walk through it, figure out where the bottlenecks are and fix those. It's likely you won't have to scale to many machines for most problems.
I'm almost thinking of coming up with a statement: Bane's rule, you don't understand a distributed computing problem until you can get it to fit on a single machine first.
Don't normally comment online as I find it a lot of work (it's taken me about 5 hours to put together this pretty badly written response), but while there's a GTK dev here on topic I don't want to miss the opportunity.
> There are way more types of impairments than visually impaired.
Thankfully I have perfect eyesight, however I am very dyslexic. I have problems with spelling, but Google's search spell checker and voice recognition have made huge differences to me in that respect over the last year or two.
But relevant to GTK - I have a terrible short term memory. A goldfish like short term memory. I can barely remember a sentence when switching between windows, 5 numbers is a challenge (no distractions please) and remembering how another programmer abbreviated a variable's name is a huge distraction.
I can't organise IRL or digitally. When I'm programming terminal windows just keep piling up, because I can't remember what terminals are doing what (is that sitting at a prompt or is it watching a folder with inotify?). I could click through the open terminals and disrupt my chain of thought and forget what I'm doing or I could just open another terminal, keep working and do a closing session every so often.
So onto how GTK is making my life harder: by making widgets more accessible to people with impaired vision/touch screens they've made them bigger. I can now fit less on screen and I have to remember more. I have to move between windows more and I have to scroll more. This really is making my life much harder, I'm honestly getting lost on my desktop because I can't remember shit and I'm constantly having to switch windows - because there just isn't room any more for two side by side windows on 720p.
And it really never used to be this way. Compare Gnome Terminal with xterm. Yes gnome terminal is better, for instance you can turn off bold fonts (Myself and many other dyslexics have problems with bold and especially italics) and have a nice GUI for changing the settings. But it's visually huge. I'm pretty sure (unfortunately not checked at all, I use XFCE - because it's smaller on screen) with the default font settings you've got just about enough space to tile 120 columns of Gedit and a 80 columns of Gnome Terminal - but not something you'd actually write code with, by the time you've got Sublime's side bar open you're going to have to shrink the fonts. And find a window manager theme with smaller borders. And find a reduced size widget theme (these often don't shrink out much whitespace). And after that somehow you'll still feel cramped, maybe it's because everything has been unnaturally shrunk.
And, really ranting now, the web is the same, I'm starting to quite regularly zoom out on websites to make them more navigable, I'm getting lost scrolling between the massive headings and huge white spaces. I can't find the damn menu buttons and by the time I've found them I've forgotten why I wanted them and have to mentally backtrack (What am I doing? Why? So why was I reading this? What was the bit that sparked something? Oh yeah! Right, back to the menu! Where did that menu go again?...).
Going to the terminal example again if I want to tile my XFCE Terminal and a Digital Ocean tutorial I've to zoom Digital Ocean to 90%. And Digital Ocean are efficient with screen space.
I'm currently using a 720p 13" laptop display and I think my next laptop is going to have to have a 1080p screen, just so I can fit whatever I'm working on. I'll then have to override the DPI settings of course and generally have a broken display setup. I'm looking to buy hardware and run a nonstandard setup to solve the accessibility problems that GTK has introduced for me when solving accessibility problems.
As an analogy to the situation (on bad days) I feel it's like GTK/Gnome is running around shouting "EVERYTHING MUST HAVE WHEEL CHAIR RAMPS! NO STAIRS! ACCESSIBILITY HAS SPOKEN!" ignoring that sloped surfaces can be difficult/painful for those with hypermobility syndrome. And when it's brought up "I'm able bodied but my knees are getting sore" (AKA "your widgets are too big I'm clicking around too much") the response is that it's accessibility and wanting something different is denying wheel chair users. But this response is missing that an able bodied person is having problems (they're being annoyed by it because it's causing them difficulty) and if an able bodied person is having problems then there is almost certainly someone less able who is having even greater problems.
As in able bodied people are complaining your widgets are too big because they're finding they have a higher cognitive load because they're having to switch windows more often. The reduction in general accessibility (higher cognitive load) is traded to enable specific use cases and it's considered ok. In testing you able bodied people say "this is barely harder" and your specific use case say "this is great". But what's missing that there are other people with other disabilities you never tested - we don't speak out because there's not many of us and frankly it's not very easy, both to speak out and to make a meaningful contribution to something as complex as UI ergonomics.
Basically what I'm saying is, yes it is good to have options for disabled users. But forcing the default is not a good idea and you might find you're causing problems. The default route into buildings with access ramps is still normally a set of stairs.
For example I could request that bold and italic fonts not be used anywhere in a UI, for me this would be great, but for able bodied users this would be a step backwards and reduce accessibility (the UI can't highlight important text for instance). Able users would complain, and it would be a mild "oh I liked it when the top command put the total memory in bold, it made it pop out a bit" to which GTK would correctly say "but dyslexics find this approach much easier" but this conversation misses that there is another category of people with poor eyesight who find the bold emphasis really helps them find the important details in the UI. But because GTK has made removing bold and italic default (in the name of accessibility) the user with poor eyesight has no options to turn that back on, and the community is already galvanised against allowing bold, because they did a UI study and it helped dyslexics.
> Count yourself lucky that you don't have an impairment (right now).
Worst of all is if the "but dyslexics find this much easier" statement has accusatory undertones of ableism (Despite my quote I don't feel audidude has done this). That might lead to the poor eyesight user to agonizingly craft a response over 5 hours that tries not to offend anyone in the galvanised community, because they don't have dyslexia and don't understand the difficulties faced, maybe they really are being ableist. Alternatively, and much more likely, they'll just keep quiet and GTK will carry on oblivious, making life worse for them.
Now I have got to thank GTK for being really good with having highlightable text in dialog boxes so I can copy and paste into Google. Also after spending a lot of my childhood in Learning Support classes I have to thank GTK for all the hard work they're putting into accessibility, it isn't sexy, it annoys people but it's damn important (even when I think it's going wrong and sometimes makes me wish I worked in construction).