For source code, I prefer the Software Heritage archive over the Internet Archive, because it archives the git history, instead of the HTML UI. This particular repo was saved there[0] and was most recently visited 24 Oct 2022, which has two more terms updates.
This paper is 50+ years old and yet programmers are still making the same mistake of defining modules based on a flowchart, instead of modules that hide design decisions. I think that shows the youth (and perhaps lack of foundational training) of software engineering as a field.
But also the progress of the last 50 years: the KWIC index went from “a small system [that] could be produced by a good programmer within a week or two” to a tiny system that can be written in a couple lines using the standard library of Java (as I once saw a demo of).
Also: I recently wrote a blog post related to this paper and all of the initial test readers told me they had no idea what a KWIC index was and it needed more explanation. Instant full-text search really changes things.
This was an interesting quote from the blog post: "There is one silly technique I discovered to allow a LLM to improve my writing without having it do my writing: feed it the text of my mostly-complete blog post, and ask the LLM to pretend to be a cynical Hacker News commenter and write five distinct comments based on the blog post."
The easiest way to get a derailleur limit setup is to de-cable the derailleur, and back both screws all the way out.
Then the derailleur spring will bring the chain all the way to the highest gear (smallest cassette cog), and since you backed the limit screw out, the chain will want to rub against the frame stays.
Use your fingers to push the derailleur up into a lower gear and then turn the limit screw for the outside limit one full turn. Repeat this until without using your hand to guide the derailleur the chain rests in the highest gear without needing any force, but also doesn't rub against the frame stay.
Then use your hand to push the derailleur all the way to the lowest gear (largest cassette cog) and try to get the chain to fall between the cassette and the wheel spokes. If it can fall over the inside of the gear and into the spokes then turn the limit screw for the inside limit one full turn. Repeat until the chain can reach the lowest gear without using excessive force, but cannot fall off the cog toward the spokes.
Add some lock tight to both limit screws so that you don't have to adjust them again, and you are done!
Now just reattach the rear derailleur cable and adjust the tension if you need to.
The bill rate for serious end-to-end web application development in a modern language, where the customer provides a loose spec and the outcome is "working, usable application" is somewhere in between 100-200/hr on the very conservative (low) end.
So, take your outside estimate of how much time the project is going to take (you think 40 hours; maybe make it 48), and pick a bill rate in the spectrum of 100-200/hr. Work that into a daily rate.
Provide proposal under the following terms:
* You offer to build the project on a time & materials basis, 6 days at a daily rate of ((100..200)x8).
* Prior to starting the engagement, you will complete a proposal with detailed acceptance criteria (app MUST do this, app MUST NOT do that, app MAY do this, &c). You'll interview your client (gratis, because you're a professional) to work out that acceptance criteria in advance.
* Should any of the requirements or acceptance criteria change once contracts have been signed, you'll accommodate by adding additional billable days to the project (subject to whatever scheduling constraints you may have at the time).
* Should they want to lock in additional billable days in anticipation of changing requirements out from under you, they can buy those billable days on a retainer (use-it-or-lose-it) basis at a small (~10%) discount to your daily rate. This (or some clause like it) allows them to pay you extra (ie, for days they buy but you don't work) to guarantee that if they change their mind about some goofy feature, you'll be available immediately to accommodate, and they don't have to wait 6 months.
* Your contract will specify a class of critical bugs (security, things that potentially lose previously-stored data) and a class of major bugs (things that make the system unusable). For a period of N months (maybe 12) after delivery, you'll commit to fixing critical bugs within N days, for free if they take less than N hours to fix, and at your daily rate otherwise; repeat (but with more favorable terms for you) for major bugs.
* For an annual fee of 20% of the price of the contract, you'll provide maintenance, which (a) continues the critical/major bug fix window past the N months specified above and (b) provides an annual bucket of K billable days with which you will fix non-major bugs; this is provided on a retainer (use-it-or-lose-it) basis.
The idea is simple: you want to:
(a) Give the client something that looks as close as possible to a fixed-price project cost.
(b) Not actually commit to a fixed-price project cost more than you have to.
(c) Turn the downside of long-term bugfix support into an upside of recurring revenue.
This is just a sketch, you'd want to tune these terms. You'll also want to pay a lawyer ~$100-$200 to sanity check the contract. (Your contract will look like a boilerplate consulting contract, ending in a "Statement of Work" that is a series of "Exhibits" [contract-ese for appendix] that spells out most of the details I listed above). Pay special attention to the acceptance criteria.
Remember also that you're liable for self-employment tax, which is due quarterly, not on April 15. You might also consider registering a Delaware LLC (~$100, online) and getting a tax ID, because liability gets sticky with software you deliver to make someone else's business work. You probably do not need to consult a lawyer about LLC formation; most of the trickiness of company formation is with partnership terms and equity, which isn't your problem.
You may find this paper helpful [1]. As noted there, syntax-rules consists really of two different parts: (1) the hygiene-preserving macro expander and (2) the pattern-matching facility. Once you wrap your head around the two pieces, it's actually pretty straightforward to implement them both. It's very common to write some "low-level" macro facility as a stepping-stone to syntax-rules (such as explicit-renaming macros or syntactic closures): the implementation of syntax-rules then becomes a composition of the pattern-matcher and the low-level facility. A tutorial implementation of an appropriate pattern-matcher can be found at [2],[3].
There's lots of good reading to be found at the ReadScheme Library [4]. Most if not all of the references in [1] can be found there.
Tangentially: yeah, syntax-rules kinda flies in the face of the oft-mentioned minimalism of Scheme, but there's a reason for it: at the time syntax-rules was standardized, there was no consensus on which (if any) low-level facility should be standardized (and, really, there still isn't any such consensus). The reason syntax-rules operates as a pattern-template rewriting system rather than as a procedural system is that such a system is able to guarantee that the macros it produces are hygienic in a (comparably) simple and intuitive way. The idea was to standardize on a high-level system so that Scheme could have a standard macro facility, whilst leaving the low-level systems open for further exploration and experimentation. The two main contenders are still, after all these years, syntax-case and syntactic closures: the latter being easier to implement and arguably easier to grok, the former being potentially more powerful (in that an implementation of syntactic closures within syntax-case is known, but not vice-versa (last I checked)).
Solution: Hire a librarian. I'm not kidding in any way. They are massively underemployed and are very good at exactly this task. Back at PBwiki we hired a librarian who not only organized all the things but ended up running and building our support organization.
Do not tell them what tool to use, let them own your knowledge base and make their requests for information understood to be P1 priority.
They also didn't get why FM synthesis was so appealing, in spite of being difficult to understand for the common musician.
It's in the tone.
Acoustic instruments respond in a complex way to the variation in strength of input: when you strike the key in the piano faster, pluck a string harder, or blow air info the saxophone stronger, you don't merely get a louder sound: the harmonic content, the timbre of the sound changes as well.
Analogue synthesis struggled accomplishing this. The classic analog synth would have an envelope generator ("ADSR") controlling the loudness of the tone, and another, most commonly, controlling the filter (the thing that makes the synth do a wowowow sound on the same note), but responsive fading and evolution of the harmonics wasn't readily available.
On the Yamaha DX7, it was built into the core idea of FM synthesis.
You don't know it when you hear it, you know it when you play it: the way the keyboard responded to the touch was alive, magical.
You didn't need to rely on the modulation wheels and joysticks and knobs to vary the timbre as you play. You could simply play the keyboard.
On my Yamaha Reface DX (which overcomes the drawbacks of FM user interface), I can easily make a tone whose character (not loudness! - or not just loudness) changes when I simply play harder. It's like having several instruments at once at your disposal, blending between them on the fly.
It's that playability that makes FM make sense — and it was what other digital synthesis technologies went for, too. Roland's "linear arithmetic", vector synthesis, and M1's multisampling all explored that area — but they came after DX7.
What makes FM synthesis unique is the heavily non-linear response of the tone to the dynamics. At worst, it's unpredictable, but once you figure out where the sweet spots are in the parameter space, you get a tone like nothing else. A bell that's also a string orchestra. A guitar with a soul of the saxophone, but not mistaken for either; an identity all of its own.
Yamaha DX7 heavily leaned into this aspect in instrument's design, via providing additional parameters that controlled the sensitivity of operators to velocity depending on where on the keyboard you are, so that the lower tones would have a different character from higher ones.
The "diminished brilliance" the author writes about was likely that — i.e., the author not figuring out how FM sound design works, which they openly admitted. It was matter of taste of whoever made the presets; without programming those curves in, the higher notes can easily sound screeching.
The point, again, was that the instrument wasn't merely responsive in a way that analogue synths couldn't dream of, but that the way in which it was responsive, tone-wise, was programmable, and varied not just from patch to patch, but across the scale and velocity range.
Again, think about how plucking different strings on a guitar harder produces a different variation in tonal response. Each string has its own character.
This is the soul of the mathematical idea of FM synthesis: that the tone evolution should not merely be controlled by time passing (as it is on most analogue synths, via envelope generators and LFO's), and not by knob twiddling (modulation wheels, knobs, sliders, joysticks,...) — but by playing the instrument itself.
And on a keyboard, what you really play with is where on the keyboard you strike a key, and how fast.
Yamaha DX7 allowed the player to vary the timbre by playing the instrument, with both hands, by having all tone generators depend on these two variables in a programmable, non-linear, interesting way.
FM synthesis of Yamaha DX7 therefore can't be separated from the physical keyboard it shipped with. The way the tones felt as you played them were determined by the response curves which simply don't map in the same way to a different keyboard.
The fact that the DX7 was a digital synth obscured the fact that it was a very analog instrument in that way; that to get a truly good FM preset, you need to tune it to the keyboard response (i.e. velocity curves), and that involves the analog components.
It's also for this reason that DX7 only has membrane buttons, and no knobs or sliders. It didn't need them. The 60 keys were your knobs and sliders, the means to control the tone.
That's why the ePiano on the DX-7 was on 60% of the new releases. It didn't merely emulate the Rhodes (which, by all means, wasn't a rare instrument).
What it did was it gave keyboard players a way to play with the tone of their instrument while playing the instrument, something the Rhodes would have a more limited range for, as the variation in tone response was constrained by how similar the actual metallic forks that made the sound were to each other, and how similar the hammers are across the octaves — and the digital DX7 didn't have that limitation.
It also gave the people used to playing the synth with one hand (to be able to tweak the sound with the other) the freedom to play truly polyphonically, and use the keyboard itself to control the tone dynamics.
Playing it was a liberating experience, and it still is, because while intricate multi-sampling can also give you that effect (at no less difficulty, mind you, even if you have the samples!), FM does it differently.
The musicians didn't need to be mindful of all that; the absolute majority (Brian Eno expected) were outright oblivious to why and what made DX7 the instrument that you had to have.
You just felt it.
And yes, new FM synthesizers keep coming. Because emulating acoustic instruments is not just easy with sampling these days, it also isn't enough. You can just hire someone to play the real instrument, after all.
You need a bit more than that to craft a distinctive sound — especially a new one.
Liven XFM, Korg Opsix, Arturia Minifreak all go boldly where manmade sound didn't go before, and these are just three novel FM synthesizers from this decade.
Reface DX came out less than 10 years ago; and its FM engine is different from DX7 (as is the UX — you can finally change the tone while playing it with live controls).
And for all the talk of how FM is old, I've yet to see someone not be captivated by the ePiano patch that comes stock on the Reface DX when I let them play it when I bring the instrument around with me on trips (which I often do).
Current developments in the controllers (like what ROLI is doing) will allow all the existing sound generation techniques to shine in new ways, including FM.
But I think it's the physical package of the keyboard, the algorithm, and the presets tuned to the combination of the two is what made the DX7 such a success.
A new FM instrument could easily be a hit with these factors, particularly if they don't skimp on including built-in speakers and making the presets sound great on them. FM truly shines when all the pieces are aligned in a performer's instrument.
Reface DX comes close to that point, but the presets it ships with are more of an engine demo than sounds to make music with, the speakers are not loud, and the mini-keys (which I love!) were a turn-off for many people — because in the Internet age, people would judge a machine without actually playing it, and that's the only way to understand what's so damn special about FM synthesis.
This list may paint Rust as a weird language that can't do anything (not even a linked list!?), but it is a good advice.
Rust is easy when you do things "the Rust way" (with more overly-cautious thread-safety, immutability, and tree-shaped data than you may like). You can protest and insist that you do things your way, or you can learn the language. You first have to learn the language as it is, to understand it can and can't do, so you don't get stuck trying to force a solution that it just won't allow.
As for linked lists and graph data — Rust can do it. The problem is people try to use references for them, and that can't work. Rust references are not general-purpose pointers. I find the naming very unfortunate, because it sounds too close to holding objects "by reference" in other languages, but Rust references can't store data! Rust references semantically closer to being mutexes evaluated at compile-time. Your first attempts at a linked list out of circularly dependent mutexes is going to be total nonsense.
I used to edit a newsstand leisure magazine here in the UK. It was founded in the 1970s. We sold about 18,000 copies a month in our peak, making us the market leader.
I'm not the editor any more (I went off to do something else) but the magazine is still going. It won't surprise you to learn that it sells much less than it used to.
But that's not because the magazine has got worse. It hasn't. The writing is still as good as ever, the news reporting still pretty sharp. It's not because the market has changed. It's not because you can get the same information online for free. Much to my amazement, in 20+ years, no one has really catered for this particular market online - there's a lot of chuntering on forums and Facebook groups, but no one really doing compelling content. We were turning over £1m+ a year. I don't think anyone is even turning over £50k writing about this subject online.
So what changed? I think it's ultimately about attention. When I edited the magazine (c. 2010), people still chose to spend part of their leisure time reading about one of their hobbies. We were a fun way to do that. Today, people don't need to spend £5 to happily while away a few hours: they can just scroll through their phones. The magazine habit has gone.
Crucially, it's not that the information has gone online. It really hasn't. I read all the various forums and groups, and still when the magazine plops onto my doormat every month, I read it and find a load of stuff I didn't know. It's just that the time that was once filled with reading magazines is now filled with something else.
If you're interested in an open-source, free equivalent, check out VSCodium (open-source version of VSCode), and FOAM (VSCode plugin - https://foambubble.github.io/foam/). In a new project, create a `docs/` folder, and start with `docs/notes.md`. When you want to branch out to other files & links, you can type [[MyTopic]] and FOAM will automatically create MyTopic.md, and will allow you to click on the link and navigate to it. Later, if you want to publish your notes as an HTML site, you can run `mkdocs` on the `docs/` folder, and it'll create a website from your notes. This MkDocs plugin enables the crosslinks in HTML: https://github.com/Jackiexiao/mkdocs-roamlinks-plugin. Good luck!
Strongly seconding this. For anyone still hesitant, I further recommend the following experiments:
----
Sample a few activities your team has completed. Check how long the 90 % smallest activities are on average, and compare it to the average of the biggest 10 %. Or the median compared to the maximum, or whatever. You'll probably find the difference is about an order of magnitude or less. In the grand scheme of things, every activity is the same size. You can estimate it as exp(mean(log(size)) and be within an order of magnitude almost every time.
Once your team has accepted that something is "an" activity and not a set of activities, don't bother estimating. For all practical intents, size is effectively constant at that point. What matters is flow, not size.
----
For the above sample, also study how long passed between the "go" decision on the task and when it was actually released to customers. In a stable team, this number will be eerily close to the theoretical based on Little's law referenced in the parent comment.
Oh, and you shouldn't focus on man-hours. Work with calendar days. Not only does that simplify mental arithmetic for everyone, it's also the only thing that matters in the end. Your customer couldn't care less that you finished their functionality in "only" 6 man-hours if it took you six weeks to get it through your internal processes.
----
Fun follow-up to the size experiment: now ask someone intimately familiar with your customers to estimate the dollar value of each activity. You might find that while all activities are practically the same size, they'll have very different dollar values. That's what you ought to be estimating.
I’m a big fan of the Donald Reinertsen approach: measure queue length.
Simply track the time to complete each task in the team queue on average, then multiply that by the number of tasks remaining in the queue.
Each team will habitually slice things into sizes they feel are appropriate. Rather than investing time to try and fail at accurately estimating each one, simply update your average every time a task is complete.
The bonus with this approach is that the sheer number of tasks in the queue will give you a leading indicator, rather than trailing indicators like velocity or cycle time.
Q: Is it true that the return value of any Haskell function depends only on its arguments?
A: Yes, barring tricks like unsafePerformIO. But you can write perfectly good programs that do IO without these tricks anyway.
Q: Okay, genius. What about functions like getLine, which reads a line from the console?
A: getLine isn't a function.
Q: What?
A: All functions in Haskell have types like "a -> b" for some a and b. But you'll notice that the type of getLine doesn't have an arrow in it, it's just "IO String".
Q What do you mean, it's not a function? I can call it right here, by doing "do name <- getLine"!
A: That's not the syntax for calling functions in Haskell, that's monadic do notation. The syntax for calling functions is "foo x y".
Q: This is outrageous. So you're saying that getLine is just an inert value?
A: Yes.
Q: But what is that value? What's inside the type "IO String"? I'd always assumed that it was a kind of wrapper around a String, with some type system nastiness so people don't misuse it.
A: No, I'm afraid it can't be anything as simple as that. And it certainly can't contain a String, because then getLine couldn't be the same value all the time.
Q: Wait, you're saying that it doesn't even contain a String? Then what is it, really?
A: The implementation of IO types is kind of private to the runtime. But you can imagine that for any Haskell type X, "IO X" is a syntax tree of a side-effecting C program that produces an X.
Q: I see. So getLine really is the same value all the time. But how do you "run" the syntax tree that's hidden inside getLine, to actually get a String?
A: You generally don't. Instead you combine them together into one large tree and call it "main", and the runtime takes it from there.
Q: That seems like a lot of work. At the very least, why doesn't the language add some special syntax for combining syntax trees, so people don't have to deal with plumbing?
A: Yeah, that's what monadic do notation was invented for. Unfortunately, it was a bit too successful and some people started confusing it with actual imperative code :-)
A couple of quick notes, from someone who has actually put this to practice — and in a non-manufacturing context, to boot!
(From a brief reading of this thread, it seems like kqr, jacques_chester, and I are the only ones who have put this to practice in non-manufacturing contexts — though correct me if I'm wrong.)
The bulk of the debate in this HN thread seems to be centred around what is or isn't a 'stable process'. I think this is partially a terminology issue, which Donald Wheeler called out in the appendix of Understanding Variation. He recommends not using words like 'stable' or 'in-control', or even 'special cause variation', as the words are confusing ... and in his experience lead people to unfruitful discussions.
Instead, he suggests:
- Instead of calling this 'Statistical Process Control', call this 'Methods of Continual Improvement'
- Use the term 'routine variation' and 'exceptional variation' whenever possible. In practice, I tend to use 'special variation' in discussion, not 'exceptional variation', simply because it's easier to say.
- Use the term 'process behaviour chart' instead of 'process control chart' — we use these charts to characterising the behaviour of a process, not merely to 'control' it.
- Use 'predictable process' and 'unpredictable process' (instead of 'stable'/'in-control' vs 'unstable'/'out-of-control' processes) because these are more reflective of the process behaviours. (e.g. a predictable process should reliably show us data between two limit lines).
Using this terminology, the right question to ask is: are there processes in software development that display routine variation? And the answer is yes, absolutely. kqr has given a list in this comment: https://news.ycombinator.com/item?id=39638491
In my experience, people who haven't actually tried to apply SPC techniques outside of manufacturing do not typically have a good sense for what kinds of processes display routine variation. I would urge you to see for yourself: collect data, and then plot it on an XmR chart. It usually takes you only a couple of seconds to see if it does or does not apply — at which point you may discard the chart if you do not find it useful. But you should discover that a surprisingly large chunk of processes do display some form of routine variation. (Source: I've taught this to a handful of folk by now — in various marketing/sales and software engineering roles —and they typically find some way to use XmR charts relatively quickly within their work domains).
[Note: this 'XmR charts are surprisingly useful' is actually one of the major themes in Wheeler's Making Sense of Data — which was written specifically for usage in non-manufacturing contexts; the subtitle of the book is 'SPC for the Service Sector'. You should buy that book if you are serious about application!]
I realise that a bigger challenge with getting SPC adopted is as follows: why should I even use these techniques? What benefits might there be for me? If you don't think SPC is a powerful toolkit, you won't be bothered to look past the janky terminology or the weird statistics.
So here's my pitch: every Wednesday morning, Amazon's leaders get together to go through 400-500 metrics within one hour. This is the Amazon-style Weekly Business Review, or WBR. The WBR draws directly from SPC (early Amazon exec Colin Bryar told me that the WBR is but a 'process control tool' ... and the truth is that it stems from the same style of thinking that gives you the process behaviour chart). What is it good for? Well, the WBR helps Amazon's leaders build a shared causal model of their business, at which point they may loop on that model to turn the screws on their competition and to drive them out of business.
But in order to understand and implement the WBR, you must first understand some of the ideas of SPC.
If that whets your interest, here is a 9000 word essay I wrote to do exactly that, which stems from 1.5 years of personal research, and then practice, and then bad attempts at teaching it to other startup operator friends: https://commoncog.com/becoming-data-driven-first-principles/
I don't get into it too much, but the essay calls out various other applications of these ideas, amongst them the Toyota Production System (which was bootstrapped off a combination of ideas taught by W Edwards Deming — including the SPC theory of variation), Koch Industries's rise to powerful conglomerate, Iams pet foods, etc etc.
My productivity app is just todo.txt in one drive using specificaly notepad.exe.
You put .LOG at the top of the new file with a return. Save and close the file.
Every time you reopen the file, the timestamp is append to the file. Add your notes, save, exit notepad. Open it again when you need to update, rinse and repeat.
Nothing I’ve ever tried has been more effective than just keeping this endless file.
1. Deliberate practice only works for skills with a history of good pedagogical development. If no such pedagogical development exists, you can’t do DP. Source: read Peak, or any of Ericsson’s original papers. Don’t read third party or popsci accounts of DP.
2. Once you realise this, then the next question you should ask is how can you learn effectively in a skill domain where no good pedagogical development exists? Well, it turns out a) the US military wanted answers to exactly this question, and b) a good subsection of the expertise research community wondered exactly the same thing.
3. The trick is this: use cognitive task analysis to extract tacit knowledge from the heads of existing experts. These experts built their expertise through trial and error and luck, not DP. But you can extract their knowledge as a shortcut. After this, you use the extracted tacit knowledge to create a case library of simulations. Sort the simulations according to difficulty to use as training programs. Don’t bother with DP — the pedagogical development necessary for DP to be successful simply takes too long.
Broadly speaking, DP and tacit knowledge extraction represent two different takes on expertise acquisition. For an overview of this, read the Oxford Handbook of Expertise and compare against the Cambridge Handbook of Expertise. The former represents the tacit knowledge extraction approach; the latter represents the DP approach. Both are legitimate approaches, but one is more tractable when you find yourself in a domain with underdeveloped training methods (like most of the skill domains necessary for success in one’s career).
The additional restrictions in the SSPL apply also to unmodified versions. This is very different from the AGPL, which does not impose any additional over the burdens over the GPL if you use an unmodified version of the software, even if you use that to offer network services. This provides a very clear path to AGPL compliance for most users, certainly those who get their software from a GNU/Linux distribution. (I missed that aspect of the AGPL for quite some time, admittedly.)
The SSPL, in contrast, applies to unmodified versions of the software. This means that even if you get SSPL software from someone who publishes the sources, you have to take additional steps for license compliance if you want to use the software. Given that the SSPL requires publishing things like the source code for NIC and network switch firmware (both are obviously required for offering your network service …), I just don't see how this license is useful besides being deceptive. The BSL with its typical field-of-use restrictions achieve the same thing in a much more direct manner.
I would also add that one fundamental aspect of linear algebra (that no one ever taught me in a class) is that non-linear problems are almost never analytically solvable (e.g. e^x= y is easily solved through logarithms, but even solving xe^x=y requires Lambert’s W function iirc). Almost all interesting real world problems are non-linear to some extent, therefore, linear algebra is really the only tool we have to make progress on many difficult problems (e.g. through linear approximation and then applying techniques of linear algebra to solve the linear problem).
Any set of n linearly independent vectors B_a={\vec{a}_i}_{i=1..n} in a vector space can be used as a coordinate system, but the "computational cost" of finding the coordinates w.r.t. the basis B_a will be annoying. Each time you want to find the coordinate of a vector you have solve a system of linear equations.
A basis consisting of orthogonal vectors {\vec{e}_i}_{i=1..n} is way cooler because you can calculate the coefficients of any vector using the formula $v_i = (\vec{v} · \vec{e}_i)/||\vec{e}_i||²
Of course the best thing is to have an orthonormal basis B_s={\hat{e}_i}_{i=1..n}, so that the coefficients of a vector w.r.t B_s can be calculated simply v_i = \vec{v} · \hat{e}_i.
A projection onto the \hat{e}_i subspace.
> Why would we ever want a basis besides the standard basis in real Euclidean space?
Hmm... I'm thinking eigenbases? The operations of a matrix A for vectors expressed in terms of its eigenbasis is just a scaling, i.e. {B_e}_[A]_{B_e} = Q^{-1} {B_s}_[A]_{B_s} Q = diagonal matrix = Λ.
> Is Euclidean space the only vector space out there?
> Do vectors have to be lists of numbers?
Good point.
If I bump this to 5-6 pages, I will try to include something about generalized vector spaces.
Orthogonal polynomials could be a good one to cover.
That was explanation from a perspective of someone acquainted with modern physics. As such, it will make sense to physicist, but no sense to most everyone else, including mathematicians who don’t know modern physics.
For example, in the beginning, author describes tensors as things behaving according to tensor transformation formula. This is already very much a physicist kind of thinking: it assumes that there is some object out there, and we’re trying to understand what it is in terms of how it behaves. It also uses the summation notation which is rather foreign to non-physicist mathematicians. Then, when it finally reaches the point where it is all related to tensors in TensorFlow sense, we find that there is no reference made to the transformation formula, purportedly so crucial to understanding tensors. How comes?
The solution here is quite simple: what author (and physicists) call tensors is not what TensorFlow (and mathematicians) call tensors. Instead, author describes what mathematicians call “a tensor bundle”, which is a correspondence that assigns each point of space a unique tensor. That’s where the transformation rule comes from: if we describe this mapping in terms of some coordinate system (as physicist universally do), the transformation rule tells you how to this description changes in terms of change of the coordinates. This setup, of course, has little to do with TensorFlow, because there is no space that its tensors are attached to, they are just standalone entities.
So what are the mathematician’s (and TensorFlow) tensors? They’re actually basically what the author says, after very confusing and irrelevant introduction talking about change of coordinates of underlying space — irrelevant, because TensorFlow tensors are not attached as a bundle to some space (manifold) as they are on physics, so no change of space coordinates ever happens. Roughly, tensors are a sort of universal objects representing multi linear maps: bilinear maps V x W -> R correspond canonically one-to-one to regular linear maps V (x) W -> R, where V (x) W is a vector space called tensor product of V and W, and tensors are simply vectors in this tensor product space.
Basically, the idea is to replace weird multi linear objects with normal linear objects (vectors), that we know how to deal with, using matrix multiplication and stuff. That’s all there is to it.
I have taught a course on quantum computing a few times, mostly to CS students who have no background in quantum mechanics. The way I proceed is to
* First introduce classical reversible computation. I model it using linear algebra, meaning classical n-bit states are 2^n length binary vectors, and the gates are 2^n x 2^n binary matrices acting on theses states. Exponential, yes, but a faithful model. The critical feature here is that you already need the tensor product structure. Rather than some unique feature of quantum.
* Introduce probabilistic classical computation. Now the states/vectors have real entries in [0,1] and obey the L1 norm (the critical feature). Similarly, the gate matrices.
* Now, argue that quantum computing just requires the same linear algebriac structure but we (1) work over the complex number field, (2) norm is L2.
The reason I like this development is that it takes at least some of the mystery out of quantum mechanics. It is not a strange model of computation, completely divorced from classical. Just a variant of it, that happens to be the one the universe runs on.
Peter Shor does discuss classical computation in two lectures, but from just the notes it seems detached from the rest of the course.
Making fast JS isn't some super-secret thing. Sure, there are weird cases like `x = y > z ? y : z` being 100x faster than `x = Math.max(y, z)`, but most performance improvements come from three simple rules:
1. Create an object with a fixed number of keys and NEVER add keys, remove keys, or change the data type of a key's value.
2. Make arrays of a set length and only put ONE type data type inside. If that data type is an object or array, all the objects/arrays must have the same type
3. Functions must be monomorphic (always called with the same parameters in the same order of the same type)
Do this and your code will be very fast. Do something else and it will get progressively slower.
Running the profiler in Chrome or Firefox is very easy and it will show you which functions are using up most of your processing time. Focusing on applying these rules to just those functions will usually get you most of the way there.
Axler's Linear Algebra Done Right has always been my favorite linear algebra text (I say this 10+ years after reading it and finishing my math PhD, if that matters).
For those who want a free alternative, behold: Treil's Linear Algebra Done Wrong[2]
The books is downloadable as a free PDF. The name is an answer to Axler's book (dry mathematician's humor), and offers an opposite approach (getting to determinants first).
While I agree with Axler and diagree with Treil, LADW offers way more examples and applications, and together LADR and LADW offer a complete, excellent course material.
As for the linked text: not a bad text, but I wouldn't pick it over LADR + LADW.
Here's why:
1. size: it's larger than LADR+LADW taken together. It's hard to see the forest behind the trees.
2. exposition: it follows the structure of many other texts that I don't like because they terribly confuse the students (that I'd have to re-teach afterwards): starting with solving systems of linear equations, then jumping into vector spaces, for example.
3. I don't like how key concepts (matrix product, determinant are introduced). If you already know the material, it will be hard to see what's wrong with the approach of throwing a definition at the reader, and then talking about why that definition was made. But the opposite should be the case.
After teaching Linear Algebra, here's my litmus test for a good book. At a glance, it should make the following clear first and foremost:
1. A matrix of a linear map F is simply writing down the image of the standard basis F(e_1), F(e_2), ... F(e_n). These vectors are the columns of the matrix. If you know them, you can compute F(v) for any v by linearity. That's called "multiplying a vector by matrix"; we write Mv = F(v).
2. The product of matrices is simply the matrix of composition of linear maps that they represent. The student can figure out what that matrix should be (or should be able to do so); here's how. If M is the matrix of F, and N is the matrix of G (where F and G are linear maps), then the first column of MN is F(G(e_1)) = M x (first column of N). Same for other columns. Ta-dah.
3. The determinant of v_1, .. v_n is simply the volume of the lopsided box formed by these vectors (mathematicians call the box "parallelepiped"). In particular, in a plane, the area of the triangle formed by vectors A and B is half the determinant. This are can have a minus sign; switching any pair of vectors flips the sign.
4. Eigenvectors and eigenvalues are fancy words that allow us to describe linear maps like this: "Stretch this picture along these directions by this much". Directions are eigenvectors, by how much - eigenvalues.
Bonus:
5. Rotation and scaling are linear maps. That's all any linear map does: rotates and stretches. Writing a map down in this way is called singular value decomposition.
6. Shears are linear maps that don't change the volume. Any box can be made rectangular by applying a bunch of shears to it. That's called Gaussian elimination or row reduction when you look at what happens to matrices (and apply scaling as the last step). This is also an explanation of why the determinant gives volume (if you define it as an alternating n-linear form).
That's the beginning of a solid understanding of the subject.
From my experience, LADR+LADW leave the student with an understanding of 1-4, and other texts, due to being organized badly, don't (even when they contain all the information in some order).
Step 1: put 'xrandr --dpi <your actual DPI>' in .xinitrc
Step 2: Use QT applications (Plasma is a fantastic QT desktop)
Step 3: Enjoy your reasonably sized everything.
"Scaling" is a broken concept to work around applications assuming 96 DPI (which is considered scale=1). You don't need it if you use programs that actually respect your real DPI. Unfortunately X11 doesn't properly compute DPI settings, even though EDID information generally contains the screen size - I imagine, for fear of breaking stuff.
(You can correct GTK3/GDK applications by setting GDK_DPI_SCALE=<actual dpi / 96>, but in my view it's a sin that you need to do that)
I've given a few textbook suggestions for almost all of the topics you requested, in a preferred order for learning them. But before you look at that list, consider the following:
I would strongly, strongly advise against trying to learn proof-based mathematics from a textbook (almost all of the math here will be proof-based). The absolute best way to learn mathematics is to have an experienced and competent instructor tailor their pedagogy to you. Failing that, an experienced instructor who is "just okay" but who can e.g. review and critique your work is better than a textbook.
Learning math is very unlike learning programming. It's a counterintuitive idea, but the information density of math textbooks (whether they're well or poorly written) is generally so high that you can't absorb the material unless you read only a few pages per day. Not only that, but it's usually not the case that a single textbook has the ideal level of exposition for your needs - for example, you don't have linear algebra on here despite it being a prerequisite for basically everything else. Some textbooks treat this subject in a highly theoretical manner, while others treat it at a very applied/computational level. Which suits your needs more? Have you studied it at all?
If you're actually serious about this, you need to proceed at a slow pace (2 - 5 pages per day) and complete as many exercises as possible. If the exercises are computationally focused you can do fewer, but you should aim to solve as many of the proof-based problems as possible.
If you go at a rate which will actually allow you to absorb the material, doing this "properly" will take you years. With dedication and not much talent I'd expect it to take as long as an undergraduate degree. With dedication and a lot of talent I could see this being accomplished in two, maybe three years. Once again, I strongly, strongly suggest finding a mentor or instructor.
In any case, here is a list of the textbooks most mathematicians will consider to be very good:
1. Calculus
Calculus, by Spivak
This gives you a rigorous treatment of calculus, which hopefully you have some familiarity with. After this you can move on to real analysis.
2. Real Analysis
Principles of Mathematical Analysis, by Rudin
You might be ready for this after Spivak's Calculus, but it can be rough. If you can't reproduce a proof of irrationality after reading through the first few pages, work through Tao's Analysis I first.
3. Topology
Topology, by Munkres is the absolute gold standard. You should be comfortable with calculus (and hopefully analysis) before tackling this.
4. Linear Algebra
Linear Algebra Done Right, by Axler
This is a thorough introduction to the subject at a theoretical level, with a focus on finite-dimensional vector spaces over fields R and C.
You should also work through either Linear Algebra by Friedberg, Insel, Spence or Linear Algebra by Hoffman & Kunze for the treatment of more advanced/specialized material and, in particular, determinants (which are notably de-emphasized by Axler).
Abstract Algebra by Dummit & Foote is the usual reference text for a first course. It's pretty good. If it's too advanced for you, try Pinter's A Book of Abstract Algebra. For a very challenging (but comprehensive) approach to the subject, try Lange's Algebra.
6. Category Theory
Once you have abstract algebra under your belt, a good introduction to category theory is given by Aluffi's Algebra: Chapter 0. I would suggest not trying to dive into this prior to at least encountering fields, groups and rings because it's good to have both the traditional and modern (read: categorical) contexts.
Also try Category Theory in Context, by Riehl.
7. Complex Analysis
Complex Analysis, by Ahlfors. This is an excellent and concise text. You can theoretically approach this before real analysis, but I wouldn't recommend that. Also try Complex Variables, by Churchill & Brown.
8. Differential Geometry
Calculus on Manifolds by Spivak. You will want to have a thorough understanding of analysis and linear algebra before approaching this material.
9. Measure Theory
This is very advanced material in an analysis sequence; don't jump to this unless you've thoroughly worked through analysis first.
I would recommend Stein & Shakarchi's Real Analysis: Measure Theory, Integration and Hilbert Spaces.
10. Probability Theory
A really rigorous treatment of probability is measure theoretic, but even if you haven't worked with measures before you'll need (real) analysis and linear algebra. Tackle those first.
Feller's Introduction to Probability Theory is usually a good first course. If you don't like that, try Ross. For truly advanced probability theory, work through Shiryaev or Kallenberg.
The other things you've asked for are a little under-specified or outside my wheelhouse (in particular, I don't think chaos theory is still emphasized as a field distinct from dynamical systems). You should probably add ordinary and partial differential equations to your list before some of these more specialized topics.
1. Numerical Analysis
Numerical Linear Algebra, by Trefethen & Bau. This is the best all-around introduction. Once you've worked through this, try moving on to Matrix Computations by Golub & van Loan. The latter is much more of a reference text.
2. Cryptography
You haven't specified what you're looking for here, but given the mathematical bent of your question I'd recommend Goldreich's Foundations of Cryptography (two volumes). Be forewarned: cryptography is a subfield of complexity theory. You should have a strong understanding of complexity theory before embarking on Goldreich's Foundations.
On the other hand, if you're looking for a more implementation-focused text on cryptography, try Menezes' Handbook of Applied Cryptography.
3. Optimization
This is extremely broad. There's linear programming, mixed integer programming, nonlinear optimization, stochastic optimization...I can't recommend textbooks targeted at everything here.
For a good start to the subject of optimization and constraints in general, work through Boyd & Vanderberghe's Convex Optimization. There are additional exercises available from the authors here: https://web.stanford.edu/%7Eboyd/cvxbook/bv_cvxbook_extra_ex...
2 books I absolutely love and have read cover to cover several times, solved most of the 1000+ problems.
1. Inference - Rohatgi
2. Inference - Stapleton
Why I recommend them ?
The real answer is super long. But the short version is - there are thinkers & there are doers. Basically, the mathematical statistics world has these theory-building Bourbaki type guys who write a LOT, say a LOT, but never get to the fucking point (imho). The opposite view is "math is bunch of tricks. its like chess - more middlegames & endgames you know, higher chance of winning. No real point in learning who originally came up with this particular middle game variation, or why does this opening work etc etc. Just learn the trick & play the game." So that's the eastern (Indian/Chinese) school of thought, which is what I subscribe to.
The 2 inference books listed above are essentially grab-bags of tricks. Do this - it works - now try it on these problems - ok next trick...on & on. So I solved the 1000+ problems & now I know lots of these methods that just work.
eg. recently i was asked - some vc's are evaluating a startup. their valuations are $1 million, $4 M,$10M, $20M, $50M. what's your evaluation & why ?
so i'm thinking - hey isn't this just rohatgi taxicar ? so i quickly said- sum is 85, times 1/5 is 17. Whereas largest observed is 50, times 6/5 is 60, so half is 30. since 50 was max observed, another estimator is half that, ie. 25. if you want doctor's estimate, get rid of 1 and 50, then sum is 34 so times 1/3 is 11.3
so then we have 4 estimators, - the sample mean is 17 million, its the method of moments estimator, clearly unbiased but high mean square error because variance is high. the maximum likelihood estimator is 25 mil, and has smallest variance, but the mse will not be the lowest since it is not unbiased so bias square will add. the 30 mil estimate is also unbiased, but has low variance so it has the lowest mse of the lot. the doctor estimator 11 million is unbiased but high variance and mse is in between. now if you want the absolute lowest mse, i can cook up a 5th estimator which has nonzero bias but mse will be the minimum....
at this point the interviewer interrupts me - you've never seen this problem because we came up with it in our last meeting at our firm. Yet you gave me 4 very good estimators under 2 minutes & want to cook up a 5th one that's even better. And you don't even have a phd. meanwhile i just spoke to an actual phd and asked him this same question, he went on and on for 20 minutes without giving me a single concrete estimator!
so that's the thing. rohatgi, stapleton, these are about real world, down & dirty, how to do stuff. how to solve actual problems.
whereas the gelman bda, the shao, the schervish, the lehman, the bickel & doksum - these were my prescribed textbooks. imho they are absolute garbage, worse than dirt. after the exam i threw them away. such bullcrap. they go on & on without getting anywhere & have practically zero good worked examples.
so that's my 2 cents. i still have the rohatgi & stapleton on my desk. sometimes i tear up when i look at them. they have taught me so, so much!
MIT recorded a set of Calculus video courses back in 1970s that they have since made publicly available. It is taught by a lecturer named Herbert Gross. His style of lecturing is clear, he states why things are defined the way they are and derives everything from first principles. There is an unusual mix of rigor and focus on building understanding - where everything comes from. It also taught me that math is about reasoning logically and rigorously and we shouldn't always rely on intuition (at least while doing math). Deriving almost all the basic calculus results that were drilled into me from the basic concept of a limit, deltas and epsilons was really refreshing.
Compared to more recent OCW calculus videos, I found this to be better in terms of respecting the learner's intellect, presenting the whole proof rigorously and teaching the student to think a certain way.
I have created a mirror of this more up-to-date version at https://github.com/thaliaarchi/unity-termsofservice.
Here's how to “cook”[1] an archive from the vault, if you want to do it yourself:
[0]: https://archive.softwareheritage.org/browse/origin/directory...[1]: https://archive.softwareheritage.org/api/1/vault/git-bare/do...