Books like these should be read by all mathematics/statistics students, not to get better at mathematics, but to see how the concepts they study in theory are relevant in practice.
It's good but it's old-school, focusing on traditional null hypothesis significance testing and related ideas. These ideas have been under attack for decades, most recently as being one cause of the scientific replicability problem.
It is old school in the sense that it is quite pragmatic. It is aimed at Mainstays of engineering like 'did I improve performance in this line' or 'ate these systems giving the same output's or even 'how do.i design an experiment in a really complex system'. And they generally work well.
Just do work using appropriate tools. There are reasons for all sorts of tools, use the right ones.
Reproducibility crisis is mostly about marginal results, publication bias, and humans.behavior. Bayesians can (and do!) The same things. It isn't the tool, it is the people.
I've consulted in many situations where people make horrible business decisions based on "statistically significant." People mistake p-values for p(hypothesis|data), mistake p-values for effect size ("it's highly significant!"); moreover, people using NHST don't understand multiplicity or "topping-off" problems.
Of course one could argue that they just need to review Intro Stats, but that skips over the impenetrable conceptual nature of p-values and NHST. Given that regular people must use applied stats and given that NHST is fundamentally arcane and confusing, then the stage is set for endless drama.
Now if we were sampling manufactured products from batches and looking for a quantitative upper-bound on the failure rates, run over many batches, then we have a winner!
It seems that the problems you cited are still very much people-related.
There's a lot of pressure (especially in regulated environments) to be "data-driven". It's tempting for folks to pick up a tool like minitab and churn some stats, cargo-cult-style, to prove whatever and look-smart while doing it.
This is tricky stuff. Nobody like to admit they're confused or that the path isn't clear. I agree it sets the stage for drama and failure.
There are whole books dedicated to it (I can recommend Bernouilli's Fallacy by Clayton), but the gist is: NHST answers the wrong question (it should answer: what is the likelihood of the hypothesis given the data, but answers: what is the likelihood of the data under the hypothesis), and is (thus) very sensitive to the probability of the prior. To quote the old example: when you wake up with a headache, it's not usual to assume you've got a brain tumor.
> is (thus) very sensitive to the probability of the prior
Standard practice is to test sensitivity of the posterior against various priors. They're often non-informative, e.g., `effectSize ~ N(0, wide)`, and the results are quite robust.
Bayesian analysis overcomes this particular problem, but of course doesn't have such an easy and commonly accepted "THIS IS SIGNIFICANT AND BY GOLLY I AM RIGHT!" signal. It's conceptually harder than NHST, doesn't lead to easy conclusions (unless your effect is really strong), and many researchers have never even heard about it. And they are the reviewers.
I got into a bit of a kerfuffle about a statistical test once. A critical review insisted I do an ANOVA, even though it was inadequate, and I had used a weaker test. Such is the status quo.
Work has been done to make Bayesian analysis as straight-forward as NHST, but if it gets accepted with a threshold (comparable to the current p<0.05), it won't help.
Some articles now publish both analyses, which is nice. And there are tools to help you, e.g. https://jasp-stats.org/
> doesn't have such an easy and commonly accepted "THIS IS SIGNIFICANT AND BY GOLLY I AM RIGHT!" signal.
Part of the ongoing stats revolution is abandoning the binary significance test, because it's often misaligned with the fundamental business question at hand. If, instead, you have a probability distribution over hypotheses, you can make optimal business decisions based on standard utility theory, like this https://philosophy.hku.hk/think/stat/naode1.php.
> It's conceptually harder than NHST
From a different angle: Experimenters fundamentally want to know p(hypothesis | data): Does adding 20g sugar to 1kg cement prevent 98% of the units from crushing under 1000kg load? Below are two answers. Which would people rather tell their boss?
1) Likely yes, given our experimental data. The totality of evidence shows that there's a 70% probability that failure rates are below the limit, but we're not yet ready to commit changes to our manufacturing. We can run more experiments to nail this down and make a concrete decision (heh).
2) Sorry boss, I have no clue. The p-value was 0.15, so we can't say anything. Unfortunately old-school stats can't easily add new data to an existing experiment, so we have to run new experiments, and then combine everything with a fancy new meta-analysis. Or we could call the first experiment a sneak peek but then we have to do multiplicity corrections. But even after we do all that, I can't really tell you the probability that sugar strengthens cement. NHST doesn't work that way.
From the stakeholder's perspective, #1 is conceptually lots easier.
The book has eight voluminous chapters and only one is about hypothesis testing. Much of it is about design of experiments and statistical process control, think something like optimizing the workings of a factory. Hypothesis testing has been under attack in psychology/economics/etc., as part of I think a broader problem those disciplines have drawing reliable conclusions in general, since it is difficult to control all the variables. This book is about engineering and industrial applications which are closer to physics.
I read a few mins of the book and loved it, but I confess that the comments here have left me a bit confused.
I have a degree in engineering, took two statistics courses that basically used high school level math.
Nowadays, I work with data analysis, using SQL and Python. And I would like to know which statistical approach you guys think are most suitable for the real world, like how to test hypothesis etc?
I think the examples in the handbook are excellent, as an engineer I started on the pipeline example [1]. In terms of testing hypothesis, the design of experiments section is useful [2].