I mean, part of the appeal of the Louvre at least isn't just that you can see the art in the physical world, but that you're practically bathed in it. This sense of being overwhelmed and the serendipity factor in discovering new works constituted a lot of the appeal when I used to visit there. You could go into there once every weekend for a year and never have the same experience twice.
Just as a bit of a nitpick the Connection Machine:
a) Wasn't manufactured by Cray. It was made by Thinking Machines Corporation in the greater Boston area.
b) Didn't have anything to do with neural nets, as it was developed during the period of time when GOFAI / symbolic AI was still in vogue (although by the late 80s the Japanese had revived connectionism / neural nets), and thus had far more in common with a LISP machine.
c) Was mostly about developing a decent SIMD architecture.
I mean, a lot of the brain is devoted to sensation, so if you don't care about simulating how the brain interprets certain aspects of sensation (motion, depth, vision more generally) you could probably simulate other functions. For memory, however, at least, there's a lot of evidence that you'll need to simulate sensory systems to be able to accurately simulate recall [0].
It's a bit different in the U.S. in the sense that, whereas in Britain and France most mid-sized cities have decent public transit and municipal services, in the U.S. only a handful large urban centers have these kinds of things.
For example, in Clermont-Ferrand, which is roughly to Paris what Albany is in NYC, there is a well-established tram system that can get you to many places you'd want to go to without a car. This is especially essential if you're low income. Albany in contrast, just has buses (the CDTA), and pretty crap buses at that. The only other city in New York State with anything even remotely approaching the utility of the subway system is the light rail system in Buffalo, which is mostly useless since it only comprises one marginally useful line.
Believe me, as someone not involved in finance in NYC, if I could get the same services I get here in a mid-sized city like Pittsburgh, Cincinnati or Buffalo, I'd strongly consider relocation.
> in Britain [...] most mid-sized cities have decent public transit
Strongly disagree with this. It might be better than the US, but to call it "decent" I think is a bit of a stretch.
Most journeys, in most cities, will be 3x faster by car or more. The exception is usually rush hour commuting into the city centre.
In my hometown for example the buses used to run on a 20 minute timetable in the middle of the day. I could cycle or drive into the city before the bus even arrived.
Inner city London is the exception. In the suburbs public transport is mostly useful for going in to Central London.
Is public transport really a driving factor for your decision on where to live? If it is, I just can't see it being a common factor for others. Most Americans I know are happy to drive if they can, and begrudgingly take public transport only when they have to (or when the cost and/or time savings is overwhelmingly in support of public transport, as it is in much of NYC). I would have to agree with the post above yours, that there is a drive among young people to live in large urban centers, if only to be nearer to culture, despite only being consumers of that culture. I can understand that drive, but it is far from the most financially responsible choice of residence for a large majority of people.
Grad students / postdocs / human lab rats aren't scum, the incentives just aren't in place to promote good behavior (such as calling other researchers out on their bullshit). If you're trying to acquire a vaunted tenure track job, you can't afford to piss off $senior_tenured_researcher_at_prestigious_institution, since $senior could blacklist you so that you won't get hired at the incredibly small set of universities out there. Sometimes things work out despite pissing off major powers (Carl Sagan technically had to "settle" for Cornell due to being denied tenure at Harvard, in no small part because of a bad recommendation letter from Harold Urey [0]), but not often.
Even if you do manage to get a tenure track job, you pretty much have to keep your head down for 7 years in order to secure your position.
And once you have tenure, you still get attacked vociferously. Look at what happened when Andrew Gelman rightly pointed out that Susan Fiske (and other social psychologists) have been abusing statistics for years. Rather than a hearty "congratulations", he was called a "methodological terrorist" and a great humdrum came about [1].
When framed against these circumstances, it should be evident that there is literally nothing to gain and everything to lose from sending out a short e-mail pointing out that someone's model doesn't work.
I'm a researcher myself and I guess this is one of those "does the end justify the means?" scenarios... Out bad research and its perpetrators and science loses out on a scientist that actually wants to do good work. Or don't and then watch yourself rationalize worse decisions later on for the sake of your research, slowly becoming as corrupt as they were and realizing that a lot of your cited work could potentially be as bad (or worse) as the ones you helped get published.
I really believe we need a better way. Privately funded / bootstrapped OPEN research comes to mind as a potential solution to bring some healthy competition to this potentially corrupt system. Mathematicians are starting to do this, I think computational researchers have the potential to be next.
> Grad students / postdocs / human lab rats aren't scum, the incentives just aren't in place to promote good behavior
The question is, would additional incentives promote good behavior or just lead to more measurement dysfunction. Some people think that just giving the "right" incentives is needed, but actual research shows otherwise.
Without reading through that very long text, claiming that incentives don't influence human behavior is a wildly exotic claim.
There is near infinite evidence to the contrary. That said, constructing a system with "the right incentives" can of course be devilishly hard or even impossible.
The claim is that it does change behavior, but only temporarily and it doesn't change the culture in a positive way / doesn't motivate people. It ends up feeling like a way of manipulating. That being said, according to this article, the entire incentive system would need to be dismantled. Simply adding more incentives wouldn't necessarily produce higher quality, at least not in the long run. So essentially the process of incentivizing new amazing research for funding is the primary issue and adding incentives for pointing out issues would just be a bandaid.
This sounds like a good critique of naive incentive schemes.
I don't think there is any doubt that humans follow incentives.
But working out what the core incentive problems are, and actually changing them might be both (1) intellectually difficult, and (2) challenge some sacred beliefs and strong power structures, thus making it practically impossible.
The HBR article's discussion of incentives is not really quite what I was thinking of when I wrote my comment. Specifically, the article you cite refers to the well-known phenomenon of how introducing extrinsic rewards via positive reinforcement is counterproductive in the long run. I've often noticed this form of "incentive" / reward being offered in the gamification of open science, such as via the Mozilla Open Science Badges [0], which in my opinion are a waste of time, effort, and money that do little to address systemic problems with scientific publishing.
With regard to the issue of grad students being unwilling to come forward and report mistakes, incentives wouldn't be added, but rather positive punishment [1] would be removed, which would then allow rewards for intrinsically motivated [2] actions.
> And yes, I'm definitely interested in doing video
As someone familiar with the libraries space, I'd actually be very interested in seeing a machine learning model that could deal with "cleaning up" old film (I've actually brought this up w/ several of my ML friends occasionally). One of the biggest challenges in the world of media preservation is migrating analogue content to digital media before physical deterioration kicks in. Oftentimes, libraries aren't able to migrate content quickly enough, and you end up with frames that have been partially eaten away by mold.
As a heads-up, these are some of the problems you might encounter on the film front (which you might not otherwise find with photos due to differences in materials used, etc):
I believe that Peter Jackson's recent endeavour in cleaning up WW1 footage employs significant ML for de-noising, frame interpolation, and colorising. I haven't seen the final film, but some of the clips are staggeringly good: https://www.bbc.com/news/av/entertainment-arts-45884501/pete...
I'm actually not sure much ML was involved here - depends where you draw the line I guess, but denoising and interpolation for restoration typically use more traditional wavelet and optical flow algorithms. The work for this was done by Park Road Post and StereoD, which are established post-production facilities using fairly off-the-shelf image processing software. The colorisation likely leant heavily on manual rotoscoping, in the same way that post-conversion to stereo 3D does.
I'd love to hear otherwise but I'm not aware of any commercial "machine learning" for post-production aside from the Nvidia Optix denoiser and one early beta of an image segmentation plugin.
Huh, I recall seeing an article at one point (can't find the link) where it said or suggested that ML was involved. Of course this could have just been a journalist failing to make the distinction; I've seen everything from linear regression on up naively lumped into the ML bucket.
In any case the results are damned impressive -- can't say I've seen anything like it before.
One of the side effects of the corporatization of modern universities is that pretty much every scientific finding is accompanied by a PR fluff piece and scientific journalists usually just recycle aforementioned fluff piece. :/
> Those fields don't even know what their best practices are
"Best practices" are a chimera. The issue at hand isn't about what is "best", but whether or not a software engineer's "good enough" practices are more likely to achieve science's goals than a graduate student's "good enough" practices.
It's also disingenuous to claim that classical music composition doesn't have "best practices" when the field of music theory exists as an explicit manifestation of "best practices" in music. Having gone to a school with a conservatory, I also believe that I know several individuals who would would disagree with your mindset regarding how the creative process can't be managed. Indeed, if creativity, as it relates to musical composition, couldn't be managed most orchestras would be brimming with anger at the number of commissions that weren't finished on time for the concert, and most Hollywood studios and Broadway shows would screech to a halt.
> When I see stuff around notebooks for "reproducibility", I'm a bit confused in that notebooks often don't specify any guidance on installation and dependencies, let alone things like arguments and options that a regular old script would.
At the core of this, as some others may have already alluded to already, is that many academic scientists have not been socialized to make a distinction between development and production environments. Jupyter notebooks are clearly beneficial for sandboxing and trying out analyses creatively (with many wrong turns) before running "production" analyses, which ideally should be the ones that are reproducible. For many scientific papers, the analysis stops at "I was messing around in SPSS and MATLAB at 3 AM and got this result" without much consideration for reformulating what the researcher did and rewriting code/scripts so that they can be re-run consistently.
> many academic scientists have not been socialized to make a distinction between development and production environments
Geologist here - definitely true in my field. Nonetheless, while I don't develop in notebooks at all, I do use them for "reproducibility" in a sense -- by putting a bit of dependency info in a github repo along with a .ipynb file, I can do things like this: https://mybinder.org/v2/gh/brenhinkeller/Chron.jl/master?fil...
Which ends up being useful when a lot of folks in my field don't do any computational work at all, so being able to just click on a link and have something work in browser is a big help.
This is kind of a broad observation, but scientists tend to borrow tools from a huge variety of fields, and use them in ways that seem un-disciplined to the practitioners of those fields. For instance, an engineer would be horrified to see me working in the machine shop without a fully dimensioned and toleranced drawing. A project manager would be disturbed to learn that I don't have a pre-written plan for my next task. How do I even know what I'm going to do? If we adopted the most disciplined processes from every field, we'd grind to a halt.
In fact, there might be something about what attracts people to be scientists rather than engineers, that makes us bristle at doing what engineers consider to be "good" engineering.
I agree that science can't be bound by the rigid structures of most applied disciplines, and that the freedom to combine technologies in novel ways is a pre-requisite to novel findings.
What I find objectionable is the inability of scientists to explicitly delegate tasks to domain specialists in their everyday work when it makes sense. I think that it's unrealistic of you to believe that engineers always work with "a fully dimensioned and toleranced drawing" before starting work on a project and that your would work "grind to a halt". Indeed, there's a reason for the qualifier rapid in the term "rapid prototyping". If you can give an engineer general specifications for what you want and then leave him/her alone, he/she should be able to produce something that mostly fits your needs while avoiding all of the pitfalls that wouldn't have occurred to you. It would also be incorrect to assume that engineering does not involve creativity and is purely bound by rigid processes- if your requirements were strange enough, something fresh would inevitably be built.
This sort of delegation of course, is actually more efficient, since you can work on other tasks in parallel with the engineer (such as writing your next grant proposal or article or gasp teaching). Most scientists also already do this implicitly by choosing to purchase instrumentation from manufacturers like Olympus, Phillips, or Siemens rather than building it themselves.
Part of the reason for why I have such strong opinions about this matter, is that I've actually witnessed scientists waste more time messing around in fields where they were clearly out of their depth. As an example, there was a thread on a listserv in my (former) field that lasted for literally months that was solely devoted to the appearance of a website. Everyone wanted to turn the website design into an academic debate, when the website's creation (which had little to do with the substance of the scholarship itself) could have been turned over to a seasoned web developer and finished in less than a week or two.
But in the case of dev and prod distinction it has nothing to do with fitting some over-constrained engineering principle, but about fitting actual science: if you cannot reproduce something, you don't have a result, you have a fluke.
I think GP here is an insightful comment. Reproducing things is indeed important, but re-running code is much too narrow a definition, and possibly distractingly narrow.
Maybe your awful notebook gets the same answer you got the day before on the blackboard. Or the same answer your collaborator got independently, perhaps with different tools. Those might be great checks that you understand what you're doing. Spending time on them might be more valuable for finding errors than spending time on making one approach run without human intervention.
Not to say that there aren't some scientists who would benefit from better engineering. But it's too strong to say that fixing everything that looks wrong to engineer's eyes is automatically a good idea.
I find that with Jupyter, re-running code does serve one useful purpose, which is to make sure that your result isn't affected by out-of-order execution or a global that you declared and forgot about. That is a real pitfall of Jupyter that has to be explained to beginners.
For my work, reproducing a result may involve collecting more data, because a notebook might be a piece of a bigger puzzle that includes hardware and physical data. This is where scripting is a two edged sword. On the one hand, it's easy to get sloppy in all of the ways that horrify real programmers. On the other hand, scripting an experiment so it runs with little manual intervention means that you can run it several times.
Huge fan of just including an environment.yml for a conda virtual-env in the repo you store your notebooks in, but the challenge there is that it's OS specific reproducibility. I've had no luck creating a single yml for all OS's and the overhead of creating similar yml's for (say) Mac and Win is a lot unless you plan on sharing your notebook widely.