It seems like a plain anonymization problem, much like a voting with a secret ballot. If I encountered this problem in real life, I would just use slips of paper and a box with a slit in it. Possible strengthening techniques would involve removing the handwriting, for instance using multiple choice 0-9 for each digit.
The average company a few years from now will probably be as profitable as the average company last year. But that's survivorship bias: some of the companies from last year might not be around anymore, and it's hard to predict which. I think that fear might be driving lower valuations.
The market reacts to expected news, not just current news. The fact that bad news is coming might already be priced in. It's hard to say how bad a piece of news would have to be to make expectations worse.
The market hates uncertainty the most and nothing about this pandemic or the economic impacts are certain. There is definitely no good news anytime soon for there to be a serious uptrend.
Does it still make sense to talk about what people expects when discussing the markets? isn't the vast majority of the action these days is coming from algorithms that try to predict the very very near future?
Furthermore, parent is assuming investors are rational. In an environment like the current, when ppl see their portfolio drop 10% in a single day and worry about their lives, they aren’t making rational pros vs cons decisions. They aren’t calculating future value.
They are panicking and just do whatever they can to make their lizard brains feel better.
Important to note that the datasets used are sparse, and that the key to this algorithm is better exploitation of sparsity. The GPU over CPU advantage is a lot lower if you need sparse operations, even with conventional algorithms.
"It should
be noted that these datasets are very sparse, e.g., Delicious
dataset has only 75 non-zeros on an average for input fea-
tures, and hence the advantage of GPU over CPU is not
always noticeable."
In other words, they got a good speedup on their problem, but it might not apply to your problem.
WaveNet, if I remember correctly, has 1-from-256 encoding of input features. And 1-from-256 encoding of output features.
It is extremely sparse.
If you look at language modeling, then things there are even sparsier - typical neural language model has 1-from-several-hundredths-of-thousands for full language (for Russian, for example, it is in range of 700K..1.2M words and it is much worse for Finnish and German) and 1-from-couple-of-tens-of-thousands for byte pair encoded language (most languages have encoding that reduced token count to about 16K distinct tokens, see [1] for such an example).
The image classification task also has sparcity at the output and, if you implement it as RNN, a sparsity at input (1-from-256 encoding of intensities).
Heck, you can engineer you features to be sparse if you want to.
I also think that this paper is an example of "if you do not compute you do not have to pay for it", just like in GNU grep case [2].
Embeddings tables aren't hard on the GPU (being only a lookup table), and the output softmax still requires you do the full matrix-multiply. The label may be sparse, but the computation is far from sparse.
> The reverse is true, embeddings are both the performance and memory-footprint bottleneck of modern NN models.
They may be a bottleneck, but the alternative is worse -- you can't fit complex models with large vocabularies into GPU memory using sparse one-hot encodings.
Technically, the sparse one-hot encoding is the most efficient in terms of memory footprint. You simply store the non-zero coordinates.
The problem in practice for GPUs is that sparse vector/matrix operations are too inefficient.
The whole point of something like this paper is to skip the entire 'densification' step and to directly deal with the sparse matrix input as a sparse matrix. The LSH is used in this paper improves on directly using SpMSpV, as that is also inefficient on CPUs, although to a lesser extent than GPUs.
> If you look at language modeling, then things there are even sparsier - typical neural language model has 1-from-several-hundredths-of-thousands for full language
Most real-world models don't use one-hot encodings of words -- they use embeddings instead, which are very dense vector representations of words. Outside of the fact that embeddings don't blow out GPU memory, they're also semantically encoded, so similar words cluster together.
First, you need to compute these embeddings at least once - sparsity, here you are! Second, these embeddings may be different between tasks and accuracy from their use may differ too.
For example, the embeddings produced from CBOW and skipgram word2vec models are strikingly different in cosine similarity sense - different classes of words are similar in CBOW and skipgram.
So you agree that the problem is fundamentally sparse? Embeddings are used to make sparse (e.g. categorical) data possible on GPUs, and real-world models are limited by how large they can make the embeddings to fit in GPU memory. Embedding lookups is also a compute bottleneck:
Why aren't GPUs better at sparse matrix math? Generally, sparse operations are memory bandwidth limited, but GPUs/TPUs still have much faster memory than CPUs and more memory bandwidth in general (roughly a factor of 4 or so between the latest cpus and gpus).
Sparse matrix math basically boils down to indirect array references: A[B[i]]. GPUs generally trade off memory bandwidth for latency, relying on being able to do a lot of work to hide that memory latency. But because there's no work between the first and second load, you are no longer able to hide the memory latency of the second load with extra work.
CPUs, by contrast, have a thorough caching hierarchy that tends to focus on minimizing memory latency, so it doesn't take as long to do the second load compared to a GPU.
Yeah on the GPU you need to get your threads to ideally load consecutive memory locations for each thread to utilize the memory bandwidth properly. Random-indexing blows this out of the water. I guess that you could pre-process on the CPU though to pack the sparse stuff for better GPU efficiency..
Another thing to note is that sparsity is being leveraged even to build a more efficient version of hardware. A good example of this is the Cerebras Waferscale chip that was announced recently. I'm assuming the author was unaware of developments on the hardware side of things.
I think you're comparing bicycles to an idealized version of horses. Real horses get tired quickly, get hurt easily, and are quite expensive. Assuming decent roads, a bicycle would often be more convenient, and almost always cheaper.
From Wikipedia: "A stagecoach traveled at an average speed of about 5 miles per hour".
> Assuming decent roads, a bicycle would often be more convenient, and almost always cheaper.
I think you're assuming an idealised version of roads. In all seriousness, good roads were usually cobblestones and bad roads rutted tracks until bicycle pioneers started lobbying for roads to be paved smoothly enough to make their then-expensive hobby more practical for transportation than horses and horse drawn vehicles. Asphalt and Portland cement roads were new technologies in the nineteenth century too.
"Big" is relative. If a company starts falling apart as soon as the founder can't personally manage everyone, it has a disease of bigness, even though it's still small on some absolute scale.
No, because computer chips don't operate at 100°C (let alone 300°C like a nuclear reactor). Low-temperature heat flows contain little usable work. Typically the best option is to dump the spare heat into the nearest body of water, or into the air.
This is a problem of science journalism. We don't want all scientists' predictions to agree - there should be a range that reflects the real uncertainty. But combine this with the demand for big headlines, and you have a broken way of conveying predictions to the public.
I think so too. But why don't we ever hear the scientific community come out and stand up for themselves? I imagine that would be considered news in and of itself.
I reckon its because there is a social and possible economic price to pay to say anything that detracts from a sense of emergency around the climate. It is a toxic culture now similar to that of the 'woke' wing of the left or the pro-gun wing of the right.
If anyone wants the general public to take this issue as seriously as it should be FUD is not the way to go. I think it hurts more than it helps.
If scientists knew how to fix science journalism, we would. It's not just climate. Cancer research and diet research have the same problem, and if battery capacity had doubled for every headline saying it had, we would be charging our phones once a year. But the science journalists argue that they're not lying, just picking the most interesting stories, and that otherwise no one would read them at all. If scientists came out with a statement that science journalism is a fraud, it might be sensational enough to make the news, but it wouldn't be true. If they said science journalism needs reform - well, I'm sure you can find plenty of blog posts arguing this, but not a lot of media coverage.
> If scientists came out with a statement that science journalism is a fraud, it might be sensational enough to make the news, but it wouldn't be true.
You're arguing the extremes here. Scientific organizations could easily publicize their displeasure with how their work is communicated to the public without calling the media fraudsters. Give me a break.
If you see smart people not doing something they could easily do, maybe it's not so easy? Example: the American Council on Science and Health put together a ranking of science journalism sources, putting Nature at the top, and Nature responded telling them they weren't helping.
I admit I'm speculating a bit when I try to explain why reforming science journalism is difficult, but I'm pretty sure empirically that it is difficult.
The climate is a non-linear dynamic system. Even if you had the perfect computer model to represent the earth's climate system, if your input parameters were off by even the slightest bit you'd be way off base in 5, 10, 50 years. That's why weather predictions are useless more than 10 days out and why we can't predict hurricane movement more than a couple of days out and our uncertainty increases as a function of time. This stuff is way more nuanced and complex than anyone lets on. You can't even begin to explain it in a one part article...and meanwhile, fearful headlines remain top click targets.
Temperatures of 37 C and 36.6 C are barely distinguishable with an ordinary thermometer, and both are within the normal range of human body temperatures (37.32-38.76 C).