More

comicjk · on March 14, 2020

It seems like a plain anonymization problem, much like a voting with a secret ballot. If I encountered this problem in real life, I would just use slips of paper and a box with a slit in it. Possible strengthening techniques would involve removing the handwriting, for instance using multiple choice 0-9 for each digit.

comicjk · on March 13, 2020

The average company a few years from now will probably be as profitable as the average company last year. But that's survivorship bias: some of the companies from last year might not be around anymore, and it's hard to predict which. I think that fear might be driving lower valuations.

comicjk · on March 12, 2020

The market reacts to expected news, not just current news. The fact that bad news is coming might already be priced in. It's hard to say how bad a piece of news would have to be to make expectations worse.

manigandham · on March 12, 2020

The market hates uncertainty the most and nothing about this pandemic or the economic impacts are certain. There is definitely no good news anytime soon for there to be a serious uptrend.

comicjk · on March 12, 2020

That doesn't follow. If people expected the markets to be back up in two years, they wouldn't be this low right now.

Just because the pandemic ends doesn't mean the value of companies returns - some will have shrunk, some will have failed.

zaat · on March 12, 2020

Does it still make sense to talk about what people expects when discussing the markets? isn't the vast majority of the action these days is coming from algorithms that try to predict the very very near future?

AznHisoka · on March 13, 2020

Furthermore, parent is assuming investors are rational. In an environment like the current, when ppl see their portfolio drop 10% in a single day and worry about their lives, they aren’t making rational pros vs cons decisions. They aren’t calculating future value.

They are panicking and just do whatever they can to make their lizard brains feel better.

comicjk · on March 6, 2020

Important to note that the datasets used are sparse, and that the key to this algorithm is better exploitation of sparsity. The GPU over CPU advantage is a lot lower if you need sparse operations, even with conventional algorithms.

"It should be noted that these datasets are very sparse, e.g., Delicious dataset has only 75 non-zeros on an average for input fea- tures, and hence the advantage of GPU over CPU is not always noticeable."

In other words, they got a good speedup on their problem, but it might not apply to your problem.

thesz · on March 6, 2020

WaveNet, if I remember correctly, has 1-from-256 encoding of input features. And 1-from-256 encoding of output features.

It is extremely sparse.

If you look at language modeling, then things there are even sparsier - typical neural language model has 1-from-several-hundredths-of-thousands for full language (for Russian, for example, it is in range of 700K..1.2M words and it is much worse for Finnish and German) and 1-from-couple-of-tens-of-thousands for byte pair encoded language (most languages have encoding that reduced token count to about 16K distinct tokens, see [1] for such an example).

[1] https://bellard.org/nncp/

The image classification task also has sparcity at the output and, if you implement it as RNN, a sparsity at input (1-from-256 encoding of intensities).

Heck, you can engineer you features to be sparse if you want to.

I also think that this paper is an example of "if you do not compute you do not have to pay for it", just like in GNU grep case [2].

[2] https://lists.freebsd.org/pipermail/freebsd-current/2010-Aug...

Given all that I think it is a paper about combination of very clever things which give excellent results in a synergy.

stephenroller · on March 6, 2020

Embeddings tables aren't hard on the GPU (being only a lookup table), and the output softmax still requires you do the full matrix-multiply. The label may be sparse, but the computation is far from sparse.

yvdriess · on March 6, 2020

The reverse is true, embeddings are both the performance and memory-footprint bottleneck of modern NN models.

Check figure 6. of : https://arxiv.org/pdf/1906.00091.pdf

Embeddings are used to lookup sparse features, so you have those pesky data-dependent lookups.

zeroxfe · on March 6, 2020

> The reverse is true, embeddings are both the performance and memory-footprint bottleneck of modern NN models.

They may be a bottleneck, but the alternative is worse -- you can't fit complex models with large vocabularies into GPU memory using sparse one-hot encodings.

yvdriess · on March 6, 2020

Surely you mean dense one-hot?

Technically, the sparse one-hot encoding is the most efficient in terms of memory footprint. You simply store the non-zero coordinates.

The problem in practice for GPUs is that sparse vector/matrix operations are too inefficient.

The whole point of something like this paper is to skip the entire 'densification' step and to directly deal with the sparse matrix input as a sparse matrix. The LSH is used in this paper improves on directly using SpMSpV, as that is also inefficient on CPUs, although to a lesser extent than GPUs.

thesz · on March 6, 2020

No, you can successfully fit complex models if you use byte-pair or similar encodings (morphessor [1] comes to mind).

[1] https://morfessor.readthedocs.io/en/latest/

You also will get much more meaningful embeddings from summing embeddings of part of the word.

Der_Einzige · on March 6, 2020

Only a bad bottleneck because proper database techniques aren't being used widely for embeddings yet within ML pipelines

See libraries like magnitude for proper embedding lookup implementations

zeroxfe · on March 6, 2020

> If you look at language modeling, then things there are even sparsier - typical neural language model has 1-from-several-hundredths-of-thousands for full language

Most real-world models don't use one-hot encodings of words -- they use embeddings instead, which are very dense vector representations of words. Outside of the fact that embeddings don't blow out GPU memory, they're also semantically encoded, so similar words cluster together.

thesz · on March 6, 2020

First, you need to compute these embeddings at least once - sparsity, here you are! Second, these embeddings may be different between tasks and accuracy from their use may differ too.

For example, the embeddings produced from CBOW and skipgram word2vec models are strikingly different in cosine similarity sense - different classes of words are similar in CBOW and skipgram.

yvdriess · on March 6, 2020

So you agree that the problem is fundamentally sparse? Embeddings are used to make sparse (e.g. categorical) data possible on GPUs, and real-world models are limited by how large they can make the embeddings to fit in GPU memory. Embedding lookups is also a compute bottleneck:

An example is facebook DLRM: https://arxiv.org/pdf/1906.00091.pdf

xiphias2 · on March 6, 2020

I believe that it's so critical here that the dataset is sparse, that it should be in the title of the paper.

Like this I view it as clickbait.

wdobbels · on March 6, 2020

It's not even mentioned in the abstract.

spott · on March 6, 2020

Why aren't GPUs better at sparse matrix math? Generally, sparse operations are memory bandwidth limited, but GPUs/TPUs still have much faster memory than CPUs and more memory bandwidth in general (roughly a factor of 4 or so between the latest cpus and gpus).

jcranmer · on March 6, 2020

Sparse matrix math basically boils down to indirect array references: A[B[i]]. GPUs generally trade off memory bandwidth for latency, relying on being able to do a lot of work to hide that memory latency. But because there's no work between the first and second load, you are no longer able to hide the memory latency of the second load with extra work.

CPUs, by contrast, have a thorough caching hierarchy that tends to focus on minimizing memory latency, so it doesn't take as long to do the second load compared to a GPU.

l33tman · on March 6, 2020

Yeah on the GPU you need to get your threads to ideally load consecutive memory locations for each thread to utilize the memory bandwidth properly. Random-indexing blows this out of the water. I guess that you could pre-process on the CPU though to pack the sparse stuff for better GPU efficiency..

vchak1 · on March 6, 2020

You can solve around this by using cuckoo or robin hood hashing. See for example: https://www.researchgate.net/scientific-contributions/148064...

wbl · on March 6, 2020

Sparsity breaks the spatial coherence GPUs like. Scatter gather pays a penalty vs direct.

rajesh-s · on March 6, 2020

Another thing to note is that sparsity is being leveraged even to build a more efficient version of hardware. A good example of this is the Cerebras Waferscale chip that was announced recently. I'm assuming the author was unaware of developments on the hardware side of things.

comicjk · on March 2, 2020

I think you're comparing bicycles to an idealized version of horses. Real horses get tired quickly, get hurt easily, and are quite expensive. Assuming decent roads, a bicycle would often be more convenient, and almost always cheaper.

From Wikipedia: "A stagecoach traveled at an average speed of about 5 miles per hour".

https://en.wikipedia.org/wiki/Stagecoach

notahacker · on March 2, 2020

> Assuming decent roads, a bicycle would often be more convenient, and almost always cheaper.

I think you're assuming an idealised version of roads. In all seriousness, good roads were usually cobblestones and bad roads rutted tracks until bicycle pioneers started lobbying for roads to be paved smoothly enough to make their then-expensive hobby more practical for transportation than horses and horse drawn vehicles. Asphalt and Portland cement roads were new technologies in the nineteenth century too.

comicjk · on Feb 23, 2020

"Big" is relative. If a company starts falling apart as soon as the founder can't personally manage everyone, it has a disease of bigness, even though it's still small on some absolute scale.

comicjk · on Feb 6, 2020

No, because computer chips don't operate at 100°C (let alone 300°C like a nuclear reactor). Low-temperature heat flows contain little usable work. Typically the best option is to dump the spare heat into the nearest body of water, or into the air.

comicjk · on Jan 21, 2020

This is a problem of science journalism. We don't want all scientists' predictions to agree - there should be a range that reflects the real uncertainty. But combine this with the demand for big headlines, and you have a broken way of conveying predictions to the public.

macinjosh · on Jan 21, 2020

I think so too. But why don't we ever hear the scientific community come out and stand up for themselves? I imagine that would be considered news in and of itself.

I reckon its because there is a social and possible economic price to pay to say anything that detracts from a sense of emergency around the climate. It is a toxic culture now similar to that of the 'woke' wing of the left or the pro-gun wing of the right.

If anyone wants the general public to take this issue as seriously as it should be FUD is not the way to go. I think it hurts more than it helps.

comicjk · on Jan 21, 2020

Well, I'm a scientist and I'm talking about it...

If scientists knew how to fix science journalism, we would. It's not just climate. Cancer research and diet research have the same problem, and if battery capacity had doubled for every headline saying it had, we would be charging our phones once a year. But the science journalists argue that they're not lying, just picking the most interesting stories, and that otherwise no one would read them at all. If scientists came out with a statement that science journalism is a fraud, it might be sensational enough to make the news, but it wouldn't be true. If they said science journalism needs reform - well, I'm sure you can find plenty of blog posts arguing this, but not a lot of media coverage.

macinjosh · on Jan 21, 2020

> If scientists came out with a statement that science journalism is a fraud, it might be sensational enough to make the news, but it wouldn't be true.

You're arguing the extremes here. Scientific organizations could easily publicize their displeasure with how their work is communicated to the public without calling the media fraudsters. Give me a break.

comicjk · on Jan 21, 2020

If you see smart people not doing something they could easily do, maybe it's not so easy? Example: the American Council on Science and Health put together a ranking of science journalism sources, putting Nature at the top, and Nature responded telling them they weren't helping.

https://www.nature.com/news/science-journalism-can-be-eviden...

I admit I'm speculating a bit when I try to explain why reforming science journalism is difficult, but I'm pretty sure empirically that it is difficult.

macinjosh · on Jan 22, 2020

Not my problem.

chrisco255 · on Jan 21, 2020

The climate is a non-linear dynamic system. Even if you had the perfect computer model to represent the earth's climate system, if your input parameters were off by even the slightest bit you'd be way off base in 5, 10, 50 years. That's why weather predictions are useless more than 10 days out and why we can't predict hurricane movement more than a couple of days out and our uncertainty increases as a function of time. This stuff is way more nuanced and complex than anyone lets on. You can't even begin to explain it in a one part article...and meanwhile, fearful headlines remain top click targets.

comicjk · on Jan 20, 2020

Temperatures of 37 C and 36.6 C are barely distinguishable with an ordinary thermometer, and both are within the normal range of human body temperatures (37.32-38.76 C).

www.ncbi.nlm.nih.gov/pmc/articles/PMC6456186