2d projections of complex multidimensional data are unreliable in the extreme as...

daniel-levin · on Feb 6, 2016

This comment got me thinking: in some applications, Euclidean distance between feature vectors acts as a good proxy for adjacency/similarity. For such applications, an isometry from R^n to R^2 or R^3 should in principle preserve the meaning of adjacency. A quick Google yields [0, 1] a technique for quasi-isometric, and isometric dimensionality reduction. This should mitigate artefacts of adjacency, or non-adjacency, as it were. In other words, you might be able to actually pull off good 2D projections of high dimensional data and still see meaningful relationships.

[0] https://en.wikipedia.org/wiki/Isomap

[1] https://www.aaai.org/Papers/AAAI/2007/AAAI07-083.pdf

ecesena · on Feb 6, 2016

Sammon mapping is another famous example, see [1] for instance for a nice visualization.

[1] http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/AV09...

frozenport · on Feb 6, 2016

>> Provides us with a measure of the quality of any given transformed dataset. However, we still need to determine the optimal such dataset, in terms of minimising E. Strictly speaking, this is an implementation detail and the Sammon mapping itself is simply defined as the optimal transformation;

Somehow its technically challenging to verify the content of this article.

ecesena · on Feb 6, 2016

I was referencing it mostly for the visualization of the "flower" that fails with pca/linear mapping.

The original Sammon's paper is here [1], this said from what I know isomaps are a more widespread tool - but I never found such a good visualization.

[1] http://theoval.cmp.uea.ac.uk/~gcc/matlab/sammon/sammon.pdf

rabidsnail · on Feb 6, 2016

For small distances, yes. If you plot a 2d projection of a dataset that doesn't have much structure you're going to be reading patterns into whitenoise (though this data has some pretty clear clusters, which are probably real). If I were doing something other than writing a fun blog post I would have done cluster analysis with something like DBSCAN.

rryan · on Feb 6, 2016

Also, this is t-SNE: https://en.wikipedia.org/wiki/T-distributed_stochastic_neigh...

The S is for "stochastic" -- i.e. you get a different 2D projection every time you run it on the same inputs. Take it with a grain of salt.

thisisdave · on Feb 6, 2016

>The S is for "stochastic" -- i.e. you get a different 2D projection every time you run it on the same inputs.

That's not the part that's "stochastic"; sensitivity to initial conditions is just nonconvex optimization in action. You get the same thing with most other local embeddings.

The stochastic bit is that the model is based on optimizing "the asymmetric probability, pij , that i would pick j as its neighbor"[0]. Those probabilities and the associated positions in 2D space are not estimated stochastically (e.g. with Monte Carlo sampling) or anything, though.

[0] https://www.cs.nyu.edu/~roweis/papers/sne_final.pdf