Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A quick introduction to data parallelism in Julia (juliafolds.github.io)
160 points by amkkma on Oct 6, 2020 | hide | past | favorite | 15 comments


Hi, the author here.

One of the hidden messages of the introduction is: watch Guy Steele's talks [1][2][3] if you are interested in data parallelism! These talks are not Julia-specific and the idea is applicable and very useful in any languages. My libraries are heavily inspired by these talks.

Of course, if you haven't used Julia yet, it'd be great if (data) parallelism gives you an excuse to try it out! It has a fantastic foundation for composable multi-threaded parallelism [4].

[1] How to Think about Parallel Programming: Not! https://www.infoq.com/presentations/Thinking-Parallel-Progra...

[2] Four Solutions to a Trivial Problem https://www.youtube.com/watch?v=ftcIcn8AmSY

[3] Organizing Functional Code for Parallel Execution; or, foldl and foldr Considered Slightly Harmful https://vimeo.com/6624203

[4] Announcing composable multi-threaded parallelism in Julia https://julialang.org/blog/2019/07/multithreading/


I love Julia, and in many areas both the language and standard libraries are quite polished.

However, simple parallelism based on threads (e.g. a parallel map) surprisingly requires a third party library. ThreadsX, the one you posted about, is also my favorite option.

Besides, the standard pipe operator (|>) doesn't support partial application. This has lead to several third-party reimplementations via macros, transducers, etc. Since this is a basic feature, fragmentation is a bit worrying.


Yeah, I get the point that it's nice to have basic things in the core language and stdlib. I'm rather on the "minimal language core" side in this discussion and I think it'd be better to not start adding more things in Julia Base and stdlib. At least not right now, to avoid accumulating "dead batteries." There are some non-trivial things to figure out for a composable data-parallel interface. For example, what the data collection interface should be to support folds over them? I'm using a minimalistic interface defined in SplittablesBase.jl for this but it's not wildly used outside my packages. So I'm not super confident that it's enough yet.

[1] https://github.com/JuliaFolds/SplittablesBase.jl


Just to echo Tkf here, I think it'd be great to have something like ThreadsX.map in base someday, but when the multi-threading features in Julia first came out, it was unclear what the best interface to provide would be.

Since Julia is committed to semantic versioning, this is a kinda scary prospect because it means that any high level, exported interface we decide on, we'll be stuck with until at least 2.0 and probably forever.

So with things like this, it's important to be conservative about the interfaces we expose and instead, we've have people explore different approaches out in the package ecosystem. One day, if someone can make a strong enough argument, I'm sure we'll see one of these package solutions end up in Base.


Yes, I agree getting design choices right is important in order not to end up with a lot of legacy code in future versions of Julia.

I think the pipe operator is probably the area that needs most urgent attention. Lots of libraries have their own slightly different macros for improved pipes with partial application, and that introduces a bit of fragmentation and hurts composability.

There's an open issue for this with some background: https://github.com/JuliaLang/julia/issues/5571


Given your experience, what primitives would you advocate for JS for data parallelism?


For composable nested parallelism, I think spawn and sync like Julia has would be great. If there is a variant of spawn that asserts there is no concurrency is required for the child task, I'd imagine the runtime can do more optimizations (Julia doesn't have it but there was an experimental PR [1]). But concurrency sounds like a very basic thing in JavaScript so I'm not sure if it fits well in the language. But I don't have experience in serious JavaScript programming so I don't know what fits best there. I'd imagine you can build some data parallel libraries already with web workers API.

[1] https://github.com/JuliaLang/julia/pull/31086


Takafumi has really built up some amazing infrastructure in the package ecosystem. His Transducers.jl [1] is really interesting and powerful and lately he's done a lot of work with things like FLoops.jl [2] and ThreadsX.jl [3] to try and bring the benefits of transducers to more 'regular' familiar representations so more people can enjoy the benefits. The basic idea behind all of it is that he has an efficient and modular way of describing various 'looping' constructs that can be stuck together, optimized and parallelized automatically.

It'd be quite interesting to see this stuff extended to GPUs.

[1] https://github.com/JuliaFolds/Transducers.jl

[2] https://github.com/JuliaFolds/FLoops.jl

[3] https://github.com/tkf/ThreadsX.jl


I really like Baselet.jl[1] which provides Tuple-specialised implementations for a bunch of Base APIs that the compiler is very good at unrolling :)

[1]: https://github.com/tkf/Baselet.jl


Tkf has so many cool packages! Less to do with transducers or parallelism, but in terms of just cool and useful stuff, Maybe [1] and BangBang [2] definitely come to mind as well.

[1] https://github.com/tkf/Maybe.jl

[2] https://github.com/JuliaFolds/BangBang.jl


One that I really thought was cool recently was UnderscoreOh.jl [1]. He has a real knack for finding crazy fun things you can do with julia syntax.

[1] https://github.com/tkf/UnderscoreOh.jl


Another one that's forward looking: https://github.com/tkf/Mutabilities.jl


There's a current MIT course call Introduction to Computational Thinking that's using Julia. I've watch a handful of videos so far. Good introduction to Julia.

https://computationalthinking.mit.edu/Fall20/


One of the lecturers working on this is Grant Sanderson, of 3Blue1Brown fame. Great video series so far!


Data parallelism is one of Python's most embarrassing shortcomings which is unlikely to be solved anytime soon.

This could be a big reason to switch!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: