Advanced Join Patterns for the Actor Model Based on CEP Techniques

Multicomp · on Nov 7, 2020

I wish we used the actor model more, it seems like it is a much better alternative to Green threads or semaphores or locks and mutexes for parallelism, and with Moore's law struggling to keep going, we can no longer continue to write our code AS synchronous code first, instead of async or parallel by default in my humble opinion.

My understanding is that go routines in the golang language are pretty popular for parallelism there, and I at some point I'm going to be trying to use the language features of the language I'm currently using (F# mailbox processor) to put my money where my mouth is.

but it continues to baffle me why we don't make fuller use of the multiprocessing capabilities of the processors we have today and the fortunes of RAM we have now, even taking into account that we are using higher level languages and not constantly profiling what we write for performance.

Surely there must be a way to make writing parallel first code as natural to humans as our imperative code is today, right? Whether that be channels or actors or something else?

Bonus: On a purely emotive level, anything with the name complex and technique in its name will have a rough time getting started with mind share in new users. If the goal is adoptionof languages that sport the features from this article, they may want to pick a less scary sounding name simply for pragmatic reasons.

alquemist · on Nov 7, 2020

Nitpick: Actors or goroutines or async are mostly used to deal with concurrency. CUDA & friends are used for parallelism.

The paper: The motivation is a problem suitable to be solved via rule systems. It's a bit unclear why the rule system needs to be grafted on top of an actor model.

Actors: Programming with async messages is hard. Message ordering matters, no clear error reporting, no stack traces, distributed state. This is hardware hard, as in bring in formal verification methods hard. To be avoided in favor of a centralized coordinator solution if at all possible.

I wonder if, alternatively, one could use Rx processing over the stream of events.

Motivating example from the paper:

1. Turn on the lights in a room if someone enters, and the ambient light is less than 40lux.

2. Turn off the lights in a room after two minutes without detecting any movement.

3. Send a notification when a window has been open for over an hour.

4. Send a notification if someone presses the doorbell, but only if no notification was already sent in the past 30 seconds.

5. Detect home arrival or leaving based on a particular sequence of messages, and activate the corresponding scene.

6. Send a notification if the combined electricity consumption of the past three weeks is greater than 200 kWh.

7. Send a notification if the boiler fires three Floor Heating Failures and one InternalFailure within the past hour, but only if no notification was sent in the past hour.

an_opabinia · on Nov 7, 2020

> The motivation is a problem suitable to be solved via rule systems

> I wonder if, alternatively, one could use Rx processing over the stream of events.

Seems like the right solution, and additionally doesn't need to be centralized, strictly speaking - you can ship the code for Rx operations anywhere.

> Message ordering matters, no clear error reporting, no stack traces, distributed state

It's funny. For all the useless Java "factory" patterns, the one thing there ISN'T a factory for - Futures - is the one thing you could use to easily solve your list of issues and really improve the quality of actor / async programming.

This is fundamentally what NewRelic and other instrumenting libraries do though, also there's stuff like JetBrain's `Schedule` and `Execute`. It's a bit of a puzzle to me why this isn't just standardized.

JD557 · on Nov 7, 2020

>It's funny. For all the useless Java "factory" patterns, the one thing there ISN'T a factory for - Futures[...]. >It's a bit of a puzzle to me why this isn't just standardized.

It kind of is. A `FutureFactory` it's pretty much `IO` (as in, Haskell's IO).

It just so happens that a lot of implementations don't want you to call `unsafeRun` (so that `unsafeRun` in only called at the edge of the world) and some implemetations don't like, but nothing stops you from having a `unsafeRunToFuture` that returns a `Future`. This is actually pretty common in Scala.

Some examples:

- Cats-effect: https://typelevel.org/cats-effect/datatypes/io.html#unsafeto...

- ZIO: https://javadoc.io/doc/dev.zio/zio_2.12/latest/zio/Runtime.h...

- Monix: https://monix.io/api/current/monix/eval/Task.html#runToFutur...

I guess that this is not very standardized in OOP languages because they lack some of the ergonomics to use such factories (namely, do-notation).

specialist · on Nov 7, 2020

"better alternative to Green threads"

Do you have opinions about Java's Project Loom?

I'm a simple bear. I have always struggled with the architecture and organization of my async code. Stuff like exception handling, percolating errors up, rational back pressure and retry.

My goal is to rewrite some of my server code using Project Loom, to see how it helps.

"why we don't make fuller use of ... the fortunes of RAM we have now"

I know you mention this as an aside, but I'd appreciate examples.

I've been personally frustrated by teammates using external key/value stores for hot data sets which would trivially fit within RAM. But haven't had ready-made solutions.

One recurring objection, for which I've had no good response, is how to quickly hoist (prime, seed, better term needed here) the RAM of new instances.

Last time this came up, I helped come up with a compromise kludge using Redis. We added a local Redis instance to each web server, had the "master" create regular snapshots, copied that snapshot onto new EC2 instances during boot, used the snapshot to prime its local cache.

It worked pretty good. Though I would have preferred in process, using Redis allowed better inspection and debugging, which was pretty cool. But devops wise, this Rube Goldberg kludge broke most teammate's brains.

Thanks for listening. I keep looking to see if anyone's got a more turnkey solution.

lostcolony · on Nov 7, 2020

"I've been personally frustrated by teammates using external key/value stores for hot data sets which would trivially fit within RAM. But haven't had ready-made solutions."

You don't need a ready made solution unless you are trying to keep the data consistent, too. If you are, better to use something off the shelf. If you aren't...just deserialize it into whatever local data format makes sense, the same as you'd do reading from an external cache. You basically just have some push/poll mechanism to get data from your data store when updated/at intervals, deserialize it, and now it lives in RAM.

The one downside of that is if autoscaling is thrashing instances, it can thrash your data store (though that's true with Redis etc too, depending what is loading data into it; having a caching layer in between DOES decouple that), and also what happens if the data store is unreachable (I've done things like query the rest of the cluster for the data, just via a REST API, because the total amount needed was all of a few megs).

an_opabinia · on Nov 7, 2020

> Do you have opinions about Java's Project Loom?

My experience with puniverse/quasar, Loom's predecessor, has been overall positive. There exists a whole actors framework built on top of it. The real problem is how clumsy it is to work with modern Futures code, you have to wrap everything.

Jugurtha · on Nov 7, 2020

We used the actor model through Thespian Python library[0] for a project. Raspberry PI connecting throught Bluetooth Low Energy (BLE) to a "fitness tracker", and stream data to our backend throught 4G dongles.

It had to be plug and play [non technology savvy users]. The Raspberry PI was unattended and had to re-connect to the internet and re-connect with the device automatically, always. Check data, pull in code updates automatically.

They were distributed geographically, in different time zones, with unstable internet.

Timezones are not easy to work with. I've had nightmares about them.

- [0]: https://github.com/kquick/Thespian

dnautics · on Nov 7, 2020

Having written erlang/elixir for several years now, it's actor-like concurrency is quite frankly stupid simple and a joy to work with.

mrkeen · on Nov 7, 2020

> but it continues to baffle me why we don't make fuller use of the multiprocessing capabilities of the processors we have today and the fortunes of RAM we have now, even taking into account that we are using higher level languages and not constantly profiling what we write for performance.

When I write parallel-first code in a fancy language, I find that I can get better absolute performance by just writing plain old single-threaded C. And when my data gets big, it is relatively easy to turn my C-algorithm-using-arrays into a C-algorithm-using-arrays-backed-by-mmap. Mmap is harder to use in every other language I've tried - or comes with additional worries.

And often a single-threaded algorithm is still the way to go, even in a multi-user system, because you can usually let the 'next layer up' multiplex the requests for you, whether it's your web framework or nginx or whatever.

I think that might be the way to make 'fuller use of the multiprocessing capabilities of the processors we have today and the fortunes of RAM'.

phyrex · on Nov 7, 2020

Fwiw, first-class parallelism was one of the premises of Clojure, but in practice those features aren’t used very heavily. I’m not sure why that is.

Multicomp · on Nov 7, 2020

I've never used clojures syntax, but my suspicion is that until first class parallelism can be used with what feels like just an annotation or another type in imperative code paths, without having to actually change one's way of thinking much from the c abstract machine concept, we won't get to the goal of parallel by default very quickly.

pessimizer · on Nov 7, 2020

Which actor models don't rely on green threads? I thought they all did.

Multicomp · on Nov 7, 2020

I don't dispute that, I'm more mean that the actor model abstracts that away rather than doing manual green threads

pessimizer · on Nov 7, 2020

Not really in Erlang, since you manually build the actors with a recursive tail call. There's nothing called an "actor" in Erlang, but it's definitely an actor-based language.

hazbo · on Nov 7, 2020

Footnote from page 3:

> Elixir can be regarded as a modern Erlang (e.g., with macros) that runs atop BEAM; i.e.,the Erlang virtual machine.

I've heard quite a few descriptions of Elixir, relative to Erlang, but I'm not quite sure about this one. Erlang has macros[0], though the syntax and general format does differ from that of Elixir's[1]. And though Erlang has been around since at least 1986, it has changed a lot in that time and is what I'd consider to be a modern programming language today, used at places such as WhatsApp, Grindr, Pintrest etc...

[0]: https://erlang.org/doc/reference_manual/macros.html#defining...

[1]: https://elixir-lang.org/getting-started/meta/macros.html#our...

pessimizer · on Nov 7, 2020

I think a lot of people use "modern" to mean "doesn't look dated or unfamiliar to me." In programming language commentary, it generally means "looks like C or ALGOL."

bicepjai · on Nov 7, 2020

Have anyone had experience with actor based models in Julia with libraries like signal, observables, rocket ?

vrnvu · on Nov 7, 2020

I find interesting their approach on how to compare the implementation complexity for the developer between the langauges. I quote,

State management, "code that is used to save temporary data required by the ongoing coordination process"

Windowing management, "code needed to discard messages that do not satisfy the pattern’s timing constraints"

Sequencing control, "code to enforce a particular message order"

Pattern definition, "ode used to express the type of messages to be synchronized and their content-based conditions"

In their solution they free the programmer from most of these lines of code.