You can do it using a runtime with scheduling and fibers.
The issue is that you want to be able to save the entire thread stack cheaply so that you can switch to another task at an async yield point, and then go back to your previous stack frame. You want to do this without spawning system threads.
Say you have an f -> g -> h call stack that blocks on IO. If all the functions are in fact async state machines (or some other kind of callbacks) your thread stack will look like this:
Executor.loop() -> Executor.run(h) -> h.await()
That is, your functions are really objects that are queued up and taken as needed by the executor. If h has a yield point, you can put it back on the queue, go back to the loop and pick up some other work. Later you can do the above again and await h to again up to the next yield point.
Now consider the case where the middle function g is NOT async. In that case the call from f to g will be a normal function call that gets put on your call stack. In a language where this is allowed, you'll still have to wait or execute h and you'll get a call stack that looks like,
(where our awaitUntilCompletion is minimally just calling h.await in a loop until h is finished). At this point, you are stuck: you can't yield from g() because it is a normal function with a normal stack frame, so your thread has to wait till g() finishes before it can be used for anything else. You are basically blocking a thread on h. At this point you might spawn a new thread to keep your thread pool count up (which is expensive and memory hungry) or just accept the forced synchronous blocking (which reduces throughput).
If this happens over and over again, you either end up running out of memory or end up running code in a synchronous fashion but with worse performance (because of all the state machines). This is why functions are usually coloured, so that you don't get yourself in this accursed state by accident. AFAIK the alternative is to allocate all of your call stacks in the heap so that you can switch them in and out of kernel threads, which is what fibers are. This requires a complicated runtime with a scheduler such as in Loom,
The issue is that you want to be able to save the entire thread stack cheaply so that you can switch to another task at an async yield point, and then go back to your previous stack frame. You want to do this without spawning system threads.
Say you have an f -> g -> h call stack that blocks on IO. If all the functions are in fact async state machines (or some other kind of callbacks) your thread stack will look like this:
Executor.loop() -> Executor.run(h) -> h.await()
That is, your functions are really objects that are queued up and taken as needed by the executor. If h has a yield point, you can put it back on the queue, go back to the loop and pick up some other work. Later you can do the above again and await h to again up to the next yield point.
Now consider the case where the middle function g is NOT async. In that case the call from f to g will be a normal function call that gets put on your call stack. In a language where this is allowed, you'll still have to wait or execute h and you'll get a call stack that looks like,
Executor.loop() -> Executor.run(f) -> f.await() -> g() -> h.awaitUntilCompletion()
(where our awaitUntilCompletion is minimally just calling h.await in a loop until h is finished). At this point, you are stuck: you can't yield from g() because it is a normal function with a normal stack frame, so your thread has to wait till g() finishes before it can be used for anything else. You are basically blocking a thread on h. At this point you might spawn a new thread to keep your thread pool count up (which is expensive and memory hungry) or just accept the forced synchronous blocking (which reduces throughput).
If this happens over and over again, you either end up running out of memory or end up running code in a synchronous fashion but with worse performance (because of all the state machines). This is why functions are usually coloured, so that you don't get yourself in this accursed state by accident. AFAIK the alternative is to allocate all of your call stacks in the heap so that you can switch them in and out of kernel threads, which is what fibers are. This requires a complicated runtime with a scheduler such as in Loom,
https://cr.openjdk.org/~rpressler/loom/Loom-Proposal.html