Looking for ideas on what direction to take. I am working on a PR to fix up the @sync
/@async
disappearing exception problem. Right now, if an async task throws an exception while another async task is waiting for it and that second task occurs earlier in the task manager queue, the system will silently hang. This is because the parent sync task is never rescheduled after the thrown exception and is looking in the wrong place for the error, anyway.
I have a structure that solves this: each Task’s thunk is wrapped in a way that guarantees the generating Task is scheduled on completion, with or without thrown exceptions. Then the sync task can look for exceptions on all running tasks in parallel. Passes all relevant Channel, Task, Thread and Distributed tests.
So far, so good. But … running through the test suite corners, we run across a serialization issue. The Task serializer tries to serialize the thunk. In doing so, it captures the above closure which includes a reference to the generating task. So it tries to descend and serialize that task as well, but that task could still be running (as in the test case), in which case an error is raised for trying to serialize a running task.
What does one use the serialization of Tasks for? This serialization structure here disallows any Task closures that contain other (possibly running) Tasks. So normally, I’d consider modifying the serializer, but I am hoping to keep this fix to be transparent to the user, and that sounds to me like it is crossing a line. Any suggestions on which way to go here are appreciated.