How to avoid data dependencies between Actors

Thoric · ‎12-23-2015

I'm late to the AF party so this might be viewed as a trivial question:

"What's the recommended approach for avoiding deadlocks when requesting data between actors?"

I know AF is about asynchronous processes, but there are times when one actor needs to know something, and needs to know it quickly, in order to continue its duties. That's a synchronous dependency between actors. Sending a message request for some data and sitting waiting for the reply can create a deadlock. In other circumstances I would create a shared resource, perhaps, but I believe that's a frowned upon circumvention of the AF task tree principle.

I like examples, they help me picture a problem, so let's imagine an application that includes a user interface, and the application has configuration information. The application has a nested View actor responsible for presenting information and dealing with user interaction, and a nested ConfigData actor responsible for the configuration information, both launched by a parent actor.

At launch, the nested ConfigData actor will initialise by loading the configuration information from file, and will be solely responsible for maintaining the information. The information includes a historical list of projects opened by the application. The nested View actor needs to provide a customised run-time menu that includes the list of most recently opened projects (much like that in LabVIEW with the File > Recent Projects item). When the user selects the File menu item, it needs to quickly populate the menu with the latest list of Projects, which is information stored in the ConfigData actor.

Should the View actor send a synchronous message to the ConfigData actor asking for the up-to-date list of most recent projects and wait for the response, perhaps using "Send Message and Wait for Response"? This creates a potential dead-lock condition, a code smell, which we really should avoid. But how can we populate the menu items with the latest data if we don't ask for it from the ConfigData actor and wait for the reply? It's good practice to have controlled data managed in one singluar place, like the ConfigData actor, but is that part of the problem? If we implement a shared resource in the ConfigData actor that can be read by the View actor, such as a FGV, does this circumvent the original intent of the AF by creating another communication layer?

I think this example is fairly trivial, but highlights for me something I don't clearly understand yet and that's how to properly implement data communication between Actors in the AF when a reply is required before process can continue. Looking forward to some insightful replies!

Thoric (CLA, CLED, CTD and LabVIEW Champion)

drjdpowell · ‎12-27-2015

The most AF-approved answer would (I think) be to "push" rather than "pull". Have the ConfigData actor keep others updated on whatever the current list of Recent Projects. View actor would already have the latest list stored when the User opens the menu.

An alternate answer I might give is that there no possibility of dead-lock if your actors have a clear hierarchy where the lower levels provide services to the upper levels, without making any demands on those above them. ConfigData is providing a service to View, so it must be at a lower level in the "service" hierarchy. View can be dependant on ConfigData, including requiring ConfigData to reply in a reasonable time, but ConfigData shouldn't be dependant on View, or any other actor above it. In particular, it should never by waiting for any actor above it. There can be no dead-lock if synchronous waiting can only go in one direction.

dsavir · ‎12-30-2015

I don't know if this is good for your example, but I like the subscribe/publish method: View would register with ConfigData for a change in the recent project list, so it would receive the current list at launch and receive updates when changed. This method is excellent for hardware data, I'm not sure it would be good for GUI actions as in your example.

Actually, in your example I would have kept a list of recent projects in the View actor or alternatively in the top actor, or a file (where it is clear that View only reads the file and ConfigData only writes to it).

Good luck!

Danielle

"Wisdom comes from experience. Experience is often a result of lack of wisdom.”
― Terry Pratchett

Thoric · ‎01-03-2016

Thank you for your answers. Perhaps my example was too trivial. Lets scale it up, in several dimensions. Now we have several actors, and each has a need to receive some information from each of the others in order to fulfill it's own duties. This information must be up to date when processed in each actor. This is a highly cohesive scenario.

Options:

"Push" - this approach declares that any update to any data that's required by another actor be immediately shared through a message. Upsides: All actors that need to know are immediately made aware of the new data, no pulling (request messages), no deadlocks, no enforced syncronicity. Downsides: Masses of unncessary messaging announcing new data (nagging?). If a dependent actor needs particular information once in a while, but the actor responsible for the data is updating it much more often then a large proportion of the messages are unnecessary. The idea of sending potentially hundreds of messages per second without justification leaves me uncomfortable as we might be affecting the system performance, especially as the solution grows.

"Enforced Dependency Heirarchy" - by ensuring there is a defined dependency heirarchy to the actor tree we can prevent deadlocks by creating an associated mono-directional syncronicity. If an actor has information that is required by another, a heirarchy must be defined that elevates the requesting actor above the data source actor. There can be no deadlock if the synchronous waiting can only go in one direction. Upsides: No deadlocks, no nagging (see "Push"). Downsides: Truly asynchronous actors oughtn't be restricted by an artificial data dependency. The above example requires data dependencies between all actors, so no heirarchy can be imposed (all actors are peers). Early decisions to create a relative heirarchy structure between actors could limit the versatility and usefulness of a solution, reducing extensibility. I wouldn't want an architecture that promises great flexibility to be straitjacketed like this.

"Publish/Subscribe" - Actors register with each other for updates on specific data. I guess this depends on the implementation, but presumably to abide by the actor framework task tree philosophy one would continue to use the defined messaging structure to pass data between actors. Therefore the propopsal is very similar to the "Push" option above, with the same Upsides and Downsides?

"Shared Resource" - Bypass the task tree messaging principle and create a dedicated shared resource for the data. Any actor can choose to read from the shared resource, but only the actor responsible for the data can write. The shared resource could be a CVT, NSV, Global, FGV, data file etc. Upsides: No messaging (certainly no nagging), processor-efficient (depending on approach). Downsides: Circumvents the actor framework task tree, can create coupling between actors, might be complex when scaled up for large solutions.

I appreciate there's no silver bullet, but when considering frameworks I like to know what the right approach is to particular challenges. I think I already know that the answer is specific to each challenge, and as I explore AF more I'll realise that there is no singular right answer, but in each case some options are more right than others.

Thanks for the discussion chaps 🙂

Thoric (CLA, CLED, CTD and LabVIEW Champion)

drjdpowell · ‎01-03-2016

Thoric wrote:
Perhaps my example was too trivial. Lets scale it up, in several dimensions. Now we have several actors, and each has a need to receive some information from each of the others in order to fulfill it's own duties. This information must be up to date when processed in each actor. This is a highly cohesive scenario.

Can you give such an example? I'm going to be Devil's Advocate and state that for any example you give there is a quite natural hierarchy of who should "know" and depend on who.

Added later: More to the point, I see "actors" as internally cohesive but loosely-coupled to other actors. A "highly cohesive" constellation of interacting actors is the opposite of what we want. How do you test multiple actors that are all highly dependant on each other? How do you reuse anything? Why make them multiple actors (instead of less-restrictive "helper loops" in a single actor) if they aren't meaningfully separable in some way.

Thoric · ‎01-03-2016

drjdpowell wrote:

Can you give such an example? I'm going to be Devil's Advocate and state that for any example you give there is a quite natural hierarchy of who should "know" and depend on who.

Interesting hypothesis. Another example? Oooh, I like examples. Let me see.

OK, take the original example with a View actor and a ConfigData actor in an n actor system. Information that's pertinent to both actors might be, as an example, the View's window bounds.

Circumstance 1: At launch, ConfigData actor loads from file the last known window position and size (bounds). When View fires up it requires this information in order to draw its UI at the last known location and therefore requests it from ConfigData. The response is synchronously tied to the process, ie it cannot show its window until the data is known.

Circumstance 2: At shutdown, ConfigData needs to save to file the current window position and size, which it must request from the View actor. Again, the response is required before ConfigData can save and close its data to file.

Both circumstances require a response to an information request, therefore each is equally dependent on the other for data, preventing you from declaring one as hierarchically superior. This "hierarchical equality" is what I was getting at earlier.

Perhaps one weakness in the example here is that the opposing requests are unlikely to happen together, but it's not impossible for the system to begin a shutdown whilst launching the View window, so a possible deadlock hasn't been avoided.

Thoric (CLA, CLED, CTD and LabVIEW Champion)

drjdpowell · ‎01-03-2016

Thoric wrote:

When View fires up it requires this information in order to draw its UI at the last known location and therefore requests it from ConfigData. The response is synchronously tied to the process, ie it cannot show its window until the data is known.
At shutdown, ConfigData needs to save to file the current window position and size, which it must request from the View actor. Again, the response is required before ConfigData can save and close its data to file.

I would say that you have taken application-level responsibilities and spread them across multiple actors. A higher-level actor should be in sole charge of startup and shutdown. View should stand ready to receive a configuration, and to make its configuration available when asked. ConfigData should save the data its told to save and serve up the data as requested. Beyond that you are reducing the cohesiveness of those actors and adding unnecessary coupling between them. You're also making it harder to modify startup and shutdown, since such application features aren't centralized in one place. And your actor system is made unneccessarily complicated.

Added later: a rewrite of the above quote, introducing a higher-level actor called "App":

When App fires up View, it needs to configure it before showing it, so it makes a synchronous request to ConfigData, and sends the result to View.
At shutdown, App needs to save the configuration, so it synchronously requests this from View and sends it to ConfigData, then shutsdown both actors and exits.

So App can make synchronous requests to the other two actors, but neither View nor ConfigData need ever make any requests on any other actor. App is the natural home of application-level things and can deal with startup and shutdown in a clean and simple synchronous way.

Thoric · ‎01-03-2016

drjdpowell wrote:
I would say that you have taken application-level responsibilities and spread them across multiple actors. A higher-level actor should be in sole charge of startup and shutdown.

Absolutely, and if there's a higher-level actor dealing with requests for data then the deadlock can be avoided. But that adds complexity. There are now three parties involved in this simple transaction. Scale this up to n actors, three or more tiers deeps, and we'll find that App Actor suddenly becomes overloaded with an awareness of everything. One of the main attractions to AF is that the actors are all asynchronous, and with that comes an assumed autonomy. I've never liked the idea that for every individual tiny procedure the root App must be involved in some way - it takes away the componentised nature of actors that I like so much. Take an analogy:

I, Thoric, am working at my desk. Think of me as an actor. There are n colleagues (actors) in my office, and we all have different line managers (parent actors), who are governed by a boss (App actor). If I need, for example, to borrow a pencil sharpener for an hour (lock a resource in hardware), I would simply ask my colleague directly to borrow it. I don't ask my line manager to ask the big boss to ask their line manager to ask my colleague for the pencil sharpener, for them to say yes and give it to their line manager who passes it onto the big boss who decides that's good and forwards it back down to me through my line manager. Although I can see how the logical order gives clarity and control, I'm overwhelmed by the wastage and potential for lag. It requires big boss (App) to know everything about everything that's going on. If he starts to become unresponsive from the sheer volume of requests and replies, the system falls apart.

I'd much prefer to simply lean over and ask my colleague for the pencil sharpener. However, that means I need to know about my colleague's resources (high coupling), and have a direct channel of communication with them (aavoiding the task tree messaging principle). So we can't just lean over and ask, we need to go through the layers of carefully designed abstraction that mean our components (actors) can operate independently. If that means massively increased communication volume then I guess it has to be tolerated.

But I'm still concerned that App actor (big boss) will be massive. It needs to share the typedefs of View and ConfigData, and all the other actors sharing data through messages. It may need to massage (not message, massage) the data, which requires processor time - stack up hundreds of these a second and we could hit a thread bottleneck. A good worker (actor) is able to be resourceful and independent, so I always imagined the ideal solution would that the root actor (App) launches View and ConfigData and leaves them to autonomously get on with their roles. If they need data from one another then they talk to one another without concerning App. Wouldn't an extensible system (think plugin based) need to be able to add functionality without adding to root App? Consider the View actor one of many various user interfaces that could be launched, each with their own specialities. If each required the App actor to be aware of their needs, an extensible plugin based approach would be impossible.

So ideally I wouldn't like the main App to be so heavily involved in the detail. It adds bloat, restricts extensibility and potentially takes bigger steps towards system performance limits.

Thoric (CLA, CLED, CTD and LabVIEW Champion)

drjdpowell · ‎01-04-2016

Thoric wrote:

Scale this up to n actors, three or more tiers deeps, and we'll find that App Actor suddenly becomes overloaded with an awareness of everything.
...
But I'm still concerned that App actor (big boss) will be massive. It needs to share the typedefs of View and ConfigData, and all the other actors sharing data through messages.

Eh what?!? Awareness? Sharing typedefs!?! Why would you be sharing typedefs? You need abstractions. Simplifications.

To take your example. Actors are much simpler than actual humans. As a "worker" actor, who occasionally needs to sharpen a pencil, your world is this:

You are started.

You are sometimes sent a configuration object (in a format you understand) that you use to configure yourself.

You are sent something that allows you to request resources you need (such as a pencil sharpener).

You are given tasks to complete.

You publish, by some mechanism, important information (though who is subscribed to this you know not).

You are sometimes asked for a configuration object describing your current settings.

At some point you will be told to shut down.

There are no "colleagues" in this world, and no "manager" beyond the implied thing that started you and is sending you messages. There is only the pencil sharpener, the other things you need for your task, and the thing that allows you to borrow these things.

You know what a pencil sharpener is, as does any other worker that uses the pencil sharpener, as well as any "Pencil-sharpener Actor" that might actually be running the real-world hardware, but nobody else does. The "Resource Pool" actor which allows you to check-out shared hardware doesn't know what a pencil sharpener is; it just knows it is a shared resource that can only be used by one worker at a time. Your manager certainly doesn't know what a pencil sharpener is. Nor does the ConfigData actor that stores the pencil-sharpener settings (it's just dealing in name-value pairs, or arrays of variants, or JSON or something).

Now, your manager, call it "ProjectManager", is also not a human either, and he's not a manager, he's a worker too. He has a (higher-level) task to complete. He knows you, but only as a tool to help complete his task; he doesn't know any details of how you function. He doesn't explicitly know what a pencil sharpener is (or a pencil, either); he just knows you need "shared tools" to function. And he knows he needs to get and pass you a configuration object. He doesn't know the form of that object, but he knows how to get it.

Note the lack of overlap in these tightly-focused actors. Only you, the worker, needs to know the form of your config information; only the manager needs to know where to store/retrieve the config. Your mental image of actors as human workers is too big and overlapping and lacking in abstractions. Multiple people know specific details and must "collaborate" in complex ways. "Actors" should be abstract and simple, and their relationships should be plug-in like. Minimize "awareness" and detail sharing.

-- James

Thoric · ‎01-04-2016

drjdpowell wrote:

When App fires up View, it needs to configure it before showing it, so it makes a synchronous request to ConfigData, and sends the result to View.

Firstly, you specifically declared App actor as being responsible for configuring View actor, ergo it must understand the configuration data required by View in order to provide it. This is reasonably a typedf cluster owned by View, or a class, or a list of named pairs or whatever. In any case, App needs to know stuff about View in order to provide the correct configuration information.

Secondly, messages that need to go up the tree to come down a different branch to reach their destination have to pass through the root actor, which will therefore 'see' messages that pass between branches (such as View retrieving the list of last opened projects from ConfigData actor - as per my original post). Having to deal with all messages in this way makes the root actor 'aware' of everything that's happening in all other actors. I appreciate the message content isn't parsed/verified, only repackaged and passed on down the link, but it is overhead that bloats root actor (assuming my understanding of message forwarding through several AF actors is properly understood).

I appreciate your points on tightly-focused actors, with distinct purposes that make them abstract and simple, but I'm not sold on the fact that they can 'all work together in harmony' without creating dependencies between them.

Stop me if I've misunderstood this, and yes this is another example (contrived to exacerbate the issue I'm puzzled by but perhaps realistic nonetheless):

For Actor C to learn of the most up to date list of recently opened projects, information known only to D, we need to do the following:

1. C sends a message to B's enqueuer asking for the list of recently opened projects.

2. B interprets the message and creates a new message from itself to A asking for the list of recently opened projects.

3. A interprets B's message and creates a new message from itself to D asking for the list of recently opened projects.

4. D interprets A's message and creates a new message from itself to A with the required information packaged inside.

5. A interprets D's message and creates a new message from itself to B with the same information repackaged inside.

6. B interprets A's message and creates a new message from itself to C with the same information repackaged inside.

7. C receives and interprets B's message and parses the contained data.

That's 6 messages, each unique to the purpose of requesting and sharing this very specific data. Because C cannot know about the existence of D it cannot simply ask for a message to be sent to D, it needs to blindly ask for "recently opened projects" and hope that the recipient knows what to do with that request. B, the recipient, needs to pass that up to A as it also can't be aware of D. Only A knows how to get that information, making A 'aware'. For each message, code has to be created to act upon it, making all actors (in this example) coupled to the existence of Recently Opened Projects data.

Thoric (CLA, CLED, CTD and LabVIEW Champion)

Actor Framework Discussions

How to avoid data dependencies between Actors

How to avoid data dependencies between Actors

Re: How to avoid data dependencies between Actors

Re: How to avoid data dependencies between Actors

Re: How to avoid data dependencies between Actors

Re: How to avoid data dependencies between Actors

Re: How to avoid data dependencies between Actors

Re: How to avoid data dependencies between Actors

Re: How to avoid data dependencies between Actors

Re: How to avoid data dependencies between Actors

Re: How to avoid data dependencies between Actors