Memory bounding the message Queues in AF, is it possible?

Mike_King · ‎01-27-2014

I've seen and read as much as possible on the AF and have seen some of the debates as to when, where and who should be responsible for controlling and handling the message queue size and while I understand there is no easy answer to this, it always depends on the impelmented design and how the actors handle the data. I would like to implement a generic solution for memory bounding my messages to ensure memory overloads cannot occur if developers abuse the messaging speeds in my system. With many developers and separate areas, this is possible, although unlikely to occur, and I'd prefer the framework have an inherent boundary on Queue sizes for the messages.

Is there any way to configure or set this with the AF today so that I can say, limit an actor's Queued messaged to have a max size of 1000 or some fixed number?

Looking at the code, it seems the only place the Queues are obtained are in the "Obtain priority queue.vi" in the Priority Queue class.

Any experience from those who've built or used AF in using bounded queues to ensure no memory overloads? I realize it adds a blocking issue then at the enqueue which will have other negative impacts since the AF currently enqueues with infinite waits.

Thanks for any recommendations.

AristosQueue (NI) · ‎01-27-2014

No, there is no such mechanism in the AF today. That was intentional on my part. As you noted, applying such bounding violates the deadlock prevention that the AF is meant to prevent. If I send to you when your queue is full and you send to me at the same time when my queue is full, we'll both deadlock waiting for space and never get around to emptying our own queues. If you try to relax the deadlock situation by adding a timeout to the enqueue, you make it so that every Send call needs to account for this "it's currently full so I enqueue this message back into my self to try sending later, and now I need to push my own execution state onto a stack somehow until this message gets sent". The resulting code is HUGE and non-trivial in the extreme. It's educational to try to write that code -- I sketched it out on paper once and said, "I hope I never have to actually implement this nonsense... please let buffered queues be a viable solution to this AF problem..." Turned out it was. 😉

In my opinion, based on observation of these systems, rather than try to solve this with arbitrary bounds on the queues, it is far better to simply allow the buffers to grow to the natural size they need between any pair of actors and then adjust your code as needed if you see one that blows up. It happens so rarely, even with code developed between developers, it just isn't worth adding the risk factor.

You could potentially add it by editing the Priority Queue class. Be very careful ... the work I did to build the priority queue class was non-trivial, and you'll notice a whole lot of code that is very specifically placed in order to hold mutex locks exactly long enough and no longer so that the priority queue avoids deadlocks. It's essentially impossible (extreme impractical) to test code for deadlocks/race conditions. You have to prove it out on paper with manual analysis. You'll want to have multiple people look over your code for any modifications you make to try to add a bounding counter.

Mike_King · ‎01-27-2014

Thanks for the reply. I understand from this the potential difficulties. We have a mandate to do better memory bounded and memory protection so I'm going to do this at the actor layer then instead of convoluting or infecting the AF directly since this is not an exposed option for implementing. The moment I started digging in under the hood I starting seeing more and more blocking type issues that would have to be solved, so I'm sure you have done more thinking on this than me and are more capable of implementing the best solution. AF as it is will do....

Mike

SteenSchmidt · ‎01-27-2014

AristosQueue wrote:

...you make it so that every Send call needs to account for this "it's currently full so I enqueue this message back into my self to try sending later, and now I need to push my own execution state onto a stack somehow until this message gets sent". The resulting code is HUGE and non-trivial in the extreme. It's educational to try to write that code -- I sketched it out on paper once and said, "I hope I never have to actually implement this nonsense...

I had to do this once, when making a buffered (streaming) network communication toolset. A requirement was that the receiving app had to ack the reception of each package, and I would detect a disconnected network only long after the dequeue of the package on the sending side, thus I had to put any un-ack'ed data back into the front of the send buffer. I don't think it's too complicated to do, it just takes careful write-up of a flowchart before implementation. I would think you encountered this as well when making the network-actors? I haven't looked at that code, so I don't know how solid it is in regards to network disruption...

For a true Real-Time compatible AF it might benefit from using RT FIFOs instead of queues, and those are bounded only. If that is ever desirable to implement I can't judge though.

/Steen

CLA, CTA, CLED & LabVIEW Champion

AristosQueue (NI) · ‎01-27-2014

There is something you could do fairly easily -- you could make Enqueue.vi return the number of items already in the queue. Don't try to add any blocking, just return the integer. Then you could know to add some throttling delays to your sender if that number got too high and then reduce those delays when the number in the queue drained back down.

The tricky part is those throttling delays... you don't want to ever starve your caller of the ability to check for a new incoming message of its own -- if you do that, you just pass the buffer bloat up to a higher level caller. You actually have to pass a message upstream until you find the caller that is actually responsible for there being more work in the system and then tell that actor to do less work or start dropping messages rather than send them. This is essentially the entire TCP-IP quality of service algorithm implemented across your actors. It's a total pain in the neck, and often getting it right means not only knowing how many items are in the queue but also knowing which enqueuer was the one that put those items in the queue, and that's information that you have to hardcode on an actor-by-actor basis. (TCP has it easy ... every real message involved in the quality of service is essentially the same command "move this packet"... you have to worry about different types of messages.)

As soon as you start *programmatically* worrying about buffer sizes, you're now into building all the layers of the network stack. It can be done, but I haven't done anything to start creating that infastructure for any Actor Framework system. As long as you simply leave it as something that you detect during testing/deployment and fix on an ad hoc basis by changing the code of a given actor, your work is substantially less.

AristosQueue (NI) · ‎01-27-2014

SteenSchmidt wrote:
I don't think it's too complicated to do, it just takes careful write-up of a flowchart before implementation. I would think you encountered this as well when making the network-actors?

No, it doesn't come up in network actors for the most part. Network actors are no different from actors on the same machine -- I let them buffer as they need for their application. I let the TCP layer handle the transmission buffering. If network traffic becomes the bottleneck for your messages, then, yes, it might have to be elevated in the API, but I propose no mechanism for doing that at this time. Indeed, I'm not at all convinced that a general purpose solution to this problem even exists. Solving quality of service issues at the app layer appears to be a custom solution for every app. Applying it to an actor model in any sort of generic way is one area that would be -- for me -- an open area of research. It would seem to require intimate knowledge of the whole system overall to know what is leading to the network traffic and what is safe to slow down and what needs to become an error because it cannot slow down.

Actor Framework Discussions

Memory bounding the message Queues in AF, is it possible?

Memory bounding the message Queues in AF, is it possible?

Re: Memory bounding the message Queues in AF, is it possible?

Re: Memory bounding the message Queues in AF, is it possible?

Re: Memory bounding the message Queues in AF, is it possible?

Re: Memory bounding the message Queues in AF, is it possible?

Re: Memory bounding the message Queues in AF, is it possible?