Actors appear to shut down, but still lock their libraries. Code included.

Ben_Phillips · ‎09-20-2014

WOW, thank you Mike_Le for taking the time to post this.

No joke, I've been trying to figure this one out for over a year, always thought it was something I'd done wrong. I went way out of my way to get things so that I could use monitored actor inheritance for all actors, having been repeatedly told that I must have been leaving an actor running. When that seemed to not be the case after more or less proving that with the monitored actor, I'd given up.

I've closed my project and reopened because of this probably 2000+ times.

That does leave a fairly awkward fix in the meantime.

AF folks, do you think perhaps we could get a VIP of the LV 2013 AF with this fix in place? At least that way I can stick the VIP into SVN.

Mike_Le · ‎09-20-2014

Glad it helped you out!

Yeah, I think this or some flavor of this has been the hidden problem behind a couple other bizarre scenarios I've encountered.

The earliest is way back in April last year, and I ended up patching in some workaround that shouldn't have had any impact given my understanding of Actors. Now I'm thinking it just added a little bit of a delay to beat whatever race condition is happening.

http://linkd.in/mikele

drjdpowell · ‎09-20-2014

Mike_Le wrote:
In the original Launch Actor, there's a step where a reference is deliberately leaked for efficiency. I now explicitly close this reference. I also removed the 2-iteration FOR loop that tries to launch the Actor a second time if the first instance fails.

That's not just for efficiency; see this conversation. It's a clone pool and it also avoids the problem of root-loop blocking. If you are going to change it, make sure you also stop using clone pools (option 0x40) as it's a pointlessly high overhead to create multiple clones and only use one of them.

Question: what happens to your initial VI that launches the first Actor? Do you let it stop or do you have it wait for the Actors to finish? I ask because that VI owns the initial Actor.vi reference, and it will invalidate that ref when it goes idle. There is a race condition of when this happens relative to the other Actors that the initial Actor launches. Some of the described symptoms are suggestive of a race condition being involved.

Mike_Le · ‎09-20-2014

I ask because that VI owns the initial Actor.vi reference, and it will invalidate that ref when it goes idle. There is a race condition of when this happens relative to the other Actors that the initial Actor launches. Some of the described symptoms are suggestive of a race condition being involved.

Yes, my problems DO feel very much like a race condition!

The VI that launches the Controller goes idle as soon as the Controller launches. I'll try changing that and see if it fixes the problem.

However, the places where inserting a delay helped wasn't in the Controller launch... but when the Controller launches Nested Actors, after it's started up. Any idea why that would be the case?

http://linkd.in/mikele

drjdpowell · ‎09-21-2014

Mike_Le wrote:
However, the places where inserting a delay helped wasn't in the Controller launch... but when the Controller launches Nested Actors, after it's started up. Any idea why that would be the case?

When your launcher goes idle it invalidates the clone-pool reference. That happens after the Controller has launched, but before some of the nested actors have. The stale reference should just throw and error, and a new clone pool should then be recreated, but I suspect there is a LabVIEW bug that is triggered by this. Keeping the Launcher alive means you will keep the original clone-pool reference alive for the life of the application.

Oli_Wachno · ‎09-22-2014

Hi Thierry,

sorry for answering so late...

Due to some trouble with my LabVIEW insatllation I have reinstalled SP1 without patches.

I'm gonna recheck as asap.

Thanks a lot!

Oli

AristosQueue (NI) · ‎09-22-2014

I can now say conclusively that there is a bug in Async Call By Reference. I hunted the thing to its lair and killed it on Friday night. It's corpse will be shipped to you in LV 2014 SP1. It is being considered for a patch sooner than that (but that's a pretty slim chance). I've also submitted it for consideration to patch LV 2012 and 2013 (an even slimmer chance).

Now ... let's talk about a workaround for right now today.

I know that the bug is related to our count getting off in how many open calls there are to a given ACBR VI. Once we identified that, it was easy to identify where the gap was. What is much harder to tell is what stresses that gap such that it bites some applications but not others. There is other code that causes the count to get back on track, but sometimes the system hangs before it gets corrected. Can we get you all onto a stable basis in the short run so that you're not bitten by this gap?

I *think* the following is the key: You have a top-level VI that launches your top actor. That launcher VI quits, leaving the top-actor running, and then the top actor goes off and does its thing, including spawning additional actors.

I believe that the entire problem goes away if you can somehow leave your top-level VI running. As long as the launcher VI stays running, the VI refnum allocated inside Launch Actor stays valid and we do not have to open a second reference to Actor.vi. Avoiding that second reference seems to be critical.

Alternatively, if there is enough of a time gap between the launcher VI quitting and the first call to Launch Nested Actor.vi, that seems to help. I cannot guarantee that, but it seems to be the case looking at the C++ code. I have not actually tried in G to empirically test this theory.

If neither of those works, then you can go back to the 2012 version of Launch Actor.vi and see how the block diagram worked back then. The "close the reference on every call to Launch Actor" is less efficient and subject to root loop pause, as noted earlier in this discussion, but it completely dodges this bug (because it basically forces there to be no overlap of the refnums).

I have spent as much time on this bug as I can afford to do so at the moment. I have a true fix for the next update of LabVIEW and I have three workarounds for current users, which means I really need to move on to the next priority. If none of those works, post here. I cannot guarantee that I'll have more time to work on it, but I might.

justACS · ‎09-22-2014

This information is being summarized in a Knowledge Base entry, which will include alternate versions of the Launch Actor VIs for 2013 and 2014. Basically, these alternate VIs launch actors in the same way we did in 2012.

I will post a link to that article when it is available.

Jed394 · ‎09-23-2014

So maybe I'm not keeping up here but if you are saying that the bug is inherent to the ABCR function, then under what circumstances should we avoid using it outside of the actor framework? I currently use the ABCR node to launch most of my programs, as well as launching non actor framework async "actors". I'm currently using the same method as Launch Actor b/c of the initial advice outlined in this forum.

Should the ABCR be avoided in all cases unless patched? Cases only using Reentrant Clones (xC0 flag)? Obviously it appears that without leaking the reference you run into root loop issues, which seems to affect the very nature of the reliability of launching async daemons.

Does this bug have any effect on executables or is it only within the development environment?

I'm trying to figured out if i should be concerned about deployed software and the software that I'm currently developing. Would you consider this a major bug in the ABCR node, given that it does affect how it tracks currently open vi's. If so why would this not be patched in older versions of labview.

ThiCop · ‎09-23-2014

Hello Mike, AQ and niACS,

Thanks for all of your efforts in troubleshooting this!

PS: Sorry for replying so late via this thread, but I was awaiting confirmation via the ongoing internal escalation about what information could be posted concerning the furture creation of a KB. I see now that niACS has already done that, so that's great!

Kind Regards,
Thierry C - CLA, CTA - Senior R&D Engineer (Former Support Engineer) - National Instruments
If someone helped you, let them know. Mark as solved and/or give a kudo. 😉

Actor Framework Discussions

Actors appear to shut down, but still lock their libraries. Code included.

Re: Actors appear to shut down, but still lock their libraries. Code included.

Re: Actors appear to shut down, but still lock their libraries. Code included.

Re: Actors appear to shut down, but still lock their libraries. Code included.

Re: Actors appear to shut down, but still lock their libraries. Code included.

Re: Actors appear to shut down, but still lock their libraries. Code included.

Re: Actors appear to shut down, but still lock their libraries. Code included.

Re: Actors appear to shut down, but still lock their libraries. Code included.

Re: Actors appear to shut down, but still lock their libraries. Code included.

Re: Actors appear to shut down, but still lock their libraries. Code included.

Re: Actors appear to shut down, but still lock their libraries. Code included.