bad target restart (firmware unknown)

steph0ff · ‎02-28-2015

Hello everybody,

I'm working with LV2014 and cRIO-9068 on a multitasking application and I've encountered the following issue.

When I shutdown my RT application,it stops without any error, but the target goes disconnected.

If I try to reconnect the operation fail (cRIO is still visible from max). In this condition if I perform a reboot from max, after the reboot target became unreachable (both in MAX and to a ping request), apparently crashed.

If then I restart the target unplugging power supply, if I look in MAX firmware revision became "unknown" (before it was at 2.2.0f0) and the status became "connected in safe mode".

From this condition is not possible to update the firmware, and the only way to reset the target is a fresh image deployment.

Has anybody encountered similar issues on LinuxRT target?

thanks

Steph

Valbo10 · ‎02-28-2015

Hi,

i had similar problem on RIO SOM (same firmware e linuxRT)

how are the status led blinking when the system is on?

look this manual "CompactRIO cRIO-9068 Operating Instructions and Specifications" at "STATUS LED" pag 17.

there is a procedure to recovery the system, if the status led is in Continuously flashing or solid.

But only National know how to do. it is different for each target

Manuel

steph0ff · ‎03-02-2015

Hi,

following your hint, I take a look at the status led.

After the first reboot from max the status led blink 4 time, in the manual it should be a out of memory problem.

Than the system is not responding so I have to disconnect the power and reconnect. After this operation the status led became solid.

I can't understand why cRIO goes out of memory when I shutdown, maybe a problem in closing some reference?

I start few tasks with the asyncronous call vi and when I close those tasks it seems they exit with no error, but maybe something retain in memory.

Is there some specific NI log? I look in /var/log/messages.1.gz but I did not found anything that could help.

Thanks

Steph

ScotSalmon · ‎03-02-2015

4 blinks is not only for out of memory. As the manual says, it means the software has crashed twice without rebooting or power cycling between crashes. The manual suggests that this is usually caused by out-of-memory, but that's not the only cause (to be honest I think the manual's use of the word "usually" there is incorrect, that's one common reason but I think it's overstating it to say "usually"). In this case I think it's probably some other problem causing the crash.

When you say you "shutdown your RT application" what do you do, exactly?

steph0ff · ‎03-02-2015

Ok,

When I perform a shutdown I send a stop notification to all the async called tasks and when each one is succesfully stopped it send back a notification to the main too. So i'm pretty sure that all tasks are succesfully terminated.

Then I can perform a reinitialization of application (in this case I restart all the tasks) or exit from the application. In every case I get the same target disconnection message.

Some tasks perform calls to .so (i.e. opc library).

Steph

tduffy · ‎03-02-2015

Steph,

The abrupt disconnect message is what I saw when doing development on a .so shared library for a project on the cRIO-9068. I believe if there is a segmentation fault (core dump) in the .so, then the rt process (or at least the process talking back to the PC) also experiances a seg fault.

Try removing the .so calls? See if the same operation happens? Is it posible to do more/better error checking on the input to the Library Function Call node to ensure you aren't passing in null and/or the an operation results in null or a divide by zero?

-TD

BradM · ‎03-02-2015

steph0ff wrote:
...
Then I can perform a reinitialization of application (in this case I restart all the tasks) or exit from the application. In every case I get the same target disconnection message.
Some tasks perform calls to .so (i.e. opc library).
Steph

Can you clarify a bit more what you mean when you say that you "perform a reinitialization of application [...] or exit from the application"? What are you doing in you LabVIEW RT application?

BradM · ‎03-02-2015

tduffy wrote:
...
The abrupt disconnect message is what I saw when doing development on a .so shared library for a project on the cRIO-9068. I believe if there is a segmentation fault (core dump) in the .so, then the rt process (or at least the process talking back to the PC) also experiances a seg fault.
...
-TD

tduffy,

For future reference, if you wanted a quick peek into what's going on, there are some logs that LabVIEW has around that includes details of what happens in an application: you can either use the Error Log Retrieval utility (from MAX or the LabVIEW project): http://zone.ni.com/reference/en-XX/help/370622J-01/lvrtdialog/db_rt_error_logging/ or directly read them from the filesystem if you already have a shell open on the target (they are located at /var/local/natinst/log)

If you need a bit more oompf (or you're debugging the .so) you can attach to LabVIEW RT on the target and debug the library calls. gdbserver is provided on the targets, meaning that you can attach the gdbserver to the LabVIEW RT process then connect from your desktop computer to the gdbserver instance running on the target, allowing you to see exactly where the issue is occuring. http://digital.ni.com/public.nsf/allkb/7B050ED0DBE6B00A86257BBB004D9DF6

tduffy · ‎03-02-2015

Brad,

Both those links are really great references, thanks! I wish one of the dozen or so times I was on the phone with NI support they could have mentioned those ..

I'll definitely look into those of those options one the next project.

Thanks again.

steph0ff · ‎03-03-2015

Hi Brad,

thanks for the very useful links!

looking at the error report as you suggest me, I found the following where it seems that there is some vi broken(?) error:

InitExecSystem() call to GetCurrProcessNumProcessors() reports: 2 processors

InitExecSystem() call to GetNumProcessors() reports: 2 processors

InitExecSystem() will use: 2 processors

starting LV_ESys1248001a_Thr0 , capacity: 1 at [3508170233.55855608, (19:43:53.558556000 2015:03:02)]

starting LV_ESys2_Thr0 , capacity: 24 at [3508170236.44077110, (19:43:56.440771000 2015:03:02)]

starting LV_ESys2_Thr1 , capacity: 24 at [3508170236.44077110, (19:43:56.440771000 2015:03:02)]

starting LV_ESys2_Thr2 , capacity: 24 at [3508170236.44077110, (19:43:56.440771000 2015:03:02)]

starting LV_ESys2_Thr3 , capacity: 24 at [3508170236.44077110, (19:43:56.440771000 2015:03:02)]

starting LV_ESys2_Thr4 , capacity: 24 at [3508170236.44077110, (19:43:56.440771000 2015:03:02)]

starting LV_ESys2_Thr5 , capacity: 24 at [3508170236.44077110, (19:43:56.440771000 2015:03:02)]

starting LV_ESys2_Thr6 , capacity: 24 at [3508170236.44077110, (19:43:56.440771000 2015:03:02)]

starting LV_ESys2_Thr7 , capacity: 24 at [3508170236.44077110, (19:43:56.440771000 2015:03:02)]

VI_BROKEN (0): [VI "NI OPC UA Client.lvlib:Read Int64 Array.vi" (0x00726aa8)]

VirtualInstrument::SetOrClearBadVILibrary - now VI is bad on [VI "NI OPC UA Client.lvlib:Read Int64 Array.vi" (0x00726aa8)]

this->flags=33563136, compilerError=6

VI_BROKEN (0): [VI "NI OPC UA Server.lvlib:Write Bool.vi" (0x004fb6c8)]

VirtualInstrument::SetOrClearBadVILibrary - now VI is bad on [VI "NI OPC UA Server.lvlib:Write Bool.vi" (0x004fb6c8)]

this->flags=33563136, compilerError=6

VI_BROKEN (0): [VI "Current Value Table.lvlib:Form Single Group.vi" (0x00a863c8)]

VirtualInstrument::SetOrClearBadVILibrary - now VI is bad on [VI "Current Value Table.lvlib:Form Single Group.vi" (0x00a863c8)]

this->flags=33563136, compilerError=6

etc..

It seems that there is an error in the call of opc functions, maybe I made some mistakes in the code. Now I check and give you a feedback.

But in any case I think that I can't debug the NI-OPCUA .so shared libraries, with gdb, right? (it's not shared libraries developed by me, so I haven't source code, maybe I've to check if library is compiled with -g).

In every case it's strange that target goes in unknown firmware state for a wrong library call, isn't right?

many thanks for the support

Steph

NI Linux Real-Time Discussions

bad target restart (firmware unknown)

bad target restart (firmware unknown)

Re: bad target restart (firmware unknown)

Re: bad target restart (firmware unknown)

Re: bad target restart (firmware unknown)

Re: bad target restart (firmware unknown)

Re: bad target restart (firmware unknown)

Re: bad target restart (firmware unknown)

Re: bad target restart (firmware unknown)

Re: bad target restart (firmware unknown)

Re: bad target restart (firmware unknown)