TDMS: How to handle gaps in signals (ie non-continuous signals)

josborne · ‎02-06-2024

Hello TDMS experts,

I like the TDMS file format. It works well for storing continuous time-series signals. But I am running into a challenge in my application when the signal is not continuous:

Sometimes my signals have gaps. For example, let's say one of the sensors generating the data goes offline. Currently, my code detects this and continues write NaNs to the TDMS for that specific signal. This is convenient when reading/writing to the file because structurally the TDMS ends up with one channel per signal. And each channel is just one long continuous stream of data. Gaps are clearly identifiable and its very easy to read the data out of the TDMS. But this is also a poor way to store data when it comes to disk footprint. I could potentially write tons of NaNs which is wasteful of disk space (especially when I am logging at a kHz+).

Is there a system already in-place in the TDMS format (and the TDM Streaming VIs) for dealing with this? And if not, is there some kind of best practice for dealing with this in my code? I could certainly come up with a means of working around this, such as this:

My potential solution: Whenever a sensor goes offline, stop writing that channel to TDMS. Then when signal resumes, start writing to TDMS again. Then write every "signal restart time" as a property attached to that channel. This is easy to do when writing to the TDMS. But the downside to this is that I will need to add lots of code to how I read the data from my TDMS. When reading the TDMS at a later date, I need to do a lot of processing to deal with the (potentially many) gaps in the signal and rebuild a signal that can be displayed and analyzed.

The TDMS format and associated VIs have a lot of built-in functionality, so I am curious: Does anything already exist for handling this. Any ideas?

http://www.medicollector.com

Henrik_Volkers · ‎02-06-2024

HDF5 comes with effective lossless datacompression

TDMS ??

https://www.vipm.io/package/hooovahh_tremendous_tdms/

mention zip...

Greetings from Germany
Henrik

LV since v3.1

“ground” is a convenient fantasy

'˙˙˙˙uıɐƃɐ lɐıp puɐ °06 ǝuoɥd ɹnoʎ uɹnʇ ǝsɐǝld 'ʎɹɐuıƃɐɯı sı pǝlɐıp ǝʌɐɥ noʎ ɹǝqɯnu ǝɥʇ'

JoshuaP · ‎02-06-2024

Apache Parquet is also an excellent file format for storing non-continuous data. From what I have seen it may also compress better than HDF5 and be faster to read and write. I'm just not sure if anyone has created a LabVIEW API for it yet.

That being said, I think TDMS is still the fastest to write, which may not be an issue unless you are steaming a ton of data to disk.

jyoung8711 · ‎02-07-2024

Generally, TDMS files in and of themselves don't care about time gaps. They're just storage containers for data. I'd probably need a little more info about how you're writing to the TDMS file, and downstream how you're working with the data to give you a the "best" suggestions/answers. But here are a few ideas:

- If you're writing the data as a "waveform data" (likely using the waveform DataType wired to the TDMS Write VI, then the data is expected to have the same time spacing between data points. If it's not, you get issues when you go to reading the data. If you're writing as a waveform datatype, then you have a couple of options:

Do what you're doing now, add NaN values to the waveform data
You could make "cuts" and write data after the time gap to a different "group" in the TDMS file (or even start a new TDMS file). The effectiveness of this strategy would depend on other details about the complexity of the file you're writing (are there already other groups?) and the downstream data analysis process.

- TDMS Write VIs can also accept straight numeric arrays as input (e.g. array of doubles), with no timing "waveform" information. This format is actually a little bit more flexible, and supports "time gaps" a little bit better. The strategy here, is that you can an additional explicit "time" channel... You can add data to these two channels as the data comes in... logging the data points to the "data" (double) channel, and the timing information to the "time" (timestamp) channel.

There's a little more to keep track of in this second case, but it's a bit more flexible solution.

A couple of formats I've seen (or used) to account for this:

- Alternating X/Y channels (each data channel has a corresponding time channel) with a file structure that looks something like this:

GroupA
- Data1_Time
- Data1
- Data2_Time
- Data2

- Single time channel for "groups" of channels:

GroupA
- GroupA_Time
- Data1
- Data2
- Data3
GroupB
- GroupB_Time
- Data4
- Data5
- Data6

If you're using DIAdem or SystemLink to work with the datafiles downstream, there are actually channel/group properties that you can add that signal to these readers that this connection exists, and to display the data with this connection in mind. You can use these properties when you're reading the data as well to detect for this information, but the native LabVIEW TDMS VIs are not "aware" of these, and it's really a more Reading Application Strategy

If you're interested in one or more of these strategies, or have additional questions, I'm happy to provide additional info, fille in gaps, or provide an example code snippet or two.

josborne · ‎02-08-2024

Wow thank you everyone. This was super helpful. I see some suggestions for alternative file formats, and some suggestions on how to use the existing TDMS format to handle my use case.

For now, I think I am going to continue to use TDMS but find a way to handle the gaps internally (which is what jyoung8711 suggested). And down the road, I may switch to HDF5.

Thanks!

http://www.medicollector.com

jyoung8711 · ‎02-08-2024

Glad you found this helpful!

Feel free to reach out either on this thread, or you can message me directly if you have additional questions. Happy to help.

Hooovahh · ‎02-08-2024

@jyoung8711 wrote:

There's a little more to keep track of in this second case, but it's a bit more flexible solution.

These are great solutions and it is what I was going to suggest. I would also do a quick test to see what kind of disk wastefulness you are seeing with writing NaNs. It might be less than you think, especially if you zip up the TDMS when you are done. TDMS file generally compress very well because normal compression works by finding patterns in the data. If you have a ton of NaNs, it probably won't take up much space. It might also make sense depending on your data, to hold a value for some amount of time.

CAN for instance has a bunch of data come in at what should be predefined rates. But it is pretty normal to need a retransmit, and so it might not be as periodic as you expect. And some data channels will get data less frequently than others. The solution here so the data looks normal, is to hold a last known value for some amount of time then go NaN if it isn't updated. I requested this to be a native feature of XNet but until then I posted some example code.

The time channel is a great solution to having data that is separated by some time. If you haven't found Scout yet I highly recommend it for viewing TDMS files. If there is a channel called "Time" the Scout viewer will automatically pull it in as the X axis on a graph.

Unofficial Forum Rules and Guidelines
Get going with G! - LabVIEW Wiki.

16 Part Blog on Automotive CAN bus. - Hooovahh - LabVIEW Overlord

LabVIEW

TDMS: How to handle gaps in signals (ie non-continuous signals)

TDMS: How to handle gaps in signals (ie non-continuous signals)

Re: TDMS: How to handle gaps in signals (ie non-continuous signals)

Re: TDMS: How to handle gaps in signals (ie non-continuous signals)

Re: TDMS: How to handle gaps in signals (ie non-continuous signals)

Re: TDMS: How to handle gaps in signals (ie non-continuous signals)

Re: TDMS: How to handle gaps in signals (ie non-continuous signals)

Re: TDMS: How to handle gaps in signals (ie non-continuous signals)