Simultaneous file read access in parallel for loops

Novgorod · ‎05-15-2024

What's the recommended or most efficient way to read from the same file at different positions in a parallelized for loop? It seems like a simple question but it's not so clear to me how parallel access to a file refnum is handled internally. Here's a basic example:

I want to read a large file (too large to fully keep in memory) in chunks of known size and do some processing on the chunks. Since the process is deterministic, it should benefit from parallelization - both in terms of multi-threaded reading of different file portions (probably only beneficial for small chunks) and for parallel processing after the chunk is read (the actual processing of the chunk data is much more complex in reality than shown in this example). I assume the example above will lead to race conditions beause all parallel threads share the same file reference, therefore the "set file position" function would be called simultaneously on the same file reference leading to unreliable data being read.

An easy fix for the race condition is using semaphores:

However, this forces the file read access to be strictly single-threaded, which is probably fine for most use cases but leaves performance on the table. So what's the best way to make use of the OS's multi-threaded file access optimization? Labview seems to have no problem opening several references to the same file, so a naive approach would be something like this:

This shouldn't cause race conditions because each file refnum created at every iteration is indeed different. However, I'm not sure whether this is the most performant or elegant way because it's creating many more file references than necessary (one for every chunk instead of one for every thread). The most elegant way would be to create a separate file reference for every thread, but I don't see a way to do it because it's not possible to access the thread/instance number for the current iteration (right?)...

altenbach · ‎05-15-2024

Do you have a bit more context about the processing and how it compares to the file IO overhead?

How many cores do you have?

For example, if you have 8 cores, you could read 8 chunks (~8MB) at once, then process the ~1MB segments in an inner parallel FOR loop, for example. One file IO, 8 parallel processing tasks.

Have you done benchmarking? I have the gut feeling that enabling parallelism would actually slow things down.

LabVIEW Champion.

Novgorod · ‎05-15-2024

I haven't done benchmarks yet and the question is meant in general as what's the best approach for random simultaneous file access. In practice, the semaphore method should be fine in most cases except for very small chunks.

And you're right, in this case the data structure of the file is a simple linear list of data, so reading N chunks (instead of one) and then parallel processing them with N threads is a good approach. Though you'll still be switching between reading and processing - the single-threaded sequential reading will be probably limited by the drive speed and the CPU will be idling during the read time.

Probably the most efficient solution would be some kind of producer-consumer queue/buffer structure with one thread doing the linear file reading into a fixed-size FIFO buffer and a second thread starting parallel processing tasks for the buffer elements. But that's a bit overkill to implement - it would be so much simpler if we could access the thread/instance ID on every FOR loop iteration and just assign an individual file reference to each thread...

Andrey_Dmitriev · ‎05-15-2024

@Novgorod wrote:

I haven't done benchmarks yet and the question is meant in general as what's the best approach for random simultaneous file access. In practice, the semaphore method should be fine in most cases except for very small chunks.

From performance point of view usually disk access is bottleneck, so parallel reading will (may be) not improve much, but will involve OS to "load balancing", means when one thread will read the chung, then the other threads will wait anyway (or will be significanly slow). If I'll attack this problem then probably will perform sequential read in one thread, then processing in multiple parallel threads (but it is a little bit more programming work). But theory could different from practice - the only benchmarks will help to get feeling which approach is better.

But if you would like to have parallel read and "assume the example above will lead to race conditions beause all parallel threads share the same file reference", then why do not open multiple references prior processing, something like this:

Screenshot 2024-05-16 06.35.32.png

Then each thread will have own ref, and this will not break parallelizm, isn't?

Novgorod · ‎05-16-2024

Isn't that exactly the same behavior as in my third example? It opens a new file reference for each loop iteration - in my example it's just all in the same loop because there's no reason to split it up. There is no race condition in my second and third example (only in the first), it just feels "wrong" to open a file reference for each iteration (potentially thousands or more) instead of for each parallel instance (not more than the number of CPU threads). I don't know how much overhead it creates, so maybe it's only relevant for very small chunks.

Andrey_Dmitriev · ‎05-16-2024

@Novgorod wrote:

Isn't that exactly the same behavior as in my third example? It opens a new file reference for each loop iteration - in my example it's just all in the same loop because there's no reason to split it up. There is no race condition in my second and third example (only in the first), it just feels "wrong" to open a file reference for each iteration (potentially thousands or more) instead of for each parallel instance (not more than the number of CPU threads). I don't know how much overhead it creates, so maybe it's only relevant for very small chunks.

Yes, I think so, and I don't expect much performance difference between both solutions (for large chunks especially).

PinguX · ‎05-16-2024

I guess we can improve it a little by using only a small amount of references to that file.

Novgorod · ‎05-16-2024

That's what I had in mind - only open as many references as parallel threads. However, in your example you assume that iterations are split sequentially between threads, but that's not how the thread partitioning works (iirc), at least not by default. Maybe you can force a deterministic distribution of iterations over the threads by setting the chunk size to 1 in the "configure parallelism" dialog? But I don't know if that's reliable, this whole topic is very poorly documented. If we had access to the "current thread number" in the FOR loop, it would eliminate the guesswork...

altenbach · ‎05-16-2024

I still don't understand all that song and dance doing any file IO in a parallel loop. Disk access is a sequential process so you might get a lot of contention between parallel instances. In your example every single parallel instance will try to get the its data slice for processing as a first step, all at once, stepping on each others toes!

I am not sure if windows disk caching can work around that.

LabVIEW Champion.

Andrey_Dmitriev · ‎05-16-2024

@altenbach wrote:

I still don't understand all that song and dance doing any file IO in a parallel loop. Disk access is a sequential process so you might get a lot of contention between parallel instances. In your example every single parallel instance will try to get the its data slice for processing as a first step, all at once, stepping on each others toes!

I am not sure if windows disk caching can work around that.

I see here the only advatage is more simple processing and less code. For sequential read and parallel processing probably the producer/consumer will be needed. Not a very big deal, but needs to be programmed. Here everything "in place" bacuse disk act as memory. I don't think that parallel reading will have performance benefits. Moreover, sequential read could be faster, if Windows intelligent enough to have kind of "read-ahead" buffering. But this needs be tested, so kind of feasibility study" is needed.

LabVIEW

Simultaneous file read access in parallel for loops

Simultaneous file read access in parallel for loops

Re: Simultaneous file read access in parallel for loops

Re: Simultaneous file read access in parallel for loops

Re: Simultaneous file read access in parallel for loops

Re: Simultaneous file read access in parallel for loops

Re: Simultaneous file read access in parallel for loops

Re: Simultaneous file read access in parallel for loops

Re: Simultaneous file read access in parallel for loops

Re: Simultaneous file read access in parallel for loops

Re: Simultaneous file read access in parallel for loops