AF not suitable for working with big data sets

Quiztus2 · ‎04-19-2024

So far I am familiar with the 'new' Channeled Message Handler and already setup some modules using it.

I am still looking for a nice way to share my acquired data between my modules (post-processing, export, acquisition).

So far I used FGV, from which reusability suffers. Despite the modules run async, the actual steps are done sync.

import->configure/manipulate all devices->acqu->post-processing->export

Now I want to learn AF and setup a system where I can compose different devices/configs to a set of studies more dynamically.

I read that every Actor should hold it's own copy of data because of it's async philosophy. Is this a show stopper for AF, when working on big datasets? How should communicate datasets of several GB between different actors without copying it? I thought about GOOP4

BertMcMahan · ‎04-19-2024

AF is fine for this sort of thing. If you truly need single copies of the data to hang around, send DVR's or something. "Every Actor has its own independent copy of the data" isn't a hard requirement, more like a (very) strong suggestion. If you need multiple copies of the data, then you need multiple copies of the data. Think about something like a database connected to an Actor-based program- you don't duplicate the entire thing for each Actor.

Your problem will be in making sure one Actor isn't modifying the data while another one reads it in. One potential solution here is just regular ol' Objects. You can use GOOP to make singleton classes or you can roll your own, and send by-ref Objects to each Actor that needs it.

One thing to consider though- does each Actor need to actually store the data? What exactly are you doing with the data?

For example, I had a project taking data from two multifunction DAQ cards continuously for about a month, monitoring very small but very sudden changes in a voltage measurement. I was reading two cards at 1 MS/s, both doubles, so that's 16 MB/sec of data coming in. I didn't need to log all of this data, but I did need to do some processing on it. I just used regular messages to pass data acquired from my DAQ cards up the stream. Each Actor that received a chunk of data operated on it, then sent out messages with the new information that Actor created. IIRC, that one needed to measure peak to peak voltage, an average value, and some time information. It all worked fine and only used like 500 MB of RAM to do so.

Now if you have several GB that you need to process that's different than a stream of incoming data, but my point is that regular messages do just fine with a bunch of data coming through them, and that it's not against the rules to share references to big hunks of data that can be shared.

By combining a Singleton object with multiple asynchronous Actors, you can let the Singleton handle the access (by making your function calls non-reentrant, for one) and each Actor can therefore get access to it immediately when the resource is available. Thus, each Actor only blocks while waiting for a resource that it can't continue without. I hope that makes sense.

Quiztus2 · ‎04-20-2024

I am not sure if I have to go with the singleton. If I put a GOOP4 into a message it is shared by reference out of the box right? A singleton feels like I limit flexibility and scalability.

IlluminatedG · ‎04-21-2024

Why is each actor keeping a copy of everything?

If you're talking about a lot of processing pipeline type stuff I don't see why each one needs to keep data around. Class private data would have processing config/params/state and act on data that comes in on messages and passes it along to its next destination or stores some result. Bus stops along the route don't necessarily mean copies. Storing extra copies of things sounds like a design issue that could be overcome.

~ The wizard formerly known as DerrickB ~
Gradatim Ferociter

BertMcMahan · ‎04-22-2024

@Quiztus2 wrote:
I am not sure if I have to go with the singleton. If I put a GOOP4 into a message it is shared by reference out of the box right? A singleton feels like I limit flexibility and scalability.

You're correct, I flubbed my terms a bit on Friday. I suppose I was thinking "singleton" in terms of "one place to access the data" but of course you could do that to multiple data sets, which I wasn't thinking about.

I don't use GOOP4 but as long as it handles by-ref stuff for you then yeah, you're fine.

Hypiz · ‎04-23-2024

While talking about big data sets there is no other way than keeping only one copy of it.

There are only few more or less dirty strategies differing by the point of view to the data:

1) PUBLIC DATA

You can share the data to all actors who need them by sending a reference
=> sharing internal data is directly against actors concept (who is responsible for the data consistency?)
BUT for me this is acceptable in case only the owner can write (which is also technically enforceable)

2) MSG DATA

You can send the only copy from actor to actor if the processing is a simple chain (as you mentioned it)

=> this is great if you need to "add calculated columns" because it allows pipelining

BUT you have to be very careful with references created in a different actor because the memory allocation is lost immediately after the actor who created them, stops.

3) INTERNAL DATA

Only the data-owning actor can access and manipulate the data (DB actor concept already mentioned)

=> this could be seen as a clean OOP concept

BUT the more complicated requests you have to implement the more complicated (and slower) is the DB actor OR the more data it copies out OR the more it implements functionality belonging to the requester another disadvantage is that this actor can become a bottleneck

3b) If you pack the data into a special Class you can separate the requestors functionalities implementation by interfaces.

drjdpowell · ‎04-25-2024

Note: if you are working with data in the "several GB" size, you should consider combining LabVIEW with a technology designed for storing and querying large data, such as SQLite, mysql, HDF5, etc.

Actor Framework Discussions

AF not suitable for working with big data sets

AF not suitable for working with big data sets

Re: AF not suitable for working with big data sets

Re: AF not suitable for working with big data sets

Re: AF not suitable for working with big data sets

Re: AF not suitable for working with big data sets

Re: AF not suitable for working with big data sets

Re: AF not suitable for working with big data sets