By reference vs. By value

Ajay_MV · ‎05-18-2016

I used Sybio GOOP for few projects and quite aware of few things in it. Out of those experience, I'm going to give some presentation (@CLA Summit India) on my experience with GOOP and how it actually leverages the LVOOP. Though I have used it widely, I'm not basically from computer science stream and had no good idea about OOP itself. Now I have fair idea about OOP by going through some books. I am looking back at GOOP now and have lot of questions on my way.

Why one should use by-reference over by-value. What are the advantages of by-reference. What can't be done with by-value which is filled by by-reference model.
- I got to know that it gives control over construction or destruction of the objects. I uderstand from googling that widely other languages (like C# C++ etc) use by-references model, I assumed that it's common design pattern or programming paradigm which would be advantageous. However, I have no clear answer for this.

Thanks,

Ajay.

--
Ajay MV

AristosQueue (NI) · ‎05-20-2016

Have you read this?

http://www.ni.com/white-paper/3574/en/

(The Great By-Value vs. By-Reference Debate)

> Why one should use by-reference over by-value.

Parallel code in general and dataflow code in particular work best with by value types. Reference types are referred to in functional langauge academic literature as "unhygenic" because it is so risky. There are very successful languages that leave references out entirely -- Haskell is the most notable example in this category, given its wide success in industry.

References have value when you need to refer to a system resource. They can be used to save on memory in some cases (though Haskell programmers will tell you that's the domain of the compiler). They can be used to express communication paths between subsystems.

> What are the advantages of by-reference.

In single-threaded applications, there are lots of advantages to by reference programming. Most of those advantages evaporate in a parallel environment (LabVIEW), but there are a few remaining in specialized cases.

If the reference is shared is read-only to all processes except one, then they allow you to single-source a piece of data (i.e. not make a copy of it). That is sometimes an advantage and sometimes a disadvantage -- it comes with the disadvantage of thread friction. When that shared reference object changes, everyone sharing the reference can see the change instantly. Again, this is both a good and bad thing. It can turn your program into a polling system (check when that shared thing changes) unless you also have a shared reference for the change event.

If lots of people can write to the reference, you tend to develop deadlocks and/or read-modify-write race conditions. These are the twin bugs that destabilize the program and ultimately are the key reasons that references are shunned in parallel programming.

> What can't be done with by-value which is filled by by-reference model.

In theory, nothing. In practice, some.

With a sufficient skill in expressing side-effects of computation (monads), all computation can be done with a by value model. In practice, it's really hard to think in that mode all the time, and some things, especially system resources (think "DAQ channel" or "file handle") really feel better with a by reference model. Circular graphs and database structures are especially tricky to express in by value forms... these often end up being programmed as by reference models. And module communication, especially dynamic communication, is generally expressed in references because languages rarely have a static syntax for expressing such things.

> I got to know that it gives control over construction or destruction of the objects.

Construction and destruction support and enforcement are provided through the Data Value Reference functions. The New Data Value Reference and the Delete Data Value Reference have custom behavior when their contained data is directly a LabVIEW class (not an array of class or a cluster containing a class). The restrictions allow you to enforce construction and destruction rules for a class, and the DVR itself supports upcasting and downcasting. You are responsible for defining the atomicity of the operations on the reference... the diffiiculty of defining that atomicity is one of the big reasons that race conditions develop in by-reference heavy code.

AristosQueue (NI) · ‎05-20-2016

I'll go flag a couple of by-ref advocates so you get a more balanced reply to this thread. 🙂

Ajay_MV · ‎05-21-2016

AristosQueue,

Thanks for the link and detailed answer.

That gives me some light on my question.

So, in general is that GOOP born to support the handling of dedicated resources (like file-systems/DAQ/circular-graph etc...) in a better way with by-reference model instead of by-value (where by-value is always cumbersome to handle such dedicated resources)? Or is it because that there are lot of C# & Java developers who are used to by-reference model whom may find this by-value difficult to follow?

--
Ajay MV

MikaelH · ‎05-22-2016

When to use by reference classes

As soon as you need to access the same object in parallel that’s when you need to go by reference.

One problem is that in some instances you don’t know from start if you need to do things in parallel, you might think that a hardware resource should only be accesses from one VI so therefore you could go with a by value object, but maybe the requirement changes later. E.g. you add an emergency stop process that needs to access some hardware object to shut them down.

By value might be safer to use, since you can’t introduce race conditions and deadlocks, but if you know what you are doing a by reference class is as safe as a by value class.

For me I always go with by-value when:

- I want to send messages to/from tasks (instead of just sending a cluster).

- I have a cluster that I want to have better control over. (A glorified cluster)

And the rest of my classes are by reference

-All hardware drivers

-Singleton classes

Cheers,

Mike

Ajay_MV · ‎05-22-2016

Mike,

Thanks for your reply and that's pretty balanced answer..! I was waiting for your turn for long time

I understand now that using by-reference for hardware is better. So, is it like, it's always better to use by-reference for dedicated-resources like hardware, file systems, circuilar buffer etc..? For such dedicated resources, mostly we opt only for singleton classes, isn't it?

Best,
Ajay.

--
Ajay MV

AristosQueue (NI) · ‎05-23-2016

> but if you know what you are doing a by reference class is as safe as a by value class.

I can give you many research papers that study code correctness from experienced programmers to demonstrate that this just ain't so. Heck, the entire Ptolomy project at UC Berkley is dedicated to changing the *entire execution system* to eliminate the threading model entirely because their research found parallel access to objects was so bad. Here's the 700+ page book.

No one ever needs parallel access to an object... they need parallel access to the data within that object -- which can be accomplished in a whole lot of ways without references. A programmer only "needs" a reference when he/she cannot think of any other way to accomplish the task.

Here's two annecdotes:

a) Steve Rogers has been LabVIEW's chief architect for I don't know how long... he was one of those who worked on 1.0. He has more experiences with the subtleties of parallel access than anyone I know... he still didn't spot the race condition in the Data Value References after multiple code inspections and test cases until Jack found a reproducing case recently... it was just fixed in the LV 2015 f2 patch.

b) In 2013, I asked 9 LV architect-level users, five external and four internal to NI, to take the shipping producer/consumer template where when you stop the producer it instantly stops the consumer and adjust it so that when you stop the producer, the consumer keeps going until the queue is empty. I got 9 results back and, of those, 7 had a subtle race condition. And that was just with a queue refnum.

You can find many things in programming that are "considered harmful" by various and sundry folks, but the criticism of parallel accessed resources dates from the dawn of computer science. I stand by this judgement:

References are categorically bad in parallel code. References need to be removed from every level of your software stack *aggressively*. The only references I believe you should have to have in LabVIEW are

a) to system resources (files, DAQ channels, control refnums, etc), but only when properly managed in code that is essentially serial.

b) the references for the communications paths between your modules that use asynch communications, and those should be used in a manner well defined by your programming language or with very strict design rules defined by your tech leads.

c) those that are forced upon you by some third-party API whose construction you do not control

d) a scattering of limited use cases, evaluated on a case-by-case basis, with the bias toward "no", mostly driven by business pressures to cut corners and hack something together to get it out the door.

AristosQueue (NI) · ‎05-23-2016

Ajayvignesh_MV wrote:
I understand now that using by-reference for hardware is better.

Yes, but only if you limit access to a single module and do not share that reference with parallel modules. I suggest that you treat the mutual exclusion protection offered by references as a last-resort shield in case you make a mistake somewhere, not a feature that you depend upon for correct running of your code. One module manages the hardware... other modules send it messages to do things and get replies in an asynch fashion.

MinhPham · ‎05-23-2016

Great discusion on ByVal and ByRef

Either way has its pros and cons, personally I dont think ones can be better than other.

It is best practice to use ByVal for minimum risk in race condition but so far the implimentation of it for large scale projects in LabVIEW ain't simple for me structure wise and memory usage wise. I found ByRef is more efficient, easy to use and implement with the GDS tool. It is very straight forward in dealing with classes and their References (base clases and child classes through dynamic dispatch), in term of modifying class data, the implementation of DVR is very smart.

If NI can provide a similar tool/built-ins function for ByVal to do parallel tasks then please let me know (I didnt look into it far enough I guess) but not the actor frameworks as I found it very messy to use in projects (might be it is just me?)

So far ByRef has my vote for all of my applications

James_McN · ‎05-24-2016

Interesting to see the variations in answers.

My 2 cents:

I basically always start from a position of a by value class for all the benefits AQ mentions - it simply works better and simpler with data flow and is the reason parallel programming works well in LabVIEW.

I will then refactor to a DVR based class if the design calls for parallel access to a common resource in some way. I tend to rely on the DVR access rather than the parallel module model as I find it easier to follow (especially if request/reply is required).

The case I am torn over is if the important elements contained are reference based anyway (say queue references). Then I tend to decide based on how the calling code expects it to work. Is the fact this code actually enqueues messages elsewhere important to it? If yes I will make it by reference but if I am trying to encapsulate the more complex parallel behaviour I would leave it as by value.

James Mc
========
CLA and cRIO Fanatic
My writings on LabVIEW Development are at devs.wiresmithtech.com

GDS(Goop Development Suite)

By reference vs. By value

By reference vs. By value

Re: By reference vs. By value

Re: By reference vs. By value

Re: By reference vs. By value

Re: By reference vs. By value

Re: By reference vs. By value

Re: By reference vs. By value

Re: By reference vs. By value

Re: By reference vs. By value

Re: By reference vs. By value