BreakPoint

cancel
Showing results for 
Search instead for 
Did you mean: 

Mini Coding Challenge - Binary Search (sort of)

I have a large cluster array. Many millions of elements. There is a field in the cluster that is a number. That number could be 0, 1, 2, 3, or 4. This value is guaranteed to be in ascending order throughout the entire cluster array. So here are some possible values for that field for the entire array:

0, 0, 0, ... 0

4, 4, 4, ... 4

1, 1, ... 1, 2, 2, 4, 4, ... 4

0, 1, 2, 3, 4 ... 4

0, 0, 0, ... 1, 1, 1... 2, 2, 2... 3, 3, 3... 4, 4, 4

1, 1, 1, ... 2, 2, 2, ... 3, 3, 3, ... 4, 4, 4

 

I am trying to look for the scenario that is bolded above...in other words, I want to know if the values for that numeric field in that array start at 1, at some point transition to 2, at some point transition to 3, then at some point transition to 4. The number of times each value occurs between those transitions does not matter. Some restrictions I have:

 

  • No brute forcing the array...I need to iterate as few times as possible.
  • No parallel For Loop.

 

It's easy to check if the first cluster element is '1' and the last one is '4', but once I get that far, I'm getting wrapped around the axle with a binary search that goes in two directions looking for two values. I've got something that sorta works but I bet y'all can come up with something better. Best solution wins a high five from me and a shout-out in whatever NIWeek presentation(s) I give in 2019.

0 Kudos
Message 1 of 28
(7,494 Views)

Would this be too slow ?

 

Example.png

0 Kudos
Message 2 of 28
(7,476 Views)

It would help if you could define the problem in LabVIEW terms, e.g. by attaching a simplified VI that contains the correct datatype.

 

If I understand this correctly, the array element of a 1D array is a cluster of many completely uninteresting elements of any datatype, but one cluster element is a numeric (U8? I32?) that is guaranteed to only contain one of five values (0..4) AND is guaranteed to be in nondescending order and that's the only one we are looking at. (You say "guaranteed to be ascending" but your examples claim otherwise)

 

The result for any given input array is a boolean, either PASS (containing only 1, 2, 3, 4) of FAIL (otherwise). You already suggested an algorithm that checks the first and last for the correct values (1, 3, resp) and then only needs to prove that 2 and 3 also exist. I think you can do a single binary search for "2" OR "3" and once you find one (stop now and return FAIL if none are found, of course!), do a second binary search for the other (= not yet found) value on the remaining range above or below it, depending on what you found (i.e. if you found a 2, search above for 3, if you found a 3, search below for 2, re-using and readjusting the bounds already established). If the second value is not found, return FAIL, else it's a PASS as soon as you find one of them.

 

(Just thinking out loud, need to go to sleep now :D)

 

Message 3 of 28
(7,469 Views)

According to Altenbach's post, I didn't fullly understand the task. I thought that it is needed to know the indexes of the transitions. Furthemore the code is really not optimized and does nothing more than this :

 

Example2.png

 

Result is OK if all values (1 to 4) exist in the array. Is this correct ?

If so, because the values are ordered :

 - first element must be 1 and last 4

 - check if 2 and 3 exist by the use of Search 1D Array

Would this be an acceptable solution ? This is so simple that I'm afraid to miss something obvious.

0 Kudos
Message 4 of 28
(7,466 Views)

@altenbach wrote:

It would help if you could define the problem in LabVIEW terms... 


I thought I did. 😉 Big array of clusters, one field is a number, values are in ascending order, tell me if those values transition between 1, 2, 3, and 4.

 


@altenbach wrote:

(You say "guaranteed to be ascending" but your examples claim otherwise) 


Which example is an exception? All of them have values in ascending order. Some only have one value. Some have values ascending but skip something.

 


@altenbach wrote:

 

 

The result for any given input array is a boolean, either PASS (containing only 1, 2, 3, 4) of FAIL (otherwise). You already suggested an algorithm that checks the first and last for the correct values (1, 3, resp) and then only needs to prove that 2 and 3 also exist. I think you can do a single binary search for "2" OR "3" and once you find one (stop now and return FAIL if none are found, of course!), do a second binary search for the other (= not yet found) value on the remaining range above or below it, depending on what you found (i.e. if you found a 2, search above for 3, if you found a 3, search below for 2, re-using and readjusting the bounds already established). If the second value is not found, return FAIL, else it's a PASS as soon as you find one of them. 


Yup, that's what I need to do. And that's pretty much what I have right now. But this approach requires two passes. I was hoping somebody more clever than me could figure out a way to do it in one pass.

0 Kudos
Message 5 of 28
(7,442 Views)

@Darren wrote:


Which example is an exception? All of them have values in ascending order. Some only have one value. Some have values ascending but skip something.


In all your example the value sometimes stays the same between adjacent elements, i.e. is not ascending but still nondescending. 😄 The "ascending" condition is not met. (see also). Your sentence probably would have worked better after including one word as in "... these values are guaranteed to be sorted in ascending order ...".

 


@Darren wrote:

Yup, that's what I need to do. And that's pretty much what I have right now. But this approach requires two passes. I was hoping somebody more clever than me could figure out a way to do it in one pass.

It really is only one pass, because the ranges are not overlapping. The second search is entirely in a region that has not been looked at before. We just need  to slightly switch logic as a function of what has been found so far. The quickly shrinking search boundaries continue to shrink, only the logic is swapped. At most, we need to inspect log2(N) elements, .i.e. about 24 for an array with 16M elements. That's peanuts. 😄

 

0 Kudos
Message 6 of 28
(7,438 Views)

Example3.png

 

Doesn't this meet your requirements ? Is it not efficient enough ?

I didn't make a benchmark to check if it would be faster to start the search of 3 at the index of 2.

 

0 Kudos
Message 7 of 28
(7,426 Views)

On my quite old machine, check if sorted disabled : 5ms for an array with 10M values and 47ms with 100M.

0 Kudos
Message 8 of 28
(7,421 Views)

@JB I like it.  You could use the delete from array to get the last element, at one point that was more efficient.  Also I think an improvement could be to start the 2 Search at index 1, and start the 3 search at the index 2 was found +1, but only if 1 and 2 were found properly.  If they aren't then we don't need to search for 3.

 

Example_VI_BD.png_

0 Kudos
Message 9 of 28
(7,417 Views)

@JB wrote:

 

 

Doesn't this meet your requirements ?


I'm dealing with an array of clusters, not an array of numbers. So the array primitives (including Search) won't help.

0 Kudos
Message 10 of 28
(7,414 Views)