performance hit when using Match Regular Expression

jcarmody · ‎09-17-2010

Some of you know that I like using the Match Regular Expression function; I can use it to accomplish a lot in a little piece of my Block Diagram. The obvious downside to this function is that its syntax is less clear to the casual observer so my code is a little less readable. A not-so-well known drawback is that Match Regular Expression is slower than Match Pattern. How much slower? Considerably slower. Here's an example:

Wow. I knew it was slower, but I would never have guessed that it would be this bad

.

This all came about because I suggested to a (CLA) colleague that his code would be cleaner if he'd use the Match Regular Expression instead of a pair of Match Patterns to find a number between brackets. I suggested the third method above; he suggested running a benchmark. My time wasn't entirely wasted, though. He learned how to use submatches in Regular Expressions and I learned to be more humble.

Jim
You're entirely bonkers. But I'll tell you a secret. All the best people are. ~ Alice
For he does not know what will happen; So who can tell him when it will occur? Eccl. 8:7

Darin.K · ‎09-17-2010

True expertise comes not when you know how to do something, but when you also understand when not to do something. I think NI tries to warn us that Match Regular Expressions are a bit slower, but who really reads all the fine print (not me). Two issues bite us here. I think the biggest hit results from the fact that Match RegEx relies on calls to an external library so there is some overhead there. Second, the flexibility of Match RegEx comes with a price. Backtracking can be time consuming.

Now two points about your benchmarking. First, given the compiler optimizations and parallel nature of your snippet I don't really trust the numbers. My quick testing with individual functions, controls inside the loops, etc. seems to indicate that Match RegEx is about 10 times slower than Match pattern. My second point is perspective, we are talking microseconds here. Are we really doing matches millions of times in tight loops all that often?

Let's say you are a RegEx ninja, you can drop and wire that function in half the time as it takes to drop and wire two match patterns. You have saved a few seconds. The Match Pattern code will have to run about a million times to break even.

If you are comfortable with RegExes, I would not worry at all about the performance hit in about 99% of the cases. I'll restate my "What the f*&*" rule. If you do something unnatural to you for the sake of performance, consider that when you see the code in a few years you will stop and say "What the ...?". The extra time it takes to figure it out again later should also factor into your choice. (I know almost everyone says WTF when they see a RegEx so I am not really advocating that effectively for it. )

Now get back up on that high horse...

TCPlomp · ‎09-18-2010

I think your benchmark is off.

You should run the for-loops in a sequence (connect the error wires) because I suspect the second and third for-loop to use the same resources.

Ton

Free Code Capture Tool! Version 2.1.3 with comments, web-upload, back-save and snippets!
Nederlandse

LabVIEW user groep www.lvug.nl
My LabVIEW Ideas

LabVIEW, programming like it should be!

TCPlomp · ‎09-18-2010

My guess was a little bit good, then i thought how to remove the 'constant folding' of the for loop, so I created an string control inside each for loop.

Here are the results,

From left to right:

String inside for-loop, with sequence error
Sequence error
Original

(same up to down order as original)

Some things to think about:

Seperating the for-loops in time gains about 400 ms per method, possibly some thread switching with the RegEx library
The replace string is optimized with constant folding, but it's still way faster than regex
RegEx's aren't constant folded (probably since a DLL is used under the hood)
A single RegEx is faster than two in sequence

Ton

Free Code Capture Tool! Version 2.1.3 with comments, web-upload, back-save and snippets!
Nederlandse

LabVIEW user groep www.lvug.nl
My LabVIEW Ideas

LabVIEW, programming like it should be!

TCPlomp · ‎09-18-2010

Darin.K wrote:
Let's say you are a RegEx ninja, you can drop and wire that function in half the time as it takes to drop and wire two match patterns. You have saved a few seconds. The Match Pattern code will have to run about a million times to break even.

If you are comfortable with RegExes, I would not worry at all about the performance hit in about 99% of the cases. I'll restate my "What the f*&*" rule. If you do something unnatural to you for the sake of performance, consider that when you see the code in a few years you will stop and say "What the ...?". The extra time it takes to figure it out again later should also factor into your choice. (I know almost everyone says WTF when they see a RegEx so I am not really advocating that effectively for it. )

The real benefit of regexes is that you can get them out of your code.

You could add them as a control, or a configuration token, using that you could alter your program and add new features in a glimpse.

For instance, I had an application for railways. Every train/locomotive has a serial number, and we wanted to get the type of the train out of that serial number so that during testing the train serial number could be entered and specific details about that traintype can be shown (weight, maximum speed, number of axles etc).

There is a very good link between the serial number and the train-type for the main trains in the Netherlands. By using a table that consists of a regex-column and a train-type column I can add new train types without changing the code.

Ton

Free Code Capture Tool! Version 2.1.3 with comments, web-upload, back-save and snippets!
Nederlandse

LabVIEW user groep www.lvug.nl
My LabVIEW Ideas

LabVIEW, programming like it should be!

tst · ‎09-18-2010

I don't do any string processing, really, but one other point for consideration is that the RegEx function is actually an XNode (if I remember correctly). This probably means that its code is also generally somewhat less efficient.

___________________
Try to take over the world!

Darin.K · ‎09-18-2010

What performance hit are you thinking of with Xnodes (yes this is one)? I roll my own and assume that once I generate the code it is basically the same as if the underlying code was placed there normally.

Which reminds me of another trick I use. The expanding submatches seem nice, but oftentimes I simply roll them up into an array. By using the Xnode Ability VI which does the actual work (\vi.lib\regexp\Match Regular Expression_Execute.vi) I get the submatches in an array with less fuss.

tst · ‎09-19-2010

@Darin.K wrote:

What performance hit are you thinking of with Xnodes (yes this is one)?

None specifically. I'm assuming it has a hit because of the overhead of the extra code and that the feature is not public and therefore not fully polished, but it's a pure guess.

___________________
Try to take over the world!

BreakPoint

performance hit when using Match Regular Expression

performance hit when using Match Regular Expression

Re: performance hit when using Match Regular Expression

Re: performance hit when using Match Regular Expression

Re: performance hit when using Match Regular Expression

Re: performance hit when using Match Regular Expression

Re: performance hit when using Match Regular Expression

Re: performance hit when using Match Regular Expression

Re: performance hit when using Match Regular Expression