08-29-2022 10:50 PM
Hi,
I have to extract a string enclosed in quotation marks, following immediately after the field name (I'm doing some kind of JSON string analyses, using LV2014). Locating the field name is working as expected by using match pattern as shown below:
But when it comes to extract the string between the following two quotation marks things go weird. For example, field names are "bot seq" or "top seq" and the input the string is:
{"afe_loop_cfg":0,"n_afe_loop":0,"num_scanA":0,"num_scanB":0,"num_scanC":0,"cbTxPol":false,"cbTablePreload":true,"cbAnimAuto":true,"time auto":1,"bot seq":" 218, 64, 217, 65, 220, 62, 219, 63, 222, 60, 221, 61, 224, 58, 223, 59,\n 226, 56, 225, 57, 228, 54, 227, 55, 230, 52, 229, 53, 232, 50, 231, 51,\n 234, 48, 233, 49, 236, 46, 235, 47, 238, 44, 237, 45, 240, 42, 239, 43,\n 242, 40, 241, 41, 244, 38, 243, 39, 246, 36, 245, 37, 248, 34, 247, 35,\n 250, 32, 249, 33, 252, 30, 251, 31, 254, 28, 253, 29, 256, 26, 255, 27,\n 258, 24, 257, 25, 260, 22, 259, 23, 262, 20, 261, 21, 264, 18, 263, 19,\n 266, 16, 265, 17, 268, 14, 267, 15, 270, 12, 269, 13, 272, 10, 271, 11,\n 274, 8, 273, 9, 276, 6, 275, 7, 278, 4, 277, 5, 280, 2, 279, 3,\n 282, 0, 281, 1","top seq":" 66-211"}
The string is not multiline - i.e. \n is not NEW_LINE character but two characters '\' and 'n'
I tried to use Match Regular Expression to extract the string enclosed in the quotation marks following the field, but I can't make it working for both "bot seq" or "top seq" . I expected Match Regular Expression to work by using control string ^"(.)" but this is not the case - for bot seq and ^"(.)" I get the following output (notice that top seq field data is also there):
218, 64, 217, 65, 220, 62, 219, 63, 222, 60, 221, 61, 224, 58, 223, 59,\n 226, 56, 225, 57, 228, 54, 227, 55, 230, 52, 229, 53, 232, 50, 231, 51,\n 234, 48, 233, 49, 236, 46, 235, 47, 238, 44, 237, 45, 240, 42, 239, 43,\n 242, 40, 241, 41, 244, 38, 243, 39, 246, 36, 245, 37, 248, 34, 247, 35,\n 250, 32, 249, 33, 252, 30, 251, 31, 254, 28, 253, 29, 256, 26, 255, 27,\n 258, 24, 257, 25, 260, 22, 259, 23, 262, 20, 261, 21, 264, 18, 263, 19,\n 266, 16, 265, 17, 268, 14, 267, 15, 270, 12, 269, 13, 272, 10, 271, 11,\n 274, 8, 273, 9, 276, 6, 275, 7, 278, 4, 277, 5, 280, 2, 279, 3,\n 282, 0, 281, 1","top seq":" 66-211
Somehow Match Regular expression gets confused by the quotation marks - the output contains also the next field with the data, which is not what I need. I used very often Match Regular expression and till now it always worked as expected.
To patch the problem I made the string extraction by using Search array function (attached VI), but still curious why Match Regular Expression is not working. Also tried to use Match Pattern instead of Match Regular Expression, but the result was the same - can't get correctly the data enclosed in quotation marks.
The question is - how I can get correctly the string enclosed by quotation marks when using Match Regular Expression or Match Pattern?
Thanks
Solved! Go to Solution.
08-30-2022 12:55 AM
I am not sure if I understood your issue correctly.
Does
"([^"]*)"
return what you want?
08-30-2022 02:31 AM - edited 08-30-2022 02:40 AM
Hi UliB,
Thanks for the suggestion - with your suggested string really works! It makes perfect sense - search for quotation, then for anything different from quotation mark and at the end for the second quotation.
^"([^"]*)"
Still curious why ^"(.*)" is not working - it's technically doing the same thing, except that >.< is searching for any character. Once the second quotation mark appears, ^"(.*)" should complete the search (which is not happening always in my example).
08-30-2022 02:45 AM - edited 08-30-2022 02:46 AM
Hi Luben.hristov
When I work with regular expressions, I test with https://regex101.com/
On this page, the help for * says:
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
And in "(.*)" the quote (") is a valid match for . as long as another quote (") comes in the following characters.
08-30-2022 03:19 AM
Hi UliB,
Thank you for the explanation - now it makes sense why sometimes it works and sometimes not.
Personally I use one test VI where it's possible quickly to experiment with different regular expression strings - see attached. It has some basic syntax analyses of the regular expression and colours the characters to highlight the groups (see attached LLB).
Your link for the regular expressions is definitely very useful, thanks for the hint.
08-30-2022 07:03 AM
Hello Hristov,
there is another, easier solution for your problem. As UliB stated, the normal behavior is to be greedy. But with the modifier '?' it is possible to change the behavior of a quantifier to be eager. As the offline help (LabVIEW2019) for the special character '?' says:
When used immediately after a quantifier, ? modifies the quantifier to match as few times as possible. Modifiable quantifiers include *, +, and {}. |
So the following search string works the same:
"bot seq":"(.*?)",
And this example shows how to get rid of the "Match Pattern" funktion to remove the first string part.
08-30-2022 07:04 AM - edited 08-30-2022 07:08 AM
Hi Luben.hristov
another regular expression you can try is
"(.*?)"
*? matches the previous token between zero and unlimited times, as few times as possible, expanding as needed (lazy)
Edit: daveTW was a minute faster. 😀
08-30-2022 07:08 AM
Hi Dave,
This information about "?" modifier is new for me - it is changing the behaviour of the Match Regular expression exactly to what I need.
Thank you for the great idea!
08-30-2022 07:16 AM
Thanks UliB,
You posted simultaneously with Dave the information about the "?" modifier. I feel privileged to get such quick solution for the LV problems 🙂