Following are my observations and findings while using Unicode in a LabVIEW UI application. The purpose of using Unicode in this application is to localize the UI for different languages including Japanese while the application is running. The application provides a settings page which allows the user to select the language of the UI. Based on this setting all UI components (controls and indicators) are localized into the selected language. The resource strings for all controls/indicators and languages are read from a text file containing Unicode strings.
LabVIEW for Windows has limited support for Unicode strings in the front panel controls and indicators. This is not an offically supported feature, meaning that it is not as fully tested as other released parts of the development and run-time environment. In addition this feature is not covered under standard product support and parts of this feature may change in future releases of LabVIEW, i.e. any code developed on this feature may require changes when upgrading to a newer version of LabVIEW. If you have any feedback or questions about using Unicode in LabVIEW post them as comments on this document or in the Developer Zone discussion forums.
The code posted as part of this document has been developed and tested in LabVIEW 2009 running on Windows XP SP3 (English). The VIs are saved back to LabVIEW 8.6. Earlier versions of LabVIEW did not include all of this Unicode support and it is suggested that you upgrade to LabVIEW 8.6 or later if you want to use Unicode in your application development.
This code has not been tested with other operating systems and I assume it will not work on non-Windows OSs, though I expect it should work in Windows Vista and Windows 7.
What is Unicode?
The answer to this question could cover many pages by itself so I will not attempt to provide a detailed or comprehensive explanation of the Unicode standard and its different character encodings. Please consult other online sources for this information. It will be helpful to be familiar with what Unicode is before proceeding with the rest of this document.
Unicode and LabVIEW
Unicode is not officially supported by the LabVIEW environment, but there is basic support of Unicode available as described in this document. Unicode can support a wide range of characters from many different languages in the same application.
Windows XP (Vista, 7) is fundamentally built on Unicode and uses Unicode strings internally, but it also supports non-Unicode applications. By default LabVIEW on Windows (English) does not use Unicode strings, but rather uses Multibyte Character Strings (MBCS). The interpretation of MBCS is based on the current code page selected in the operating system. The current code page is set using the regional settings of the OS and determines how the bytes in the strings are rendered into characters on the screen. The most common code page is 1252 used by English Windows as well as several other Western languages and comprises the commonly known extended ASCII character set.
When the regional settings in Windows are changed the OS may switch to a different code page for rendering strings. For example if you switch to Japanese, code page 932 will be used. Using diferent code pages allows LabVIEW to have localized versions of the development environment. Almost all All code pages include support for the basic Latin ASCII characters used in the English language, as well as a local set of characters. Therefore if you have code page 932 selected, the operating system can still render Latin ASCII characters as well as Japanese.
Using Unicode instead of MBCS, an application can render characters from many different languages and code pages alphabets or scripts without switching code pages/regional settings. In fact most all of the language scripts supported in the legacy code pages are included in Unicode and Unicode has room for many more character sets than are defined today. Most current keeps being expanded with more characters every release. Because Unicode does support more than 65535 characters nowadays, a concept of planes was introduced in conjunction with surrogate pairs. Most of the characters covered by the code pages are included in Plane 0 of the Unicode standard , while characters from some ancient and extinct lanaguages are located in Plane 1and fit on a 2-byte representation, but more complex characters for mathematics or ancient scripts have been located on higher planes and thus use surrogate pairs as they code point value (and are thus coded on 4 bytes).
A common encoding form of the Unicode characters on Windows are commonly encoded as set is UTF-16 /UCS-2, which and on Windows, its most common form is big endian. Unicode in LabVIEW is handled as little endian, also called UTF-16LE. This is important to know when looking at the hexadecimal representation of strings or working with Unicode text files.
|Character||ASCII (hex)||UTF-16 (hex)||UTF-16LE (hex) - LabVIEW|
|z||7A||00 7A||7A 00|
|水||n/a||6C 34||34 6C|
|Ѳ||n/a||04 72||72 04|
Table 1: Example of a few characters in ASCII and Unicode
When writing Unicode to a plain text file you commonly prepend a Byte Order Mark (BOM) as the first two characters of the file. The BOM indicates to the file reader that the file contians Unicode text and if the byte order is big-endian or little-endian. The BOM for big-endian is 0xFE FF. The BOM for little-endian including LabVIEW is 0xFF FE. Windows Notepad and Wordpad can detect a Unicode file using the BOM and display their contents correctly.
LabVIEW for Touchpanel on Windows CE
LabVIEW for Touchpanel on Windows CE supports on multi-byte character character sets (MBCS) — specifically double-byte character sets (DBCS). Under this scheme, a character can be either one or two bytes wide. If it is two two bytes wide, its first byte is a special "lead byte," chosen from a particular range depending on which code page is in use. Taken together, the lead and "trail bytes" specify a unique character encoding.” (http://msdn.microsoft.com/en-us/library/ey142t48(VS.80).aspx) A code code page only contains the characters from one particular language such as Korean. Therefore MBCS can only support ASCII and one other set of language characters at a time and you need to select the specific code page for non-ASCII characters to be used in your application. To do that, look for the "language for non-Unicode programs" in the Windows Control Panel.
Using Unicode in LabVIEW
Common Use Cases
A list of common uses of Unicode in an application developed using LabVIEW includes:
• All strings in the application used for display, user input, file I/O network communication (e.g. TCP/IP) are ASCII strings. This is the most common use of LabVIEW and does not require the use or consideration of Unicode.
Non-Unicode = Extension of ASCII based on system code page
ASCII technically only defines a 7-bit value and can accordingly represent 128 different characters including control characters such as newline (0x0A) and carriage return (0x0D). However ASCII characters in most applications including LabVIEW are stored as 8-bit values which can represent 256 different characters. The additional 128 characters in this extended ASCII range are defined by the operating system code page aka "Language for non-Unicode Programs". Windows uses For example, on a Western system, Windows defaults to the character set defined by the Windows-1252 code page. In some case the Windows-1252 is an extension of another commonly used encoding called ISO-8859-1 character set may be used. These two are very similar but do have some differences in the range of 0x80 to 0x9F.
Figure 1: LabVIEW ASCII string in Hex and normal display showing extended ASCII characters
• The application reads Unicode data from a file or other source and displays it as ASCII characters using a non-Unicode encoding (ASCII based) on the user interface. In this use case it is assumed the Unicode characters are limited to the subset supported by extended ASCII.
• The application reads Unicode data from a file or other source and displays it as Unicode characters on the user interface.
• The application internally uses ASCII characters characters encoded in a non-Unicode way, including input from the UI by the user, but needs to write data to a file or other destination in Unicode.
• The application uses Unicode strings internally including input from the UI and writes Unicode data to a file or other destination.
LabVIEW Configuration for Unicode
To use Unicode in LabVIEW you must enable Unicode support by adding the following setting in the LabVIEW.ini file. After making this change you must restart the development environment.
LabVIEW Controls and Indicator Properties
The LabVIEW string controls and indicators have two private properties related to entering and displaying Non-Unicode (extended ASCII) or Unicode characters. These properties are not exposed through the regular property node; access to these properties is provided through subVIs as part of the examples included with this document.
• Force Unicode Text
Force Unicode Text is a property which can be enabled and disabled on the string control using the context menu of the control or indicators.
Figure 2: Setting the Force Unicode Text property on a string control
The Force Unicode Text property affects how text entered from the keyboard is converted to a string (byte stream) in the diagram. If text is passed from an ASCII keyboard and this property is turned on, then the text is automatically converted to the Unicode equivalent of the ASCII characters. Typically this means that every byte per character is converted to the two byte Unicode equivalent.
InterpretAsUnicode is a property which can be enabled on different text elements of different UI controls and indicators such as the text of a string control/indicatir, the caption of a control/indicator, the Boolean text of a Boolean control/indicator, etc. This property
controls whether a string value passed to the rtext element is interpreted as an ASCII or Unicode string. SubVIs provided with the example in this document allow you to pass strings to different UI elements and sleect whether you are passing an ASCII or Unicode string.
Note: The state of the InterpretAsUnicode property of a string element may be changed dynamically if text is pasted or entered into the text element by the user. The display mode (InterpretAsUnicode) of the text element will automatically adapt to Unicode or ASCII depending on the type of text entered into the control.
• If you paste a Unicode string into a text element the InterpretAsUnicode property is turned on.
• If you paste a regular ASCII string into a text element the InterpretAsUnicode property is turned off.
For example, if the display mode of a string control is Unicode (InterpretAsUnicode property on) and text is entered from an ASCII keyboard, the display mode will be switched to ASCII and the current value of the string control will be interpreted and displayed as ASCII characters. This can cause issues if the Force Unicode Text property is enabled for a string control. Entering regular ASCII text will cause the string control to interpret all data as ASCII, however the Force Unicode Text property will automatically convert the new characters entered in the control input to Unicode data. These two conditions combined will cause ASCII text to have a ‘space’ between each letter entered. These spaces are actually the extra Null byte, which are the second byte of each of the ASCII characters converted to Unicode. To resolve this issue you must detect the keyboard input and set the Text.InterpretAsUnicode property of the string control to True to properly display all text as Unicode. This is shown in the examples.
Labels and Captions
When localizing the name of a control or indicator on the user interface you should always use the caption of the control instead of the label. The label is part of the code of the VI (similar to a variable name) and should not be changed. The caption should be displayed instead of the label and can be changed at run-time using the VIs provided.
Listbox, Multicolumn Listbox and Table
The Listbox, Multicolumn Listbox and Table controls have different behavior in terms of processing Unicode strings from the rest of the text elements described previously. These controls so not use the InterpretAsUnicode property. Instead they look for a BOM (Byte Order Mark) on any strings passed to them. If a string passed to these controls starts with a BOM (either 0xFFFE or 0xFEFF) then the string will be handled as Unicode. This allows you to mix both Unicode and ASCII strings in the same control. The examples include subVIs to pass strings to these controls and mark them as Unicode using the BOM.
Figure 3: Adding the BOM to Unicode strings to update a listbox
In order to display Unicode strings on your user interface the fonts you are using must have the necessary support for all the characters you are using. If you are using an extensive set of characters from languages using non-latin characters you should verify that your selected fonts have the necessary character support.
Two specific fonts commonly available on Windows that include most Unicode characters are Arial Unicode MS and Lucida Sans Unicode.
Programming Unicode in LabVIEW
Converting ASCII Strings to Unicode
Included in the examples are two very simple VIs to convert an ASCII (MBCS) strings to Unicode and vice versa. These VIs use functions provided by Windows to detect the current code page used for the MBCS and handle the conversion.
Figure 4: Converting between ASCII and Unicode strings in LabVIEW
The conversion VIs are polymorphic and can handle scalar strings as well as 1D and 2D string arrays.
Displaying Unicode Strings on Controls and Indicators
The attached project includes a number of examples showing how to display Unicode strings on different UI controls and indicators. For each of these control types subVIs are included to pass strings to the control and their caption and specify whether the string should be treated as Unicode or not. The following UI controls and indicators are supported with specific VIs:
- Caption of any control or indicator
- 1D String Array
- 2D String Array
- Multicolumn Listbox
Using control properties you can also access these controls inside of other data structures such as a cluster.
Figure 5: Converting an ASCII string to Unicode and display it on a string indicator, 1D string array and 2D string array
Figure 6: Converting an ASCII string array to Unicode and display it on a Listbox, Multicolumn Listbox and Table
Reading Unicode from a String Control
In order to read Unicode strings from a front panel string control there are a number of settings and that need to be made:
1. Enable the Force Unicode Text property of the string control from its context (right-click) menu.
2. Enable the Update Value while Typing property of the string control from its context menu.
Figure 7: Enable the Force Unicode Text and Update Value while Typing properties
3. Add an event case to the Event Structure for the Value Change event of the string control. In the event case wire the control reference and new value from the event to the Tool_Unicode Update String VI as shown in the following diagram. This will update the string control as the user is typing to keep the InterpretAsUnicode property set to Unicode, while entering ASCII characters.
Figure 8: Event Handler for the Value Changed event of the string control
Reading and Writing Unicode Strings to Text File
When reading and writing text files it is important to know if the contents of the file is ASCII or Unicode. The Read from Text File function in LabVIEW does not know whether the contetns of the file is ASCII or Unicode. Therefore you need to check to see if the file contains a BOM (Byte Order Mark) at the beginning of the data read from the file and then process the data accordingly.
Figure 9: Read a Unicode text file and process
To write Unicode text to a file, convert all your strings to Unicode and then prepend the BOM before writing the final string to a file using the Write to Text File function.
Figure 10: Write Unicode text to a file