ExecuteScriptText and German umlauts

Generally ExecuteScriptText cmd works nicely on Win10. But as soon as the string cmd containts special characters, e.g. the German umlauts "ä", "ö" or "ü", the command is no longer executed without further notice. Try

ExecuteScriptText "C:\\Test\\Überprüfung.bat"

where the batch file Überprüfung.bat contains e.g. a line like

echo "Hello World" >test.txt

I guess, this problem could be related to UTF coding. -?

 

Best regards,

Joerg.

I tested Igor 8.02 on Windows 10 and I don't see a problem. I changed your test .bat file to the following:

echo "Hello World" > C:\Test\test2.txt

When I execute it using the ExecuteScriptText command you provided, test2.txt is created and contains the correct contents.

 

I may be leading you on a wild goose chase, but...

This may be an issue of precomposed versus decomposed characters.

If your file name uses precomposed then your command must use precomposed. In precomposed, Ü is U+00DC ("Latin capital letter U  with diaeresis").

If your file name uses decomposed then your command must use decomposed. In decomposed, Ü is U+0055 ("Latin capital letter U") followed by U+0308 ("Combining diaerisis").

Windows allows you to have two files with ostensibly the same name but which use different representations of accented characters.

Here are Igor commands that call ExecuteScriptText on two distinct files:

ExecuteScriptText "C:\\Test\\\u00DCberpr\u00FCfung.bat"		// Precomposed

ExecuteScriptText "C:\\Test\\\u0055\u0308berpr\u0075\u0308fung.bat"		// Decomposed (uses combining characters)

If precomposed, strlen("Überprüfung.bat") prints 17.

If decomposed, strlen("Überprüfung.bat") prints 19.

If you copy the file name to the clipboard from the Windows desktop and then execute the strlen command on it, that will tell you if the filename is precomposed or decomposed.

If I use the Windows 10 "on screen keyboard" and set my input method to DEU (German), and press Shift and click Ü, I get a precomposed Ü. There may be other ways of entering Ü that produced decomposed text.

In reply to by hrodstein

Hi Howard,

thank you for this very insightful answer. I use Win 10 DEU keyboard. strlen tells me, the results are decomposed. So I played with decomposed letters. And my questions get rather more than less.

If I type

print "\u0055\u0308"

I get U ̈  instead of Ü. And an ExecuteTextScript command does not work either this way. So I want to know, how my "Ü" is constructed. I decompose the "Ü" with two char2num commands and I yield -61 and -100, which is 0xFFC3FF9C. Honestly I do not understand, how these codes are related to the decomposed 0x00550308.

I recognized, that the problem does only exist with ExecuteScriptText while CopyFile works always fine. In order to get ahead with work, I programmed a workaraound. I now copy all files with CopyFile to an umlaut-free path, give them umlaut-free names, let a latex batch do its work on them and copy the result back using the CopyFile command. ExecuteScriptText is working again after rebooting the computer. So no more pressure on this issue. But I am still curious to understand it, because it will for sure come back to me.

In reply to by joerg.kunze

joerg.kunze wrote:

If I type

print "\u0055\u0308"

I get U ̈  instead of Ü.

That's due to the Lucida Console font (or whatever font you're using). If you change the history window's font to something different, such as Courier New, you'll see the text printed correctly. Or, if you copy the string into Word and set the font to Lucida Console, you'll see it displayed incorrectly there also.

Quote:
I decompose the "Ü" with two char2num commands and I yield -61 and -100, which is 0xFFC3FF9C. Honestly I do not understand, how these codes are related to the decomposed 0x00550308.

Igor stores text as UTF-8, a byte-oriented Unicode text encoding format that uses between one and four bytes per character.

0055 and 0308 are code values expressed in hexadecimal as UTF-16, a Unicode text encoding format that encodes most characters using a single 16-bit word but which encodes some characters as a sequence of two 16-bit words.

Here is a demo that shows how to view the UTF-16 codes corresponding to an Igor UTF-8 string:

Function MakeUTF16LEWave(utf8Str, wName)
	String utf8Str
	String wName						// Name for output wave
	
	// Convert UTF-8 to UTF-16LE
	Variable utf8Code = 1
	Variable utf16Code = 101		// UTF-16, little-endian
	String utf16LEStr = ConvertTextEncoding(utf8Str, utf8Code, utf16Code, 1, 0)
	
	// Make an unsigned byte wave
	Variable numCodeValues = strlen(utf16LEStr) / 2	// Two bytes per UTF-16 code value
	Make/O/N=(numCodeValues*2)/B/U $wName
	WAVE w = $wName
	w = char2num(utf16LEStr[p]) & 0xFF
	
	// Convert to an unsigned 16-bit wave
	Redimension/E=1/N=(numCodeValues)/W/U w
End

Function DemoUTF16()
	String precomposedStr = "\u00DCberpr\u00FCfung.bat"
	MakeUTF16LEWave(precomposedStr, "PrecomposedUTF16LE")
	WAVE PrecomposedUTF16LE
	
	String decomposedStr = "\u0055\u0308berpr\u0075\u0308fung.bat"
	MakeUTF16LEWave(decomposedStr, "DecomposedUTF16LE")
	WAVE DecomposedUTF16LE
	
	DoWindow/F DemoUTF16LETable
	if (V_Flag == 0)
		Edit/W=(190,45,582,442)/N=DemoUTF16LETable PrecomposedUTF16LE, DecomposedUTF16LE
		ModifyTable format(PrecomposedUTF16LE)=10,format(DecomposedUTF16LE)=10
	endif
End

 

Like aclight, I was also unable to reproduce the original problem.

Please zip and attach a "Überprüfung.bat" file that shows the problem. That should preserve the spelling of your file name and may allow me to reproduce the problem.

 

Are you calling the code from a procedure file? What encoding does the procedure file has? The "i" button on the lower left tells you that.

In reply to by thomas_braun

thomas_braun wrote:
What encoding does the procedure file has?

A procedure file's text encoding comes into play only when the file is read or written. All text is stored in memory as UTF-8. So the procedure file's text encoding should not be an issue.

If a procedure file has the wrong text encoding for the text stored in it, you will see incorrect characters when editing the procedure file in Igor. Unless you see that, the file's text encoding is not relevant.

 

Your file, as unzipped, is spelled using precomposed characters, like my version of your file.

I tried calling ExecuteScriptText on your file in Igor6, Igor7, and Igor8. In all versions, it runs without returning an error.

It does not produce the Test.txt output file unless I run Igor as administrator. If I run as administrator, it produces the file in all versions.

The file is produced in the "current folder" which is the folder containing the Igor64.exe file. This is a descendant of the "Program Files" folder which is protected by the operating system. That's why I need to run as administrator.

If I change the command in the file from

echo "Hello World" > Test.txt

to

echo "Hello World" > "C:\Folder\Test.txt

where "Folder" is a folder for which I have write access, then the Test.txt file is produced in the specified folder whether I am running as administrator or not.

Bottom line is that I don't know why I am getting a different result from you.

 

Hi Howard,

thanks again. It seems to be a soft bug or a "Heisenbug", as my colleagues like to call it. They are sometimes hard to reproduce. Maybe also the path to the batch plays a role. It also contained some umlauts. And maybe also a process got stucked. Again, rebooting helped.

As I currently have a running workaround, I would like to stop discussion here and rather put our energy into other questions.

Thanks again,

Jörg (...hopefully precomposed...)

 

Quote:
ExecuteScriptText is working again after rebooting the computer.

I missed that part in your post from December 2nd.

 

Oh, sorry. It helped after changing to umlaut-free path and file names. I posted it as soon as I recognized.