Character encoding substitution | Igor Pro by WaveMetrics

The OS/2 operating system supports an encoding by the name of Code page 1004 (CCSID 1004) or "Windows Extended".[18][19] This mostly matches code page 1252, with the exception of certain C0 control characters being replaced by diacritic characters.

So it is probably Windows-1252.

You can use The ConvertTextEncoding function with sourceTextEncoding=3 (Windows 1252) and destTextEncoding=1 (UTF-8).

Log in or register to post comments

February 19, 2022 at 03:07 pm - Permalink

tony

Thanks, that is helpful.

On further inspection, it seems that the text uses an extended ASCII character set, (something like "Code page 437"), where mu or micro symbol is character 0xE6.

I don't know how I missed ConvertTextEncoding!

Unfortunately it doesn't look like I can use ConvertTextEncoding to map extended ASCII variants to UTF-8. However, I can use ConvertTextEncoding to clean up any illegal bytes, using 1 for both source and destination encodings. That's probably the best option, because I don't know how to differentiate between files that could have contemporary character encodings and these files created by a legacy system.

Log in or register to post comments

February 20, 2022 at 03:03 am - Permalink

thomas_braun

If you are on Mac you should be able to use something like

iconv --from-code=CP437 --to-code=UTF8 ...

to convert the encoding.

CP437 is supported by ICU which Igor uses, at least if I understand https://github.com/unicode-org/icu/blob/main/icu4c/source/data/mappings… correctly.

Log in or register to post comments

February 28, 2022 at 02:16 am - Permalink

tony

Thanks, Thomas, that's good to know.

For my application, this was sufficient to deal with the unlikely occurrence of such characters in legacy files:

// preserves UTF-8 and cleans up single-byte extended charactersets
str = ConvertTextEncoding(str , 1 , 1 , 2, 1)

Log in or register to post comments

February 28, 2022 at 03:12 am - Permalink