[ACCEPTED]-In Win7, Unicode/ UTF-8 text file: gibberish on Windows console (Trying to display hebrew)-hebrew

Accepted answer
Score: 16

The Font Courier New supports hebrew and can be added 153 to the command prompt. The default fonts 152 are consolas, lucida, raster, none of them 151 support hebrew. So add Courier New to the 150 command prompt.

It's a registry hack to 149 do that

http://www.howtogeek.com/howto/windows-vista/stupid-geek-tricks-enable-more-fonts-for-the-windows-command-prompt/

http://www.techrepublic.com/blog/windows-and-office/quick-tip-add-fonts-to-the-command-prompt/

This is a good example of how 148 to install fonts, but I should remove a 147 lot of these entries, because most of them 146 didn't get added to cmd because cmd didn't 145 support them.

Lucida and Consolas are defaults.
Raster 144 is a default not listed here maybe 'cos 143 it's a TTF
Of all these I tried to add, only 142 3 added(are supported by cmd)
Courier New, DejaVu 141 Sans Mono, Droid Sans Mono

DejaVu Sans 140 Mono and Droid Sans Mono are downloadable, supported 139 by cmd, might have some good unicode support/characters, but 138 don't include Hebrew

enter image description here

I have

Consolas <-- default
Courier New  <--- added
DejaVu Sans Mono  <-- added
Droid Sans Mono  <-- added
Lucida Console <-- default
Raster Fonts <-- default

Common hebrew 137 fonts are Miriam and David, but they can't 136 be added to the command prompt.

For the 135 record, Babelmap can list all fonts on your 134 system that support hebrew e.g. in babelmap- click 133 fonts..font coverage, then enter 05D0(that's 132 aleph). I think all these fonts exist on 131 a default windows 7 installation

Aharoni, Arial, Courier New, David, FrankRuehl, Gisha, Levenim MT, Lucida Sans Unicode, Microsoft Sans Serif, Miriam, Miriam Fixed, Narkisim, Rod, Segoe WP, Tahoma, Times New Roman

But most 130 or all of those fonts with hebrew aren't 129 supported in the command prompt, except 128 Courier New. In fact most fonts full stop 127 aren't supported in the command prompt, not 126 even "times new roman"(because 125 "times new roman" is not mono-spaced 124 / fixed width, and that's one of a number 123 of criteria for it to be supported, other 122 criteria seem to be more obscure).

So now 121 you can have Courier New added and selected 120 for use in the command prompt.

enter image description here

And so you 119 can paste unicode characters onto cmd provided 118 the selected font supports it.

enter image description here

To copy/paste, click 117 the Copy button in charmap

Now it's in the 116 clipboard

To paste it into the command prompt, in 115 win7 paste into command prompt isn't ctrl-v. You 114 right click and choose paste. (or if in 113 quickedit mode then just rightclick)

enter image description here

That's 112 the main thing.

Additionally

Often in windows one might 111 use notepad and character map.. but one 110 should be aware of some limitations with 109 them.

Character map shows the first 65536 108 unicode characters when the font you selected 107 supports it, and character map shows you 106 the UTF-16 code. That's ok, you can still 105 paste from character map into a cmd.exe 104 window, but you should know that commands 103 run in cmd.exe and pipes don't support utf-16. So 102 you can use character map, find a character 101 e.g. aleph 05d0, but it's worth looking 100 up the character on http://www.fileformat.info/info/unicode/char/05d0/index.htm and seeing that while 99 the utf-16 code is 05d0, the utf-8 code 98 is d790. The xxd command and file command 97 is useful for seeing the real contents of 96 a file and determining the file's type.

Notepad 95 is a bit limited when it comes to unicode 94 or any character in the unicode character 93 set whose UTF16 code is > FF. And cmd 92 is a bit limited in regard to some commands 91 like 'type', and in regard to pipes and 90 redirection.

If using cmd.exe you really 89 need pipes to work 'cos pipes are important..

Pipes 88 are limited to the encodings that can be 87 specified by the CHCP Command.

(Note that 86 if CHCP tells you you are on a particular 85 codepage, e.g. 850, it's telling you the 84 input encoding. If you run the command chcp 83 850 it will change both the input and output 82 encodings. Usually they are the same. It's 81 simpler when they are the same. But if you 80 used some other program to change the encoding 79 of cmd eg the c# compiler has a switch that 78 changes it, then it's best to change it 77 with chcp so you know both encodings are 76 set ).

There is a CHCP 1200 (UTF-16LE) and 75 1201(UTF-16BE) , but neither are supported, if 74 you try it it will say invalid codepage 73 (tested in win7). CHCP doesn't support 72 UTF-16(it doesn't support UTF16LE or UTF16BE). There 71 is CHCP 65001 (That's UTF-8 without BOM). And 70 there is CHCP 862 (the old fashioned way 69 as in MSDOS days way, of encoding Hebrew, that 68 I mentioned)

The type command supports UTF16LE 67 as does notepad(What notepad calls Unicode, is 66 UTF-16 LE), But pipes and redirection don't 65 support that. The type command also supports 64 any codepage specified/supported by CHCP. So 63 type supports 862 or 65001.

So you could 62 use notepad save it as UTF8 (which is with 61 BOM), then fiddle around to remove the BOM. (That's 60 a bit overkill).. Or you could use notepad, save 59 it as Unicode UTF 16LE.. But then you can't 58 sue pipes.. (that's bad).. Easiest thing 57 to do is use a text editor like notepad2 56 or notepad++, that supports UTF8 without 55 BOM.

Or if doing everything from cmd you 54 could use 862 or 65001. Though many text 53 editors might not give good support of 862. So 52 you might prefer 65001.

If you want to write 51 any file in notepad and it has a character 50 greater than what in UTF16 is referred to 49 as \uFF, and you want to run commands in 48 cmd.exe on that file, then some commands 47 (e.g. the type command), will have problems 46 if you don't take into account what is supported 45 by what.

Notepad supports UTF-16BE, UTF-16LE 44 and UTF-8 with BOM. That's not good. And 43 no need to fiddle around with xxd and sed 42 or other commands to remove the BOM. If 41 you have any file with a so-called unicode 40 character, a character outside of the regular 39 ascii range. A character > UTF-16's \uFF, as 38 shown by character map as being > \uFF, then 37 use Notepad2 or notepad++

Type supports 36 UTF16LE, and any codepage set by CHCP e.g. 65001 35 or 862.

Pipes and redirection go by whatever 34 is set by CHCP.

Codepage 862 is old so Codepage 33 65001 is a good way to go.

xxd and file are 32 useful for seeing how a file is encoded 31 which can be helpful if you have issues. But 30 not absolutely necessary.

So if you want 29 to write a file for use in CMD, and it has 28 some unicode characters, while thee are 27 some commands like xxd and sed that could 26 be used to remove a BOM, and other commands 25 to do so. The easiest way to make such a 24 file in a text editor is to use a text editor 23 like notepad2 or notepad++ which supports 22 UTF8 without BOM.

Getting hebrew displaying 21 might be the most important thing to do 20 first, as described above. And the next 19 thing is being able to save files in a text 18 editor that you can display with e.g. 'type'.

And 17 if you ever want to copy from the command 16 prompt, if not in quickedit mode, then right 15 click then choose mark then select it then 14 hit ENTER. And to paste right click and 13 choose paste.

An further additional point 12 is

Apparently there are bugs in chcp 65001 11 where some batch files won't run and maybe 10 some C programs won't work either. How to use unicode characters in Windows command line? And 9 i've even seen the c sharp compiler crash 8 when cmd is in codepage 65001 (though one 7 may blame the c sharp compiler, one could 6 also blame 65001) Why is csc.exe crashing when I last left the output encoding as UTF8?

Note- an earlier revision 5 of this answer had some command line examples 4 but they were unnecessarily complex. I might 3 at some point add some commands that demonstrate 2 what I have been describing but it's fairly 1 trivial.

Score: 4

/u is for UTF-16LE, not UTF-8. This is why 26 saving the file as UTF-16LE (what Windows/Notepad 25 misleadingly calls "Unicode") and running 24 with /u works, in as much as it does.

UTF-8 23 should be achievable with chcp 65001, but there are some 22 nasty low-level bugs in the Microsoft C 21 Runtime for this code page, which makes 20 some apps unreliable and some not run at 19 all.

So yeah, I'm sorry, but UTF-8 is a second-class 18 citizen under Windows. Anything that uses 17 the 'ANSI' interfaces for IO, including 16 anything that uses the C standard IO library, including 15 the Command Prompt, won't be able to cope 14 with it properly.

The only reliable way to 13 get Unicode output in Command Prompt is 12 to use the Windows-specific WriteConsoleW interface to 11 push Unicode strings directly. Unfortunately 10 as this is not available cross-platform, many 9 tools won't use it.

In any case, even when 8 you've got the encoding right, you still 7 have to have a font in the Command Prompt 6 that contains the characters you want. I 5 believe this is why you still aren't getting 4 Hebrew in the /u+UTF-16LE route.

Summary: Command 3 Prompt + non-ASCII == almost certain fail. Give 2 up and find some other interface you can 1 use that supports Unicode better.

Score: 1

You should convert file.txt to UTF-16(Little Endian) before 1 type file.txt

Reference: What encoding/code page is cmd.exe using?

Score: 1

I presume you mean "Lucida Console" when 14 you say "Lucida".

Using the charmap application 13 I couldn't find any Hebrew characters in 12 the font. I don't know if the font was more 11 capable in earlier versions of Windows, but 10 in Windows 7 there appears to be nothing 9 outside of the European characters.

My system 8 also has Lucida Sans Typewriter which does 7 include the Hebrew characters. Unfortunately 6 the Cmd window doesn't show it as a choice. You 5 need to edit the registry to open up more 4 choices, as shown in this question on SuperUser: https://superuser.com/questions/5035/how-to-change-the-windows-console-font

P.S. I 3 have been unable to verify this solution 2 because Windows is being difficult. See 1 https://superuser.com/questions/390933/how-to-add-a-font-to-the-cmd-window-choices-in-windows-7-64-bit

Score: 1

How to get an Hebrew enabled XP installation?

First of all, this is about an XP home SP3, Hebrew 28 enabled. By that I mean it is a standard 27 XP US installation, or so I believe, with 26 the addition of Hebrew capabilities for 25 keyboard and display. I believe every XP 24 CD can install such a system. In particular, I 23 believe the following is all that is needed 22 for such a system:

  1. Control panel -> Date, Time, Language and Regional Options -> Language and Regional Options -> in Language tab: 1) Click Details and add an Hebrew keyboard. 2) mark with a V the Install files for complex script and right-to-left languages (including Thai) option.
  2. Control panel -> Date, Time, Language and Regional Options -> Language and Regional Options -> in Advanced tab: Accept, mark with a V, 10004 (MAC - Arabic) and 10005 (Mac - Hebrew). Not sure if Arabic is a must have here.

Now to the cmd console

One has to explicitly 21 add Courier New fonts to the console fonts 20 registry, as described earlier. Otherwise, explicit 19 Hebrew fonts will not be displayed.

Now 18 when cmd console is opened, all there is 17 to do in order to input Hebrew characters 16 is to enable the Courier New fonts, and 15 change the keyboard to an Hebrew mode. Having 14 Windows scroll the languages it has for 13 the keyboard is easy. Either repetitive 12 pressing of left Alt combined with left 11 shift keys, or with the mouse.

As an aside, a 10 dir command will show file names that have 9 Hebrew characters. However, one can't just 8 issue a

dir file_name

and see the usual output if the file 7 begins with a Hebrew letter. It must be

dir *file_name

I 6 assume the asterisk character adds the BOM 5 unicode character.

One can also open Notepad, input 4 Hebrew characters, save the file as UTF8, and 3 run the following in the console commands:

chcp 65001
type that_Notepad_file_I_saved

Saving 2 the file as UTF8 is done on Notepad save 1 screen.

More Related questions