You can't; there are no Cyrillic characters in ASCII. The chart you've shown is for one of the many "extended ASCII" character sets; specifically, it appears to be Windows-1251 (a.k.a. CP1251). In order to get a character's codepoint in this encoding, you thus need to first encode the string as CP1251 and then take the value of the resulting byte:
# Assuming Python 3
s = "Йог".encode('cp1251')
for b in s:
print(b)
Answer from jwodder on Stack OverflowYou can't; there are no Cyrillic characters in ASCII. The chart you've shown is for one of the many "extended ASCII" character sets; specifically, it appears to be Windows-1251 (a.k.a. CP1251). In order to get a character's codepoint in this encoding, you thus need to first encode the string as CP1251 and then take the value of the resulting byte:
# Assuming Python 3
s = "Йог".encode('cp1251')
for b in s:
print(b)
glagolitsa = "А,Б,В,Г,Д,Е,Ё,Ж,З,И,Й,К,Л,М,Н,О,П,Р,С,Т,У,Ф,Х,Ц,Ч,Ш,Щ,Ъ,Ы,Ь,Э,Ю,Я"
Glagolitsa = glagolitsa.split(',')
Glagolitsa
for i in range(len(Glagolitsa)):
char = Glagolitsa[i]
print(ord(char))
glagolitsa = glagolitsa.lower().split(',')
for i in range(len(glagolitsa)):
char = glagolitsa[i]
print(ord(char))
for i in range(1040, 1104):
print(chr(i))
It is not "ASCII" nor "ASCII Russian".
Before Unicode became widespread, most computer systems used the ISO-8859 character encodings, of which there were 16, each for a different region (Central European, Cyrillic, Greek...). Windows had its own 'code pages', very similar but with extra glyphs in otherwise-unused ranges. All these character encodings are 8-bit and only differ in the second half (128-255).
The problem with these encodings is that it's next to impossible for a program to determine which encoding was used to save a file, unless it was specified explicitly (such as in HTML pages; however, plain text files have no such metadata tags). Read the Wikipedia article on Mojibake for a more detailed description.
In your example, the document was saved using Windows-1251 (Cyrillic), but your program reads it as if it were Windows-1252 (Western European), which has very different characters in the same positions. To the computer, it looks perfectly okay – it doesn't understand languages or scripts. (There are programs which do statistical analysis in order to determine the correct encoding, though – some web browsers have such a function.)
There are several ways you could convert such text to Unicode:
Use online tools such as this one or this one.
Use your web browser:
Drag the
.txtfile into the browser.From View → Character Encoding (or Firefox → Web Developer → Character Encoding, or Wrench → Tools → Encoding), pick the correct original encoding: "Cyrillic (Windows-1251)" in your case.
Use the Notepad2 text editor:
Open the file.
From File → Encoding → Recode..., choose the right original encoding.
Use GNU
iconv, with Windows binaries either from GnuWin32 or Gettext for Win32.iconv -f cp1251 -t utf-8 < myfile.txt > myfile.fixed.txt
Windows Notepad will correctly read UTF-8 and UTF-16 encoded text.
You could convert the encoding using a program such as iconv - but you'll need to know what encoding was used.
It seems to be Windows-1251 according to a random web page found by Google.
Установка:
1) Запускаем QuidamStudioSetup3.15.exe
2) При запросе серийного номера вводим
I don't know Russian but pasting that into translate.google.com suggests that the above is plausible:
installation:
1) Run QuidamStudioSetup3.15.exe
2) When prompted, enter the serial number
So ...
iconv -f 1252 -t UTF-8 document.txt
Should convert your test file into something that can be opened and read in Notepad