Following this post instructions:How to make Unicode charset in cmd.exe by default?
Its possible to bypass this encoding problem
import subprocess
output = subprocess.check_output("chcp 65001 | powershell \"Get-ChildItem -LiteralPath 'HKLM:SOFTWARE\\\\Microsoft\\\\Windows\\\\CurrentVersion\\\\Uninstall' -ErrorAction 'Stop' -ErrorVariable '+ErrorUninstallKeyPath'\"", shell=True, stderr=subprocess.STDOUT)
Answer from r3v3r53 on Stack OverflowFollowing this post instructions:How to make Unicode charset in cmd.exe by default?
Its possible to bypass this encoding problem
import subprocess
output = subprocess.check_output("chcp 65001 | powershell \"Get-ChildItem -LiteralPath 'HKLM:SOFTWARE\\\\Microsoft\\\\Windows\\\\CurrentVersion\\\\Uninstall' -ErrorAction 'Stop' -ErrorVariable '+ErrorUninstallKeyPath'\"", shell=True, stderr=subprocess.STDOUT)
The output is a of type bytes so you need to either decode it to a string with .decode('utf-8') (or with whatever codec you want), or use str(), Example:
import subprocess
output_bytes = subprocess.check_output("powershell \"Get-ChildItem -LiteralPath 'HKLM:SOFTWARE\\\\Microsoft\\\\Windows\\\\CurrentVersion\\\\Uninstall' -ErrorAction 'Stop' -ErrorVariable '+ErrorUninstallKeyPath'\"", shell=True, stderr=subprocess.STDOUT)
output_string = str(output_bytes)
# alternatively
# output_string = output_bytes.decode('utf-8')
# there are lots of \r\n in the output I encounterd, so you can split
# to get a list
output_list = output_string.split(r'\r\n')
# once you have a list, you can loop thru and print (or whatever you want)
for e in output_list:
print(e)
The key here is to decode to whatever codec you want to use in order to produce the correct character when printing.
universal_newlines=True enables text mode. Combined with stdout=PIPE, it forces decoding of the child process' output using locale.getpreferredencoding(False) that is not utf-8 on Windows. That is why you see UnicodeDecodeError.
To read the subprocess' output using utf-8 encoding, drop universal_newlines=True:
#!/usr/bin/env python3
from subprocess import Popen, PIPE
with Popen(r'C:\path\to\program.exe "arg 1" "arg 2"',
stdout=PIPE, stderr=PIPE) as p:
output, errors = p.communicate()
lines = output.decode('utf-8').splitlines()
str.encode("utf-8") is equivalent to "utf-8".encode(). There is no point to pass it to .communicate() unless you set stdin=PIPE and the child process expects b'utf-8' bytestring as an input.
str.encode(encoding="utf-8", errors="ignore) has the form klass.method(**kwargs). .encode() method expects self (a string object) that is why you see TypeError.
>>> str.encode("abc", encoding="utf-8", errors="ignore") #XXX don't do it
b'abc'
>>> "abc".encode(encoding="utf-8", errors="ignore")
b'abc'
Do not use klass.method(obj) instead of obj.method() without a good reason.
You are not supposed to call .encode() on the class itself. What you probably want to do is something like
p1.communicate("FOOBAR".encode("utf-8"))
The error message you're getting means that the encode() function has nothing to encode, since you called it on the class, rather than on an instance (that would then be passed as the self parameter to encode()).
(Answering own question hoping it could be helpfull to others)
I made a short test program. This is what I have found:
- File system encoding is the key point.
- Monkey patching does not work. Well, that's OK. It is not acceptable as a solution anyway.
LANG=C.UTF-8requires the locale installed and it was not on my system (checked withlocale -a). But on a second system where it was available, it worked.I can make the encoding explicitly and pass bytes as one of the args:
cmdresult = sub.run( [SCRIPT, tid, days, name.encode('utf-8')], ...
This works, but one question remianed:
Does it comply with the docs?
All I could find is:
args should be a sequence of program arguments or else a single string
And I did understand it as one string or a list of strings, but actually it does not specify a list of what types. I passed also and int to see what will happen. I got this error:
expected str, bytes or os.PathLike object
So my solution seems to be fine.
In the context of mod_wsgi, you should ensure you are using mod_wsgi daemon mode and set the lang/locale for the mod_wsgi daemon process group. For a much more detailed explanation which is too much to repeat here, see:
- http://blog.dscpl.com.au/2014/09/setting-lang-and-lcall-when-using.html
In Python 3.3+ you can separately indicate that you expect text in a particular encoding. The keyword argument universal_newlines=True was renamed in 3.7 to the more accurate and transparent text=True.
This keyword basically says "just use whatever encoding is default on my system" (so basically UTF-8 on anything reasonably modern except on Windows, where you get some Cthulhu atrocity from the abyss the system's default code page).
In the absence of this keyword, subprocesses receive and return bytes in Python 3.
Of course, if you know the encoding, you can also separately .decode() the bytes you get back.
If you know the encoding it's probably useful to use the encoding= keyword argument (even if you assume it is also the system encoding; this was added in Python 3.6).
response = subprocess.check_output([...], text=True)
response = subprocess.check_output([...], encoding='utf-8')
response = subprocess.check_output([...]).decode('utf-8')
The trick to get the script running, is to encode the arguments to 'utf8' and then to decode them to 'ansi'.
command = r'C:\PROGRAM FILES\Application\bin\cfg.exe'
argument = ["-modify", "-location:123á456ß99"]
argument_ansi = []
for x in argument:
argument_ansi.append(x.encode('utf-8').decode('ansi', 'replace'))
cmd = [command]
cmd.extend(argument_ansi)
result = subprocess.check_output(cmd, shell=False, encoding="utf-8", universal_newlines=True)