First find the encoding of the string and then decode it... to do this you will need to make a byte string by adding the letter 'b' to the front of the original string.
Try this:
import chardet
s = "Aur\xc3\xa9lien"
bs = b"Aur\xc3\xa9lien"
encoding = chardet.detect(bs)["encoding"]
str = s.encode(encoding).decode("utf-8")
print(str)
If you are reading the text from a file you can detect the encoding using the magic lib, see here: https://stackoverflow.com/a/16203777/1544937
Videos
First find the encoding of the string and then decode it... to do this you will need to make a byte string by adding the letter 'b' to the front of the original string.
Try this:
import chardet
s = "Aur\xc3\xa9lien"
bs = b"Aur\xc3\xa9lien"
encoding = chardet.detect(bs)["encoding"]
str = s.encode(encoding).decode("utf-8")
print(str)
If you are reading the text from a file you can detect the encoding using the magic lib, see here: https://stackoverflow.com/a/16203777/1544937
You have UTF-8 decoded as latin-1, so the solution is to encode as latin-1 then decode as UTF-8.
s = "Aur\xc3\xa9lien"
s.encode('latin-1').decode('utf-8')
print(s.encode('latin-1').decode('utf-8'))
Output
Aurélien
Neither is better than the other, they do exactly the same thing. However, using .encode() and .decode() is the more common way to do it. It is also compatible with Python 2.
To add to Lennart Regebro's answer There is even the third way that can be used:
encoded3 = str.encode(original, 'utf-8')
print(encoded3)
Anyway, it is actually exactly the same as the first approach. It may also look that the second way is a syntactic sugar for the third approach.
A programming language is a means to express abstract ideas formally, to be executed by the machine. A programming language is considered good if it contains constructs that one needs. Python is a hybrid language -- i.e. more natural and more versatile than pure OO or pure procedural languages. Sometimes functions are more appropriate than the object methods, sometimes the reverse is true. It depends on mental picture of the solved problem.
Anyway, the feature mentioned in the question is probably a by-product of the language implementation/design. In my opinion, this is a nice example that show the alternative thinking about technically the same thing.
In other words, calling an object method means thinking in terms "let the object gives me the wanted result". Calling a function as the alternative means "let the outer code processes the passed argument and extracts the wanted value".
The first approach emphasizes the ability of the object to do the task on its own, the second approach emphasizes the ability of an separate algoritm to extract the data. Sometimes, the separate code may be that much special that it is not wise to add it as a general method to the class of the object.
I am getting a byte base64 encoded data and I have to decode it but it is showing an error on the last line.
Error - UnicodeDecodeError: 'ascii' codec can't decode byte 0x82 in position 1: ordinal not in range(128)
base64_bytes = base64_message.encode('ascii')
print(base64_bytes)
#decode
message_bytes = base64.b64decode(base64_bytes)
message = message_bytes.decode('ascii')