If you are dealing with a zero-padded buffer then you can use rstrip to remove trailing \x00s
>>> text = 'Hello\x00\x00\x00\x00'
>>> text.rstrip('\x00')
'Hello'
It removes all \x00 characters at the end of the string but keeps any nulls in the middle. Not suitable for null-terminated strings that may contain random data after the terminator.
If you are dealing with a null-terminated string where the first zero indicates the end of string, but there might be other characters following it, you should use anregen's solution.
>>> text = 'Hello\x00\x24\x4e\x32'
>>> text.split('\x00', 1)[0]
'Hello'
It splits the text at the first zero and returns the slice. It works with strings having no null character too.
EDIT:
Explained rstrip in more detail and provided a correct use case.
Included alternative solution.
If you are dealing with a zero-padded buffer then you can use rstrip to remove trailing \x00s
>>> text = 'Hello\x00\x00\x00\x00'
>>> text.rstrip('\x00')
'Hello'
It removes all \x00 characters at the end of the string but keeps any nulls in the middle. Not suitable for null-terminated strings that may contain random data after the terminator.
If you are dealing with a null-terminated string where the first zero indicates the end of string, but there might be other characters following it, you should use anregen's solution.
>>> text = 'Hello\x00\x24\x4e\x32'
>>> text.split('\x00', 1)[0]
'Hello'
It splits the text at the first zero and returns the slice. It works with strings having no null character too.
EDIT:
Explained rstrip in more detail and provided a correct use case.
Included alternative solution.
>>> a = 'Hello\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
>>> a.replace('\x00','')
'Hello'
use struct.unpack:
>>> import struct
>>> s = '\x00\x00\x00\x01\x00\x00\x00\xff\xff\x00\x00'
>>> struct.unpack('11B',s)
(0, 0, 0, 1, 0, 0, 0, 255, 255, 0, 0)
This gives you a tuple instead of a list, but I trust you can convert it if you need to.
You can use ord() in combination with map():
>>> s = '\x00\x00\x00\x01\x00\x00\x00\xff\xff\x00\x00'
>>> map(ord, s)
[0, 0, 0, 1, 0, 0, 0, 255, 255, 0, 0]
python converting hexadecimal binary to string - Stack Overflow
How to remove \\x00 from a string
How to delete "\x" from element in list
Can someone explain how the datatype beginning with "\x" work?
A byte string is automatically a list of numbers.
input_bytes = b"\x00\x01"
output_numbers = list(input_bytes)
Are you just looking for something like this?
for x in range(0,8):
(x).to_bytes(1, byteorder='big')
Output is:
b'\x00'
b'\x01'
b'\x02'
b'\x03'
b'\x04'
b'\x05'
b'\x06'
b'\x07'
Or the reverse:
byteslist = [b'\x00',
b'\x01',
b'\x02',
b'\x03',
b'\x04',
b'\x05',
b'\x06',
b'\x07']
for x in byteslist:
int.from_bytes(x,byteorder='big')
Output:
0
1
2
3
4
5
6
7
I am pretty much a noob in this field of Python. I know that this kind of data represents bytes but i don't know how exactly they work, can someone explain? Thanks in advance.
Something along the way is encoding your values as UTF-32. Simply decode them:
>>> b = u"c\x00\x00\x00o\x00\x00\x00n\x00\x00\x00t\x00\x00\x00e\x00\x00\x00\
... n\x00\x00\x00t\x00\x00\x00-\x00\x00\x00l\x00\x00\x00e\x00\x00\x00\
... n\x00\x00\x00g\x00\x00\x00t\x00\x00\x00h\x00\x00\x00"
>>> b.decode('utf-32')
u'content-length'
The root cause is that cStringIO.StringIO(unicode_object) produces a nonsense.
The current 2.X docs on docs.python.org say
Unlike the StringIO module, this module is not able to accept Unicode strings that cannot be encoded as plain ASCII strings.
This is unhelpful and incorrect; see below. The chm version of the docs supplied with the win32 installer for CPython 2.7.2 and 2.6.6 follow that with this sentence:
Calling StringIO() with a Unicode string parameter populates the object with the buffer representation of the Unicode string instead of encoding the string.
This is a correct description of the behaviour (see below). The behaviour is not brilliant. I can't imagine a good reason for that sentence being removed from the web docs.
Behaving badly:
Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
>>> import StringIO, cStringIO, sys
>>> StringIO.StringIO(u"fubar").getvalue()
u'fubar' <<=== unicode object
>>> cStringIO.StringIO(u"fubar").getvalue()
'f\x00u\x00b\x00a\x00r\x00' <<=== str object
cStringIO.StringIO(u"\u0405\u0406").getvalue()
'\x05\x04\x06\x04' <<=== "accepts"
>>> sys.maxunicode
65535 # your sender presumably emits 1114111 (wide unicode)
>>> sys.byteorder
'little'
So in general all one needs to do is know/guess the endianness and unicode-width of the sender's Python and decode the mess with UTF-(16|32)-(B|L)E.
In your case the sender is being rather Byzantine; for example u'content-length'.encode('utf-8') is the str object 'content-length' which bears a remarkable similarity to what you started with. Also foo.encode(utf8').decode('utf8') produces either foo or an exception.