bytes doesn't support item deletion because it's immutable. To "modify" strings and string-like objects you need to take a copy, so to remove olddata[start:end] do:
newdata = olddata[:start] + olddata[end:]
Of course that's a fair amount of copying, not all of which is necessary, so you might prefer to rework your code a bit for performance. You could use bytearray (which is mutable). Or perhaps you could find a way to work through the buffer (using an index or iterating over its elements), instead of needing to shorten it after each step.
bytes doesn't support item deletion because it's immutable. To "modify" strings and string-like objects you need to take a copy, so to remove olddata[start:end] do:
newdata = olddata[:start] + olddata[end:]
Of course that's a fair amount of copying, not all of which is necessary, so you might prefer to rework your code a bit for performance. You could use bytearray (which is mutable). Or perhaps you could find a way to work through the buffer (using an index or iterating over its elements), instead of needing to shorten it after each step.
I think I found the proper way, just looking from another perspective:
self.data = self.data[Index:]
just copying what I need to itself again
There are two issues here, one of which is the actual issue, the other is confusing you, but not an actual issue. Firstly:
Your string is a bytes object, ie a string of 8-bit bytes. Python 3 handles this differently from text, which is Unicode. Where do you get the string from? Since you want to treat it as text, you should probably convert it to a str-object, which is used to handle text. This is typically done with the .decode() function, ie:
somestring.decode('UTF-8')
Although calling str() also works:
str(somestring, 'UTF8')
(Note that your decoding might be something else than UTF8)
However, this is not your actual question. Your actual question is how to strip a bytes string. And the asnwer is that you do that the same way as you string a text-string:
somestring.strip()
There is no strip() builtin in either Python 2 or Python 3. There is a strip-function in the string module in Python 2:
from string import strip
But it hasn't been good practice to use that since strings got a strip() method, which is like ten years or so now. So in Python 3 it is gone.
>>> b'foo '.strip()
b'foo'
Works just fine.
If what you're dealing with is text, though, you probably should just have an actual str object, not a bytes object.
Use bytes.replace to replace the substring with an empty string:
b = b'Today, in the digital age, any type of data, such as text, images, and audio, can be\r\ndigitized, stored indefinitely, and transmitted at high speeds. Notwithstanding these\r\nadvantages, digital data also have a downside. They are easy to access illegally, tamper\r\nwith, and copy for purposes of copyright violation.\r\nThere is therefore a need to hide secret identification inside certain types of digital\r\ndata. This information can be used to prove copyright ownership, to identify attempts\r\nto tamper with sensitive data, and to embed annotations. Storing, hiding, or embedding\r\nsecret information in all types of digital data is one of the tasks of the field of\r\nsteganography.\r\nSteganography is the art and science of data hiding. In contrast with cryptography,\r\nwhich secures data by transforming it into another, unreadable format, steganography\r\nmakes data invisible by hiding (or embedding) them in another piece of data, known\r\nalternatively as the cover, the host, or the carrier. The modified cover, including the\r\nhidden data, is referred to as a stego object. It can be stored or transmitted as a message.\r\nWe can think of cryptography as overt secret writing and of steganography as covert\r\nsecret writing.\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
b = b.replace(b'\x00', b'')
assert b.endswith(b'writing.')
Bytes objects behave like many other iterables, which means slicing and indexing should work as expected. Since the character you want to remove is specifically at the end and the object supports the method, the solution is the same as in striping characters from the end of a string. Just make sure to pass the desired characters are bytes.
>>> my_bytes = b'blah\x00\x00\x00'
>>> my_bytes.rstrip(b'\x00')
b'blah'
>>> sByte = b'\x00\x81308 921 q53 246 133 137 022 1 0 1 1 1 130 C13 330 0000000199 04002201\n'
>>> sByte[2:]
b'308 921 q53 246 133 137 022 1 0 1 1 1 130 C13 330 0000000199 04002201\n'
See also https://appdividend.com/2022/07/09/python-slice-notation/
The code snippet returns sByte from and including the third byte until the end.
If you wanted to store the variable again you could do this:
>>> sByte = b'\x00\x81308 921 q53 246 133 137 022 1 0 1 1 1 130 C13 330 0000000199 04002201\n'
>>> sByte = sByte[2:]
>>> sByte
b'308 921 q53 246 133 137 022 1 0 1 1 1 130 C13 330 0000000199 04002201\n'
bytes.replace doesn't work in-place, it returns a modified copy of the bytes object. You can use sByte = sByte.replace(b'\x00\x81', b'') (or bytes.removeprefix if the bytes always occur at the start). Depending on your circumstances, you can also set the errors parameter of the decode method to 'ignore': sByte = sByte.decode(encoding='utf-8', errors='ignore').
You can use StringIO to read a string like a file
>>> import StringIO
>>> s = 'Hello, World!'
>>> sio = StringIO.StringIO(s)
>>> sio.read(6)
'Hello,'
>>> sio.read()
' World!'
I would also suggest you take a look at the struct module for help with parsing binary data
>>> from struct import *
>>> pack('hhl', 1, 2, 3)
'\x00\x01\x00\x02\x00\x00\x00\x03'
>>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
(1, 2, 3)
You define the format of the data using format strings, so 'hhl' in the above example is short (2 bytes), short (2 bytes), int (4 bytes). It also supports specifying endianness (byte order) in the format string.
For example if your header format was uint, 4 byte str, uint, uint, ushort, ulong:
>>> import struct
>>> data = ''.join(chr(i) for i in range(128)) * 10
>>> hdr_fmt = 'I4sIIHL'
>>> struct.calcsize(hdr_fmt)
32
>>> struct.unpack_from(hdr_fmt, data, 0)
(50462976, '\x04\x05\x06\x07', 185207048, 252579084, 4368, 2242261671028070680)
To split the packet into a 32 byte header and body:
header = packet[:32]
body = packet[32:]
To further split the body into one or more entries:
entries = [packet[i:i+90] for i in range(0, len(packet), 90)]
If you are dealing with a zero-padded buffer then you can use rstrip to remove trailing \x00s
>>> text = 'Hello\x00\x00\x00\x00'
>>> text.rstrip('\x00')
'Hello'
It removes all \x00 characters at the end of the string but keeps any nulls in the middle. Not suitable for null-terminated strings that may contain random data after the terminator.
If you are dealing with a null-terminated string where the first zero indicates the end of string, but there might be other characters following it, you should use anregen's solution.
>>> text = 'Hello\x00\x24\x4e\x32'
>>> text.split('\x00', 1)[0]
'Hello'
It splits the text at the first zero and returns the slice. It works with strings having no null character too.
EDIT:
Explained rstrip in more detail and provided a correct use case.
Included alternative solution.
>>> a = 'Hello\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
>>> a.replace('\x00','')
'Hello'
According to the Python docs, the b prefix means that your string is a byte string. Specifically:
A prefix of 'b' or 'B' is ignored in Python 2; it indicates that the literal should become a bytes literal in Python 3 (e.g. when code is automatically converted with 2to3). A 'u' or 'b' prefix may be followed by an 'r' prefix.
To convert this to a string without trailing newline and return, and to remove the byte prefix, you would use:
str(b'helloworld\r\n').rstrip('\r\n')
Try this:
b'helloworld\r\n'.strip() // leading + trailing
or
b'helloworld\r\n'.rstrip() // trailing only
Usually you'd use a filtered version of the object, for example:
In [63]: test
Out[63]: 'hello\x00world'
In [68]: for my_bytes in filter(lambda x: x != b'\x00', test):
....: print(my_bytes)
....:
h
e
l
l
o
w
o
r
l
d
Note I used my_bytes instead of bytes, which is a built-in name you'd rather not overwrite.
Similar you can also simply construct a filtered bytes object for further processing:
In [62]: test = b'hello\x00world'
In [63]: test
Out[63]: 'hello\x00world'
In [64]: test_without_nulls = bytes(filter(lambda x: x != b'\x00', test))
In [65]: test_without_nulls
Out[65]: 'helloworld'
I usually use bytes objects as it does not share the interface with strings in python 3. Certainly not byte arrays.
You can use a membership test using in:
>>> b'\x00' in bytes([1, 2, 3])
False
>>> b'\x00' in bytes([0, 1, 2, 3])
True
Here b'\x00' produces a bytes object with a single NULL byte (as opposed to b'00' which produces an object of length 2 with two bytes with integer values 48).
I call these things bytes objects, sometimes byte strings, but the latter usually in context of Python 2 only. A bytearray is a separate, distinct type (a mutable version of the bytes type).
You can use bytes.decode function if you really need to "get rid of b": http://docs.python.org/3.3/library/stdtypes.html#bytes.decode
But it seems from your code that you do not really need to do this, you really need to work with bytes.
The b"..." is just a python notation of byte strings, it's not really there, it only gets printed. Does it cause some real problems to you?
You could use ord to extract each character's numeric value, then combine them with simple arithmetic.
Copy>>> a = '\x02'
>>> b = '\x00'
>>> c = ord(a)*256 + ord(b)
>>> c == 0x0200
True
>>> print hex(c)
0x200
An alternate way to do this for standard-length types is to use the struct module to convert from strings of bytes to Python types.
For example:
Copy>>> import struct
>>> byte_arr = ['\x02', '\x00']
>>> byte_str = ''.join(byte_arr)
>>> byte_str
'\x02\x00'
>>> num, = struct.unpack('>H', byte_str)
>>> num
512
In this example, the format string '>H' indicates a big-endian unsigned 2-byte integer. Other format strings can be used to specify other sizes, endianness, and signed/unsigned status.
Try below code line instead of last two lines. Hope it helps:
line=line.decode('utf-8','ignore').encode("utf-8")
For python 3, as mentioned in a comment in this thread, you can do:
line = bytes(line, 'utf-8').decode('utf-8', 'ignore')
The 'ignore' parameter prevents an error from being raised if any characters are unable to be decoded.
If your line is already a bytes object (e.g. b'my string') then you just need to decode it with decode('utf-8', 'ignore').
Just use a bytearray:
>>> a = bytearray(b'abcdef')
>>> del a[1]
>>> a
bytearray(b'acdef')
It's almost like bytes but mutable:
The
bytearrayclass is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that thebytestype has, see Bytes and Bytearray Operations.
Using a bytearray as shown by @MSeifert above, you can extract the first n elements using slicing
>>> a = bytearray(b'abcdef')
>>> a[:3]
bytearray(b'abc')
>>> a = a[3:]
a
bytearray(b'def')
No, you cannot, as BytesIO is an in-memory version of a common file object.
As such it is treated as a sequence of bytes that can be overwritten or appended to, and just like a file removing elements from the front is not efficient as it requires a complete rewrite of all data following.
You probably want to look into the collections.deque() type instead.
i was looking for a way to clear the contents of BytesIO. after seeing martijn-pieters answer, i realized that this is not possible.
however, i decided to propose a solution (reconstruction of BytesIO):
import io
class BytesIO(io.BytesIO):
def delete(self):
self.close()
super().__init__(b'')
b = BytesIO()
b.write(b'milad')
b.delete()
print(b.getvalue()) # -> b''