bytes doesn't support item deletion because it's immutable. To "modify" strings and string-like objects you need to take a copy, so to remove olddata[start:end] do:
newdata = olddata[:start] + olddata[end:]
Of course that's a fair amount of copying, not all of which is necessary, so you might prefer to rework your code a bit for performance. You could use bytearray (which is mutable). Or perhaps you could find a way to work through the buffer (using an index or iterating over its elements), instead of needing to shorten it after each step.
bytes doesn't support item deletion because it's immutable. To "modify" strings and string-like objects you need to take a copy, so to remove olddata[start:end] do:
newdata = olddata[:start] + olddata[end:]
Of course that's a fair amount of copying, not all of which is necessary, so you might prefer to rework your code a bit for performance. You could use bytearray (which is mutable). Or perhaps you could find a way to work through the buffer (using an index or iterating over its elements), instead of needing to shorten it after each step.
I think I found the proper way, just looking from another perspective:
self.data = self.data[Index:]
just copying what I need to itself again
python - How to remove some bytes from a byte string? - Stack Overflow
How to remove first 4 bytes from s string in python - Stack Overflow
string - Delete some specific content from byte in python 3 - Stack Overflow
Delete non-decodeable chars in string?
There are two issues here, one of which is the actual issue, the other is confusing you, but not an actual issue. Firstly:
Your string is a bytes object, ie a string of 8-bit bytes. Python 3 handles this differently from text, which is Unicode. Where do you get the string from? Since you want to treat it as text, you should probably convert it to a str-object, which is used to handle text. This is typically done with the .decode() function, ie:
somestring.decode('UTF-8')
Although calling str() also works:
str(somestring, 'UTF8')
(Note that your decoding might be something else than UTF8)
However, this is not your actual question. Your actual question is how to strip a bytes string. And the asnwer is that you do that the same way as you string a text-string:
somestring.strip()
There is no strip() builtin in either Python 2 or Python 3. There is a strip-function in the string module in Python 2:
from string import strip
But it hasn't been good practice to use that since strings got a strip() method, which is like ten years or so now. So in Python 3 it is gone.
>>> b'foo '.strip()
b'foo'
Works just fine.
If what you're dealing with is text, though, you probably should just have an actual str object, not a bytes object.
>>> sByte = b'\x00\x81308 921 q53 246 133 137 022 1 0 1 1 1 130 C13 330 0000000199 04002201\n'
>>> sByte[2:]
b'308 921 q53 246 133 137 022 1 0 1 1 1 130 C13 330 0000000199 04002201\n'
See also https://appdividend.com/2022/07/09/python-slice-notation/
The code snippet returns sByte from and including the third byte until the end.
If you wanted to store the variable again you could do this:
>>> sByte = b'\x00\x81308 921 q53 246 133 137 022 1 0 1 1 1 130 C13 330 0000000199 04002201\n'
>>> sByte = sByte[2:]
>>> sByte
b'308 921 q53 246 133 137 022 1 0 1 1 1 130 C13 330 0000000199 04002201\n'
bytes.replace doesn't work in-place, it returns a modified copy of the bytes object. You can use sByte = sByte.replace(b'\x00\x81', b'') (or bytes.removeprefix if the bytes always occur at the start). Depending on your circumstances, you can also set the errors parameter of the decode method to 'ignore': sByte = sByte.decode(encoding='utf-8', errors='ignore').
You can use StringIO to read a string like a file
>>> import StringIO
>>> s = 'Hello, World!'
>>> sio = StringIO.StringIO(s)
>>> sio.read(6)
'Hello,'
>>> sio.read()
' World!'
I would also suggest you take a look at the struct module for help with parsing binary data
>>> from struct import *
>>> pack('hhl', 1, 2, 3)
'\x00\x01\x00\x02\x00\x00\x00\x03'
>>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
(1, 2, 3)
You define the format of the data using format strings, so 'hhl' in the above example is short (2 bytes), short (2 bytes), int (4 bytes). It also supports specifying endianness (byte order) in the format string.
For example if your header format was uint, 4 byte str, uint, uint, ushort, ulong:
>>> import struct
>>> data = ''.join(chr(i) for i in range(128)) * 10
>>> hdr_fmt = 'I4sIIHL'
>>> struct.calcsize(hdr_fmt)
32
>>> struct.unpack_from(hdr_fmt, data, 0)
(50462976, '\x04\x05\x06\x07', 185207048, 252579084, 4368, 2242261671028070680)
To split the packet into a 32 byte header and body:
header = packet[:32]
body = packet[32:]
To further split the body into one or more entries:
entries = [packet[i:i+90] for i in range(0, len(packet), 90)]
Use bytes.replace to replace the substring with an empty string:
b = b'Today, in the digital age, any type of data, such as text, images, and audio, can be\r\ndigitized, stored indefinitely, and transmitted at high speeds. Notwithstanding these\r\nadvantages, digital data also have a downside. They are easy to access illegally, tamper\r\nwith, and copy for purposes of copyright violation.\r\nThere is therefore a need to hide secret identification inside certain types of digital\r\ndata. This information can be used to prove copyright ownership, to identify attempts\r\nto tamper with sensitive data, and to embed annotations. Storing, hiding, or embedding\r\nsecret information in all types of digital data is one of the tasks of the field of\r\nsteganography.\r\nSteganography is the art and science of data hiding. In contrast with cryptography,\r\nwhich secures data by transforming it into another, unreadable format, steganography\r\nmakes data invisible by hiding (or embedding) them in another piece of data, known\r\nalternatively as the cover, the host, or the carrier. The modified cover, including the\r\nhidden data, is referred to as a stego object. It can be stored or transmitted as a message.\r\nWe can think of cryptography as overt secret writing and of steganography as covert\r\nsecret writing.\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
b = b.replace(b'\x00', b'')
assert b.endswith(b'writing.')
Bytes objects behave like many other iterables, which means slicing and indexing should work as expected. Since the character you want to remove is specifically at the end and the object supports the method, the solution is the same as in striping characters from the end of a string. Just make sure to pass the desired characters are bytes.
>>> my_bytes = b'blah\x00\x00\x00'
>>> my_bytes.rstrip(b'\x00')
b'blah'
Hello - i would like to delete all non-decodable chars from a string - and i tried it with the following code but his is not working -
s = "this ง, ญ, ณ, น, ม, ร, ล, ฬ is a text ���}��j)���.ߪs*i� ��zmj��q��p with something between"
line = bytes(s, 'utf-8').decode('utf-8', 'ignore')
print(s)
print(line)This ���}� chars are read from a pdf / doc / exe file or something like that and i would like to delete this information from ths string but keep everything else in the string (so the english, but also the thai-chars).
How can i do this and clean the string?
If you are dealing with a zero-padded buffer then you can use rstrip to remove trailing \x00s
>>> text = 'Hello\x00\x00\x00\x00'
>>> text.rstrip('\x00')
'Hello'
It removes all \x00 characters at the end of the string but keeps any nulls in the middle. Not suitable for null-terminated strings that may contain random data after the terminator.
If you are dealing with a null-terminated string where the first zero indicates the end of string, but there might be other characters following it, you should use anregen's solution.
>>> text = 'Hello\x00\x24\x4e\x32'
>>> text.split('\x00', 1)[0]
'Hello'
It splits the text at the first zero and returns the slice. It works with strings having no null character too.
EDIT:
Explained rstrip in more detail and provided a correct use case.
Included alternative solution.
>>> a = 'Hello\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
>>> a.replace('\x00','')
'Hello'
You can use bytes.decode function if you really need to "get rid of b": http://docs.python.org/3.3/library/stdtypes.html#bytes.decode
But it seems from your code that you do not really need to do this, you really need to work with bytes.
The b"..." is just a python notation of byte strings, it's not really there, it only gets printed. Does it cause some real problems to you?
input:
s = 'ਅ'
a = s.encode('ascii', 'backslashreplace')
print(a)
output:
b'\\u0a05'
how do I get rid of the b'\ ? i just want it to say \u0a05
You could use ord to extract each character's numeric value, then combine them with simple arithmetic.
Copy>>> a = '\x02'
>>> b = '\x00'
>>> c = ord(a)*256 + ord(b)
>>> c == 0x0200
True
>>> print hex(c)
0x200
An alternate way to do this for standard-length types is to use the struct module to convert from strings of bytes to Python types.
For example:
Copy>>> import struct
>>> byte_arr = ['\x02', '\x00']
>>> byte_str = ''.join(byte_arr)
>>> byte_str
'\x02\x00'
>>> num, = struct.unpack('>H', byte_str)
>>> num
512
In this example, the format string '>H' indicates a big-endian unsigned 2-byte integer. Other format strings can be used to specify other sizes, endianness, and signed/unsigned status.
Say I have a massive file that would not fit into memory, would it be possible to truncate the file by removing the last byte?