remove bytes from string python

How to remove a range of bytes from a bytes object in python?

stackoverflow.com › questions › 18563018 › how-to-remove-a-range-of-bytes-from-a-bytes-object-in-python

bytes doesn't support item deletion because it's immutable. To "modify" strings and string-like objects you need to take a copy, so to remove olddata[start:end] do:

newdata = olddata[:start] + olddata[end:]

Of course that's a fair amount of copying, not all of which is necessary, so you might prefer to rework your code a bit for performance. You could use bytearray (which is mutable). Or perhaps you could find a way to work through the buffer (using an index or iterating over its elements), instead of needing to shorten it after each step.

Answer from Steve Jessop on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 18563018 › how-to-remove-a-range-of-bytes-from-a-bytes-object-in-python

How to remove a range of bytes from a bytes object in python? - Stack Overflow

Top answer

1 of 3

11

bytes doesn't support item deletion because it's immutable. To "modify" strings and string-like objects you need to take a copy, so to remove olddata[start:end] do:

newdata = olddata[:start] + olddata[end:]

Of course that's a fair amount of copying, not all of which is necessary, so you might prefer to rework your code a bit for performance. You could use bytearray (which is mutable). Or perhaps you could find a way to work through the buffer (using an index or iterating over its elements), instead of needing to shorten it after each step.

2 of 3

5

I think I found the proper way, just looking from another perspective:

self.data = self.data[Index:]

just copying what I need to itself again

Python.org

discuss.python.org › python help

Strip byte string and take only importante values - Python Help - Discussions on Python.org

July 7, 2023 - Hello all…good day…please help on how to strip byte string as below: input : b'\x081F304984\x0843501' output : 1F304984 thanks a lot

Discussions

python - How to remove some bytes from a byte string? - Stack Overflow

I am trying to remove a byte (\x00\x81) from a byte string sByte. sByte = b'\x00\x81308 921 q53 246 133 137 022 1 0 1 1 1 130 C13 330 0000000199 04002201\n' I am expecting to have as a result the More on stackoverflow.com

stackoverflow.com

How to remove first 4 bytes from s string in python - Stack Overflow

I got a special packet in string format, which has 32 bytes header and the body contains one of more entries, each consist of 90 bytes. I want to process this string using python. Can I just read ... More on stackoverflow.com

stackoverflow.com

string - Delete some specific content from byte in python 3 - Stack Overflow

Since the character you want to remove is specifically at the end and the object supports the method, the solution is the same as in striping characters from the end of a string. Just make sure to pass the desired characters are bytes. >>> my_bytes = b'blah\x00\x00\x00' >>> my_bytes.rstrip(b'\x00') b'blah' ... In Python ... More on stackoverflow.com

stackoverflow.com

Delete non-decodeable chars in string?

There is no way to know if a byte read is actually supposed to be a character or just coincidentally looks like an encoded code point. Your s is a unicode (utf16 if I'm not mistaken) string because you defined it as a string. If you actually read a bytes object from file the situation is different. Still, so many byte sequences are correct utf8 encoded unicode characters that you are unlikely to filter much. Just reading a PDFs bytes and treating it as text is not reasonable. More on reddit.com

r/learnpython

9

1

March 7, 2024

Stack Overflow

stackoverflow.com › questions › 9560759 › python-3-how-to-make-strip-work-for-bytes

python 3: how to make strip() work for bytes - Stack Overflow

Top answer

1 of 4

20

There are two issues here, one of which is the actual issue, the other is confusing you, but not an actual issue. Firstly:

Your string is a bytes object, ie a string of 8-bit bytes. Python 3 handles this differently from text, which is Unicode. Where do you get the string from? Since you want to treat it as text, you should probably convert it to a str-object, which is used to handle text. This is typically done with the .decode() function, ie:

somestring.decode('UTF-8')

Although calling str() also works:

str(somestring, 'UTF8')

(Note that your decoding might be something else than UTF8)

However, this is not your actual question. Your actual question is how to strip a bytes string. And the asnwer is that you do that the same way as you string a text-string:

somestring.strip()

There is no strip() builtin in either Python 2 or Python 3. There is a strip-function in the string module in Python 2:

from string import strip

But it hasn't been good practice to use that since strings got a strip() method, which is like ten years or so now. So in Python 3 it is gone.

2 of 4

7

>>> b'foo '.strip()
b'foo'

Works just fine.

If what you're dealing with is text, though, you probably should just have an actual str object, not a bytes object.

Educative

educative.io › answers › what-is-the-bytes-removeprefix-method-in-python

What is the bytes removeprefix() method in Python?

New in Python 3.9! ... The bytes.removeprefix() method returns a bytes object, bytes[len(prefix):]. In the code snippets below, we are going to discuss whether the given object contains 'Test' bytes or not.

Stack Overflow

stackoverflow.com › questions › 73225995 › how-to-remove-some-bytes-from-a-byte-string

python - How to remove some bytes from a byte string? - Stack Overflow

Top answer

1 of 2

3

>>> sByte = b'\x00\x81308 921 q53 246 133 137 022 1   0 1 1  1 130 C13 330 0000000199 04002201\n'
>>> sByte[2:]
b'308 921 q53 246 133 137 022 1   0 1 1  1 130 C13 330 0000000199 04002201\n'

See also https://appdividend.com/2022/07/09/python-slice-notation/

The code snippet returns sByte from and including the third byte until the end.

If you wanted to store the variable again you could do this:

>>> sByte = b'\x00\x81308 921 q53 246 133 137 022 1   0 1 1  1 130 C13 330 0000000199 04002201\n'
>>> sByte = sByte[2:]
>>> sByte
b'308 921 q53 246 133 137 022 1   0 1 1  1 130 C13 330 0000000199 04002201\n'

2 of 2

2

bytes.replace doesn't work in-place, it returns a modified copy of the bytes object. You can use sByte = sByte.replace(b'\x00\x81', b'') (or bytes.removeprefix if the bytes always occur at the start). Depending on your circumstances, you can also set the errors parameter of the decode method to 'ignore': sByte = sByte.decode(encoding='utf-8', errors='ignore').

Stack Overflow

stackoverflow.com › questions › 23693594 › how-to-remove-first-4-bytes-from-s-string-in-python

How to remove first 4 bytes from s string in python - Stack Overflow

Top answer

1 of 3

7

You can use StringIO to read a string like a file

>>> import StringIO
>>> s = 'Hello, World!'
>>> sio = StringIO.StringIO(s)
>>> sio.read(6)
'Hello,'
>>> sio.read()
' World!'

I would also suggest you take a look at the struct module for help with parsing binary data

>>> from struct import *
>>> pack('hhl', 1, 2, 3)
'\x00\x01\x00\x02\x00\x00\x00\x03'
>>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
(1, 2, 3)

You define the format of the data using format strings, so 'hhl' in the above example is short (2 bytes), short (2 bytes), int (4 bytes). It also supports specifying endianness (byte order) in the format string.

For example if your header format was uint, 4 byte str, uint, uint, ushort, ulong:

>>> import struct
>>> data = ''.join(chr(i) for i in range(128)) * 10
>>> hdr_fmt = 'I4sIIHL'
>>> struct.calcsize(hdr_fmt)
32
>>> struct.unpack_from(hdr_fmt, data, 0)
(50462976, '\x04\x05\x06\x07', 185207048, 252579084, 4368, 2242261671028070680)

2 of 3

5

To split the packet into a 32 byte header and body:

header = packet[:32]
body = packet[32:]

To further split the body into one or more entries:

entries = [packet[i:i+90] for i in range(0, len(packet), 90)]

Stack Overflow

stackoverflow.com › questions › 51745600 › delete-some-specific-content-from-byte-in-python-3

string - Delete some specific content from byte in python 3 - Stack Overflow

Top answer

1 of 2

3

Use bytes.replace to replace the substring with an empty string:

b = b'Today, in the digital age, any type of data, such as text, images, and audio, can be\r\ndigitized, stored indefinitely, and transmitted at high speeds. Notwithstanding these\r\nadvantages, digital data also have a downside. They are easy to access illegally, tamper\r\nwith, and copy for purposes of copyright violation.\r\nThere is therefore a need to hide secret identification inside certain types of digital\r\ndata. This information can be used to prove copyright ownership, to identify attempts\r\nto tamper with sensitive data, and to embed annotations. Storing, hiding, or embedding\r\nsecret information in all types of digital data is one of the tasks of the field of\r\nsteganography.\r\nSteganography is the art and science of data hiding. In contrast with cryptography,\r\nwhich secures data by transforming it into another, unreadable format, steganography\r\nmakes data invisible by hiding (or embedding) them in another piece of data, known\r\nalternatively as the cover, the host, or the carrier. The modified cover, including the\r\nhidden data, is referred to as a stego object. It can be stored or transmitted as a message.\r\nWe can think of cryptography as overt secret writing and of steganography as covert\r\nsecret writing.\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

b = b.replace(b'\x00', b'')
assert b.endswith(b'writing.')

2 of 2

2

Bytes objects behave like many other iterables, which means slicing and indexing should work as expected. Since the character you want to remove is specifically at the end and the object supports the method, the solution is the same as in striping characters from the end of a string. Just make sure to pass the desired characters are bytes.

>>> my_bytes = b'blah\x00\x00\x00'
>>> my_bytes.rstrip(b'\x00')
b'blah'

Educative

educative.io › answers › what-is-the-bytes-removesuffix-method-in-python

What is the bytes removesuffix() method in Python?

The bytes.removesuffix() method returns the bytes[:-len(suffix)] if the binary data ends with the suffix string and is not empty. Otherwise, the original binary data is returned. bytes[:-len(suffix)] means it will return bytes data from

Find elsewhere

Google Bing Mojeek

reddit.com › r/learnpython › delete non-decodeable chars in string?

r/learnpython on Reddit: Delete non-decodeable chars in string?

March 7, 2024 -

Hello - i would like to delete all non-decodable chars from a string - and i tried it with the following code but his is not working -

s = "this  ง, ญ, ณ, น, ม, ร, ล, ฬ is a text ���}��j)���.ߪs*i� ��zmj��q��p with something between"
line = bytes(s, 'utf-8').decode('utf-8', 'ignore')
print(s)
print(line)

This ��}� chars are read from a pdf / doc / exe file or something like that and i would like to delete this information from ths string but keep everything else in the string (so the english, but also the thai-chars).

How can i do this and clean the string?

Top answer

1 of 4

2

There is no way to know if a byte read is actually supposed to be a character or just coincidentally looks like an encoded code point. Your s is a unicode (utf16 if I'm not mistaken) string because you defined it as a string. If you actually read a bytes object from file the situation is different. Still, so many byte sequences are correct utf8 encoded unicode characters that you are unlikely to filter much. Just reading a PDFs bytes and treating it as text is not reasonable.

2 of 4

2

If the idea is to filter out all non-Latin characters - encode s as latin-1: s.encode('latin-1', 'ignore').decode('utf-8') Unicode is a superset of Latin and contains lots of perfectly decodeable, but potentially unprintable symbols.

Python Forum

python-forum.io › thread-38829.html

extract only text strip byte array

November 29, 2022 - first of, awesome forum. you guys have been super helpful to a noob.. without guilting me into 'reading the manual' this is how i learn, and glad you are helping me learn.... i have various byte strings with text in them.. the bytes are alway...

Stack Overflow

stackoverflow.com › questions › 37016946 › remove-b-character-do-in-front-of-a-string-literal-in-python-3

Remove 'b' character do in front of a string literal in Python 3 - Stack Overflow

Top answer

1 of 3

282

This should do the trick:

pw_bytes.decode("utf-8")

2 of 3

36

Here u Go

f = open('test.txt','rb+')
ch=f.read(1)
ch=str(ch,'utf-8')
print(ch)

Stack Overflow

stackoverflow.com › questions › 38883476 › how-to-remove-those-x00-x00 › 38883536

python - How to remove those "\x00\x00" - Stack Overflow

Top answer

1 of 7

79

If you are dealing with a zero-padded buffer then you can use rstrip to remove trailing \x00s

>>> text = 'Hello\x00\x00\x00\x00'
>>> text.rstrip('\x00')
'Hello'

It removes all \x00 characters at the end of the string but keeps any nulls in the middle. Not suitable for null-terminated strings that may contain random data after the terminator.

If you are dealing with a null-terminated string where the first zero indicates the end of string, but there might be other characters following it, you should use anregen's solution.

>>> text = 'Hello\x00\x24\x4e\x32'
>>> text.split('\x00', 1)[0]
'Hello'

It splits the text at the first zero and returns the slice. It works with strings having no null character too.

EDIT:
Explained rstrip in more detail and provided a correct use case.
Included alternative solution.

2 of 7

71

>>> a = 'Hello\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' 
>>> a.replace('\x00','')
'Hello'

Stack Overflow

stackoverflow.com › questions › 17013089 › python-get-rid-of-bytes-b › 17013127

Python get rid of bytes b' ' - Stack Overflow

Top answer

1 of 6

13

You can use bytes.decode function if you really need to "get rid of b": http://docs.python.org/3.3/library/stdtypes.html#bytes.decode

But it seems from your code that you do not really need to do this, you really need to work with bytes.

2 of 6

4

The b"..." is just a python notation of byte strings, it's not really there, it only gets printed. Does it cause some real problems to you?

Bobby Hadz

bobbyhadz.com › blog › python-remove-b-prefix-from-string

How to remove the 'b' prefix from a String in Python | bobbyhadz

Use the `bytes.decode()` method to remove the `b` prefix from a bytes object by converting it to a string.

reddit.com › r/learnpython › how do i get rid of the 'b'

r/learnpython on Reddit: How do I get rid of the 'b'

January 21, 2024 -

input:

s = 'ਅ'

a = s.encode('ascii', 'backslashreplace')

print(a)

output:

b'\\u0a05'

how do I get rid of the b'\ ? i just want it to say \u0a05

Top answer

1 of 6

22

print(a.decode('utf-8'))

2 of 6

16

You don't. That's a byte string. Alternatively you can convert it back to utf8 as suggested.

Stack Overflow

stackoverflow.com › questions › 53690583 › python-remove-stray-bytes-from-string

python: remove stray bytes from string - Stack Overflow

Top answer

1 of 1

2

You can use regex:

import re

s = '"trackingId":"f<0x85>9\u0004+L<0x9b><0x91>\u001A<0x87>&\u0013i+T"},{"pendingInvitation":false'
print(s)
print(re.sub(r'<0x\w{2}>', '',s))

with output:

"trackingId":"f<0x85>9+L<0x9b><0x91><0x87>&i+T"},{"pendingInvitation":false
"trackingId":"f9+L&i+T"},{"pendingInvitation":false

I have searched for the patten <0x__>, where the __ is any char or digit of length 2.

Stack Overflow

stackoverflow.com › questions › 28862954 › remove-x-from-bytes

python - Remove '\x' from bytes - Stack Overflow

Top answer

1 of 5

3

You could use ord to extract each character's numeric value, then combine them with simple arithmetic.

Copy>>> a = '\x02'
>>> b = '\x00'
>>> c = ord(a)*256 + ord(b)
>>> c == 0x0200
True
>>> print hex(c)
0x200

2 of 5

3

An alternate way to do this for standard-length types is to use the struct module to convert from strings of bytes to Python types.

For example:

Copy>>> import struct
>>> byte_arr = ['\x02', '\x00']
>>> byte_str = ''.join(byte_arr)
>>> byte_str
'\x02\x00'
>>> num, = struct.unpack('>H', byte_str)
>>> num
512

In this example, the format string '>H' indicates a big-endian unsigned 2-byte integer. Other format strings can be used to specify other sizes, endianness, and signed/unsigned status.

Python documentation

docs.python.org › 3 › library › stdtypes.html

Built-in Types — Python 3.14.4 documentation

The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped. ... The outermost leading and trailing chars argument values are stripped from the string.

reddit.com › r/learnpython › is it possible to remove the last character (byte) of a file without opening it?

r/learnpython on Reddit: Is it possible to remove the last character (byte) of a file without opening it?

May 12, 2023 -

Say I have a massive file that would not fit into memory, would it be possible to truncate the file by removing the last byte?

Top answer

1 of 3

6

Without opening, no (as far as I know). Without reading it all into ram, yes. Look into the truncate method. I haven't done this before, so test it on a dummy file, but you can probably open it in r+ mode, seek to the end - 1, and truncate.

2 of 3

2

something like this might work with open("my_file.txt", "r+b") as f: # Move the file pointer to the end of the file. f.seek(0, 2) # Truncate the file to the previous byte. f.truncate(f.tell() - 1)