For simplicity's sake, let's consider writing instead of reading for now.
So when you use open() like say:
with open("test.dat", "wb") as f:
f.write(b"Hello World")
f.write(b"Hello World")
f.write(b"Hello World")
After executing that a file called test.dat will be created, containing 3x Hello World. The data wont be kept in memory after it's written to the file (unless being kept by a name).
Now when you consider io.BytesIO() instead:
with io.BytesIO() as f:
f.write(b"Hello World")
f.write(b"Hello World")
f.write(b"Hello World")
Which instead of writing the contents to a file, it's written to an in memory buffer. In other words a chunk of RAM. Essentially writing the following would be the equivalent:
buffer = b""
buffer += b"Hello World"
buffer += b"Hello World"
buffer += b"Hello World"
In relation to the example with the with statement, then at the end there would also be a del buffer.
The key difference here is optimization and performance. io.BytesIO is able to do some optimizations that makes it faster than simply concatenating all the b"Hello World" one by one.
Just to prove it here's a small benchmark:
- Concat: 1.3529 seconds
- BytesIO: 0.0090 seconds
import io
import time
begin = time.time()
buffer = b""
for i in range(0, 50000):
buffer += b"Hello World"
end = time.time()
seconds = end - begin
print("Concat:", seconds)
begin = time.time()
buffer = io.BytesIO()
for i in range(0, 50000):
buffer.write(b"Hello World")
end = time.time()
seconds = end - begin
print("BytesIO:", seconds)
Besides the performance gain, using BytesIO instead of concatenating has the advantage that BytesIO can be used in place of a file object. So say you have a function that expects a file object to write to. Then you can give it that in-memory buffer instead of a file.
The difference is that open("myfile.jpg", "rb") simply loads and returns the contents of myfile.jpg; whereas, BytesIO again is just a buffer containing some data.
Since BytesIO is just a buffer - if you wanted to write the contents to a file later - you'd have to do:
buffer = io.BytesIO()
# ...
with open("test.dat", "wb") as f:
f.write(buffer.getvalue())
Also, you didn't mention a version; I'm using Python 3. Related to the examples: I'm using the with statement instead of calling f.close()
For simplicity's sake, let's consider writing instead of reading for now.
So when you use open() like say:
with open("test.dat", "wb") as f:
f.write(b"Hello World")
f.write(b"Hello World")
f.write(b"Hello World")
After executing that a file called test.dat will be created, containing 3x Hello World. The data wont be kept in memory after it's written to the file (unless being kept by a name).
Now when you consider io.BytesIO() instead:
with io.BytesIO() as f:
f.write(b"Hello World")
f.write(b"Hello World")
f.write(b"Hello World")
Which instead of writing the contents to a file, it's written to an in memory buffer. In other words a chunk of RAM. Essentially writing the following would be the equivalent:
buffer = b""
buffer += b"Hello World"
buffer += b"Hello World"
buffer += b"Hello World"
In relation to the example with the with statement, then at the end there would also be a del buffer.
The key difference here is optimization and performance. io.BytesIO is able to do some optimizations that makes it faster than simply concatenating all the b"Hello World" one by one.
Just to prove it here's a small benchmark:
- Concat: 1.3529 seconds
- BytesIO: 0.0090 seconds
import io
import time
begin = time.time()
buffer = b""
for i in range(0, 50000):
buffer += b"Hello World"
end = time.time()
seconds = end - begin
print("Concat:", seconds)
begin = time.time()
buffer = io.BytesIO()
for i in range(0, 50000):
buffer.write(b"Hello World")
end = time.time()
seconds = end - begin
print("BytesIO:", seconds)
Besides the performance gain, using BytesIO instead of concatenating has the advantage that BytesIO can be used in place of a file object. So say you have a function that expects a file object to write to. Then you can give it that in-memory buffer instead of a file.
The difference is that open("myfile.jpg", "rb") simply loads and returns the contents of myfile.jpg; whereas, BytesIO again is just a buffer containing some data.
Since BytesIO is just a buffer - if you wanted to write the contents to a file later - you'd have to do:
buffer = io.BytesIO()
# ...
with open("test.dat", "wb") as f:
f.write(buffer.getvalue())
Also, you didn't mention a version; I'm using Python 3. Related to the examples: I'm using the with statement instead of calling f.close()
Using open opens a file on your hard drive. Depending on what mode you use, you can read or write (or both) from the disk.
A BytesIO object isn't associated with any real file on the disk. It's just a chunk of memory that behaves like a file does. It has the same API as a file object returned from open (with mode r+b, allowing reading and writing of binary data).
BytesIO (and it's close sibling StringIO which is always in text mode) can be useful when you need to pass data to or from an API that expect to be given a file object, but where you'd prefer to pass the data directly. You can load your input data you have into the BytesIO before giving it to the library. After it returns, you can get any data the library wrote to the file from the BytesIO using the getvalue() method. (Usually you'd only need to do one of those, of course.)
Videos
Hello all,
I'm trying to wrap my head around the practical differences between:
<class 'bytes'> and <class '_io.BytesIO'>.
I read through the documentation:
https://docs.python.org/3/library/io.html?highlight=bytesio#binary-i-o
Binary I/O (also called buffered I/O) expects bytes-like objects and produces bytes objects. No encoding, decoding, or newline translation is performed. This category of streams can be used for all kinds of non-text data, and also when manual control over the handling of text data is desired.
It provides some examples:
The easiest way to create a binary stream is with open() with 'b' in the mode string:
and
f = io.BytesIO(b"some initial binary data: \x00\x01")
So I read all this, but so what? Why would you use the io.BytesIO data type over a standard bytes data type?
EDIT: Let me provide some additional context that I just discovered after reading the documentation on lxml.
https://lxml.de/parsing.html#parsing-html
I'm using the requests object and parsing the results with lxml. Here is the example code:
from io import BytesIO
from lxml import etree
#* etree - https://lxml.de/parsing.html
#? etree stands for element tree
import requests
#? Need to know concepts
#? What are bytes
#? HTTP status codes
#? HTTP methods (GET. POST, PUT, DELETE)
#? bytes - https://docs.python.org/3/library/stdtypes.html?highlight=bytes#bytes-objects
url = 'http://localhost'
#! The URL https://nostarch.com/ doesn't seem to work
resp = requests.get(url=url)
html_bytes = resp.content
parser = etree.HTMLParser()
content = etree.parse(BytesIO(html_bytes), parser=parser)
print(type(html_bytes))
print(type(BytesIO(html_bytes)))
for link in content.findall('//a'):
print(f"{link.get('href')} -> {link.text}")Kind regards