This question is old, but it looks like nobody has answered this sufficiently.
Simply:
obj.getbuffer()creates amemoryviewobject.- Every time you write, or if there is a
memoryviewofobjpresent,obj.getvalue()will need to create a new, complete value. - If you have not written (since creation or since the last
obj.getvalue()call) and there is nomemoryviewpresent,obj.getvalue()is the fastest method of access, and requires no copies.
That being the case:
- When creating another
io.BytesIO, useobj.getvalue() - For random-access reading and writing, DEFINITELY use
obj.getbuffer() - Avoid interpolating reading and writing frequently. If you must, then DEFINITELY use
obj.getbuffer(), unless your file is tiny. - Avoid using
obj.getvalue()while a buffer is laying around.
Here, we see that it's all fast, and all well and good if no buffer is laying around:
# time getvalue()
>>> i = io.BytesIO(b'f' * 1_000_000)
>>> %timeit i.getvalue()
34.6 ns ± 0.178 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# time getbuffer()
>>> %timeit i.getbuffer()
118 ns ± 0.495 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# time getbuffer() and getvalue() together
>>> %timeit i.getbuffer(); i.getvalue()
173 ns ± 0.829 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Everything is fine, and working about like you'd expect. But let's see what happens when there's a buffer just laying around:
>>> x = i.getbuffer()
>>> %timeit i.getvalue()
33 µs ± 675 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Notice that we're no longer measuring in nanoseconds, we're measuring in microseconds. That's multiple orders of magnitude slower. If you del x, we're back to being fast. This is all because while a memoryview exists, Python has to account for the possibility that the BytesIO may have been written to. So, to give a definite state to the user, it copies the buffer.
No, this isn't a bug. This is normal behaviour. See this answer: the bytes type in python 2.7 and PEP-358
It basically comes down that the 2.7 bytes is just an alias for str to smoothen the transition to 3.x.
bytes doesn't exist as a separate kind of datastructure in Python 2.X so yes, it is entirely normal - str are bytestrings in Python 2 (unlike Python 3, where str are unicode strings).
The issue is that you are positioned at the end of the stream. Think of the position like a cursor. Once you have written b' world', your cursor is at the end of the stream. When you try to .read(), you are reading everything after the position of the cursor - which is nothing, so you get the empty bytestring.
To navigate around the stream you can use the .seek method:
>>> import io
>>> in_memory = io.BytesIO(b'hello', )
>>> in_memory.write(b' world')
>>> in_memory.seek(0) # go to the start of the stream
>>> print(in_memory.read())
b' world'
Note that, just like a filestream in write ('w') mode, the initial bytes b'hello' have been overwritten by your writing of b' world'.
.getvalue() just returns the entire contents of the stream regardless of current position.
this is a memory stream but still a stream. The position is stored, so like any other stream if you try to read after having written, you have to re-position:
import io
in_memory = io.BytesIO(b'hello')
in_memory.seek(0,2) # seek to end, else we overwrite
in_memory.write(b' world')
in_memory.seek(0) # seek to start
print( in_memory.read() )
prints:
b'hello world'
while in_memory.getvalue() doesn't need the final seek(0) as it returns the contents of the stream from position 0.