s3objectinputstream to byte array

How to download s3 object directly into memory in java

stackoverflow.com › questions › 13179623 › how-to-download-s3-object-directly-into-memory-in-java

Use the AWS SDK for Java and Apache Commons IO as such:

//import org.apache.commons.io.IOUtils

AmazonS3 s3  = new AmazonS3Client(credentials);  // anonymous credentials are possible if this isn't your bucket
S3Object object = s3.getObject("bucket", "key"); 
byte[] byteArray = IOUtils.toByteArray(object.getObjectContent());

Not sure what you mean by "get it removed", but IOUtils will close the object's input stream when it's done converting it to a byte array. If you mean you want to delete the object from s3, that's as easy as:

s3.deleteObject("bucket", "key");

Answer from Zach Musgrave on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 13179623 › how-to-download-s3-object-directly-into-memory-in-java

amazon s3 - How to download s3 object directly into memory in java - Stack Overflow

Top answer

1 of 4

Use the AWS SDK for Java and Apache Commons IO as such:

//import org.apache.commons.io.IOUtils

AmazonS3 s3  = new AmazonS3Client(credentials);  // anonymous credentials are possible if this isn't your bucket
S3Object object = s3.getObject("bucket", "key"); 
byte[] byteArray = IOUtils.toByteArray(object.getObjectContent());

s3.deleteObject("bucket", "key");

2 of 4

As of AWS JAVA SDK 2 you can you use ReponseTransformer to convert the response to different types. (See javadoc).

Below is the example for getting the object as bytes

GetObjectRequest request = GetObjectRequest.builder().bucket(bucket).key(key).build()
ResponseBytes<GetObjectResponse> result = bytess3Client.getObject(request, ResponseTransformer.toBytes())

// to get the bytes
result.asByteArray()

Program Creek

programcreek.com › java-api-examples

com.amazonaws.services.s3.model.S3ObjectInputStream Java Exaples

... @Override public byte[] serialize(String topic, S3ObjectInputStream data) { InputStream is = data.getDelegateStream(); ByteArrayOutputStream buffer = new ByteArrayOutputStream(); int nRead; byte[] byteArray = new byte[16384]; try { while ((nRead = is.read(byteArray, 0, byteArray.length)) ...

Discussions

SDK repeatedly complaining "Not all bytes were read from the S3ObjectInputStream"

With a recent upgrade to the 1.11.134 SDK, tests seeking around a large CSV file is triggering a large set of repeated warnings about closing the stream early. 2017-06-27 15:47:05,121 [ScalaTest-ma... More on github.com

github.com

June 27, 2017

java - Stream dynamic image from Spring MVC endpoint without holding it in memory - Stack Overflow

I have a @Controller that returns an image to the front end, but currently it holds the whole thing in memory as a byte array before returning a ResponseEntity object. I would like to More on stackoverflow.com

stackoverflow.com

Simpler / Easier mechanism to read S3 Object content as a String (like v1 getObjectAsString)

The v1 SDK has a nice method for just grabbing the content of an S3 object as a String. It would be nice if the v2 SDK had this functionality. Expected Behavior String result = s3.getObjectAsString... More on github.com

github.com

November 30, 2017

java - Conversion from S3ObjectInputStream to AbstractStreamResource - Stack Overflow

Alternatively if that does not work, based on other questions here in StackOverflow, I assume you can convert S3ObjectInputStream to byte array. More on stackoverflow.com

stackoverflow.com

July 27, 2022

Java2s

java2s.com › example › java-api › com › amazonaws › util › ioutils › tobytearray-1-0.html

Example usage for com.amazonaws.util IOUtils toByteArray

@Override public byte[] download(final String key) throws IOException { GetObjectRequest getObjectRequest = new GetObjectRequest(bucket, key); S3Object s3Object = amazonS3Client.getObject(getObjectRequest); S3ObjectInputStream objectInputStream = s3Object.getObjectContent(); return IOUtils.toByteArray(objectInputStream); }

AWS

docs.aws.amazon.com › sdk-for-java › v1 › reference › com › amazonaws › services › s3 › model › S3ObjectInputStream.html

S3ObjectInputStream (AWS SDK for Java - 1.12.797)

By default Apache HttpClient tries to reuse http connections by reading to the end of an attached input stream on InputStream.close(). This is efficient from a socket pool management perspective, but for objects with large payloads can incur significant overhead while bytes are read from s3 and discarded.

Tabnine

tabnine.com › home page › code › java › com.amazonaws.services.s3.model.s3objectinputstream

com.amazonaws.services.s3.model.S3ObjectInputStream.read java code examples | Tabnine

public boolean readBufferFromFile() throws IOException { int n = s3ObjectInputStream.read( bb ); if ( n == -1 ) { return false; } else { // adjust the highest used position... // bufferSize = endBuffer + n; // Store the data in our byte array // for ( int i = 0; i < n; i++ ) { byteBuffer[endBuffer + i] = bb[i]; } return true; } }

GitHub

github.com › aws › aws-sdk-java › issues › 1211

SDK repeatedly complaining "Not all bytes were read from the S3ObjectInputStream" · Issue #1211 · aws/aws-sdk-java

June 27, 2017 - Request only the bytes you need via a ranged GET or drain the input stream after use. 2017-06-27 15:47:06,730 [ScalaTest-main-running-S3ACSVReadSuite] WARN internal.S3AbortableInputStream (S3AbortableInputStream.java:close(163)) - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection.

Author steveloughran

Java Tips

javatips.net › api › com.amazonaws.services.s3.model.s3objectinputstream

Java Examples for com.amazonaws.services.s3.model.S3ObjectInputStream

@Override public InputStream readFileData(FileData fileData) { String path = FilenameUtils.separatorsToUnix(FilenameUtils.normalize(extraPath + fileData.getPath() + "/" + fileData.getFileName())); path = StringUtils.stripStart(path, "/"); InputStream ret = null; S3ObjectInputStream objectContent = null; try { S3Object object = s3Client.getObject(bucketName, path); if (object != null) { ByteArrayOutputStream temp = new ByteArrayOutputStream(); objectContent = object.getObjectContent(); IOUtils.copy(objectContent, temp); ret = new ByteArrayInputStream(temp.toByteArray()); if (compress) { ret = new GZIPInputStream(ret); } } } catch (Exception e) { LOG.error("Error getting File: " + e, e); throw new RuntimeException(e); } finally { IOUtils.closeQuietly(objectContent); } return ret; }

Alexwlchan

alexwlchan.net › 2019 › streaming-large-s3-objects

Streaming large objects from S3 with ranged GET requests – alexwlchan

September 12, 2019 - import java.io.ByteArrayInputStream val underlying: util.Enumeration[InputStream] val bufferedEnumeration = new util.Enumeration[ByteArrayInputStream] { override def hasMoreElements: Boolean = underlying.hasMoreElements override def nextElement(): ByteArrayInputStream = { val nextStream = underlying.nextElement() val byteArray = IOUtils.toByteArray(nextStream) nextStream.close() new ByteArrayInputStream(byteArray) } } We can drop this enumeration into another SequenceInputStream, and get a single InputStream again – but this time the S3ObjectInputStream is read and closed almost immediately.

Find elsewhere

Google Bing Mojeek

Stack Overflow

stackoverflow.com › questions › 34193470 › stream-dynamic-image-from-spring-mvc-endpoint-without-holding-it-in-memory › 34193636

java - Stream dynamic image from Spring MVC endpoint without holding it in memory - Stack Overflow

Top answer

1 of 1

You are right on one thing. You should never completely load a file to memory. Files have sizes which may exceed the total RAM capacity (5 Gb, 10 Gb) thus it is not a reliable way to handle files. Using IOUtils.toByteArray() reads an InputStream and loads it to a single byte array which is not a good thing.

What you should do is read chunk by chunk and write whenever you finish reading a chunk. More like streaming. See code change below. Here for any given time for a image there will be only 8 * 2048 bytes or 2 Kb of RAM will be used. You can increase it if you want stream faster.

@RequestMapping(value = "/image/{id}/{quality}/{pageNumber}", method = RequestMethod.GET, produces = MediaType.IMAGE_PNG_VALUE)
public void retrieveImage(@PathVariable Long id, @PathVariable String quality, @PathVariable Integer pageNumber,
        HttpServletResponse response) throws Exception{

    //Open connection
    S3Object s3Object = imageService.getS3Object(quality, pageNumber, id)
    S3ObjectInputStream s3ObjectIS = s3Object.getObjectContent();

    byte[] data = new byte[2048];
    int read = 0;
    OutputStream out = response.getOutputStream();
    while((read = s3ObjectIS.read(data)) > 0) {
        out.write(data, 0, read);
        out.flush();
    }
    out.close();

    //Then close connection to amazon......
}

GitHub

github.com › aws › aws-sdk-java-v2 › issues › 306

Simpler / Easier mechanism to read S3 Object content as a String (like v1 getObjectAsString) · Issue #306 · aws/aws-sdk-java-v2

November 30, 2017 - The v1 SDK has a nice method for just grabbing the content of an S3 object as a String. It would be nice if the v2 SDK had this functionality. Expected Behavior String result = s3.getObjectAsString(request); Current Behavior Need to writ...

Author plombardi89

Google Groups

groups.google.com › g › google-appengine-stackoverflow › c › Dq51Ogu5aTI

AmazonS3 putObject with InputStream length example

/* * Obtain the Content length of the Input stream for S3 header */ try { InputStream is = event.getFile().getInputstream(); contentBytes = IOUtils.toByteArray(is); } catch (IOException e) { System.err.printf("Failed while reading bytes from %s", e.getMessage()); } Long contentLength = Long.valueOf(contentBytes.length); ObjectMetadata metadata = new ObjectMetadata(); metadata.setContentLength(contentLength); /* * Reobtain the tmp uploaded file as input stream */ InputStream inputStream = event.getFile().getInputstream(); /* * Put the object in S3 */ try { s3client.putObject(new PutObjectReques

Stack Overflow

stackoverflow.com › questions › 73140838 › conversion-from-s3objectinputstream-to-abstractstreamresource

java - Conversion from S3ObjectInputStream to AbstractStreamResource - Stack Overflow

Top answer

1 of 1

Based on S3ObjectInputStream API doc S3ObjectInputStream is sub class of java.io.InputStream. In that case you should be able to do

    S3ObjectInputStream s3ImageResource = ... 
    StreamResource res = new StreamResource("filename.jpg", () -> {
        InputStream is = s3ImageResource;
        return is;
    });
    Image image = new Image(res,"Image");

Alternatively if that does not work, based on other questions here in StackOverflow, I assume you can convert S3ObjectInputStream to byte array.

    byte[] bytes = // get bytes from S3ObjectInputStream 
    StreamResource res = new StreamResource("filename.jpg", () -> {
        InputStream is = new ByteArrayInputStream(bytes);
        return is;
    });
    Image image = new Image(res,"Image");

Javadoc.io

javadoc.io › doc › com.amazonaws › aws-java-sdk-s3 › 1.11.37 › com › amazonaws › services › s3 › model › S3ObjectInputStream.html

S3ObjectInputStream (AWS Java SDK for Amazon S3 1.11. ...

Bookmarks · Latest version of com.amazonaws:aws-java-sdk-s3 · https://javadoc.io/doc/com.amazonaws/aws-java-sdk-s3 · Current version 1.11.37 · https://javadoc.io/doc/com.amazonaws/aws-java-sdk-s3/1.11.37 · package-list path (used for javadoc generation -link option) · https://javadoc...

GitHub

github.com › aws › aws-sdk-java › issues › 797

S3ObjectInputStream skip is inefficient · Issue #797 · aws/aws-sdk-java

July 29, 2016 - Calling skip() on S3ObjectInputStream ... Here is what documentation says about default implementation: The skip method of this class creates a byte array and then ......

AWS re:Post

forums.aws.amazon.com › thread.jspa

Forums | AWS re:Post

July 24, 2018 - AWS re:Post is a cloud knowledge service launched at re:Invent 2021. We've migrated selected questions and answers from Forums to AWS re:Post. The thread you are trying to access has outdated guidance, hence we have archived it. If you would like up-to-date guidance, then share your question ...

reddit.com › r/aws › writing an object to s3 from a (java) string or bytestream which contains multi-glyph emoji truncates the output.

r/aws on Reddit: Writing an object to S3 from a (java) string or bytestream which contains multi-glyph emoji truncates the output.

March 20, 2023 -

I am using the Java SDK (in Kotlin) to write some files to S3. Given this String, which is 12 characters but 16 bytes long:

<p>👨🏿</p>

Then S3 writes 12 bytes to S3. The final output is:

<p>👨🏿<

Here's some of my relevant code:

 println("ContentsLength (long): ${contents.length.toLong()}")
 val ba = contents.toByteArray(Charsets.UTF_8)
 println("ByteArray UTF_8: ${ba.size} bytes (int)")
val requestBuilder = PutObjectRequest.builder()
            .contentLength(contents.length.toLong())
            .key(key)
            .bucket(bucket)
        contentType?.let {
            requestBuilder.contentType(it)
        }
        val request = requestBuilder.build()
        s3Client.putObject(request, RequestBody.fromBytes(ba))

And here's the logs from Cloudwatch:

	2023-03-20T21:15:15.311+00:00	ContentsLength (long): 12
	2023-03-20T21:15:15.349+00:00	ByteArray UTF_8: 16 bytes (int)

So I'm definitely using UTF-8 character sets, the dark skinned man emoji requires several characters (I think 3, man+ZWJ+dark).

So it seems to me that the AWS SDK Java class software.amazon.awssdk.core.sync.RequestBody is incorrectly handling the byte array, or that software.amazon.awssdk.services.s3.S3Client.putObject() is writing the wrong number of bytes to S3.

I have also tried RequestBody.fromString(content, Charsets.UTF_8), same result. I've even tried UTF_32.

What have I done wrong?

Top answer

1 of 2

This input:

👨🏿 👨 🙍🏼‍♀️

Produces a file which I can't open because IntelliJ can't work out the encoding but Notepad++ reads it as:

👨🏿👨🙍 If I tell Notepad++ that it is UTF-8.

2 of 2

In the builder you pass 12 as content length. Use ba.size instead.

Bloomreach

community.bloomreach.com › experience manager (paas/onprem)

How to resolve warning: "Not all bytes were read from the S3ObjectInputStream" - Experience Manager (PaaS/OnPrem) - Bloomreach Developers Forum

September 3, 2018 - Hi, We’re using Hippo with an S3 DataStore, and have followed the instructions to Deploy the Authoring and Delivery Web Applications Separately. We’re seeing the following warning crop up repeatedly in the logs, mostly on app servers for the public site (but also a very few times on the CMS host): WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection.

Javadoc.io

javadoc.io › static › com.amazonaws › aws-java-sdk-s3 › 1.11.3 › com › amazonaws › services › s3 › model › S3ObjectInputStream.html

S3ObjectInputStream (AWS Java SDK for Amazon S3 1.11.3 API)

This is efficient from a socket pool management perspective, but for objects with large payloads can incur significant overhead while bytes are read from s3 and discarded. It's up to clients to decide when to take the performance hit implicit in not reusing an http connection in order to not read unnecessary information from S3.

Profitbase

docs.profitbase.com › articles › flow › actions › amazon-s3 › read-s3object-as-byte-array.html

Read S3 object as byte array | Profitbase Docs

Reads the contents of an Amazon S3 object into memory as a byte array. You can compare this to downloading a file. Prefer using streaming over reading as byte array if possible. Streaming is generally faster and uses less memory, because streaming doesn't require loading the entire object into ...

Stack Overflow

stackoverflow.com › questions › 7679924 › how-to-write-an-s3-object-to-a-file

java - How to write an S3 object to a file? - Stack Overflow

Top answer

1 of 6

Since Java 7 (published back in July 2011), there’s a better way: Files.copy() utility from java.util.nio.file.

Copies all bytes from an input stream to a file.

So you need neither an external library nor rolling your own byte array loops. Two examples below, both of which use the input stream from S3Object.getObjectContent().

InputStream in = s3Client.getObject("bucketName", "key").getObjectContent();

1) Write to a new file at specified path:

Files.copy(in, Paths.get("/my/path/file.jpg"));

2) Write to a temp file in system's default tmp location:

File tmp = File.createTempFile("s3test", "");
Files.copy(in, tmp.toPath(), StandardCopyOption.REPLACE_EXISTING);

(Without specifying the option to replace existing file, you'll get a FileAlreadyExistsException.)

Also note that getObjectContent() Javadocs urge you to close the input stream:

If you retrieve an S3Object, you should close this input stream as soon as possible, because the object contents aren't buffered in memory and stream directly from Amazon S3. Further, failure to close this stream can cause the request pool to become blocked.

So it should be safest to wrap everything in try-catch-finally, and do in.close(); in the finally block.

The above assumes that you use the official SDK from Amazon (aws-java-sdk-s3).

2 of 6

While IOUtils.copy() and IOUtils.copyLarge() are great, I would prefer the old school way of looping through the inputstream until the inputstream returns -1. Why? I used IOUtils.copy() before but there was a specific use case where if I started downloading a large file from S3 and then for some reason if that thread was interrupted, the download would not stop and it would go on and on until the whole file was downloaded.

Of course, this has nothing to do with S3, just the IOUtils library.

So, I prefer this:

InputStream in = s3Object.getObjectContent();
byte[] buf = new byte[1024];
OutputStream out = new FileOutputStream(file);
while( (count = in.read(buf)) != -1)
{
   if( Thread.interrupted() )
   {
       throw new InterruptedException();
   }
   out.write(buf, 0, count);
}
out.close();
in.close();

Note: This also means you don't need additional libraries