Use the AWS SDK for Java and Apache Commons IO as such:

//import org.apache.commons.io.IOUtils

AmazonS3 s3  = new AmazonS3Client(credentials);  // anonymous credentials are possible if this isn't your bucket
S3Object object = s3.getObject("bucket", "key"); 
byte[] byteArray = IOUtils.toByteArray(object.getObjectContent());

Not sure what you mean by "get it removed", but IOUtils will close the object's input stream when it's done converting it to a byte array. If you mean you want to delete the object from s3, that's as easy as:

s3.deleteObject("bucket", "key"); 
Answer from Zach Musgrave on Stack Overflow
🌐
Program Creek
programcreek.com › java-api-examples
com.amazonaws.services.s3.model.S3ObjectInputStream Java Exaples
... @Override public byte[] serialize(String topic, S3ObjectInputStream data) { InputStream is = data.getDelegateStream(); ByteArrayOutputStream buffer = new ByteArrayOutputStream(); int nRead; byte[] byteArray = new byte[16384]; try { while ((nRead = is.read(byteArray, 0, byteArray.length)) ...
Discussions

SDK repeatedly complaining "Not all bytes were read from the S3ObjectInputStream"
With a recent upgrade to the 1.11.134 SDK, tests seeking around a large CSV file is triggering a large set of repeated warnings about closing the stream early. 2017-06-27 15:47:05,121 [ScalaTest-ma... More on github.com
🌐 github.com
34
June 27, 2017
java - Stream dynamic image from Spring MVC endpoint without holding it in memory - Stack Overflow
I have a @Controller that returns an image to the front end, but currently it holds the whole thing in memory as a byte array before returning a ResponseEntity object. I would like to More on stackoverflow.com
🌐 stackoverflow.com
Simpler / Easier mechanism to read S3 Object content as a String (like v1 getObjectAsString)
The v1 SDK has a nice method for just grabbing the content of an S3 object as a String. It would be nice if the v2 SDK had this functionality. Expected Behavior String result = s3.getObjectAsString... More on github.com
🌐 github.com
26
November 30, 2017
java - Conversion from S3ObjectInputStream to AbstractStreamResource - Stack Overflow
Alternatively if that does not work, based on other questions here in StackOverflow, I assume you can convert S3ObjectInputStream to byte array. More on stackoverflow.com
🌐 stackoverflow.com
July 27, 2022
🌐
Java2s
java2s.com › example › java-api › com › amazonaws › util › ioutils › tobytearray-1-0.html
Example usage for com.amazonaws.util IOUtils toByteArray
@Override public byte[] download(final String key) throws IOException { GetObjectRequest getObjectRequest = new GetObjectRequest(bucket, key); S3Object s3Object = amazonS3Client.getObject(getObjectRequest); S3ObjectInputStream objectInputStream = s3Object.getObjectContent(); return IOUtils.toByteArray(objectInputStream); }
🌐
AWS
docs.aws.amazon.com › sdk-for-java › v1 › reference › com › amazonaws › services › s3 › model › S3ObjectInputStream.html
S3ObjectInputStream (AWS SDK for Java - 1.12.797)
By default Apache HttpClient tries to reuse http connections by reading to the end of an attached input stream on InputStream.close(). This is efficient from a socket pool management perspective, but for objects with large payloads can incur significant overhead while bytes are read from s3 and discarded.
🌐
Tabnine
tabnine.com › home page › code › java › com.amazonaws.services.s3.model.s3objectinputstream
com.amazonaws.services.s3.model.S3ObjectInputStream.read java code examples | Tabnine
public boolean readBufferFromFile() throws IOException { int n = s3ObjectInputStream.read( bb ); if ( n == -1 ) { return false; } else { // adjust the highest used position... // bufferSize = endBuffer + n; // Store the data in our byte array // for ( int i = 0; i < n; i++ ) { byteBuffer[endBuffer + i] = bb[i]; } return true; } }
🌐
GitHub
github.com › aws › aws-sdk-java › issues › 1211
SDK repeatedly complaining "Not all bytes were read from the S3ObjectInputStream" · Issue #1211 · aws/aws-sdk-java
June 27, 2017 - Request only the bytes you need via a ranged GET or drain the input stream after use. 2017-06-27 15:47:06,730 [ScalaTest-main-running-S3ACSVReadSuite] WARN internal.S3AbortableInputStream (S3AbortableInputStream.java:close(163)) - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection.
Author   steveloughran
🌐
Java Tips
javatips.net › api › com.amazonaws.services.s3.model.s3objectinputstream
Java Examples for com.amazonaws.services.s3.model.S3ObjectInputStream
@Override public InputStream readFileData(FileData fileData) { String path = FilenameUtils.separatorsToUnix(FilenameUtils.normalize(extraPath + fileData.getPath() + "/" + fileData.getFileName())); path = StringUtils.stripStart(path, "/"); InputStream ret = null; S3ObjectInputStream objectContent = null; try { S3Object object = s3Client.getObject(bucketName, path); if (object != null) { ByteArrayOutputStream temp = new ByteArrayOutputStream(); objectContent = object.getObjectContent(); IOUtils.copy(objectContent, temp); ret = new ByteArrayInputStream(temp.toByteArray()); if (compress) { ret = new GZIPInputStream(ret); } } } catch (Exception e) { LOG.error("Error getting File: " + e, e); throw new RuntimeException(e); } finally { IOUtils.closeQuietly(objectContent); } return ret; }
🌐
Alexwlchan
alexwlchan.net › 2019 › streaming-large-s3-objects
Streaming large objects from S3 with ranged GET requests – alexwlchan
September 12, 2019 - import java.io.ByteArrayInputStream val underlying: util.Enumeration[InputStream] val bufferedEnumeration = new util.Enumeration[ByteArrayInputStream] { override def hasMoreElements: Boolean = underlying.hasMoreElements override def nextElement(): ByteArrayInputStream = { val nextStream = underlying.nextElement() val byteArray = IOUtils.toByteArray(nextStream) nextStream.close() new ByteArrayInputStream(byteArray) } } We can drop this enumeration into another SequenceInputStream, and get a single InputStream again – but this time the S3ObjectInputStream is read and closed almost immediately.
Find elsewhere
🌐
GitHub
github.com › aws › aws-sdk-java-v2 › issues › 306
Simpler / Easier mechanism to read S3 Object content as a String (like v1 getObjectAsString) · Issue #306 · aws/aws-sdk-java-v2
November 30, 2017 - The v1 SDK has a nice method for just grabbing the content of an S3 object as a String. It would be nice if the v2 SDK had this functionality. Expected Behavior String result = s3.getObjectAsString(request); Current Behavior Need to writ...
Author   plombardi89
🌐
Google Groups
groups.google.com › g › google-appengine-stackoverflow › c › Dq51Ogu5aTI
AmazonS3 putObject with InputStream length example
/* * Obtain the Content length of the Input stream for S3 header */ try { InputStream is = event.getFile().getInputstream(); contentBytes = IOUtils.toByteArray(is); } catch (IOException e) { System.err.printf("Failed while reading bytes from %s", e.getMessage()); } Long contentLength = Long.valueOf(contentBytes.length); ObjectMetadata metadata = new ObjectMetadata(); metadata.setContentLength(contentLength); /* * Reobtain the tmp uploaded file as input stream */ InputStream inputStream = event.getFile().getInputstream(); /* * Put the object in S3 */ try { s3client.putObject(new PutObjectReques
🌐
Javadoc.io
javadoc.io › doc › com.amazonaws › aws-java-sdk-s3 › 1.11.37 › com › amazonaws › services › s3 › model › S3ObjectInputStream.html
S3ObjectInputStream (AWS Java SDK for Amazon S3 1.11. ...
Bookmarks · Latest version of com.amazonaws:aws-java-sdk-s3 · https://javadoc.io/doc/com.amazonaws/aws-java-sdk-s3 · Current version 1.11.37 · https://javadoc.io/doc/com.amazonaws/aws-java-sdk-s3/1.11.37 · package-list path (used for javadoc generation -link option) · https://javadoc...
🌐
GitHub
github.com › aws › aws-sdk-java › issues › 797
S3ObjectInputStream skip is inefficient · Issue #797 · aws/aws-sdk-java
July 29, 2016 - Calling skip() on S3ObjectInputStream ... Here is what documentation says about default implementation: The skip method of this class creates a byte array and then ......
🌐
AWS re:Post
forums.aws.amazon.com › thread.jspa
Forums | AWS re:Post
July 24, 2018 - AWS re:Post is a cloud knowledge service launched at re:Invent 2021. We've migrated selected questions and answers from Forums to AWS re:Post. The thread you are trying to access has outdated guidance, hence we have archived it. If you would like up-to-date guidance, then share your question ...
🌐
Reddit
reddit.com › r/aws › writing an object to s3 from a (java) string or bytestream which contains multi-glyph emoji truncates the output.
r/aws on Reddit: Writing an object to S3 from a (java) string or bytestream which contains multi-glyph emoji truncates the output.
March 20, 2023 -

I am using the Java SDK (in Kotlin) to write some files to S3. Given this String, which is 12 characters but 16 bytes long:

<p>👨🏿</p>

Then S3 writes 12 bytes to S3. The final output is:

<p>👨🏿<

Here's some of my relevant code:

 println("ContentsLength (long): ${contents.length.toLong()}")
 val ba = contents.toByteArray(Charsets.UTF_8)
 println("ByteArray UTF_8: ${ba.size} bytes (int)")
val requestBuilder = PutObjectRequest.builder()
            .contentLength(contents.length.toLong())
            .key(key)
            .bucket(bucket)
        contentType?.let {
            requestBuilder.contentType(it)
        }
        val request = requestBuilder.build()
        s3Client.putObject(request, RequestBody.fromBytes(ba))

And here's the logs from Cloudwatch:

	2023-03-20T21:15:15.311+00:00	ContentsLength (long): 12
	2023-03-20T21:15:15.349+00:00	ByteArray UTF_8: 16 bytes (int)

So I'm definitely using UTF-8 character sets, the dark skinned man emoji requires several characters (I think 3, man+ZWJ+dark).

So it seems to me that the AWS SDK Java class software.amazon.awssdk.core.sync.RequestBody is incorrectly handling the byte array, or that software.amazon.awssdk.services.s3.S3Client.putObject() is writing the wrong number of bytes to S3.

I have also tried RequestBody.fromString(content, Charsets.UTF_8), same result. I've even tried UTF_32.

What have I done wrong?

🌐
Bloomreach
community.bloomreach.com › experience manager (paas/onprem)
How to resolve warning: "Not all bytes were read from the S3ObjectInputStream" - Experience Manager (PaaS/OnPrem) - Bloomreach Developers Forum
September 3, 2018 - Hi, We’re using Hippo with an S3 DataStore, and have followed the instructions to Deploy the Authoring and Delivery Web Applications Separately. We’re seeing the following warning crop up repeatedly in the logs, mostly on app servers for the public site (but also a very few times on the CMS host): WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection.
🌐
Javadoc.io
javadoc.io › static › com.amazonaws › aws-java-sdk-s3 › 1.11.3 › com › amazonaws › services › s3 › model › S3ObjectInputStream.html
S3ObjectInputStream (AWS Java SDK for Amazon S3 1.11.3 API)
This is efficient from a socket pool management perspective, but for objects with large payloads can incur significant overhead while bytes are read from s3 and discarded. It's up to clients to decide when to take the performance hit implicit in not reusing an http connection in order to not read unnecessary information from S3.
🌐
Profitbase
docs.profitbase.com › articles › flow › actions › amazon-s3 › read-s3object-as-byte-array.html
Read S3 object as byte array | Profitbase Docs
Reads the contents of an Amazon S3 object into memory as a byte array. You can compare this to downloading a file. Prefer using streaming over reading as byte array if possible. Streaming is generally faster and uses less memory, because streaming doesn't require loading the entire object into ...
Top answer
1 of 6
41

Since Java 7 (published back in July 2011), there’s a better way: Files.copy() utility from java.util.nio.file.

Copies all bytes from an input stream to a file.

So you need neither an external library nor rolling your own byte array loops. Two examples below, both of which use the input stream from S3Object.getObjectContent().

InputStream in = s3Client.getObject("bucketName", "key").getObjectContent();

1) Write to a new file at specified path:

Files.copy(in, Paths.get("/my/path/file.jpg"));

2) Write to a temp file in system's default tmp location:

File tmp = File.createTempFile("s3test", "");
Files.copy(in, tmp.toPath(), StandardCopyOption.REPLACE_EXISTING);

(Without specifying the option to replace existing file, you'll get a FileAlreadyExistsException.)

Also note that getObjectContent() Javadocs urge you to close the input stream:

If you retrieve an S3Object, you should close this input stream as soon as possible, because the object contents aren't buffered in memory and stream directly from Amazon S3. Further, failure to close this stream can cause the request pool to become blocked.

So it should be safest to wrap everything in try-catch-finally, and do in.close(); in the finally block.

The above assumes that you use the official SDK from Amazon (aws-java-sdk-s3).

2 of 6
20

While IOUtils.copy() and IOUtils.copyLarge() are great, I would prefer the old school way of looping through the inputstream until the inputstream returns -1. Why? I used IOUtils.copy() before but there was a specific use case where if I started downloading a large file from S3 and then for some reason if that thread was interrupted, the download would not stop and it would go on and on until the whole file was downloaded.

Of course, this has nothing to do with S3, just the IOUtils library.

So, I prefer this:

InputStream in = s3Object.getObjectContent();
byte[] buf = new byte[1024];
OutputStream out = new FileOutputStream(file);
while( (count = in.read(buf)) != -1)
{
   if( Thread.interrupted() )
   {
       throw new InterruptedException();
   }
   out.write(buf, 0, count);
}
out.close();
in.close();

Note: This also means you don't need additional libraries