Use the AWS SDK for Java and Apache Commons IO as such:
//import org.apache.commons.io.IOUtils
AmazonS3 s3 = new AmazonS3Client(credentials); // anonymous credentials are possible if this isn't your bucket
S3Object object = s3.getObject("bucket", "key");
byte[] byteArray = IOUtils.toByteArray(object.getObjectContent());
Not sure what you mean by "get it removed", but IOUtils will close the object's input stream when it's done converting it to a byte array. If you mean you want to delete the object from s3, that's as easy as:
s3.deleteObject("bucket", "key");
Answer from Zach Musgrave on Stack OverflowUse the AWS SDK for Java and Apache Commons IO as such:
//import org.apache.commons.io.IOUtils
AmazonS3 s3 = new AmazonS3Client(credentials); // anonymous credentials are possible if this isn't your bucket
S3Object object = s3.getObject("bucket", "key");
byte[] byteArray = IOUtils.toByteArray(object.getObjectContent());
Not sure what you mean by "get it removed", but IOUtils will close the object's input stream when it's done converting it to a byte array. If you mean you want to delete the object from s3, that's as easy as:
s3.deleteObject("bucket", "key");
As of AWS JAVA SDK 2 you can you use ReponseTransformer to convert the response to different types. (See javadoc).
Below is the example for getting the object as bytes
GetObjectRequest request = GetObjectRequest.builder().bucket(bucket).key(key).build()
ResponseBytes<GetObjectResponse> result = bytess3Client.getObject(request, ResponseTransformer.toBytes())
// to get the bytes
result.asByteArray()
SDK repeatedly complaining "Not all bytes were read from the S3ObjectInputStream"
java - Stream dynamic image from Spring MVC endpoint without holding it in memory - Stack Overflow
Simpler / Easier mechanism to read S3 Object content as a String (like v1 getObjectAsString)
java - Conversion from S3ObjectInputStream to AbstractStreamResource - Stack Overflow
I am using the Java SDK (in Kotlin) to write some files to S3. Given this String, which is 12 characters but 16 bytes long:
<p>👨🏿</p>
Then S3 writes 12 bytes to S3. The final output is:
<p>👨🏿<
Here's some of my relevant code:
println("ContentsLength (long): ${contents.length.toLong()}")
val ba = contents.toByteArray(Charsets.UTF_8)
println("ByteArray UTF_8: ${ba.size} bytes (int)")
val requestBuilder = PutObjectRequest.builder()
.contentLength(contents.length.toLong())
.key(key)
.bucket(bucket)
contentType?.let {
requestBuilder.contentType(it)
}
val request = requestBuilder.build()
s3Client.putObject(request, RequestBody.fromBytes(ba))And here's the logs from Cloudwatch:
2023-03-20T21:15:15.311+00:00 ContentsLength (long): 12 2023-03-20T21:15:15.349+00:00 ByteArray UTF_8: 16 bytes (int)
So I'm definitely using UTF-8 character sets, the dark skinned man emoji requires several characters (I think 3, man+ZWJ+dark).
So it seems to me that the AWS SDK Java class software.amazon.awssdk.core.sync.RequestBody is incorrectly handling the byte array, or that software.amazon.awssdk.services.s3.S3Client.putObject() is writing the wrong number of bytes to S3.
I have also tried RequestBody.fromString(content, Charsets.UTF_8), same result. I've even tried UTF_32.
What have I done wrong?
Since Java 7 (published back in July 2011), there’s a better way: Files.copy() utility from java.util.nio.file.
Copies all bytes from an input stream to a file.
So you need neither an external library nor rolling your own byte array loops. Two examples below, both of which use the input stream from S3Object.getObjectContent().
InputStream in = s3Client.getObject("bucketName", "key").getObjectContent();
1) Write to a new file at specified path:
Files.copy(in, Paths.get("/my/path/file.jpg"));
2) Write to a temp file in system's default tmp location:
File tmp = File.createTempFile("s3test", "");
Files.copy(in, tmp.toPath(), StandardCopyOption.REPLACE_EXISTING);
(Without specifying the option to replace existing file, you'll get a FileAlreadyExistsException.)
Also note that getObjectContent() Javadocs urge you to close the input stream:
If you retrieve an S3Object, you should close this input stream as soon as possible, because the object contents aren't buffered in memory and stream directly from Amazon S3. Further, failure to close this stream can cause the request pool to become blocked.
So it should be safest to wrap everything in try-catch-finally, and do in.close(); in the finally block.
The above assumes that you use the official SDK from Amazon (aws-java-sdk-s3).
While IOUtils.copy() and IOUtils.copyLarge() are great, I would prefer the old school way of looping through the inputstream until the inputstream returns -1. Why? I used IOUtils.copy() before but there was a specific use case where if I started downloading a large file from S3 and then for some reason if that thread was interrupted, the download would not stop and it would go on and on until the whole file was downloaded.
Of course, this has nothing to do with S3, just the IOUtils library.
So, I prefer this:
InputStream in = s3Object.getObjectContent();
byte[] buf = new byte[1024];
OutputStream out = new FileOutputStream(file);
while( (count = in.read(buf)) != -1)
{
if( Thread.interrupted() )
{
throw new InterruptedException();
}
out.write(buf, 0, count);
}
out.close();
in.close();
Note: This also means you don't need additional libraries