s3objectinputstream read line by line

stackoverflow.com › questions › 67497946 › how-to-make-sure-all-the-lines-records-are-read-in-s3object

java - How to make sure all the lines/records are read in S3Object - Stack Overflow

docs.aws.amazon.com › AWSJavaSDK › latest › javadoc › com › amazonaws › services › s3 › model › S3ObjectInputStream.html

1 of 1

I wrote a method to read information from S3 object.

It looks fine to me¹.

There are multiple records in S3Object, what's the best way to read all the lines.

Your code should read all of the lines.

Does it only read the first line of the object?

No. It should read all of the lines². That while loop reads until readLine() returns null, and that only happens when you reach the end of the stream.

How to make sure all the lines are read?

If you are getting fewer lines than you expect, EITHER the S3 object contains fewer lines than you think, OR something is causing the object stream to close prematurely.

For the former, count the lines as you read them and compare that with the expected line count.

The latter could possibly be due to a timeout when reading a very large file. See How to read file chunk by chunk from S3 using aws-java-sdk for some ideas on how to deal with that problem.

AWS

S3ObjectInputStream (AWS SDK for Java - 1.12.797)

public S3ObjectInputStream(InputStream in, org.apache.http.client.methods.HttpRequestBase httpRequest, boolean collectMetrics) ... Can be used to provide abortion logic prior to throwing the AbortedException. If the wrapped InputStream is also an instance of this class, then it will also be aborted, otherwise this is a no-op. Aborts the underlying http request without reading any more data and closes the stream. By default Apache HttpClient tries to reuse http connections by reading to the end of an attached input stream on InputStream.close().

Discussions

amazon web services - How to read file chunk by chunk from S3 using aws-java-sdk - Stack Overflow

I am trying to read large file into chunks from S3 without cutting any line for parallel processing. Let me explain by example: There is file of size 1G on S3. I want to divide this file into ... More on stackoverflow.com

stackoverflow.com

SDK repeatedly complaining "Not all bytes were read from the S3ObjectInputStream"

With a recent upgrade to the 1.11.134 SDK, tests seeking around a large CSV file is triggering a large set of repeated warnings about closing the stream early. 2017-06-27 15:47:05,121 [ScalaTest-ma... More on github.com

github.com

June 27, 2017

Line-by-line publisher of S3 object content

Does the current version of the SDK support returning the content of an S3 object reactively in a Publisher line by line similar to for example Athena pagination? More on github.com

github.com

May 13, 2019

string - How to get the value from the s3 input stream in java - Stack Overflow

You could wrap the S3ObjectInputStream within an InputStreamReader and the InputStreamReader within a BufferedInputStream. That way you can read the object line by line: More on stackoverflow.com

stackoverflow.com

Java2s

java2s.com › example › java-api › com › amazonaws › services › s3 › model › s3objectinputstream › close-0-0.html

Example usage for com.amazonaws.services.s3.model S3ObjectInputStream close

/** ************************************************************* * @param filename name of file to retrieve from s3 * @return file name of local file/*from w w w .ja v a2s . com*/ * Gets file from S3 and writes to local file */ public List<String> readS3File(String filename) { List<String> lines = new ArrayList<>(); try { S3Object object = client.getObject(bucket, filename); S3ObjectInputStream stream = object.getObjectContent(); BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(stream)); String line; while ((line = bufferedReader.readLine()) != null) lines.add(line); bufferedReader.close(); stream.close(); } catch (Exception e) { e.printStackTrace(); } return lines; } From source file:org.apache.oodt.cas.filemgr.datatransfer.S3DataTransferer.java ·

stackoverflow.com › questions › 44389194 › how-to-read-file-chunk-by-chunk-from-s3-using-aws-java-sdk

amazon web services - How to read file chunk by chunk from S3 using aws-java-sdk - Stack Overflow

tabnine.com › home page › code › java › com.amazonaws.services.s3.model.s3objectinputstream

1 of 6

My usual approach (InputStream -> BufferedReader.lines() -> batches of lines -> CompletableFuture) won't work here because the underlying S3ObjectInputStream times out eventually for huge files.

So I created a new class S3InputStream, which doesn't care how long it's open for and reads byte blocks on demand using short-lived AWS SDK calls. You provide a byte[] that will be reused. new byte[1 << 24] (16Mb) appears to work well.

package org.harrison;

import java.io.IOException;
import java.io.InputStream;

import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.services.s3.model.GetObjectRequest;

/**
 * An {@link InputStream} for S3 files that does not care how big the file is.
 *
 * @author stephen harrison
 */
public class S3InputStream extends InputStream {
    private static class LazyHolder {
        private static final AmazonS3 S3 = AmazonS3ClientBuilder.defaultClient();
    }

    private final String bucket;
    private final String file;
    private final byte[] buffer;
    private long lastByteOffset;

    private long offset = 0;
    private int next = 0;
    private int length = 0;

    public S3InputStream(final String bucket, final String file, final byte[] buffer) {
        this.bucket = bucket;
        this.file = file;
        this.buffer = buffer;
        this.lastByteOffset = LazyHolder.S3.getObjectMetadata(bucket, file).getContentLength() - 1;
    }

    @Override
    public int read() throws IOException {
        if (next >= length) {
            fill();

            if (length <= 0) {
                return -1;
            }

            next = 0;
        }

        if (next >= length) {
            return -1;
        }

        return buffer[this.next++];
    }

    public void fill() throws IOException {
        if (offset >= lastByteOffset) {
            length = -1;
        } else {
            try (final InputStream inputStream = s3Object()) {
                length = 0;
                int b;

                while ((b = inputStream.read()) != -1) {
                    buffer[length++] = (byte) b;
                }

                if (length > 0) {
                    offset += length;
                }
            }
        }
    }

    private InputStream s3Object() {
        final GetObjectRequest request = new GetObjectRequest(bucket, file).withRange(offset,
                offset + buffer.length - 1);

        return LazyHolder.S3.getObject(request).getObjectContent();
    }
}

2 of 6

The aws-java-sdk already provides streaming functionality for your S3 objects. You have to call "getObject" and the result will be an InputStream.

1) AmazonS3Client.getObject(GetObjectRequest getObjectRequest) -> S3Object

2) S3Object.getObjectContent()

Note: The method is a simple getter and does not actually create a stream. If you retrieve an S3Object, you should close this input stream as soon as possible, because the object contents aren't buffered in memory and stream directly from Amazon S3. Further, failure to close this stream can cause the request pool to become blocked.

aws java docs

Tabnine

com.amazonaws.services.s3.model.S3ObjectInputStream.read java code examples | Tabnine

public boolean readBufferFromFile() throws IOException { int n = s3ObjectInputStream.read( bb ); if ( n == -1 ) { return false; } else { // adjust the highest used position... // bufferSize = endBuffer + n; // Store the data in our byte array // for ( int i = 0; i < n; i++ ) { byteBuffer[endBuffer + i] = bb[i]; } return true; } } origin: Alluxio/alluxio ·

github.com › aws › aws-sdk-java › issues › 1211

SDK repeatedly complaining "Not all bytes were read from the S3ObjectInputStream" · Issue #1211 · aws/aws-sdk-java

June 27, 2017 - Request only the bytes you need via a ranged GET or drain the input stream after use. 2017-06-27 15:47:06,730 [ScalaTest-main-running-S3ACSVReadSuite] WARN internal.S3AbortableInputStream (S3AbortableInputStream.java:close(163)) - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection.

Author steveloughran

github.com › aws › aws-sdk-java-v2 › issues › 1253

Line-by-line publisher of S3 object content · Issue #1253 · aws/aws-sdk-java-v2

May 13, 2019 - Does the current version of the SDK support returning the content of an S3 object reactively in a Publisher line by line similar to for example Athena pagination?

Published May 13, 2019

Author martin-tarjanyi

stackoverflow.com › questions › 64249536 › how-to-get-the-value-from-the-s3-input-stream-in-java

string - How to get the value from the s3 input stream in java - Stack Overflow

1 of 1

You could wrap the S3ObjectInputStream within an InputStreamReader and the InputStreamReader within a BufferedInputStream. That way you can read the object line by line:

var reader = new BufferedReader(new InputStreamReader(ts));
var line = reader.readLine();

Also check out Apache Commons IO which provide additional convenient streams and readers and utilities.

Find elsewhere

Google Bing Mojeek

stackoverflow.com › questions › 45624175 › amazons3-getting-warning-s3abortableinputstreamnot-all-bytes-were-read-from-t

amazon s3 - AmazonS3: Getting warning: S3AbortableInputStream:Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection - Stack Overflow

docs.aws.amazon.com › aws sdk for java › developer guide for version 1.x › aws sdk for java code examples › amazon s3 examples using the aws sdk for java › performing operations on amazon s3 objects

1 of 7

Got the answer via other medium. Sharing it here:

The warning indicates that you called close() without reading the whole file. This is problematic because S3 is still trying to send the data and you're leaving the connection in a sad state.

There's two options here:

Read the rest of the data from the input stream so the connection can be reused.
Call s3ObjectInputStream.abort() to close the connection without reading the data. The connection won't be reused, so you take some performance hit with the next request to re-create the connection. This may be worth it if it's going to take a long time to read the rest of the file.

2 of 7

Following option #1 of Chirag Sejpal's answer I used the below statement to drain the S3AbortableInputStream to ensure the connection can be reused:

com.amazonaws.util.IOUtils.drainInputStream(s3ObjectInputStream);

AWS

Performing Operations on Amazon S3 Objects - AWS SDK for Java 1.x

System.out.format("Downloading %s from S3 bucket %s...\n", key_name, bucket_name); final AmazonS3 s3 = AmazonS3ClientBuilder.standard().withRegion(Regions.DEFAULT_REGION).build(); try { S3Object o = s3.getObject(bucket_name, key_name); S3ObjectInputStream s3is = o.getObjectContent(); FileOutputStream fos = new FileOutputStream(new File(key_name)); byte[] read_buf = new byte[1024]; int read_len = 0; while ((read_len = s3is.read(read_buf)) > 0) { fos.write(read_buf, 0, read_len); } s3is.close(); fos.close(); } catch (AmazonServiceException e) { System.err.println(e.getErrorMessage()); System.exit(1); } catch (FileNotFoundException e) { System.err.println(e.getMessage()); System.exit(1); } catch (IOException e) { System.err.println(e.getMessage()); System.exit(1); }

github.com › aws › aws-sdk-java-v2 › issues › 306

Simpler / Easier mechanism to read S3 Object content as a String (like v1 getObjectAsString) · Issue #306 · aws/aws-sdk-java-v2

November 30, 2017 - The v1 SDK has a nice method for just grabbing the content of an S3 object as a String. It would be nice if the v2 SDK had this functionality. Expected Behavior String result = s3.getObjectAsString(request); Current Behavior Need to writ...

Author plombardi89

HowToDoInJava

howtodoinjava.com › home › amazon web services › read a file from aws s3 using s3client

Read a File from AWS S3 using S3Client

September 28, 2022 - The following example reads the file using URL. @Test void testReadFileFromUrl() throws IOException { String FILE_URL = "https://howtodoinjava-s3-bucket.s3.amazonaws.com/test.txt"; BufferedReader bufferedReader = new BufferedReader( new InputStreamReader(new URL(FILE_URL).openConnection().getInputStream())); String line; StringBuilder content = new StringBuilder(); while ((line = bufferedReader.readLine()) != null) { content.append(line); } Assertions.assertEquals("Hello World!", content.toString()); }

stackoverflow.com › questions › 55970951 › java-read-json-data-from-s3-object-line-by-line

amazon s3 - Java - Read (JSON) data from S3 object line by line - Stack Overflow

May 3, 2019 - My goal here actually is to convert it into List<String> where each element is a line in the S3 object. 2019-05-03T13:46:15.68Z+00:00 ... I have added sample data. I was wondering if creating a BufferedStreamReader using the InputStream and then using readLine() would accomplish this.

IBM

ibm.github.io › ibm-cos-sdk-java › com › ibm › cloud › objectstorage › services › s3 › model › S3ObjectInputStream.html

S3ObjectInputStream (IBM COS SDK for Java 2.14.1 API)

Codeflex

codeflex.co › java-read-amazon-s3-object-as-string

Java Read Amazon S3 Object as String | CodeFlex

In previous post you saw how to delete several S3 objects from Amazon S3 using Java AWS SDK. Today I’ll show how to read specific S3 object and convert it to string.

stackoverflow.com › questions › 28618468 › read-a-file-line-by-line-from-s3-using-boto

python - Read a file line by line from S3 using boto? - Stack Overflow

forums.aws.amazon.com › thread.jspa

1 of 10

Here's a solution which actually streams the data line by line:

from io import TextIOWrapper
from gzip import GzipFile
...

# get StreamingBody from botocore.response
response = s3.get_object(Bucket=bucket, Key=key)
# if gzipped
gzipped = GzipFile(None, 'rb', fileobj=response['Body'])
data = TextIOWrapper(gzipped)

for line in data:
    # process line

2 of 10

You may find https://pypi.python.org/pypi/smart_open useful for your task.

From documentation:

for line in smart_open.smart_open('s3://mybucket/mykey.txt'):
    print line

AWS re:Post

Forums | AWS re:Post

July 24, 2018 - AWS re:Post is a cloud knowledge service launched at re:Invent 2021. We've migrated selected questions and answers from Forums to AWS re:Post. The thread you are trying to access has outdated guidance, hence we have archived it. If you would like up-to-date guidance, then share your question ...

Alexwlchan

alexwlchan.net › 2019 › streaming-large-s3-objects

Streaming large objects from S3 with ranged GET requests – alexwlchan

September 12, 2019 - import java.io.{ByteArrayInputStream, InputStream, SequenceInputStream} import java.util import com.amazonaws.services.s3.model.{GetObjectRequest, S3ObjectInputStream} import com.amazonaws.services.s3.AmazonS3 import org.apache.commons.io.IOUtils import scala.util.Try /** Read objects from S3, buffering up to `bufferSize` bytes of an object * in-memory at a time, minimising the time needed to hold open * an HTTP connection to S3.

Bloomreach

community.bloomreach.com › experience manager (paas/onprem)

How to resolve warning: "Not all bytes were read from the S3ObjectInputStream" - Experience Manager (PaaS/OnPrem) - Bloomreach Developers Forum

September 3, 2018 - Hi, We’re using Hippo with an S3 DataStore, and have followed the instructions to Deploy the Authoring and Delivery Web Applications Separately. We’re seeing the following warning crop up repeatedly in the logs, mostly on app servers for the public site (but also a very few times on the CMS host): WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection.