No, there isn't a way to direct S3 to fetch a resource, on your behalf, from a non-S3 URL and save it in a bucket.
The only "fetch"-like operation S3 supports is the PUT/COPY operation, where S3 supports fetching an object from one bucket and storing it in another bucket (or the same bucket), even across regions, even across accounts, as long as you have a user with sufficient permission for the necessary operations on both ends of the transaction. In that one case, S3 handles all the data transfer, internally.
Otherwise, the only way to take a remote object and store it in S3 is to download the resource and then upload it to S3 -- however, there's nothing preventing you from doing both things at the same time.
To do that, you'll need to write some code, using presumably either asynchronous I/O or threads, so that you can simultaneously be receiving a stream of downloaded data and uploading it, probably in symmetric chunks, using S3's Multipart Upload capability, which allows you to write individual chunks (minimum 5MB each) which, with a final request, S3 will validate and consolidate into a single object of up to 5TB. Multipart upload supports parallel upload of chunks, and allows your code to retry any failed chunks without restarting the whole job, since the individual chunks don't have to be uploaded or received by S3 in linear order.
If the origin supports HTTP range requests, you wouldn't necessarily even need to receive a "stream," you could discover the size of the object and then GET chunks by range and multipart-upload them. Do this operation with threads or asynch I/O handling multiple ranges in parallel, and you will likely be able to copy an entire object faster than you can download it in a single monolithic download, depending on the factors limiting your download speed.
I've achieved aggregate speeds in the range of 45 to 75 Mbits/sec while uploading multi-gigabyte files into S3 from outside of AWS using this technique.
Answer from Michael - sqlbot on Stack OverflowNo, there isn't a way to direct S3 to fetch a resource, on your behalf, from a non-S3 URL and save it in a bucket.
The only "fetch"-like operation S3 supports is the PUT/COPY operation, where S3 supports fetching an object from one bucket and storing it in another bucket (or the same bucket), even across regions, even across accounts, as long as you have a user with sufficient permission for the necessary operations on both ends of the transaction. In that one case, S3 handles all the data transfer, internally.
Otherwise, the only way to take a remote object and store it in S3 is to download the resource and then upload it to S3 -- however, there's nothing preventing you from doing both things at the same time.
To do that, you'll need to write some code, using presumably either asynchronous I/O or threads, so that you can simultaneously be receiving a stream of downloaded data and uploading it, probably in symmetric chunks, using S3's Multipart Upload capability, which allows you to write individual chunks (minimum 5MB each) which, with a final request, S3 will validate and consolidate into a single object of up to 5TB. Multipart upload supports parallel upload of chunks, and allows your code to retry any failed chunks without restarting the whole job, since the individual chunks don't have to be uploaded or received by S3 in linear order.
If the origin supports HTTP range requests, you wouldn't necessarily even need to receive a "stream," you could discover the size of the object and then GET chunks by range and multipart-upload them. Do this operation with threads or asynch I/O handling multiple ranges in parallel, and you will likely be able to copy an entire object faster than you can download it in a single monolithic download, depending on the factors limiting your download speed.
I've achieved aggregate speeds in the range of 45 to 75 Mbits/sec while uploading multi-gigabyte files into S3 from outside of AWS using this technique.
This has been answered by me in this question, here's the gist:
object = Aws::S3::Object.new(bucket_name: 'target-bucket', key: 'target-key')
object.upload_stream do |write_stream|
IO.copy_stream(URI.open('http://example.com/file.ext'), write_stream)
end
This is no 'direct' pull-from-S3, though. At least this doesn't download each file and then uploads in serial, but streams 'through' the client. If you run the above on an EC2 instance in the same region as your bucket, I believe this is as 'direct' as it gets, and as fast as a direct pull would ever be.
Best way to upload files to S3 from front-end web app
amazon web services - Is it possible to upload to S3 directly from URL using POST? - Stack Overflow
Using Access Point URL to upload a file
Is there a way to "trigger" an upload from a URL to S3 without needing to keep an open process on my end?
Videos
What is the best way to upload files from a front end web app to S3? The way I currently have my infrastructure is:
-
User submits a post request to my /uploads route
-
/uploads route in API gateway has a authorizer in place to check for authentication and then directs traffic to my lambda function
-
My lambda function generates a pre-signed URL and returns it to the front end
-
Front end takes the presigned URL and uploads files to the s3 bucket
The problem I have is I basically have no security (checking file type, size, etc).
Anyone can use this url to upload a file
Besides setting bucket policies, is there a better way?
It sounds like you want S3 itself to download the file from a remote server where you only pass the URL of the resource to S3.
This is not currently supported by S3.
It needs an API client to actually transfer the content of the object to S3.
I thought I should share my code to achieve something similar. I was working on the backend but possibly could do something similar in frontend though be mindful about AWS credentials likely to be exposed.
For my purposes, I wanted to download a file from the external URL and then ultimately get back the URL form S3 of the uploaded file instead.
I also used axios in order to get the uploadable format and file-type to get the proper type of the file but that is not the requirement.
Below is the snippet of my code:
async function uploadAttachmentToS3(type, buffer) {
var params = {
//file name you can get from URL or in any other way, you could then pass it as parameter to the function for example if necessary
Key : 'yourfolder/directory/filename',
Body : buffer,
Bucket : BUCKET_NAME,
ContentType : type,
ACL: 'public-read' //becomes a public URL
}
//notice use of the upload function, not the putObject function
return s3.upload(params).promise().then((response) => {
return response.Location
}, (err) => {
return {type: 'error', err: err}
})
}
async function downloadAttachment(url) {
return axios.get(url, {
responseType: 'arraybuffer'
})
.then(response => {
const buffer = Buffer.from(response.data, 'base64');
return (async () => {
let type = (await FileType.fromBuffer(buffer)).mime
return uploadAttachmentToS3(type, buffer)
})();
})
.catch(err => {
return {type: 'error', err: err}
});
}
let myS3Url = await downloadAttachment(url)
I hope it helps people who still struggle with similar issues. Good luck!