How to automate downloading a file from a site?

stackoverflow.com › questions › 56591238 › how-to-automate-downloading-a-file-from-a-site

First off, javascript in the client is probably not the best language to do this nor the best approach to make this happen. It might work, but it's better to know what is best when choosing an approach to a problem. Also, it will avoid for you clicking ~800 times in the popup accepting the download.

You can get the files in a programatically way by just learning what you browser is doing to get the file and trying to reproduce it in bunch.

After inspecting the calls you can see that it's calling an endpoint and that endpoint is returning a link which contains the file that you can download.

Well, that is going to be easy, so now you just need to make the script in any language to be able to retrieve them.

I've chosen javascript but not client side, but nodejs which means that this has to run from your computer.

You could do the same with bash, python or any other language.

To run this do the following:

Go to a new empty directory
Run npm install axios
Create a file with the code I pasted let's call it crawler.js
Run node crawler.js

This has been tested using node v8.15.0

// NOTE: Require this to make a request and save the link as file 20190813:Alevale
const axios = require('axios');
const fs = require('fs');

let now = new Date();
let daysOfYear = [];
const baseUrl = 'https://a4dzytphl9.execute-api.ap-southeast-1.amazonaws.com/prod/eod/'

for (var d = new Date(2016, 0, 1); d <= now; d.setDate(d.getDate() + 1)) {
    daysOfYear.push(new Date(d).toISOString().substring(0,10));
}

const waitFor = (time) => {
    return new Promise((resolve => setTimeout(resolve, time)))
}

const getUrls = async () =>{
    let day
    for (day of daysOfYear) {
        console.log('getting day', baseUrl + day)
        // NOTE: Throttle the calls to not overload the server 20190813:Alevale
        await waitFor(4000)

        await axios.get(baseUrl + day)
            .then(response => {
                console.log(response.data);
                console.log(response);
                if (response.data && response.data.download_url) {
                    return response.data.download_url
                }
                return Promise.reject('Could not retrieve response.data.download_url')
            })
            .then((url) =>{
                axios({
                    method: 'get',
                    url,
                    responseType: 'stream'
                })
                    .then(function (response) {
                        // NOTE: Save the file as 2019-08-13 20190813:Alevale
                        response.data.pipe(fs.createWriteStream(`${day}.csv`))
                    })
                    .catch(console.error)

            })
            .catch(error => {
                console.log(error);
            });
    }
}

getUrls()

Answer from Alejandro Vales on Stack Overflow

Axiom

axiom.ai › docs › reference › steps › download-file-from-url

Download file from URL step | axiom.ai

Use this step to automatically download a file from a specified URL to your computer. You can use this in conjunction with other steps to create a web automation that retrieves URLs from various sources such as Google Sheet, and downloads the corresponding files.

Stack Overflow

stackoverflow.com › questions › 56591238 › how-to-automate-downloading-a-file-from-a-site

How to automate downloading a file from a site?

Top answer

1 of 2

You can get the files in a programatically way by just learning what you browser is doing to get the file and trying to reproduce it in bunch.

After inspecting the calls you can see that it's calling an endpoint and that endpoint is returning a link which contains the file that you can download.

Well, that is going to be easy, so now you just need to make the script in any language to be able to retrieve them.

I've chosen javascript but not client side, but nodejs which means that this has to run from your computer.

You could do the same with bash, python or any other language.

To run this do the following:

Go to a new empty directory
Run npm install axios
Create a file with the code I pasted let's call it crawler.js
Run node crawler.js

This has been tested using node v8.15.0

// NOTE: Require this to make a request and save the link as file 20190813:Alevale
const axios = require('axios');
const fs = require('fs');

let now = new Date();
let daysOfYear = [];
const baseUrl = 'https://a4dzytphl9.execute-api.ap-southeast-1.amazonaws.com/prod/eod/'

for (var d = new Date(2016, 0, 1); d <= now; d.setDate(d.getDate() + 1)) {
    daysOfYear.push(new Date(d).toISOString().substring(0,10));
}

const waitFor = (time) => {
    return new Promise((resolve => setTimeout(resolve, time)))
}

const getUrls = async () =>{
    let day
    for (day of daysOfYear) {
        console.log('getting day', baseUrl + day)
        // NOTE: Throttle the calls to not overload the server 20190813:Alevale
        await waitFor(4000)

        await axios.get(baseUrl + day)
            .then(response => {
                console.log(response.data);
                console.log(response);
                if (response.data && response.data.download_url) {
                    return response.data.download_url
                }
                return Promise.reject('Could not retrieve response.data.download_url')
            })
            .then((url) =>{
                axios({
                    method: 'get',
                    url,
                    responseType: 'stream'
                })
                    .then(function (response) {
                        // NOTE: Save the file as 2019-08-13 20190813:Alevale
                        response.data.pipe(fs.createWriteStream(`${day}.csv`))
                    })
                    .catch(console.error)

            })
            .catch(error => {
                console.log(error);
            });
    }
}

getUrls()

2 of 2

You can instead of simulating the user, get the link to download from: https://a4dzytphl9.execute-api.ap-southeast-1.amazonaws.com/prod/eod/2019-08-07 just change the date at the end to the date of the file you want to download. And use axios to get this URL.

This will save you sometime (in case you don't really need to simulate the click of the user etc)

Then you will get a response like this:

{
   download_url":"https://d3u9ukmkxau9he.cloudfront.net/eod/2019-08-07.csv?Expires=1566226156&Signature=QRUk3tstuNX5KYVPKJSWrXsSXatkWS-eFBIGUufaTEMJ~rgpVi0iPCe1AXl5pbQVdBQxOctpixCbyNz6b9ycDgYNxEdZqPr2o2pDe8cRL655d3zXdICnEGt~dU6p35iMAJkMpPSH~jbewhRSCPUwWXQBfOiEzlHwxru9lPnDfsdSnk3iI3GyR8Oc0ZP50EdUMHF7MjWSBRbCIwnu6wW4Jh0bPmZkQDQ63ms5QxehsmtuGLOgcrC6Ky1OffVQj~ihhmBt4LGhZTajjK4WO18hCP3urKt03qpC4bOvYvJ3pxvRkae0PH1f-vbTWMDkaWHHVCrzqZhkAh3FlvMTWj8D4g__&Key-Pair-Id=APKAIAXOVAEOGN2AYWNQ"
}

and then you can use axios to GET this url and download your file.

Videos