Archive.org is a fantastic source for all kinds of data. It even makes it convenient to download by supplying a .torrent with every submission!
The only problem is, the torrents almost never function correctly on archives with lots of files.
It’s a good thing Archive made a Python tool to download and upload directly from their servers!
However, on many of the archives I’ve tried, it fails to download files surprisingly often. Each file is represented by a letter, and ‘e’ means that there was an error of some kind. It’s not a problem with my internet, because some of the archives download fine, and some just don’t.
Here’s how you can download files from Archive consistently, without any problems.
-
Firstly, install Internet Download Manager, if you don’t have it already.
-
Then, using Chrome, install Linkclump. In its Options menu, create a new action (or edit the existing action, if there is one) and set Right mouse button to Copy selected links to clipboard.
-
Next, open the Archive page you want to download from. In Download Options, select Show All. Right click and drag to select everything you want to download in the list, and paste it into Notepad. Save it as a text document.
-
Finally, open Internet Download Manager. Under Tasks, select Import > From text file and select the .txt document with all the Archive links. It will give you a list of the links found, and prompt you to check which ones you want to download. Check All, and you can uncheck the files you don’t want to download manually.
-
Once you click OK, it will add all the links to your download queue and begin downloading them. It may freeze after about a hundred links or so; it has not crashed, just be patient.
Thanks for reading! If you have any issues, reply to this post or PM me and I’ll try my best to help. Now get hoarding!
That's a nice trick.
I've been using Internet Download Manager for a while now (fantastic, and very versatile tool for Windows), and in the past, I'd just copy and paste links into a txt document manually. I never knew about anything like Linkclump; this would definitely help.
AAAAAAAAA.
I desperately needed this just a few weeks ago, for a small side-project. What's done is done, though. Thanks. :)
How to Download Archive.org Books as PDFs You Can Keep
In archive org, some books can only be borrowed for 1 hour, these types of books do not have a download button as usual, does anyone know how to download these kind of books?
How do I download a book from archive.org?
Here's a bookmarklet method to download borrowed "scanned" books from archive.org if anyone's interested. Naturally make sure to delete the files once your book borrow time has expired!
https://gist.github.com/cemerson/043d3b455317d762bb1378aeac3679f3
Is it possible to download books from Internet Archive?
Videos
A. How to Get Archive Download Links even when no download links are available
This (usually)† works so long as you can "Borrow" (and in particular, doesn't require that you be able to "Borrow for 14 days").
Click on "Borrow" (you must be logged into Archive).
Let's say the book link is https://archive.org/details/XXXXX/
Then simply go to https://archive.org/services/loans/loan/?action=media_url&identifier=XXXXX&format=lcp_pdf&redirect=1 and your browser should download the LCPL file.
(Unfortunately this doesn't seem to work for books that you can't "Borrow".)
B. How to Convert Archive LCP Files to PDFs that you can keep
Pre-reqs:
I. Run and install Thorium
II. Run and install Python
III. Download "lcpdedrm.py" https://drive.google.com/drive/folders/16czfYfWp-UY4MaUZ15azNVHSK2SEe_5h
Now,
In Archive, download the "LCPL" file (either as described above or if download is available by clicking the download LCPDF link).
(You might have to first wait a few minutes before this step.) Open Thorium, drag and drop the downloaded LCPL file into Thorium.
Click on three dots (⋮) > "Save as" > Now save the file as a LCPDF file (e.g. "abc-def-123.lcpdf"; it should be automatically saved as a LCPDF file but if it isn't, make sure to rename it with the ".lcpdf" at the end). Ensure there are no special characters such as colons ":" or periods "." in the LCPDF file name (to be sure, you can just rename the LCPDF as e.g. "x.lcpdf").
Run the Python script lcpdedrm.py and follow the instructions. In particular, select the file "abc-def-123.lcpdf" (and NOT the original LCPL file). Enter the passphrase (which should be the Archive email account you used to download the LCP file).
You should get an output folder "abc-def-123" (this folder will be in the same folder that contains "abc-def-123.lcpdf"). This output folder will contain the PDF "XXXXX.pdf" (and also possibly a JPG with the cover).
Optional: Now upload the PDF book you downloaded to LibGen and everywhere too!
Having trouble? Ensure that you've VERY carefully followed ALL of the above instructions to the letter. Try again, slowly and carefully. If you're still having trouble, post a comment sharing your problem and the Archive link (so I can try it out).
†For some files, you may get the following message when trying to download the LCP file (and so unfortunately this method won't work): error: "lcp_pdf unavailable". (I believe this error arises for most books that were uploaded to Archive recently, say 2025 or later).
2025-10-30 added Here's another new method that's much easier than the above: Internet Archive Downloader (extensions for Firefox, Edge, and Chrome).
So far, I've only tried this new method for only one book: The quality is similar to the above LCPDF method. But note though that the file size is about 10x more (166 vs 15MB).
I tried different ways to download a site and finally I found the wayback machine downloader - which was built by Hartator (so all credits go to him, please), but I simply did not notice his comment to the question. To save you time, I decided to add the wayback_machine_downloader gem as a separate answer here.
The site at http://www.archiveteam.org/index.php?title=Restoring lists these ways to download from archive.org:
- Wayback Machine Downloader, small tool in Ruby to download any website from the Wayback Machine. Free and open-source. My choice! As of 2025, the original project hadn't received updates for 3 years and stopped working at least for some sites, StrawberryMaster/wayback-machine-downloader is a fork that worked better.
- Warrick - Main site seems down.
- Wayback downloaders - a service that will download your site from the Wayback Machine and even add a plugin for WordPress. Not free.
This can be done using a bash shell script combined with wget.
The idea is to use some of the URL features of the wayback machine:
http://web.archive.org/web/*/http://domain/*will list all saved pages fromhttp://domain/recursively. It can be used to construct an index of pages to download and avoid heuristics to detect links in webpages. For each link, there is also the date of the first version and the last version.http://web.archive.org/web/YYYYMMDDhhmmss*/http://domain/pagewill list all version ofhttp://domain/pagefor year YYYY. Within that page, specific links to versions can be found (with exact timestamp)http://web.archive.org/web/YYYYMMDDhhmmssid_/http://domain/pagewill return the unmodified pagehttp://domain/pageat the given timestamp. Notice the id_ token.
These are the basics to build a script to download everything from a given domain.