Hit Ctrl+S and save it as an HTML file (not MHTML). Then, in the <head> tag, add a <base href="http://downloaded_site's_address.com"> tag. For this webpage, for example, it would be <base href="http://stackoverflow.com">.
This makes sure that all relative links point back to where they're supposed to instead of to the folder you saved the HTML file in, so all of the resources (CSS, images, JavaScript, etc.) load correctly instead of leaving you with just HTML.
See MDN for more details on the <base> tag.
Hit Ctrl+S and save it as an HTML file (not MHTML). Then, in the <head> tag, add a <base href="http://downloaded_site's_address.com"> tag. For this webpage, for example, it would be <base href="http://stackoverflow.com">.
This makes sure that all relative links point back to where they're supposed to instead of to the folder you saved the HTML file in, so all of the resources (CSS, images, JavaScript, etc.) load correctly instead of leaving you with just HTML.
See MDN for more details on the <base> tag.
The HTML, CSS and JavaScript are sent to your computer when you ask for them on a HTTP protocol (for instance, when you enter a url on your browser), therefore, you have these parts and could replicate on your own pc or server. But if the website has a server-side code (databases, some type of authentication, etc), you will not have access to it, and therefore, won't be able to replicate on your own pc/server.
Crawler to download all HTML/CSS/JS needed
How to download ALL resources of a website
Videos
I built a fairly simple website for a business that showcases its work online in several categories. It uses Coldfusion to generate a lot of sub-pages for each category with the help of file system reading and merging with metadata supplied by a spreadsheet file.
I'm finally fed up with my CF service (not to mention, who programs in CF anymore) and I'm looking to get away from it entirely. In the end this website is static, so I'm looking for a good way to scrape the entire rendered site down for use on an alternative static server. What's a good way to do this nowadays? I've done some searching but "site scraper" seems to mean stuff like pricing metadata now instead of the actual website files.
HTTRACK works like a champ for copying the contents of an entire site. This tool can even grab the pieces needed to make a website with active code content work offline. I am amazed at the stuff it can replicate offline.
This program will do all you require of it.
Happy hunting!
Wget is a classic command-line tool for this kind of task. It comes with most Unix/Linux systems, and you can get it for Windows too. On a Mac, Homebrew is the easiest way to install it (brew install wget).
You'd do something like:
wget -r --no-parent http://example.com/songs/
For more details, see Wget Manual and its examples, or e.g. these:
wget: Download entire websites easy
Wget examples and scripts
If you on Mac, install brew, then use brew to install wget, finally, use wget command to download the page.
wget -p -k -e robots=off -U 'Mozilla/5.0 (X11; U; Linux i686; en-US;rv:1.8.1.6) Gecko/20070802 SeaMonkey/1.1.4' https://www.google.com
On the web page , view its source , and then copy it and paste it in notepad and for others open those css and js links from the website's source and save according to their file-extension.
wget -erobots=off --no-parent --wait=3 --limit-rate=20K -r -p -U "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)" -A htm,html,css,js,json,gif,jpeg,jpg,bmp http://example.com
This runs in the console.
this will grab a site, wait 3 seconds between requests, limit how fast it downloads so it doesn't kill the site, and mask itself in a way that makes it appear to just be a browser so the site doesn't cut you off using an anti-leech mechanism.
Note the -A parameter that indicates a list of the file types you want to download.
You can also use another tag, -D domain1.com,domain2.com to indicate a series of domains you want to download if they have another server or whatever for hosting different kinds of files. There's no safe way to automate that for all cases, if you don't get the files.
wget is commonly preinstalled on Linux, but can be trivially compiled for other Unix systems or downloaded easily for Windows: GNUwin32 WGET
Use this for good and not evil.
Good, Free Solution: HTTrack
HTTrack is a free (GPL, libre/free software) and easy-to-use offline browser utility.
It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the "mirrored" website in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads. HTTrack is fully configurable, and has an integrated help system.