WebLech is a fully featured web site download/mirror tool in Java, which supports many features required to download websites and emulate standard web-browser behaviour as much as possible. WebLech is multithreaded and will feature a GUI console.
Similar in some aspects to tools such as wget (in recursive retrieval mode), WebSuck or Teleport Pro, WebLech allows you to "spider" a website and to recursively download all the pages on it. You can then browse the site offline for your convenience, or even "mirror" the website and re-publish it yourself. Note that WebLech is not suited to downloading single URLs -- use wget for this kind of thing.
Features
WebLech has a number of features that make it useful:
* Open Source MIT Licence means it's totally free and you can do what you want with it
* Pure Java code means you can run it on any Java-enabled computer
* Multi-threaded operation for downloading lots of files at once
* Supports basic HTTP authentication for accessing password-protected sites
* HTTP referer support maintains link information between pages (needed to Spider some websites)
* Lots of configuration options:
o Depth-first or breadth-first traversal of the site
o Candidate URL filtering, so you can stick to one web server, one directory, or just Spider the whole web
o Configurable caching of downloaded files allows restart without needing to download everything again
o URL prioritisation, so you can get interesting files first and leave boring files till last (or ignore them completely)
o Checkpointing so you can snapshot spider state in the middle of a run and restart without lots of processing.
WebLech URL Spider Home Page
http://weblech.sourceforge.net/
Download WebLech URL Spider
http://prdownloads.sourceforge.net/weblech/weblech-0.0.3.tar.gz?download