SuckIT
SuckIT allows you to recursively visit and download a website's content to your disk.
suckit 0.2.0
CLI arguments
USAGE:
suckit [FLAGS] [OPTIONS] <url>
FLAGS:
-c, --continue-on-error Flag to enable or disable exit on error
--disable-certs-checks Dissable SSL certificates verification
--dry-run Do everything without saving the files to the disk
-h, --help Prints help information
-V, --version Prints version information
-v, --verbose Enable more information regarding the scraping process
--visit-filter-is-download-filter Use the dowload filter in/exclude regexes for visiting as well
OPTIONS:
-a, --auth <auth>...
HTTP basic authentication credentials space-separated as "username password host". Can be repeated for
multiple credentials as "u1 p1 h1 u2 p2 h2"
--delay <delay>
Add a delay in seconds between downloads to reduce the likelihood of getting banned [default: 0]
-d, --depth <depth>
Maximum recursion depth to reach when visiting. Default is -1 (infinity) [default: -1]
-e, --exclude-download <exclude-download>
Regex filter to exclude saving pages that match this expression [default: $^]
--exclude-visit <exclude-visit>
Regex filter to exclude visiting pages that match this expression [default: $^]
--ext-depth <ext-depth>
Maximum recursion depth to reach when visiting external domains. Default is 0. -1 means infinity [default:
0]
-i, --include-download <include-download>
Regex filter to limit to only saving pages that match this expression [default: .*]
--include-visit <include-visit>
Regex filter to limit to only visiting pages that match this expression [default: .*]
-j, --jobs <jobs> Maximum number of threads to use concurrently [default: 1]
-o, --output <output> Output directory
--random-range <random-range>
Generate an extra random delay between downloads, from 0 to this number. This is added to the base delay
seconds [default: 0]
-t, --tries <tries> Maximum amount of retries on download failure [default: 20]
-u, --user-agent <user-agent> User agent to be used for sending requests [default: suckit]
ARGS:
<url> Entry point of the scraping
This is an unofficial package of the SuckIT source which can be found at https://github.com/Skallwar/suckit. Made using the snap configuration found at https://github.com/popey/suckit-snap