The latest update to WebCopy has just been released, and includes two new features which expand the usefulness of the product.
These features are considered experimental at this stage - they haven't been as fully tested as some other features, and as a result they either might not work properly or have unintended side effects.
Multiple Hosts
One of the more odd omissions of WebCopy was the fact it wouldn't crawl other hosts. You could copy sub domains, but what about if you used a CDN with a completely different domain name? Fortunately that deficit has now been rectified. The Additional Hosts configuration page lets you specify additional domains to crawl.
Now, when WebCopy finds an external URI, it will check to see if the domain is listed as safe to crawl. If it is, it will promptly download the linked resource, and then attempt to scan it for further links, and expand from there.
As these additional hosts can be jumped into from any level, some project settings won't apply to the additional hosts - for example the Crawl Above Root setting. Therefore it is important to make sure you use rules to control how content is downloaded.
Proxy Server Support
Previously, WebCopy would use the system defined proxy server settings. Now you can config your own independent settings on a per-project based. This allows all requests during a crawl to be sent via the proxy.
Odds and ends
With these features being new and only tested in a limited fashion, there could be bugs or side effects - please let us know if you experience any problems.
As is usual for these updates, there is also a handful of bug fixes and minor new functionality, mostly around the UI interactions, but also including a fix where WebCopy would treat certain URI's as sub domains even though they weren't.
We hope you enjoy this update to the product!
All content Copyright (c) by Cyotek Ltd or its respective writers. Permission to reproduce news and web log entries and other RSS feed content in unmodified form without notice is granted provided they are not used to endorse or promote any products or opinions (other than what was expressed by the author) and without taking them out of context. Written permission from the copyright owner must be obtained for everything else.
Original URL of this content is https://www.cyotek.com/blog/products/webcopy-1-0-9-0-released-multiple-hosts-and-proxy-server-support?source=rss.