Version 0.18 (2014-09-10)
- New tool:
BootCaTExtractor.jarperforms the same task asretrieve_and_clean_pages_from_url_list.plbut, unlike the Perl script, supports UTF-8 , language filtering and document size filtering; UrlCollector.jardoes not require the “market” parameter anymore;