====== Using an external downloader ====== When you download a very long list of URLs, sometimes BootCaT will crash. We're trying to fix the problem, but for now here's a handy workaround. Even if BootCaT crashed while downloading files, you can find a file called ''urls_list_final.txt'' in the folder created for the failed attempt at building your corpus: that's the list of all the URLs you collected in the first stage of the corpus creation process. You can simply try again using the [[bootcat:help:corpus_creation_mode|Custom URLs]] corpus creation mode. Another solution is downloading the files using an external program and then using BootCaT to clean them using the [[bootcat:help:corpus_creation_mode|Local files]] corpus creation mode. Here's a step-by-step guide to downloading files using the freeware external downloader WinWget and then turning them into a corpus with BootCaT. ===== Download and configure WinWget ===== * Visit the WinWget site at https://www.astatix.com/tools/winwget.php and download the WinWget zip file * Unzip the WinWget.zip file {{ :bootcat:tutorials:external_downloader:010.png?nolink |}} * Download Wget for Windows from here https://eternallybored.org/misc/wget/1.20.3/32/wget.exe and move it to the WinWget folder {{ :bootcat:tutorials:external_downloader:020.png?nolink |}} * Double-click on WinWget to start the application, then click on Tools -> Options {{ :bootcat:tutorials:external_downloader:030.png?nolink |}} * Click on browse and select the ''wget.exe'' file you downloaded earlier * Then select the folder where you want to download the web pages {{ :bootcat:tutorials:external_downloader:040.png?nolink |}} * Click OK ===== Downloading URLs ===== * Create a new download job {{ :bootcat:tutorials:external_downloader:050.png?nolink |}} * Select the ''url_list_final.txt'' file {{ :bootcat:tutorials:external_downloader:060.png?nolink |}} * Add double quotes characters (''"'') at the beginning and the end of the file path, it should look something like ''"C:\Users\john\Desktop\urls_list_final.txt"'', **the important part is that there must be double quotes at the beginning and at the end of the line** {{ :bootcat:tutorials:external_downloader:070.png?nolink |}} {{ :bootcat:tutorials:external_downloader:080.png?nolink |}} * Click OK, you'll see the job is ready to run, click on the Run button {{ :bootcat:tutorials:external_downloader:090.png?nolink |}} * After some time (from a few seconds to several minutes, depending on the number of URLs), you'll see that the job is complete {{ :bootcat:tutorials:external_downloader:100.png?nolink |}} * You can close WinWget and move on to the corpus creation process ===== Creating the corpus ===== * Start BootCaT, choose a name and a language for the corpus as usual and when BootCaT asks you how you want to proceed, select ''Local files'' {{ :bootcat:tutorials:external_downloader:110.png?nolink |}} * BootCaT will ask you to select the folder containing the downloaded pages, click **once** on the folder and then click on ''Open'' {{ :bootcat:tutorials:external_downloader:120.png?nolink |}} * You'll be taken to the corpus creation page, click on "Build corpus" and you're done! {{ :bootcat:tutorials:external_downloader:130.png?nolink |}}