Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
| bootcat:help:corpus_creation_mode [2016/11/14 12:29] – eros | bootcat:help:corpus_creation_mode [2023/04/19 10:56] (current) – eros | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== Corpus creation mode ====== | ====== Corpus creation mode ====== | ||
| - | Version 0.71 of the BootCaT frontend introduced the possibility of skipping some of the steps involved in the corpus | + | You can choose between | 
| - | You can now choose between the following creation " | + |  | 
| - | + | * [[bootcat: | |
| - |  | + | * [[bootcat: | 
| - | * [[bootcat: | + | * [[bootcat: | 
| - | * [[bootcat: | + | * [[bootcat: | 
| - | * [[bootcat: | + | |
| {{: | {{: | ||
| Line 38: | Line 37: | ||
| In this mode you'll skip directly to the final step, the one where the corpus is built using a list of Internet addresses (or URLs). | In this mode you'll skip directly to the final step, the one where the corpus is built using a list of Internet addresses (or URLs). | ||
| - | You'll be asked to provide a text file containing one URL per line. | + | You'll be asked to provide a text file containing one **valid** | 
| - | You'll have to edit the list separately using a text editor (like Notepad for Windows or TextEdit for Mac) and save it in '' | + | You'll have to edit the list separately using a text editor (like [[https:// | 
| The text file should look like this: | The text file should look like this: | ||
| Line 46: | Line 45: | ||
| < | < | ||
| http:// | http:// | ||
| - | http:// | + | https:// | 
| + | https:// | ||
| http:// | http:// | ||
| - | ... | + | http:// | 
| </ | </ | ||
| - | **N.B.**: only URLs pointing to HTML files will be downloaded | + | NB: up to version 1.21, BootCaT does not accept URLs lists encoded as " | 
| ===== Local files (advanced) ===== | ===== Local files (advanced) ===== | ||
| - | Using this mode BootCaT will process all files contained in a folder | + | Using this mode BootCaT will process all files contained in a folder on your computer. Files will be cleaned and the corpus files will be created. | 
| + | |||
| + | Most common file formats are supported, including '' | ||
| + | |||
| + | ===== Local queries (advanced) ===== | ||
| + | |||
| + | Using this mode, you can query Google normally using a web browser and save the result pages to a folder. Then you can tell BootCaT where this folder is and it will extract the URLs from the queries you saved. | ||