Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
| bootcat:help:corpus_creation_mode [2019/11/05 12:55] – eros | bootcat:help:corpus_creation_mode [2023/04/19 10:56] (current) – eros | ||
|---|---|---|---|
| Line 3: | Line 3: | ||
| You can choose between the following creation " | You can choose between the following creation " | ||
| - | * [[bootcat: | + | * [[bootcat: | 
| - | * [[bootcat: | + | * [[bootcat: | 
| - | * [[bootcat: | + | * [[bootcat: | 
| - | * [[bootcat: | + | * [[bootcat: | 
| - | * [[bootcat: | + | * [[bootcat: | 
| {{: | {{: | ||
| Line 37: | Line 37: | ||
| In this mode you'll skip directly to the final step, the one where the corpus is built using a list of Internet addresses (or URLs). | In this mode you'll skip directly to the final step, the one where the corpus is built using a list of Internet addresses (or URLs). | ||
| - | You'll be asked to provide a text file containing one URL per line. | + | You'll be asked to provide a text file containing one **valid** | 
| - | You'll have to edit the list separately using a text editor (like Notepad for Windows or TextEdit for Mac) and save it in '' | + | You'll have to edit the list separately using a text editor (like [[https:// | 
| The text file should look like this: | The text file should look like this: | ||
| Line 45: | Line 45: | ||
| < | < | ||
| http:// | http:// | ||
| - | http:// | + | https:// | 
| + | https:// | ||
| http:// | http:// | ||
| - | ... | + | http:// | 
| </ | </ | ||
| - | **N.B.**: only URLs pointing to HTML files will be downloaded | + | NB: up to version 1.21, BootCaT does not accept URLs lists encoded as " | 
| ===== Local files (advanced) ===== | ===== Local files (advanced) ===== | ||
| - | Using this mode BootCaT will process all files contained in a folder | + | Using this mode BootCaT will process all files contained in a folder on your computer. Files will be cleaned and the corpus files will be created. | 
| + | |||
| + | Most common file formats are supported, including '' | ||
| + | |||
| + | ===== Local queries (advanced) ===== | ||
| + | |||
| + | Using this mode, you can query Google normally using a web browser and save the result pages to a folder. Then you can tell BootCaT where this folder is and it will extract the URLs from the queries you saved. | ||