Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
bootcat:help:corpus_creation_mode [2019/11/05 11:55] – eros | bootcat:help:corpus_creation_mode [2023/04/19 08:56] (current) – eros | ||
---|---|---|---|
Line 3: | Line 3: | ||
You can choose between the following creation " | You can choose between the following creation " | ||
- | * [[bootcat: | + | * [[bootcat: |
- | * [[bootcat: | + | * [[bootcat: |
- | * [[bootcat: | + | * [[bootcat: |
- | * [[bootcat: | + | * [[bootcat: |
- | * [[bootcat: | + | * [[bootcat: |
{{: | {{: | ||
Line 37: | Line 37: | ||
In this mode you'll skip directly to the final step, the one where the corpus is built using a list of Internet addresses (or URLs). | In this mode you'll skip directly to the final step, the one where the corpus is built using a list of Internet addresses (or URLs). | ||
- | You'll be asked to provide a text file containing one URL per line. | + | You'll be asked to provide a text file containing one **valid** |
- | You'll have to edit the list separately using a text editor (like Notepad for Windows or TextEdit for Mac) and save it in '' | + | You'll have to edit the list separately using a text editor (like [[https:// |
The text file should look like this: | The text file should look like this: | ||
Line 45: | Line 45: | ||
< | < | ||
http:// | http:// | ||
- | http:// | + | https:// |
+ | https:// | ||
http:// | http:// | ||
- | ... | + | http:// |
</ | </ | ||
- | **N.B.**: only URLs pointing to HTML files will be downloaded | + | NB: up to version 1.21, BootCaT does not accept URLs lists encoded as " |
===== Local files (advanced) ===== | ===== Local files (advanced) ===== | ||
- | Using this mode BootCaT will process all files contained in a folder | + | Using this mode BootCaT will process all files contained in a folder on your computer. Files will be cleaned and the corpus files will be created. |
+ | |||
+ | Most common file formats are supported, including '' | ||
+ | |||
+ | ===== Local queries (advanced) ===== | ||
+ | |||
+ | Using this mode, you can query Google normally using a web browser and save the result pages to a folder. Then you can tell BootCaT where this folder is and it will extract the URLs from the queries you saved. |