Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revisionBoth sides next revision | ||
bootcat:help:corpus_creation_mode [2016/11/14 12:29] – eros | bootcat:help:corpus_creation_mode [2019/11/08 09:08] – [Custom URLs (advanced)] eros | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Corpus creation mode ====== | ====== Corpus creation mode ====== | ||
- | Version 0.71 of the BootCaT frontend introduced the possibility of skipping some of the steps involved in the corpus creation procedure. | + | You can choose between the following creation " |
- | + | ||
- | You can now choose between the following creation " | + | |
* [[bootcat: | * [[bootcat: | ||
Line 9: | Line 7: | ||
* [[bootcat: | * [[bootcat: | ||
* [[bootcat: | * [[bootcat: | ||
+ | * [[bootcat: | ||
{{: | {{: | ||
Line 38: | Line 37: | ||
In this mode you'll skip directly to the final step, the one where the corpus is built using a list of Internet addresses (or URLs). | In this mode you'll skip directly to the final step, the one where the corpus is built using a list of Internet addresses (or URLs). | ||
- | You'll be asked to provide a text file containing one URL per line. | + | You'll be asked to provide a text file containing one **valid** |
- | You'll have to edit the list separately using a text editor (like Notepad for Windows or TextEdit for Mac) and save it in '' | + | You'll have to edit the list separately using a text editor (like [[https:// |
The text file should look like this: | The text file should look like this: | ||
Line 46: | Line 45: | ||
< | < | ||
http:// | http:// | ||
- | http:// | + | https:// |
+ | https:// | ||
http:// | http:// | ||
+ | http:// | ||
... | ... | ||
</ | </ | ||
- | **N.B.**: only URLs pointing to HTML files will be downloaded (typical extensions for such files are '' | + | NB: up to version 1.21, BootCaT does not accept URLs lists encoded as " |
===== Local files (advanced) ===== | ===== Local files (advanced) ===== | ||
Using this mode BootCaT will process all files contained in a folder (and its subfolders) on your computer. Files will be cleaned and a single text file will be created. | Using this mode BootCaT will process all files contained in a folder (and its subfolders) on your computer. Files will be cleaned and a single text file will be created. | ||
+ | |||
+ | Most common file formats are supported, including '' | ||
+ | |||
+ | ===== Local queries (advanced) ===== | ||
+ | |||
+ | Using this mode, you can query Google normally using a web browser and save the result pages to a folder. Then you can tell BootCaT where this folder is and it will extract the URLs from the queries you saved. | ||
+ |