Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revisionBoth sides next revision | ||
bootcat:help:corpus_creation_mode [2013/06/15 15:19] – eros | bootcat:help:corpus_creation_mode [2019/11/05 12:55] – eros | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Corpus creation mode ====== | ====== Corpus creation mode ====== | ||
- | Version 0.71 of the BootCaT frontend introduced the possibility of skipping some of the steps involved in the corpus creation procedure. | + | You can choose between the following creation " |
- | + | ||
- | You can now choose between the following creation " | + | |
* [[bootcat: | * [[bootcat: | ||
* [[bootcat: | * [[bootcat: | ||
* [[bootcat: | * [[bootcat: | ||
+ | * [[bootcat: | ||
+ | * [[bootcat: | ||
{{: | {{: | ||
Line 19: | Line 19: | ||
===== Custom tuples (advanced) ===== | ===== Custom tuples (advanced) ===== | ||
- | In this mode you skip the seed selection | + | In this mode you skip the seed selection |
Remember that each line will become a single query to the search engine, therefore phrases should be enclosed in quotes. You tuples should look like this: | Remember that each line will become a single query to the search engine, therefore phrases should be enclosed in quotes. You tuples should look like this: | ||
Line 35: | Line 35: | ||
===== Custom URLs (advanced) ===== | ===== Custom URLs (advanced) ===== | ||
- | In this mode you'll skip directly to the final stap: you'll be asked to provide | + | In this mode you'll skip directly to the final step, the one where the corpus is built using a list of Internet addresses (or URLs). |
- | You'll have to edit the list separately | + | You'll be asked to provide a text file containing one URL per line. |
+ | |||
+ | You'll have to edit the list separately | ||
The text file should look like this: | The text file should look like this: | ||
Line 47: | Line 49: | ||
... | ... | ||
</ | </ | ||
+ | |||
+ | **N.B.**: only URLs pointing to HTML files will be downloaded (typical extensions for such files are '' | ||
+ | |||
+ | ===== Local files (advanced) ===== | ||
+ | |||
+ | Using this mode BootCaT will process all files contained in a folder (and its subfolders) on your computer. Files will be cleaned and a single text file will be created. |