Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision Next revisionBoth sides next revision | ||
bootcat:help:corpus_creation_mode [2013/06/15 13:33] – created eros | bootcat:help:corpus_creation_mode [2019/11/05 13:00] – [Custom URLs (advanced)] eros | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Corpus creation mode ====== | ====== Corpus creation mode ====== | ||
- | Version 0.71 of the BootCaT frontend introduced the possibility of skipping some of the steps involved in the corpus | + | You can choose between |
- | You can now choose between the following creation " | + | * [[bootcat:help: |
+ | * [[bootcat: | ||
+ | * [[bootcat: | ||
+ | * [[bootcat: | ||
+ | * [[bootcat: | ||
- | * Simple mode (recommended) | + | {{: |
- | * Custom tuples (advanced) | + | |
- | * Custom URLs (advanced) | + | |
===== Simple mode (recommended) ===== | ===== Simple mode (recommended) ===== | ||
Line 13: | Line 15: | ||
This is the standard method for creating a BootCaT corpus: you choose seeds, build random tuples, collect URLs and finally build the corpus. | This is the standard method for creating a BootCaT corpus: you choose seeds, build random tuples, collect URLs and finally build the corpus. | ||
- | If you're a novice user this is the mode you should use (see the [[bootcat: | + | If you're a novice user this is the mode you should use (see the [[bootcat: |
===== Custom tuples (advanced) ===== | ===== Custom tuples (advanced) ===== | ||
- | In this mode you skip the seed selection | + | In this mode you skip the seed selection |
Remember that each line will become a single query to the search engine, therefore phrases should be enclosed in quotes. You tuples should look like this: | Remember that each line will become a single query to the search engine, therefore phrases should be enclosed in quotes. You tuples should look like this: | ||
Line 33: | Line 35: | ||
===== Custom URLs (advanced) ===== | ===== Custom URLs (advanced) ===== | ||
- | In this mode you'll skip directly to the final stap: you'll be asked to provide | + | In this mode you'll skip directly to the final step, the one where the corpus is built using a list of Internet addresses (or URLs). |
- | You'll have to edit the list separately | + | You'll be asked to provide a text file containing one URL per line. |
+ | |||
+ | You'll have to edit the list separately | ||
The text file should look like this: | The text file should look like this: | ||
Line 45: | Line 49: | ||
... | ... | ||
</ | </ | ||
+ | |||
+ | **N.B.**: you need to provide a list of valid URLs, i.e. each line must begin with '' | ||
+ | |||
+ | ===== Local files (advanced) ===== | ||
+ | |||
+ | Using this mode BootCaT will process all files contained in a folder (and its subfolders) on your computer. Files will be cleaned and a single text file will be created. |