Both sides previous revision Previous revision Next revision | Previous revision Next revisionBoth sides next revision |
bootcat:help:corpus_creation_mode [2015/01/27 15:08] – [Custom URLs (advanced)] eros | bootcat:help:corpus_creation_mode [2019/11/05 13:00] – [Custom URLs (advanced)] eros |
---|
====== Corpus creation mode ====== | ====== Corpus creation mode ====== |
| |
Version 0.71 of the BootCaT frontend introduced the possibility of skipping some of the steps involved in the corpus creation procedure. | You can choose between the following creation "modes": |
| |
You can now choose between the following creation "modes": | |
| |
* [[bootcat:help:corpus_creation_mode#simple_mode_recommended|Simple mode]] (recommended) | * [[bootcat:help:corpus_creation_mode#simple_mode_recommended|Simple mode]] (recommended) |
* [[bootcat:help:corpus_creation_mode#custom_tuples_advanced|Custom tuples]] (advanced) | * [[bootcat:help:corpus_creation_mode#custom_tuples_advanced|Custom tuples]] (advanced) |
* [[bootcat:help:corpus_creation_mode#custom_urls_advanced|Custom URLs]] (advanced) | * [[bootcat:help:corpus_creation_mode#custom_urls_advanced|Custom URLs]] (advanced) |
| * [[bootcat:help:corpus_creation_mode#local_files|Local files]] (advanced) |
| * [[bootcat:help:corpus_creation_mode#local_queries|Local queries]] (advanced) |
| |
{{:bootcat:help:corpus_creation_modes.png?nolink|}} | {{:bootcat:help:corpus_creation_modes.png?nolink|}} |
</file> | </file> |
| |
**N.B.**: only URLs pointing to HTML files will be downloaded (typical extensions for such files are ''.htm'', ''.html'', ''.php'', ''.asp''), if the list you provide contains URLs ending in PDF, DOC, DOCX etc. BootCaT will display an error and will refuse to proceed. In order to continue you'll have to remove the links to unsupported file formats from the list. | **N.B.**: you need to provide a list of valid URLs, i.e. each line must begin with ''http://'' or ''https://'' |
| |
| ===== Local files (advanced) ===== |
| |
| Using this mode BootCaT will process all files contained in a folder (and its subfolders) on your computer. Files will be cleaned and a single text file will be created. |