bootcat:help:corpus_creation_mode

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
bootcat:help:corpus_creation_mode [2019/11/08 09:08] – [Custom URLs (advanced)] erosbootcat:help:corpus_creation_mode [2023/04/19 10:56] (current) eros
Line 3: Line 3:
 You can choose between the following creation "modes": You can choose between the following creation "modes":
  
-  * [[bootcat:help:corpus_creation_mode#simple_mode_recommended|Simple mode]] (recommended) +  * [[bootcat:help:corpus_creation_mode#simple_mode_recommended|Simple mode]] 
-  * [[bootcat:help:corpus_creation_mode#custom_tuples_advanced|Custom tuples]] (advanced) +  * [[bootcat:help:corpus_creation_mode#custom_tuples_advanced|Custom tuples]] 
-  * [[bootcat:help:corpus_creation_mode#custom_urls_advanced|Custom URLs]] (advanced) +  * [[bootcat:help:corpus_creation_mode#custom_urls_advanced|Custom URLs]] 
-  * [[bootcat:help:corpus_creation_mode#local_files|Local files]] (advanced) +  * [[bootcat:help:corpus_creation_mode#local_files|Local files]] 
-  * [[bootcat:help:corpus_creation_mode#local_queries|Local queries]] (advanced)+  * [[bootcat:help:corpus_creation_mode#local_queries|Local queries]]
  
 {{:bootcat:help:corpus_creation_modes.png?nolink|}} {{:bootcat:help:corpus_creation_modes.png?nolink|}}
Line 49: Line 49:
 http://some.site.com/index.html http://some.site.com/index.html
 http://random.docs.org/thesis.docx http://random.docs.org/thesis.docx
-... 
 </file> </file>
  
-NB: up to version 1.21, BootCaT does not accept URLs lists encoded as "UTF8 **with BOM**", the issue will be solved in future versions of BootCaT.+NB: up to version 1.21, BootCaT does not accept URLs lists encoded as "UTF8 **with BOM**", please make sure your URL list is saved as "UTF8" (**without BOM**), the issue will be solved in future versions of BootCaT.
 ===== Local files (advanced) ===== ===== Local files (advanced) =====
  
-Using this mode BootCaT will process all files contained in a folder (and its subfolders) on your computer. Files will be cleaned and a single text file will be created.+Using this mode BootCaT will process all files contained in a folder on your computer. Files will be cleaned and the corpus files will be created.
  
 Most common file formats are supported, including ''html'', ''pdf'' and ''doc'' files. Most common file formats are supported, including ''html'', ''pdf'' and ''doc'' files.
  • bootcat/help/corpus_creation_mode.1573200497.txt.gz
  • Last modified: 2019/11/08 09:08
  • by eros