bootcat:help:corpus_creation_mode

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revisionBoth sides next revision
bootcat:help:corpus_creation_mode [2019/11/08 09:08] – [Custom URLs (advanced)] erosbootcat:help:corpus_creation_mode [2023/04/19 09:50] – [Local files (advanced)] eros
Line 49: Line 49:
 http://some.site.com/index.html http://some.site.com/index.html
 http://random.docs.org/thesis.docx http://random.docs.org/thesis.docx
-... 
 </file> </file>
  
-NB: up to version 1.21, BootCaT does not accept URLs lists encoded as "UTF8 **with BOM**", the issue will be solved in future versions of BootCaT.+NB: up to version 1.21, BootCaT does not accept URLs lists encoded as "UTF8 **with BOM**", please make sure your URL list is saved as "UTF8" (**without BOM**), the issue will be solved in future versions of BootCaT.
 ===== Local files (advanced) ===== ===== Local files (advanced) =====
  
-Using this mode BootCaT will process all files contained in a folder (and its subfolders) on your computer. Files will be cleaned and a single text file will be created.+Using this mode BootCaT will process all files contained in a folder on your computer. Files will be cleaned and the corpus files will be created.
  
 Most common file formats are supported, including ''html'', ''pdf'' and ''doc'' files. Most common file formats are supported, including ''html'', ''pdf'' and ''doc'' files.
  • bootcat/help/corpus_creation_mode.txt
  • Last modified: 2023/04/19 10:56
  • by eros