bootcat:release_notes:1.21

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
bootcat:release_notes:1.21 [2019/07/01 10:04]
eros
bootcat:release_notes:1.21 [2019/07/15 12:14]
eros
Line 1: Line 1:
 ====== Version 1.21 ====== ====== Version 1.21 ======
  
-  * **NEW (feature)**: pseudo-XML versions of the extracted plain text files are now created in the ''xml_corpus'' folder;+  * **NEW (feature)**: pseudo-XML versions of the extracted plain text files are now created in the ''xml_corpus'' folder; a single ''corpus.xml'' file is also created, containing the merged version of the pseudo-XML corpus; the XML version of the corpus contains more metadata than the plain text version: 
 +    * ''id'' (the URL of the original file), 
 +    * ''content_type'' of the original file, 
 +    * ''filename'' of the downloaded file;
  
-  * **NEW (feature)**: two new files are created''corpus.txt'' and ''corpus.xml'' containing the merged versions of the plain text and pseudo-XML corpus;+  * **NEW (feature)**: in the "Project Definition" stepyou can now add up to three user-defined XML attributes to the XML version of the corpus
 + 
 +  * **NEW (feature)**: the name of the corpus is now prepended to the names of downloaded files, individual corpus text files and XML corpus files; this makes it possible to easily merge different corpora in the same folder; files are still progressively numbered;
  
   * **BUGFIX** : fixed a bug that prevented download timeout to work properly, resulting in BootCaT to wait forever for certain URLs to download   * **BUGFIX** : fixed a bug that prevented download timeout to work properly, resulting in BootCaT to wait forever for certain URLs to download
  • bootcat/release_notes/1.21.txt
  • Last modified: 2019/10/29 14:47
  • by eros