bootcat:release_notes:1.21

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
bootcat:release_notes:1.21 [2019/07/01 12:03]
eros created
bootcat:release_notes:1.21 [2019/07/10 13:53]
eros
Line 1: Line 1:
 ====== Version 1.21 ====== ====== Version 1.21 ======
  
-  * **NEW (feature)**: pseudo-XML versions of the extracted plain text files are now created in the ''xml_corpus'' folder;+  * **NEW (feature)**: pseudo-XML versions of the extracted plain text files are now created in the ''xml_corpus'' folder; a single ''corpus.xml'' file is also created, containing the merged version of the pseudo-XML corpus; the XML version of the corpus contains more metadata than the plain text version: ''id'' (the URL of the original file), ''content_type'' of the original file, ''filename'' of the downloaded file; it's also possible to add custom XML attributes to the corpus (see next bullet point);
  
-  * **NEW (feature)**: two new files are created ''corpus.txt'' and ''corpus.xml'' containing the merged versions of the plain text and pseudo-XML version of the corpus;+  * **NEW (feature)**: in the "Project Definition" step, you can now add up to three user-defined XML attributes to the XML version of the corpus
 + 
 +  * **NEW (feature)**: a random string is now appended to the names of downloaded files, individual corpus text files and XML corpus files; this makes it possible to easily merge different corpora in the same folder; file names still start with a progressive number;
  
   * **BUGFIX** : fixed a bug that prevented download timeout to work properly, resulting in BootCaT to wait forever for certain URLs to download   * **BUGFIX** : fixed a bug that prevented download timeout to work properly, resulting in BootCaT to wait forever for certain URLs to download
  • bootcat/release_notes/1.21.txt
  • Last modified: 2019/10/29 15:47
  • by eros