This shows you the differences between two versions of the page.
Next revision | Previous revisionLast revisionBoth sides next revision |
bootcat:release_notes:1.21 [2019/07/01 10:03] – created eros | bootcat:release_notes:1.21 [2019/07/15 12:14] – eros |
---|
====== Version 1.21 ====== | ====== Version 1.21 ====== |
| |
* **NEW (feature)**: pseudo-XML versions of the extracted plain text files are now created in the ''xml_corpus'' folder; | * **NEW (feature)**: pseudo-XML versions of the extracted plain text files are now created in the ''xml_corpus'' folder; a single ''corpus.xml'' file is also created, containing the merged version of the pseudo-XML corpus; the XML version of the corpus contains more metadata than the plain text version: |
| * ''id'' (the URL of the original file), |
| * ''content_type'' of the original file, |
| * ''filename'' of the downloaded file; |
| |
* **NEW (feature)**: two new files are created ''corpus.txt'' and ''corpus.xml'' containing the merged versions of the plain text and pseudo-XML version of the corpus; | * **NEW (feature)**: in the "Project Definition" step, you can now add up to three user-defined XML attributes to the XML version of the corpus; |
| |
| * **NEW (feature)**: the name of the corpus is now prepended to the names of downloaded files, individual corpus text files and XML corpus files; this makes it possible to easily merge different corpora in the same folder; files are still progressively numbered; |
| |
* **BUGFIX** : fixed a bug that prevented download timeout to work properly, resulting in BootCaT to wait forever for certain URLs to download | * **BUGFIX** : fixed a bug that prevented download timeout to work properly, resulting in BootCaT to wait forever for certain URLs to download |