bootcat:release_notes:1.21

This is an old revision of the document!


Version 1.21

  • NEW (feature): pseudo-XML versions of the extracted plain text files are now created in the xml_corpus folder; a single corpus.xml file is also created, containing the merged version of the pseudo-XML corpus; the XML version of the corpus contains more metadata than the plain text version:
    • id (the URL of the original file),
    • content_type of the original file,
    • filename of the downloaded file;
  • NEW (feature): in the “Project Definition” step, you can now add up to three user-defined XML attributes to the XML version of the corpus;
  • NEW (feature): the name of the corpus is now prepended to the names of downloaded files, individual corpus text files and XML corpus files; this makes it possible to easily merge different corpora in the same folder; files are still progressively numbered;
  • BUGFIX : fixed a bug that prevented download timeout to work properly, resulting in BootCaT to wait forever for certain URLs to download
  • bootcat/release_notes/1.21.1563192860.txt.gz
  • Last modified: 2019/07/15 14:14
  • by eros