xml_corpus folder; a single corpus.xml file is also created, containing the merged version of the pseudo-XML corpus; the XML version of the corpus contains more metadata than the plain text version:id, a unique identifier for the document consisting of the corpus name followed by a number,filename of the downloaded file (basically, the id plus the file extension),uri, the uri of the original file,content_type of the original file;