corpora:itwac

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
corpora:itwac [2017/10/19 10:13] – created eroscorpora:itwac [2017/10/19 10:19] (current) eros
Line 1: Line 1:
-===== ITWAC Tagset =====+===== ITWaC =====
  
-<code> +  * **itWaC**: a 2 billion word corpus constructed from the Web limiting the crawl to the **.it** domain and using medium-frequency words from the [[corpora:Repubblica]] corpus and basic Italian vocabulary lists as seeds. The corpus was POS-tagged with the [[http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/|TreeTagger]] using this [[corpora:tagsets:italian|tagset]], and lemmatized using the [[http://sslmit.unibo.it/morphit|Morph-it!]] lexicon, more information available {{http://wacky.sslmit.unibo.it/lib/exe/fetch.php?media=papers:wacky_2008.pdf|here}}. 
-ADJ adjective + 
-ADV adverb (excluding -mente forms) +  * semantically and syntactically annotated **Italian Wikipedia**
-ADV:mente adverb ending in -mente +    * [[http://medialab.di.unipi.it/Project/QA/wikiCoNLL.bz2|CoNLL format]] ([[http://medialab.di.unipi.it/wiki/Tanl_Tagsets|tagset]]) 
-ART article +    * [[http://medialab.di.unipi.it/Project/QA/wikiMT.bz2|MultiTag format]]
-ARTPRE preposition + article +
-AUX:fin finite form of auxiliary +
-AUX:fin:cli finite form of auxiliary with clitic +
-AUX:geru gerundive form of auxiliary +
-AUX:geru:cli gerundive form of auxiliary with clitic +
-AUX:infi infinitival form of auxiliary +
-AUX:infi:cli infinitival form of auxiliary with clitic +
-AUX:ppast past participle of auxiliary +
-AUX:ppre present participle of auxiliary +
-CHE che +
-CLI clitic +
-CON conjunction +
-DET:demo demonstrative determiner +
-DET:indef indefinite determiner +
-DET:num numeral determiner +
-DET:poss possessive determiner +
-DET:wh wh determiner +
-NEG negation +
-NOCAT non-linguistic element +
-NOUN noun +
-NPR proper noun +
-NUM number +
-PRE preposition +
-PRO:demo demonstrative pronoun +
-PRO:indef indefinite pronoun +
-PRO:num numeral pronoun +
-PRO:pers personal pronoun +
-PRO:poss possessive pronoun +
-PUN non-sentence-final punctuation mark +
-SENT sentence-final punctuation mark +
-VER2:fin finite form of modal/causal verb +
-VER2:fin:cli finite form of modal/causal verb with clitic +
-VER2:geru gerundive form of modal/causal verb +
-VER2:geru:cli gerundive form of modal/causal verb with clitic +
-VER2:infi infinitival form of modal/causal verb +
-VER2:infi:cli infinitival form of modal/causal verb with clitic +
-VER2:ppast past participle of modal/causal verb +
-VER2:ppre present participle of modal/causal verb +
-VER:fin finite form of verb +
-VER:fin:cli finite form of verb with clitic +
-VER:geru gerundive form of verb +
-VER:geru:cli gerundive form of verb with clitic +
-VER:infi infinitival form of verb +
-VER:infi:cli infinitival form of verb with clitic +
-VER:ppast past participle of verb +
-VER:ppast:cli past participle of verb with clitic +
-VER:ppre present participle of verb +
-WH wh word +
-</code>+
  • corpora/itwac.1508400825.txt.gz
  • Last modified: 2017/10/19 10:13
  • by eros