Show pageOld revisionsBacklinksODT exportBack to top This page is read only. You can view the source, but not change it. Ask your administrator if you think this is wrong. ====== UkWaC ====== UkWaC is a 2 billion word corpus constructed from the Web limiting the crawl to the **.uk** domain and using medium-frequency words from the [[http://www.natcorp.ox.ac.uk/|BNC]] as seeds. The corpus was POS-tagged and lemmatized with the [[http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/|TreeTagger]]. The tagset is available [[corpora:tagsets:english|here]], more information can be found in this {{:corpora:wacky_2008.pdf|paper}}. ===== Tagset ===== Consult the [[corpora:tagsets:english|tagset]] corpora/ukwac.txt Last modified: 2018/10/15 10:31by eros