====== UkWaC ====== UkWaC is a 2 billion word corpus constructed from the Web limiting the crawl to the **.uk** domain and using medium-frequency words from the [[http://www.natcorp.ox.ac.uk/|BNC]] as seeds. The corpus was POS-tagged and lemmatized with the [[http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/|TreeTagger]]. The tagset is available [[corpora:tagsets:english|here]], more information can be found in this {{:corpora:wacky_2008.pdf|paper}}. ===== Tagset ===== Consult the [[corpora:tagsets:english|tagset]]