Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
corpora:ukwac [2018/10/15 10:31]
eros [UkWaC]
corpora:ukwac [2018/10/15 10:31] (current)
eros [UkWaC]
Line 1: Line 1:
 ====== UkWaC ====== ====== UkWaC ======
  
-UkWaC is a 2 billion word corpus constructed from the Web limiting the crawl to the **.uk** domain and using medium-frequency words from the [[http://​www.natcorp.ox.ac.uk/​|BNC]] as seeds. The corpus was POS-tagged and lemmatized with the [[http://​www.ims.uni-stuttgart.de/​projekte/​corplex/​TreeTagger/​|TreeTagger]]. The tagset is available [[corpora:​tagsets:​english|here]],​ more information can be found in this {{ :​corpora:​wacky_2008.pdf |}}.+UkWaC is a 2 billion word corpus constructed from the Web limiting the crawl to the **.uk** domain and using medium-frequency words from the [[http://​www.natcorp.ox.ac.uk/​|BNC]] as seeds. The corpus was POS-tagged and lemmatized with the [[http://​www.ims.uni-stuttgart.de/​projekte/​corplex/​TreeTagger/​|TreeTagger]]. The tagset is available [[corpora:​tagsets:​english|here]],​ more information can be found in this {{:​corpora:​wacky_2008.pdf|paper}}.
 ===== Tagset ===== ===== Tagset =====
  
 Consult the [[corpora:​tagsets:​english|tagset]] Consult the [[corpora:​tagsets:​english|tagset]]
  • corpora/ukwac.txt
  • Last modified: 2018/10/15 10:31
  • by eros