====== FrWaC ====== FrWaC is a 1.6 billion word corpus constructed from the Web limiting the crawl to the **.fr** domain and using medium-frequency words from the Le Monde Diplomatique corpus and basic French vocabulary lists as seeds. The corpus was POS-tagged and lemmatized with the [[http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/|TreeTagger]] using this [[corpora:tagsets:french|tagset]], more information available {{:papers:wacky_2008.pdf|here}}.