<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="FeedCreator 1.8" -->
<?xml-stylesheet href="https://docs.sslmit.unibo.it/lib/exe/css.php?s=feed" type="text/css"?>
<rdf:RDF
    xmlns="http://purl.org/rss/1.0/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
    xmlns:dc="http://purl.org/dc/elements/1.1/">
    <channel rdf:about="https://docs.sslmit.unibo.it/feed.php">
        <title>Docs - corpora</title>
        <description></description>
        <link>https://docs.sslmit.unibo.it/</link>
        <image rdf:resource="https://docs.sslmit.unibo.it/lib/exe/fetch.php?media=wiki:dokuwiki.svg" />
       <dc:date>2026-04-30T06:48:07+00:00</dc:date>
        <items>
            <rdf:Seq>
                <rdf:li rdf:resource="https://docs.sslmit.unibo.it/doku.php?id=corpora:bulletin&amp;rev=1522845175&amp;do=diff"/>
                <rdf:li rdf:resource="https://docs.sslmit.unibo.it/doku.php?id=corpora:desert_island_discs_corpus&amp;rev=1770736002&amp;do=diff"/>
                <rdf:li rdf:resource="https://docs.sslmit.unibo.it/doku.php?id=corpora:dewac&amp;rev=1508402159&amp;do=diff"/>
                <rdf:li rdf:resource="https://docs.sslmit.unibo.it/doku.php?id=corpora:epic&amp;rev=1523360866&amp;do=diff"/>
                <rdf:li rdf:resource="https://docs.sslmit.unibo.it/doku.php?id=corpora:eptic&amp;rev=1527153217&amp;do=diff"/>
                <rdf:li rdf:resource="https://docs.sslmit.unibo.it/doku.php?id=corpora:frwac&amp;rev=1508404311&amp;do=diff"/>
                <rdf:li rdf:resource="https://docs.sslmit.unibo.it/doku.php?id=corpora:itwac&amp;rev=1508401177&amp;do=diff"/>
                <rdf:li rdf:resource="https://docs.sslmit.unibo.it/doku.php?id=corpora:list&amp;rev=1776844968&amp;do=diff"/>
                <rdf:li rdf:resource="https://docs.sslmit.unibo.it/doku.php?id=corpora:nomadlingo&amp;rev=1767947541&amp;do=diff"/>
                <rdf:li rdf:resource="https://docs.sslmit.unibo.it/doku.php?id=corpora:repubblica&amp;rev=1508404697&amp;do=diff"/>
                <rdf:li rdf:resource="https://docs.sslmit.unibo.it/doku.php?id=corpora:ukwac&amp;rev=1539592280&amp;do=diff"/>
                <rdf:li rdf:resource="https://docs.sslmit.unibo.it/doku.php?id=corpora:victorian-edwardian_novels&amp;rev=1730371704&amp;do=diff"/>
                <rdf:li rdf:resource="https://docs.sslmit.unibo.it/doku.php?id=corpora:wipo&amp;rev=1510911936&amp;do=diff"/>
            </rdf:Seq>
        </items>
    </channel>
    <image rdf:about="https://docs.sslmit.unibo.it/lib/exe/fetch.php?media=wiki:dokuwiki.svg">
        <title>Docs</title>
        <link>https://docs.sslmit.unibo.it/</link>
        <url>https://docs.sslmit.unibo.it/lib/exe/fetch.php?media=wiki:dokuwiki.svg</url>
    </image>
    <item rdf:about="https://docs.sslmit.unibo.it/doku.php?id=corpora:bulletin&amp;rev=1522845175&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2018-04-04T12:32:55+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>bulletin</title>
        <link>https://docs.sslmit.unibo.it/doku.php?id=corpora:bulletin&amp;rev=1522845175&amp;do=diff</link>
        <description>&quot;Bulletin&quot; Corpus
HerausgeberPresse- und Informationsamt der Bundesregierung, Neustädtische Kirchstr. 15, 10117 BerlinCD-ROM-ProjektbetreuungArvid Brunnemann, Dr. HackethalGestaltung und Erstellung der CD-ROM-VersionEasyBrowse EP-Servicegesellschaft mbH, Voßstraße 15 a, 19053 Schwerin, Telefon: (03 85) 71 01 85, Telefax: (03 85) 73 38 83</description>
    </item>
    <item rdf:about="https://docs.sslmit.unibo.it/doku.php?id=corpora:desert_island_discs_corpus&amp;rev=1770736002&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2026-02-10T15:06:42+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>desert_island_discs_corpus</title>
        <link>https://docs.sslmit.unibo.it/doku.php?id=corpora:desert_island_discs_corpus&amp;rev=1770736002&amp;do=diff</link>
        <description>Desert Island Discs Corpus

A corpus of transcriptions of BBC's Desert Island Discs episodes (1951-2025)

The Desert Island Discs corpus is a diachronic collection of nearly 12 million words of spoken English. It contains transcripts from the complete archive of 74 years of</description>
    </item>
    <item rdf:about="https://docs.sslmit.unibo.it/doku.php?id=corpora:dewac&amp;rev=1508402159&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2017-10-19T08:35:59+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>dewac</title>
        <link>https://docs.sslmit.unibo.it/doku.php?id=corpora:dewac&amp;rev=1508402159&amp;do=diff</link>
        <description>DeWaC

DeWaC is a 1.7 billion word corpus constructed from the Web limiting the crawl to the .de domain and using medium-frequency words from the SudDeutsche Zeitung corpus and basic German vocabulary lists as seeds. The corpus was POS-tagged and lemmatized with the</description>
    </item>
    <item rdf:about="https://docs.sslmit.unibo.it/doku.php?id=corpora:epic&amp;rev=1523360866&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2018-04-10T11:47:46+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>epic</title>
        <link>https://docs.sslmit.unibo.it/doku.php?id=corpora:epic&amp;rev=1523360866&amp;do=diff</link>
        <description>E.P.I.C. (European Parliament Interpreting Corpus)

EPIC is an open, parallel, trilingual (Italian, English and Spanish) corpus of European Parliament speeches and their corresponding interpretations currently being compiled at DIT (University of Bologna).

Two research grants provided by the</description>
    </item>
    <item rdf:about="https://docs.sslmit.unibo.it/doku.php?id=corpora:eptic&amp;rev=1527153217&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2018-05-24T09:13:37+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>eptic</title>
        <link>https://docs.sslmit.unibo.it/doku.php?id=corpora:eptic&amp;rev=1527153217&amp;do=diff</link>
        <description>E.P.T.I.C.

The European Parliament Translation and Interpreting Corpus (EPTIC) is developed at the University of Bologna and a few other universities responsible for different language components. At present EPTIC comprises texts in English, Italian and French, however, other language components, including Finnish, Polish and Slovene are currently on the way.
The corpus is currently available at the NoSketch Engine platform (Rychlý 2007).
EPTIC is an intermodal and a parallel corpus of a comple…</description>
    </item>
    <item rdf:about="https://docs.sslmit.unibo.it/doku.php?id=corpora:frwac&amp;rev=1508404311&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2017-10-19T09:11:51+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>frwac</title>
        <link>https://docs.sslmit.unibo.it/doku.php?id=corpora:frwac&amp;rev=1508404311&amp;do=diff</link>
        <description>FrWaC

FrWaC is a 1.6 billion word corpus constructed from the Web limiting the crawl to the .fr domain and using medium-frequency words from the Le Monde Diplomatique corpus and basic French vocabulary lists as seeds. The corpus was POS-tagged and lemmatized with the</description>
    </item>
    <item rdf:about="https://docs.sslmit.unibo.it/doku.php?id=corpora:itwac&amp;rev=1508401177&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2017-10-19T08:19:37+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>itwac</title>
        <link>https://docs.sslmit.unibo.it/doku.php?id=corpora:itwac&amp;rev=1508401177&amp;do=diff</link>
        <description>ITWaC

	*  itWaC: a 2 billion word corpus constructed from the Web limiting the crawl to the .it domain and using medium-frequency words from the Repubblica corpus and basic Italian vocabulary lists as seeds. The corpus was POS-tagged with the TreeTagger using this tagset, and lemmatized using the</description>
    </item>
    <item rdf:about="https://docs.sslmit.unibo.it/doku.php?id=corpora:list&amp;rev=1776844968&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2026-04-22T08:02:48+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>list</title>
        <link>https://docs.sslmit.unibo.it/doku.php?id=corpora:list&amp;rev=1776844968&amp;do=diff</link>
        <description>Corpora

Tagsets

	*  English
	*  English (EPTIC)
	*  French
	*  French (EPTIC)
	*  French (EPTIC - Freeling)
	*  German
	*  Italian
	*  Italian (EPTIC)
	*  Spanish

List of available corpora

Multilingual corpora

	*  E.P.I.C. (English, Italian, Spanish)
	*  E.P.T.I.C. (English, Finnish, French, Italian, Polish, Slovene)
	*  Nomadlingo - a corpus documenting multilingual, naturally occurring interactions among European digital nomads</description>
    </item>
    <item rdf:about="https://docs.sslmit.unibo.it/doku.php?id=corpora:nomadlingo&amp;rev=1767947541&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2026-01-09T08:32:21+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>nomadlingo</title>
        <link>https://docs.sslmit.unibo.it/doku.php?id=corpora:nomadlingo&amp;rev=1767947541&amp;do=diff</link>
        <description>NomadLingo 1.1

General description

NomadLingo is the first publicly available corpus documenting multilingual, naturally occurring interactions among European digital nomads, a rapidly growing yet understudied transnational community (Tedesco 2025). The corpus contains transcripts of extracts from naturally-occurring conversations which were audio-recorded between November 2023 and April 2024 at social events organised and promoted within digital nomad communities based in Madeira and Canary I…</description>
    </item>
    <item rdf:about="https://docs.sslmit.unibo.it/doku.php?id=corpora:repubblica&amp;rev=1508404697&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2017-10-19T09:18:17+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>repubblica</title>
        <link>https://docs.sslmit.unibo.it/doku.php?id=corpora:repubblica&amp;rev=1508404697&amp;do=diff</link>
        <description>Repubblica

Click here to consult the &quot;la Repubblica&quot; corpus

The “la Repubblica” corpus is a very large corpus of Italian newspaper text (approximately 380M tokens).

The corpus is tokenized, pos-tagged (with the Treetagger trained with ad-hoc resources), lemmatized (with Morph-it) and categorized in terms of genre and topic (with</description>
    </item>
    <item rdf:about="https://docs.sslmit.unibo.it/doku.php?id=corpora:ukwac&amp;rev=1539592280&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2018-10-15T08:31:20+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>ukwac</title>
        <link>https://docs.sslmit.unibo.it/doku.php?id=corpora:ukwac&amp;rev=1539592280&amp;do=diff</link>
        <description>UkWaC

UkWaC is a 2 billion word corpus constructed from the Web limiting the crawl to the .uk domain and using medium-frequency words from the BNC as seeds. The corpus was POS-tagged and lemmatized with the TreeTagger. The tagset is available here, more information can be found in this</description>
    </item>
    <item rdf:about="https://docs.sslmit.unibo.it/doku.php?id=corpora:victorian-edwardian_novels&amp;rev=1730371704&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2024-10-31T10:48:24+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>victorian-edwardian_novels</title>
        <link>https://docs.sslmit.unibo.it/doku.php?id=corpora:victorian-edwardian_novels&amp;rev=1730371704&amp;do=diff</link>
        <description>Victorian-Edwardian Novels

A corpus of novels written in the late 19th and early 20th century, built using the texts collected by the 100 English Novels Project on GitHub.

The corpus has been lemmatized and tagged with TreeTagger, using the BNC tagset.</description>
    </item>
    <item rdf:about="https://docs.sslmit.unibo.it/doku.php?id=corpora:wipo&amp;rev=1510911936&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2017-11-17T09:45:36+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>wipo</title>
        <link>https://docs.sslmit.unibo.it/doku.php?id=corpora:wipo&amp;rev=1510911936&amp;do=diff</link>
        <description>WIPO

This corpus is based on texts from the World Intellectual Property Organization.

At the moment only the English version is available.

More info is available [here]

Tagset

Consult the tagset</description>
    </item>
</rdf:RDF>
