E.P.T.I.C.
The European Parliament Translation and Interpreting Corpus (EPTIC) is developed at the University of Bologna and a few other universities responsible for different language components. At present EPTIC comprises texts in English, Italian and French, however, other language components, including Finnish, Polish and Slovene are currently on the way. The corpus is currently available at the NoSketch Engine platform (Rychlý 2007). EPTIC is an intermodal and a parallel corpus of a complex structure. Data included in the corpus is derived from the official website of the European Parliament, which provides, among other, videos and verbatim reports of the plenary sessions together with the interpretations of the speeches, as well as their translations (the latter only for plenaries taking place until mid-2011). Each language combination component in EPTIC includes the following aligned elements:
- sources – spoken: orthographic transcript of the source speech for simultaneous interpretations, aligned with the video of the original speech (for now video alignment is only available for the Italian and English subcorpora)
- sources – written: verbatim report of the source speech in the form available at the EP website, source texts of the translations
- targets – interpreted: transcripts of interpretations available at the EP website only in audio format
- targets – translated: translations of the verbatim reports available at the EP website
EPTIC includes language combinations of English and other official EU languages. The subcorpora completed so far include the following interpreting/ translation direction: English-Italian, Italian-English, English-French, French-English. The current sizes of individual subcorpora are visible in the Table 1 below. EPTIC is compiled mostly of short speeches ranging from 100 to 400 words, but there are also medium speeches ranging from 301 to 1000 words and long speeches exceeding that size. Speeches are delivered at the European Parliament plenary sessions on a range of topics (e.g. politics, agriculture) by different speakers, including mostly MEPs, but also commissioners and guests. While searching the corpus, it is possible to filter the queries using the contextual information that allow to narrow down the query to e.g. a speech delivered by a particular speaker, or a speech of particular length. The filtering options are described in Table 2.
As the corpus is aligned at sentence level, it is possible to search all 4 aligned components in a parallel search. In addition, the corresponding excerpt of a video of the source speech can be displayed.
One | Two | Three |
One & two | Three |
Filters
Filter | Description |
---|---|
text.id | refers to the ID of the text |
text.date | date on which the speech was delivered at the EP |
text.length | length of the speech in general (short, medium, long) |
text.lengthw | exact text length in words |
text.duration | duration of the speech (short, medium, long) |
text.durations | duration of the speech in seconds |
text.speed | refers to the pace of delivery of the speech (low, medium, fast) |
text.speedwm | speed of delivery expressed in words per minute |
text.delivery | read vs. impromptu |
text.topic | the general topic of the seech |
text.topicspec | title of the debate |
text.type | source-spoken/ source-written/ target-interpreted/ target-translated |
text.wordcount | length of the speech in words |
speaker.name | name of the speaker |
speaker.gender | gender of the speaker |
speaker.country | country the speaker represents |
speaker.native | the speaker is speaking the native tongue or a foreign language |
speaker.politfunc | the speaker’s political function at the EP |
speaker.politgroup | the speaker’s political group |
st.language | source text language |
st.length | source text length |
st.lengthw | source text length in words |
st.duration | source text duration in general (short, medium, long) |
st.durations | source text duration in seconds |
st.speed | pace of the original speaker in general (low, medium, fast) |
st.speedwm | pace of the original speaker in words per minute |
st.delivery | mode of delivery of the source speech (read vs. impromptu) |
interpreter.gender | gender of the interpreter |
interpreter.native | the interpretation is delivered into the native tongue or the foreign language |