The header

The header contains extra-linguistic information on each speech. It is made up of a number of fields, which provide information about the transcript file, the speech and the speaker. The following is an example of the template we use:

(date: 25-02-04-p speech number: 017 language: en type: org-en

duration: short timing: 24

text length: short number of words: 69

speed: high words per minute: 172

source text delivery: impromptu

speaker: Cox, Patrick gender: M country: Ireland mother tongue: yes

political function: President of the European Parliament political group: ELDR

topic: Procedure & Formalities specific topic: speeches on matters of political importance comments: NA)

The first group of four fields (date, speech number, language and type) contains a reference code, which is used to classify the speeches. The first number (25) indicates the day, the second item (02) indicates the month (in this case, February), followed by the year (04, that is, 2004). The letters (m) or (p) tell us if the speech was delivered during a morning or afternoon sitting (in this particular case, in the afternoon). The number that follows (in our example 017) is a progressive number we assign to speeches.

The abbreviations “en”, “it” and “es” indicate, respectively, a speech in English, Italian or Spanish. “org” and “int” indicate whether it is an original speech (i.e. a source text) or an interpretation (i.e. a target text). If it is an interpreted speech, we indicate both source and target languages, for example “int-en-it” means that the speech was interpreted from English into Italian.

This reference code is followed by a number of fields containing information on the speech, namely duration, text length and speed. We have recorded the exact figures indicating the number of seconds (timing), the number of words and the words per minute (calculated by dividing the number of words by the duration expressed in seconds). We have also classified the duration of speeches as short, medium or long (short: < 120 secs; medium 121-360 secs; long: >360 secs).

The same applies to text length, classfied as short, medium or long (short: < 300 words; medium 301 - 1000 words; long > 1000).

Speed was classified as low, medium or high (low: < 130 w/m; medium: 131 - 160 w/m; high: > 160 w/m). It must be pointed out that these values were calculated on the basis of the present corpus of speeches, and therefore can only be considered representative of this type of material, that is speeches delivered during a specific group of plenary sittings of the European Parliament. Indeed, in different contexts (e.g. the Italian conference interpreting market) a speech lasting 5 minutes (300 seconds) would be considered short, as opposed to medium, since simultaneous interpreters normally work in shifts of about 30 minutes. Likewise, a speech delivered at an average speed of 150 w/m is fast (not medium) by normal conference interpreting standards: however, owing to the specific rules for the allocation of speaking time in European Parliament sittings (click on Source Texts in the left-hand side bar for more information), most MEPs try and say as much as possible in the shortest possible time and therefore tend to speak very fast. In this sense and in this particular context, 150 w/m can be considered a medium speed.

Other information related to the speech includes source text delivery (that is, mode of presentation of the source speech), classified as impromptu, read or mixed. This information is recorded in the transcripts of interpreted speeches as well, since it is important to know whether the source text was read or improvised when analysing the target text.

We have grouped the speeches on the basis of macro-categories indicating the general topic of each speech and we have also recorded the specific topic under discussion in the debate. Specific topics are varied, ranging from the Parmalat fraud case to human rights in Afghanistan. A full list of specific topics, with corresponding clip numbers, is available in the archive (click on Multimedia Archive in the left-hand side bar).

The next fields in the header contain information on the speaker: name, gender, country of origin, mother tongue, political function and political group. When the speaker is an interpreter, no values are assigned to the fields name, country, political function and political group (indicated as NA, that is, not assigned).

The labels “European Commission” and “European Council” indicate that the speaker is either a Commissioner or a European Council Minister: in both cases, we record the field of action of the Commissioner or the Council configuration in the space reserved for comments at the end of the header.

European Commission's areas of responsibility:

  • Agriculture and Fisheries
  • Administrative Reform
  • Competition
  • Enterprise and Information Society
  • Internal Market
  • Research
  • Development and Humanitarian Aid
  • Enlargement
  • External Relations
  • Trade
  • Health and Consumer Protection
  • Education and Culture
  • Budget
  • Environment
  • Justice and Home Affairs
  • Employment and Social Affairs
  • Regional Policy
  • Economic and Monetary Affairs
  • Relations with the European Parliament, Transport and Energy
  • President of the European Commission.
  • European Council configurations:
  • General Affairs and External Relations
  • Economic and Financial Affairs
  • Cooperation in the fields of Justice and Home Affairs
  • Employment, Social Policy, Health and Consumer Affairs
  • Competitiveness
  • Transport, Telecommunications and Energy
  • Agriculture and Fisheries
  • Environment
  • Education, Youth and Culture

Finally, the label “guest” indicates that the speaker does not belong to a European Union institution: s/he could be a head of state or government, an intellectual, a politician from a country outside the EU, etc.

The last field is the space reserved for comments. As was mentioned above, this space is used to add information on Commissioners and European Council Ministers, but also to indicate whether the speaker has a noticeable accent (Scottish, Welsh, Irish; Andalusian, Latin American), to comment on any technical problems in the recordings and record any unusual features of each speech which are considered potentially useful for later analysis.

  • corpora/epic/header.txt
  • Last modified: 2018/04/05 15:19
  • by eros