corpora:epic:search_parameters

Search Parameters

EPIC can be interrogated by carrying out a simple query or an advanced query (see Advanced Query how-to) in each sub-corpus or aligned corpora. From the Project Description page, users can select between Source texts, Target texts and Aligned texts. For example, by clicking on Source texts, it is possible to access the three sub-corpora “org-en”, “org-it” and “org-es”, that is, the English, Italian and Spanish source texts, respectively. If a user wants to query the sub-corpus of English source texts (org-en), s/he can either query the whole sub-corpus (that is, search for all occurrences of a word or phrase in all the English source texts) or restrict the search to a number of texts by selecting one of the search options. The search parameters are based on the header fields (see above) and refer either to speech features or speaker features.

The “Duration” search parameter makes it possible to search for a certain phrase only in short, medium or long speeches (less than 2 minutes, between 2 and 6 minutes and more than 6 minutes, respectively).

Similarly, the “Text length” parameter makes it possible to restrict the query on the basis of the number of words in each speech. The user can select all the texts which are less than 300 words long, or those between 300 and 1000 words, or those with more than 1000 words.

The “Speed” parameter enables users to choose speeches delivered at low, medium or high speed (<130 words per minute, 130-160 w/m and >100 w/m. See above in the section on the header for more details).

The “Source text Delivery” option makes it possible to filter speeches according to delivery mode: read, impromptu, and mixed.

The “Topic” search parameter enables users to select texts according to the following macro-categories:

  • Agriculture & Fisheries
  • Economics & Finance
  • Employment
  • Environment
  • Health
  • Justice
  • Politics
  • Procedure & Formalities
  • Society & Culture
  • Science & Technology
  • Transport

Users can also select speeches on the basis of speaker characteristics. If the speaker is a source text speaker (that is, not an interpreter), s/he is always in one of the following categories:

  • MEP
  • President of the European Parliament
  • Vice-President of the European Parliament
  • European Commission
  • European Council
  • guest

When the speaker is an MEP, we also indicate the political group to which s/he belongs, which clearly does not apply to other types of speakers. The political groups represented in the European Parliament are as follows:

  • PPE-DE (Group of the European People's Party (Christian Democrats) and European Democrats)
  • PSE (Socialist Group in the European Parliament)
  • ALDE (Group of the Alliance of Liberals and Democrats for Europe)
  • Verts/ALE (Group of the Greens/European Free Alliance)
  • GUE/NGL (Confederal Group of the European United Left - Nordic Green Left)
  • IND/DEM (Independence/Democracy Group)
  • UEN (Union for Europe of the Nations Group)
  • NI (Non-attached Members)

It is also possible to select the speaker's gender and country of origin.

The field “Mother tongue” can be used to select speeches made by native speakers only or, vice versa, by non-native speakers. This is particularly relevant for the speeches in English, which is often used as a lingua franca by non-native speakers (e.g. Commissioners and Council Ministers normally use English when they visit the Parliament).

The various search parameters can be combined to further restrict the speeches on which a query is to be carried out. For example, one can select all the English source texts delivered by non-native speakers from the European Commission dealing with finance & economics; or one may wish to query a section of EPIC made up of speeches on employment issues delivered by MEPs belonging to the Socialist Group; one can carry out separate searches on the speeches delivered by Irish speakers and UK speakers, and so on.

The display options available in EPIC allow users to display words, lemmas, POS tags, and the actual transcript showing how the words were actually uttered, including mispronounced words (click on Transcription Conventions for more details).

  • corpora/epic/search_parameters.txt
  • Last modified: 2018/04/05 13:20
  • by eros