corpora:epic:creating_the_multimedia_archive

Creating the EPIC multimedia archive

The multimedia archive is currently stored on the hard disk of a dedicated machine, but there are plans to load it on an Internet server to enable external researchers to access the audio and video clips as well as the transcripts which make up the EPIC corpus.

The EP plenary sessions were recorded off the satellite news channel EbS (Europe by Satellite), which enables viewers to select different sound channels for different EU languages. Four TV+videorecorder workstations were used for each plenary to obtain a recording of the original sound channel, and recordings of the English, Italian and Spanish sound channels (that is, of the interpreters working in the three booths).

The part-sessions recorded include the following plenaries: February, March, April and July (2004). See the official 2004 EP calendar.

We used 240 minute VHS tapes. Each part-session generally lasted four days (Monday to Thursday), and we used about 2 tapes per day per language, reflecting the EbS broadcasting schedule which does not include entire EP part-sessions. Moreover, owing to technical difficulties with satellite broadcasts or with our recording equipment, it was not always possible to record everything that was broadcast by EbS. It must be noted that EbS also broadcasts press conferences and stock footage which European TV channels can use when reporting on EU affairs. Therefore, although on average we used 28 VHS tapes for each plenary, our recordings had to be edited to select only the debates. The next step, therefore, was digitisation.

The VHS tapes with the recordings of the original speakers are being digitised as video files, as visual information is potentially useful for later analysis of the corpus. By contrast, the interpreted speeches are digitised as audio files, since the images on the VHS tapes are exactly the same, i.e. the plenary speakers, whereas our interest lay in the audio information, i.e. the interpreters' performances. For each plenary, we thus obtain one video file (the original version) and three audio files (the English, Italian and Spanish interpretations respectively).

The recordings of the original speakers are converted into digital video files thanks to Pinnacle Studio (9.0), a video-capture and editing software programme. The chosen format for the video files is “.mpeg1”.

The recordings of the interpreted speeches are digitised by using Cool Edit-Pro 2.0, a sound editor. The chosen format is “.wav” (sample rate = 32.000; channel = mono; resolution = 8 bit), which ensures very good audio quality for possible future studies of prosodic features (distribution of pauses, hesitations, etc.). There are plans to upload the EPIC archive to a dedicated Web server from which researchers will be able to download the clips. When the project reaches that stage, the “.wav” clips will be converted into a lighter format, probably “.mp3”.

Once the original recording of each plenary has been converted into a video file, all the speeches made in Italian, English and Spanish are selected and saved as individual clips (video files for the original speakers and audio files for the interpreters).

The archive includes video clips of each source language speaker, audio clips of the corresponding interpreted target texts, and the transcripts of all the texts.

  • corpora/epic/creating_the_multimedia_archive.txt
  • Last modified: 2018/04/05 13:23
  • by eros