Differences
This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| corpora:desert_island_discs_corpus [2025/09/16 17:32] – created eros | corpora:desert_island_discs_corpus [2026/02/10 16:06] (current) – adriano | ||
|---|---|---|---|
| Line 3: | Line 3: | ||
| ===== A corpus of transcriptions of BBC's Desert Island Discs episodes (1951-2025) ===== | ===== A corpus of transcriptions of BBC's Desert Island Discs episodes (1951-2025) ===== | ||
| - | The **Desert Island Discs** corpus is a diachronic collection of nearly 12 million words of spoken English. It contains transcripts from the complete archive of 74 years of of [[https:// | + | The **Desert Island Discs** corpus is a diachronic collection of nearly 12 million words of spoken English. It contains transcripts from the complete archive of 74 years of [[https:// |
| - | The corpus | + | The corpus |
| - | The following table illustrates the available metadata (full list of metadata concerning hosts, guests, episodes and turn). | + | The corpus can be accessed freely from the [[https:// |
| + | |||
| + | The following table illustrates the **available metadata** (full list of metadata concerning hosts, guests, episodes and turn). | ||
| ^Metadata field type ^ Metadata field name ^ Metadata field value ^ Metadata source ^ | ^Metadata field type ^ Metadata field name ^ Metadata field value ^ Metadata source ^ | ||
| - | |On host & guest|Host' | + | |**On host & guest**|Host' |
| | |Guest' | | |Guest' | ||
| | |Guest' | | |Guest' | ||
| Line 30: | Line 32: | ||
| | |Guest' | | |Guest' | ||
| | |Guest' | | |Guest' | ||
| - | |On recording|Recording: | + | |**On recording**|Recording: exact date|E.g. 1994-02-06|BBC archive| |
| | |Recording: decade|E.g. 1990|Inferred from other metadata| | | |Recording: decade|E.g. 1990|Inferred from other metadata| | ||
| | |Recording: year|E.g. 1994|Inferred from other metadata| | | |Recording: year|E.g. 1994|Inferred from other metadata| | ||
| | |Text ID|E.g. DouglasAdams_1994|Inferred from other metadata| | | |Text ID|E.g. DouglasAdams_1994|Inferred from other metadata| | ||
| - | |On turn|Turn type|guest, host, intro, music, other, thanking and ending|Heuristics based on WhisperX output| | + | |**On turn**|Turn type|guest, host, intro, music, other, thanking and ending|Heuristics based on WhisperX output| |
| - | + | ||
| - | Summary statistics on the corpus are provided below (corpus size information). | + | ==== Corpus statistics ==== |
| + | |||
| + | Summary statistics on **corpus | ||
| |Number of texts (episodes)|2, | |Number of texts (episodes)|2, | ||
| Line 51: | Line 55: | ||
| Further information on how the corpus was compiled can be found in the article. | Further information on how the corpus was compiled can be found in the article. | ||
| + | |||
| + | ==== Copyright and Use ==== | ||
| + | |||
| + | This corpus contains transcriptions of copyrighted radio programme recordings. Copyright in the original audio material remains with the respective rightsholders. The recordings were obtained from publicly available sources and processed for non-commercial research and teaching purposes. The original recordings are available via the BBC’s official online archive. | ||
| + | |||
| + | Only textual transcriptions are made available through this platform. Access is provided exclusively via a query-based interface, which returns short textual excerpts (concordances). Audio files and full transcripts are not distributed. | ||
| + | |||
| + | The corpus may be used for research and educational purposes only. Users may not reconstruct full programmes, systematically extract content, or redistribute materials beyond what is permitted by applicable copyright law. | ||
| + | |||
| + | For copyright-related concerns, please contact: [[adriano.ferraresi@unibo.it|adriano.ferraresi@unibo.it]] | ||