Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
| corpora:desert_island_discs_corpus [2026/02/10 16:02] – adriano | corpora:desert_island_discs_corpus [2026/02/10 16:06] (current) – adriano | ||
|---|---|---|---|
| Line 3: | Line 3: | ||
| ===== A corpus of transcriptions of BBC's Desert Island Discs episodes (1951-2025) ===== | ===== A corpus of transcriptions of BBC's Desert Island Discs episodes (1951-2025) ===== | ||
| - | The **Desert Island Discs** corpus is a diachronic collection of nearly 12 million words of spoken English. It contains transcripts from the complete archive of 74 years of of [[https:// | + | The **Desert Island Discs** corpus is a diachronic collection of nearly 12 million words of spoken English. It contains transcripts from the complete archive of 74 years of [[https:// |
| + | |||
| + | The corpus features | ||
| The corpus can be accessed freely from the [[https:// | The corpus can be accessed freely from the [[https:// | ||
| - | The following table illustrates the available metadata (full list of metadata concerning hosts, guests, episodes and turn). | + | The following table illustrates the **available metadata** (full list of metadata concerning hosts, guests, episodes and turn). |
| ^Metadata field type ^ Metadata field name ^ Metadata field value ^ Metadata source ^ | ^Metadata field type ^ Metadata field name ^ Metadata field value ^ Metadata source ^ | ||
| - | |On host & guest|Host' | + | |**On host & guest**|Host' |
| | |Guest' | | |Guest' | ||
| | |Guest' | | |Guest' | ||
| Line 30: | Line 32: | ||
| | |Guest' | | |Guest' | ||
| | |Guest' | | |Guest' | ||
| - | |On recording|Recording: | + | |**On recording**|Recording: exact date|E.g. 1994-02-06|BBC archive| |
| | |Recording: decade|E.g. 1990|Inferred from other metadata| | | |Recording: decade|E.g. 1990|Inferred from other metadata| | ||
| | |Recording: year|E.g. 1994|Inferred from other metadata| | | |Recording: year|E.g. 1994|Inferred from other metadata| | ||
| | |Text ID|E.g. DouglasAdams_1994|Inferred from other metadata| | | |Text ID|E.g. DouglasAdams_1994|Inferred from other metadata| | ||
| - | |On turn|Turn type|guest, host, intro, music, other, thanking and ending|Heuristics based on WhisperX output| | + | |**On turn**|Turn type|guest, host, intro, music, other, thanking and ending|Heuristics based on WhisperX output| |
| ==== Corpus statistics ==== | ==== Corpus statistics ==== | ||
| - | Summary statistics on the corpus are provided below. | + | Summary statistics on **corpus |
| |Number of texts (episodes)|2, | |Number of texts (episodes)|2, | ||
| Line 62: | Line 64: | ||
| The corpus may be used for research and educational purposes only. Users may not reconstruct full programmes, systematically extract content, or redistribute materials beyond what is permitted by applicable copyright law. | The corpus may be used for research and educational purposes only. Users may not reconstruct full programmes, systematically extract content, or redistribute materials beyond what is permitted by applicable copyright law. | ||
| - | For copyright-related concerns, please contact: [[mailto:adriano.ferraresi@unibo.it|adriano.ferraresi@unibo.it]] | + | For copyright-related concerns, please contact: [[adriano.ferraresi@unibo.it|adriano.ferraresi@unibo.it]] |