tutorials:b4b

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
tutorials:b4b [2019/11/05 15:31] – [Storing the results] erostutorials:b4b [2019/11/06 11:36] (current) – [Location and time] albarron
Line 5: Line 5:
 ====== Location and time ====== ====== Location and time ======
  
-The tutorial will be held in Room 4 of the PhD Lab on Wednesday 6 November from 9.15 to 10.45. +The tutorial was held in Room 4 of the PhD Lab on Wednesday 6 November from 9.15 to 10.45. 
  
 ===== Requirements ===== ===== Requirements =====
Line 15: Line 15:
   * **Linux**. Nothing extra. You are ready to go.   * **Linux**. Nothing extra. You are ready to go.
  
 +===== Resources =====
 +
 +We'll use a small subset of the English-Italian part of the Europarl parallel corpus.
 +
 +Download the two files here: {{:tutorials:b4b:en.zip|English}} and {{:tutorials:b4b:it.zip|Italian}}
  
 ===== Why is bash relevant? ===== ===== Why is bash relevant? =====
Line 41: Line 46:
  
 Commands: ''cat'', ''more'', ''less'', ''most'', ''wc'', ''nano'', ''head'', ''shuf'' Commands: ''cat'', ''more'', ''less'', ''most'', ''wc'', ''nano'', ''head'', ''shuf''
 +
 +
 ==== Grabbing information in a file from the command line ==== ==== Grabbing information in a file from the command line ====
    
Line 67: Line 74:
  
 Commands: ''man'' Commands: ''man''
 +
 +==== Exercises ====
 +
 +**EXERCISE 1**. Let us "measure" a file: bytes, megabytes, lines, words, etc.
 +
 +**EXERCISE 2**. Shuffle a parallel corpus in order to have sentences from different speeches. 
 +
 +**EXERCISE 3**. Find the most frequent tokens in the two parts of a parallel corpus and analyse them.
 +
 +**EXERCISE 4**. Get all words which are cognates wrt Italian from a tsv dictionary. Afterwards, count the number of tokens which belong to each family. 
 +
  • tutorials/b4b.1572964278.txt.gz
  • Last modified: 2019/11/05 15:31
  • by eros