tutorials:b4b

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revisionBoth sides next revision
tutorials:b4b [2019/11/05 15:31] – [Storing the results] erostutorials:b4b [2019/11/05 17:42] – [Exercises] albarron
Line 15: Line 15:
   * **Linux**. Nothing extra. You are ready to go.   * **Linux**. Nothing extra. You are ready to go.
  
 +===== Resources =====
 +
 +We'll use a small subset of the English-Italian part of the Europarl parallel corpus.
 +
 +Download the two files here: {{:tutorials:b4b:en.zip|English}} and {{:tutorials:b4b:it.zip|Italian}}
  
 ===== Why is bash relevant? ===== ===== Why is bash relevant? =====
Line 41: Line 46:
  
 Commands: ''cat'', ''more'', ''less'', ''most'', ''wc'', ''nano'', ''head'', ''shuf'' Commands: ''cat'', ''more'', ''less'', ''most'', ''wc'', ''nano'', ''head'', ''shuf''
 +
 +
 ==== Grabbing information in a file from the command line ==== ==== Grabbing information in a file from the command line ====
    
Line 51: Line 58:
 All the operations carried out show their result in the terminal, but do not alter the contents nor are stored anywhere. Now we learn how to store them. All the operations carried out show their result in the terminal, but do not alter the contents nor are stored anywhere. Now we learn how to store them.
  
-Commands: ''>'', ''>>''+Commands: ''>'', ''%%>>%%''
  
 ==== Understanding the structure of the commands ==== ==== Understanding the structure of the commands ====
Line 67: Line 74:
  
 Commands: ''man'' Commands: ''man''
 +
 +==== Exercises ====
 +
 +**EXERCISE 1**. Let us "measure" a file: bytes, megabytes, lines, words, etc.
 +
 +**EXERCISE 2**. Shuffle a parallel corpus in order to have sentences from different speeches. 
 +
 +**EXERCISE 3**. Find the most frequent tokens in the two parts of a parallel corpus and analyse them.
 +
 +**EXERCISE 4**. Get all words which are cognates wrt Italian from a tsv dictionary. Afterwards, count the number of tokens which belong to each family. 
 +
  • tutorials/b4b.txt
  • Last modified: 2019/11/06 11:36
  • by albarron