tutorials:b4b

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
tutorials:b4b [2019/11/05 15:23] – [How to display and edit a file] erostutorials:b4b [2019/11/06 11:36] (current) – [Location and time] albarron
Line 5: Line 5:
 ====== Location and time ====== ====== Location and time ======
  
-The tutorial will be held in Room 4 of the PhD Lab on Wednesday 6 November from 9.15 to 10.45. +The tutorial was held in Room 4 of the PhD Lab on Wednesday 6 November from 9.15 to 10.45. 
  
 ===== Requirements ===== ===== Requirements =====
Line 15: Line 15:
   * **Linux**. Nothing extra. You are ready to go.   * **Linux**. Nothing extra. You are ready to go.
  
 +===== Resources =====
 +
 +We'll use a small subset of the English-Italian part of the Europarl parallel corpus.
 +
 +Download the two files here: {{:tutorials:b4b:en.zip|English}} and {{:tutorials:b4b:it.zip|Italian}}
  
 ===== Why is bash relevant? ===== ===== Why is bash relevant? =====
Line 41: Line 46:
  
 Commands: ''cat'', ''more'', ''less'', ''most'', ''wc'', ''nano'', ''head'', ''shuf'' Commands: ''cat'', ''more'', ''less'', ''most'', ''wc'', ''nano'', ''head'', ''shuf''
 +
 +
 ==== Grabbing information in a file from the command line ==== ==== Grabbing information in a file from the command line ====
    
Line 51: Line 58:
 All the operations carried out show their result in the terminal, but do not alter the contents nor are stored anywhere. Now we learn how to store them. All the operations carried out show their result in the terminal, but do not alter the contents nor are stored anywhere. Now we learn how to store them.
  
-Commands: ''te'', ''>'', ''>>''+Commands: ''>'', ''%%>>%%''
  
 ==== Understanding the structure of the commands ==== ==== Understanding the structure of the commands ====
Line 67: Line 74:
  
 Commands: ''man'' Commands: ''man''
 +
 +==== Exercises ====
 +
 +**EXERCISE 1**. Let us "measure" a file: bytes, megabytes, lines, words, etc.
 +
 +**EXERCISE 2**. Shuffle a parallel corpus in order to have sentences from different speeches. 
 +
 +**EXERCISE 3**. Find the most frequent tokens in the two parts of a parallel corpus and analyse them.
 +
 +**EXERCISE 4**. Get all words which are cognates wrt Italian from a tsv dictionary. Afterwards, count the number of tokens which belong to each family. 
 +
  • tutorials/b4b.1572963794.txt.gz
  • Last modified: 2019/11/05 15:23
  • by eros