tutorials:b4b

Bash for Beginners

This quick-and-dirty tutorial is intended as an introduction to bash. The natural language used during the tutorial will be English.

Location and time

The tutorial was held in Room 4 of the PhD Lab on Wednesday 6 November from 9.15 to 10.45.

In order to follow the tutorial, you will require a laptop. Depending on your operative system, you will require one of the following:

  • Windows machine. You have to download both KiTTy and kscp. You can find a zip file with both here.
  • Mac. Nothing extra. You are ready to go.
  • Linux. Nothing extra. You are ready to go.

We'll use a small subset of the English-Italian part of the Europarl parallel corpus.

Download the two files here: English and Italian

  • Quick and easy text and data processing
  • The right way to interact with real computing software
  • One gate to Python and deep learning

You will first learn how to setup a remote connection to the machine

Afterwards, you will understand what is the meaning of “living” in a multi-user setting. You will learn how to list the files and directories, as well as how to move around.

Commands: ssh, ls, pwd, cd, mkdir

Files can be simply displayed (without performing any modification) or actually opened for edition purposes. You will learn to do both.

Commands: cat, more, less, most, wc, nano, head, shuf

Until now, the kinds of operation you have performed are quite basic and not to different from what you can do with standard tools. Now we start to do interesting stuff. In this section you will learn how to sort, filter, and modify, and combine the information in a file

Commands: sort, grep, sed, column

All the operations carried out show their result in the terminal, but do not alter the contents nor are stored anywhere. Now we learn how to store them.

Commands: >, >>

We have played with quite a few commands already. Let us understand how commands are usually structured.

Let's start making things interesting: all these commands can be executed one after the other at no extra cost. These are the so-called one-liners.

Commands: awk, |

Commands: man

EXERCISE 1. Let us “measure” a file: bytes, megabytes, lines, words, etc.

EXERCISE 2. Shuffle a parallel corpus in order to have sentences from different speeches.

EXERCISE 3. Find the most frequent tokens in the two parts of a parallel corpus and analyse them.

EXERCISE 4. Get all words which are cognates wrt Italian from a tsv dictionary. Afterwards, count the number of tokens which belong to each family.

  • tutorials/b4b.txt
  • Last modified: 2019/11/06 10:36
  • by albarron