This is an old revision of the document!

BootCaT front-end tutorial - Part 3

It's time to generate the queries that will be sent to the search engine (i.e. Google) using the tuples we generated earlier. The queries we generate here will be used in the next step to open a browser and save results.

A number of parameters can be specified here, but, for the purposes of this tutorial, we'll just accept the default values and click on “Next”.

What happens here is that we open each of the queries generated in the previous step in a web browser. Each query consists of the tuples (combinations of our seeds) we generated earlier. This identifies texts that are relevant to the more or less specific corpus (domain) in which we are interested, based on how specialized or general the seeds are.

Click on “Open in browser”, a message will appear explaining what's about to happen and the folder where you'll need to save the results page. You can also open the folder by clicking on “Open folder”.

Once you click on “OK” your default Web browser will open and you'll see the results of the query, the page will look something like this:

Now you need to save the page by using the “Save page” function of your browser (on Windows you can just press CTRL-S, on MacOS CMD-S), a dialog box will appear asking you where you want to save the page. You need to select the folder BootCaT Corpora → dogs → queries:

Once you're done saving the results of all queries, click on “Collect URLs” and you'll be taken to the next step:

:!:: you can choose to click on “Open All in Browser” to send all queries to the browser with a single click, but this sometimes results in Google blocking the operation.

:!: In this step we only collect the URLs (i.e. the Internet addresses) of pages, the actual pages will be downloaded in a later step.

  • bootcat/tutorials/basic_3.1518011465.txt.gz
  • Last modified: 2018/02/07 13:51
  • by eros