BootCaT front-end tutorial - Part 3

This is an old revision of the document!

Before we can query the search engine, we need to provide BootCaT with a Bing AppId (see this page for more information on AppIds).

Once you have obtained your Bing AppId, paste it in the box and click “Next”.

If you want BootCaT to remember your AppId the next time you use it, leave the relevant box checked (it's not recommended doing this if you're using a public or shared computer).

It's time to query the search engine (i.e. Bing) using the tuples we generated earlier. What happens here is that we search the web via the search engine, looking for pages that contain the tuples (combinations of our seeds) that were generated in the previous step. This identifies texts that are relevant to the more or less specific corpus (domain) in which we are interested, based on how specialized or general the seeds are.

The search engine will return only a limited number of pages for each query (i.e. tuple) we submit; the default value is 10 URLs per query and we won't change it.

Increasing the number of pages will result in a larger corpus, but its contents will tend to become less relevant.

Some advanced options are available on this step, but we won't discuss them here, for now just click “Collect URLs” to start collecting URLs from the search engine.

This might take a while, depending on the number of tuples, Internet traffic and speed of your connection. In the lower text area you can see the URLs that are being collected from the search engine.

In this step we only collect the URLs (i.e. the Internet addresses) of pages, the actual pages will be downloaded in a later step.