Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision Next revisionBoth sides next revision | ||
bootcat:tutorials:basic_3 [2012/05/30 15:22] – created eros | bootcat:tutorials:basic_3 [2018/02/07 14:51] – [Collecting URLs] eros | ||
---|---|---|---|
Line 1: | Line 1: | ||
[[bootcat: | [[bootcat: | ||
[[bootcat: | [[bootcat: | ||
+ | ---- | ||
====== BootCaT front-end tutorial - Part 3 ====== | ====== BootCaT front-end tutorial - Part 3 ====== | ||
- | ==== Bing AppId ==== | + | ==== Generating queries |
- | Before we can query the search engine, we need to provide BootCaT with a Bing AppId (see [[help: | + | It's time to generate the queries that will be sent to the search engine |
- | Once you have obtained your Bing AppId, paste it in the box and click " | + | A number of parameters can be specified here, but, for the purposes of this tutorial, we'll just accept |
- | {{: | + | {{bootcat: |
- | :!: If you want BootCaT to remember your AppId the next time you use it, leave the relevant box checked (it's not recommended doing this if you're using a public or shared computer). | + | ==== Collecting URLs ==== |
- | ==== Collect URLs ==== | + | What happens here is that we open each of the queries generated in the previous step in a web browser. Each query consists of the tuples (combinations of our seeds) |
- | + | ||
- | It's time to query the search engine (i.e. Bing) using the tuples we generated earlier. | + | |
- | The search engine will return only a limited number of pages for each query (i.e. tuple) we submit; | + | {{bootcat: |
+ | |||
+ | Click on "Open in browser", | ||
+ | |||
+ | {{bootcat: | ||
+ | |||
+ | Once you click on " | ||
+ | |||
+ | {{bootcat: | ||
- | {{:tutorials: | + | Now you need to save the page by using the "Save page" function of your browser (on Windows you can just press CTRL-S, on MacOS CMD-S), a dialog box will appear asking you where you want to save the page. You need to select the folder **BootCaT Corpora -> dogs -> queries**: |
- | :!: Increasing the number of pages will result in a larger corpus, but its contents will tend to become less relevant. | + | {{bootcat:tutorials:basic_steps: |
- | Some advanced options are available on this step, but we won't discuss them here, for now just click " | + | Once you're done saving the results of all queries, click on " |
- | This might take a while, depending on the number of tuples, Internet traffic and speed of your connection. In the lower text area you can see the URLs that are being collected from the search engine. | + | {{bootcat: |
- | {{:tutorials:basic_steps:009.png|}} | + | :!:: you can choose to click on "Open All in Browser" |
:!: In this step we only collect the URLs (i.e. the Internet addresses) of pages, the actual pages will be downloaded in a later step. | :!: In this step we only collect the URLs (i.e. the Internet addresses) of pages, the actual pages will be downloaded in a later step. | ||
+ | ====== ====== | ||
+ | ---- | ||
[[bootcat: | [[bootcat: | ||
[[bootcat: | [[bootcat: |