tutorials:basic_3

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
tutorials:basic_3 [2011/08/23 17:35] erostutorials:basic_3 [2012/05/30 15:22] (current) – removed eros
Line 1: Line 1:
-====== BootCaT front-end tutorial - Part 3 ====== 
  
-[[tutorials:basic_2|Back to part 2 of the tutorial]] 
- 
-==== Bing AppId ==== 
- 
-Before we can query the search engine, we need to provide BootCaT with a Bing AppId, see [[help:search_engine_key|this page]] for more information. 
- 
-Once if've obtained your Bing AppId, paste it in the box and click "Next" 
- 
-{{:tutorials:basic_steps:007.png|}} 
- 
-:!: If you want BootCaT to remember your AppId the next time you use it, leave the relevant box checked (it's not recommended doing this if you're using a public or shared computer). 
- 
-==== Collect URLs ==== 
- 
-It's time to query the search engine (i.e. Bing) using the tuples we generated earlier. What happens here is that we search the web via the search engine, looking for pages that contain the tuples (combinations of our seeds) that were generated in the previous step. This identifies texts that are relevant to the more or less specific corpus (domain) in which we are interested, based on how specialized or general the seeds are. 
-  
-The search engine will return only a limited number of pages for each query (i.e. tuple) we submit; the default value is 10 URLs per query and we won't change it. 
- 
-{{:tutorials:basic_steps:008.png|}} 
- 
-:!: Increasing the number of pages will result in a larger corpus, but its contents will tend to become less relevant. 
- 
-Some advanced options are available on this step, but we won't discuss them here, for now just click "Collect URLs" to start collecting **URLs** from the search engine. 
- 
-This might take a while, depending on the number of tuples, Internet traffic and speed of your connection. In the lower text area you can see the while they are being collected from the search engine. 
- 
-{{:tutorials:basic_steps:009.png|}} 
- 
-:!: In this step we only collect the URLs (i.e. the Internet addresses) of pages, the actual pages will be downloaded in a later step. 
- 
-[[tutorials:basic_4|Tutorial part 4]] 
  • tutorials/basic_3.1314113718.txt.gz
  • Last modified: 2011/08/23 17:35
  • by eros