bootcat:tutorials:basic_4

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
bootcat:tutorials:basic_4 [2012/05/30 13:23] – created erosbootcat:tutorials:basic_4 [2018/02/07 14:40] (current) eros
Line 1: Line 1:
 [[bootcat:tutorials:basic_5|{{ :buttons:next.png|}}]] [[bootcat:tutorials:basic_5|{{ :buttons:next.png|}}]]
 [[bootcat:tutorials:basic_3|{{:buttons:previous.png|}}]] [[bootcat:tutorials:basic_3|{{:buttons:previous.png|}}]]
 +----
 ====== BootCaT front-end tutorial - Part 4 ====== ====== BootCaT front-end tutorial - Part 4 ======
  
-==== Edit the URL list ====+==== Editing the URL list ====
  
 In this step you can choose to remove URLs you think might not be interesting. Just for fun try unchecking the box next to a couple of URLs: notice how the number of "Selected URLs" changes when you check/uncheck the boxes. You can also click on the URLs to visit the web page and decide whether you want to include the page in your corpus or not. In this step you can choose to remove URLs you think might not be interesting. Just for fun try unchecking the box next to a couple of URLs: notice how the number of "Selected URLs" changes when you check/uncheck the boxes. You can also click on the URLs to visit the web page and decide whether you want to include the page in your corpus or not.
  
-{{:tutorials:basic_steps:010.png|}}+{{ bootcat:tutorials:basic_steps:010.png?nolink |}}
  
-:!: Notice how the number of "Total URLs" appears to be wrong: we generated 15 queries and instructed BootCaT to retrieve 10 URLs per query, so the total should be 150. What happened then? Simple, quite a few URLs where retrieved more than once (this is because the queries can be very similar to one another, as the tuples overlap to a large extent) and duplicates were automatically eliminated by BootCaT.+:!: Notice how the number of "Retrieved URLs" appears to be wrong: we generated 15 queries and instructed BootCaT to retrieve 10 URLs per query, so the total should be 150. What happened then? Simple, quite a few URLs where retrieved more than once (this is because the queries can be very similar to one another, as the tuples overlap to a large extent) and duplicates were automatically eliminated by BootCaT.
  
 Click "Next". Click "Next".
-==== Build corpus ====+ 
 +==== Building the corpus ====
  
 This is the final step.  This is the final step. 
Line 24: Line 25:
 The purpose of this stage is to get rid of elements which are part of the downloaded web pages, but that are very unlikely to be of interest to corpus users. However, since this process is automated, the cleaning process is far from perfect, so be aware that some unwanted elements will still be present in the corpus. The purpose of this stage is to get rid of elements which are part of the downloaded web pages, but that are very unlikely to be of interest to corpus users. However, since this process is automated, the cleaning process is far from perfect, so be aware that some unwanted elements will still be present in the corpus.
  
-{{:tutorials:basic_steps:011.png|}}+{{ bootcat:tutorials:basic_steps:011.png?nolink |}}
  
 Click on "Build corpus" to start the corpus creation process. This will take a while, depending on Internet traffic, connection speed and number of URLs to download. Click on "Build corpus" to start the corpus creation process. This will take a while, depending on Internet traffic, connection speed and number of URLs to download.
Line 30: Line 31:
 Go make a cup of tea while you wait. Go make a cup of tea while you wait.
  
-{{:tutorials:basic_steps:012.png|}}+{{ bootcat:tutorials:basic_steps:012.png?nolink |}}
  
 Once the download is complete click "Open corpus folder". Once the download is complete click "Open corpus folder".
  
-{{:tutorials:basic_steps:013.png|}}+{{ bootcat:tutorials:basic_steps:013.png?nolink |}}
  
 The contents of the folder where the corpus data is stored will be displayed. The contents of the folder where the corpus data is stored will be displayed.
  
-{{:tutorials:basic_steps:014.png|}}+{{ bootcat:tutorials:basic_steps:014.png?nolink |}}
  
 +====== ======
 +----
 [[bootcat:tutorials:basic_3|{{:buttons:previous.png|}}]] [[bootcat:tutorials:basic_3|{{:buttons:previous.png|}}]]
 [[bootcat:tutorials:basic_5|{{ :buttons:next.png|}}]] [[bootcat:tutorials:basic_5|{{ :buttons:next.png|}}]]
  • bootcat/tutorials/basic_4.1338384206.txt.gz
  • Last modified: 2012/05/30 13:23
  • by eros