tutorials:basic_4

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
tutorials:basic_4 [2010/04/13 16:31] erostutorials:basic_4 [2012/05/30 15:23] (current) – removed eros
Line 1: Line 1:
-====== BootCaT front-end tutorial - Part 4 ====== 
  
-[[tutorials:basic_3|Back to part 1 of the tutorial]] 
- 
-==== What now? ==== 
- 
-Congratulations, you have created your first web corpus! 
- 
-Now you can use your favourite corpus analysis tools to work with your corpus, here's a [[http://sslmit.unibo.it/~eros/teaching_software.php#concordancers|list of programs]] you might find useful. 
- 
-If you want to manually inspect the corpus you just created, there are a number of text editors you can use. If you're on Mac or Linux you already have everything you need, if you're on Windows we strongly recommend the free [[http://notepad-plus.sourceforge.net|Notepad++]] since the default Windows Notepad will not display the corpus correctly. 
- 
-==== Not happy with your corpus? ==== 
- 
-If you're happy with the corpus that you have created, then go ahead and have fun using it! Otherwise, if the semi-automatically built corpus does not meet your requirements, repeat the procedure providing a different set of seeds (e.g. more seeds to make the corpus more specific and focussed), and/or modifying the parameters subsequently used to generate the tuples. 
- 
-==== You're gonna need a bigger corpus ==== 
- 
-Whether you believe in the old adage that "more data is better data" or you simply want to experiment some more, you might want to build a larger corpus. The easiest way of doing it is repeating the process using more seeds (with which you'll be able to generate more tuples/queries which in turn will result in more URLs and more documents). 
- 
-Use [[http://www.antlab.sci.waseda.ac.jp/software.html|Antconc]] or [[http://www.lexically.net/wordsmith/|Wordsmith tools]] (or whatever other tool you might have) to generate a list of keywords from your new corpus. Then you can use the most relevant keywords as seeds for a new web corpus. 
  • tutorials/basic_4.1271169117.txt.gz
  • Last modified: 2010/04/13 16:31
  • by eros