These scripts were used to prepare the data. Two utilities are used:

* WGET (a part of GNU) 
* TIDY (http://tidy.sourceforge.net/)

Note that this was ran at the end of 2003. If the structure changed, they 
will no longer work.
