WhyWaitForever

London England

VisibleWeb - Support - Tutorial

WebGet - Retrieve web pages

This page describes a number of batch command files which retrieve web pages. For each example the command file, the retrieved page and the report can be viewed. The command files have been stored as text files to prevent accidental execution. The examples range from how to execute a simple retrieval extending to fairly complex multiple retrievals.

In these examples in the main the following environment variables are modified.

  • myurl - the URL of the page to be retrieved
  • mypag - the retrieved page
  • myrep - the report
  • mycnt - the submission count
  • mydly - the time delay between submission

VisibleWeb is subject to a programme of continous improvement which means the details of some of the files and reports may not be exactly the same as those listed below.


WebGet-j01 - single page retrieval

    This command file retrieves a single web page from an "open" web site. The majority of web sites are "open" and do not need a user identifier or password to be supplied before access is granted.

    webget-j01-bat.txt command file
    webget-01.html retrieved page
    webget-01.xml report formatted as XML


WebGet-j02 - single secure page retrieval

    This command file retrieves a single web page which requires a user identifier ("myuser") and password ("mypass") to be supplied before access is granted.

    The user identifier is separated from the password by the colon character (":"). The user identifier and password combination must appear immediately after the leading "http://" and be terminated by the at character ("@"). The password can be an empty string for example "myuser:@".

    webget-j02-bat.txt command file
    webget-02.html retrieved page
    webget-02.xml report formatted as XML


WebGet-j03 - single page retrieval with referer

    This command file retrieves a single web page and sets the referer for the retrieved page. Some web sites implement processing dependent on the previous page retrieved and so use the "referer" in this processing.

    webget-j03-bat.txt command file
    webget-03.html retrieved page
    webget-03.xml report formatted as XML


WebGet-j04 - single secure page retrieval with referer


WebGet-j05 - single secure page retrieval with referer and user agent

    This command file retrieves a single secure web page and sets the referer page and the user agent. Some web sites implement processing dependent on the user agent. The user agent is set as part of the licence.

    webget-j05-bat.txt command file
    webget-05.html retrieved page
    webget-05.xml report formatted as XML


WebGet-j06 - single page retrieval with report as text

    This command file retrieves a single web page and produces a report formatted as text. It is strongly recommended that systems are implemented with reports formatted as XML.

    webget-j06-bat.txt command file
    webget-06.html retrieved page
    webget-06.txt report formatted as text


WebGet-j07 - single page retrieval with report as HTML

    This command file retrieves a single web page and produces a report formatted as HTML. It is strongly recommended that systems are implemented with reports formatted as XML.

    webget-j07-bat.txt command file
    webget-07.html retrieved page
    webget-07.htm report formatted as HTML


WebGet-j08 - single page retrieval with report as HTML and CSS

    This command file retrieves a single web page and produces a report formatted as HTML and CSS. It is strongly recommended that systems are implemented with reports formatted as XML.

    webget-j08-bat.txt command file
    webget-08.html retrieved page
    webget-08.htm report formatted as HTML and CSS


WebGet-j09 - single web form page retrieval for form data extraction


WebGet-j11 - multiple retrieval of single page

    This command file retrieves a single web page fifteen times with a delay of two seconds (two thousand milli-seconds) between retrievals.

    webget-j11-bat.txt command file
    webget-11.html retrieved page
    webget-11.xml report formatted as XML


WebGet-j12 - multiple retrieval of single page


WebGet-j13 - multiple retrieval of single page


WebGet-j14 - multiple retrieval of single page


WebGet-j15 - multiple retrieval of single page

    This command file retrieves a single web page four times with a delay of one minute twenty seconds five hundred milli-seconds between retrievals.

    webget-j15-bat.txt command file
    webget-15.html retrieved page
    webget-15.xml report formatted as XML


WebGet-j16 - multiple retrieval of single page

    This command file retrieves a single web page five times with a random delay of between two and ten seconds between retrievals.

    webget-j16-bat.txt command file
    webget-16.html retrieved page
    webget-16.xml report formatted as XML


WebGet-j21 - multiple retrieval of different pages


WebGet-j22 - multiple retrieval of single page


WebGet-j23 - multiple retrieval of different page


Life's too short why wait forever
Privacy Declaration
Copyright © 2000 - 2005. WhyWaitForever. All rights reserved.
Legal Disclaimer