VisibleWeb - Support - Tutorial
WebGet - Retrieve web pages
This page describes a number of batch command files which retrieve web pages. For each example the command file, the retrieved page and the report can be viewed. The command files have been stored as text files to prevent accidental execution. The examples range from how to execute a simple retrieval extending to fairly complex multiple retrievals.
In these examples in the main the following environment variables are modified.
- myurl - the URL of the page to be retrieved
- mypag - the retrieved page
- myrep - the report
- mycnt - the submission count
- mydly - the time delay between submission
VisibleWeb is subject to a programme of continous improvement which means the details of some of the files and reports may not be exactly the same as those listed below.
WebGet-j01 - single page retrieval
This command file retrieves a single web page from an "open" web site. The majority of web sites are "open" and do not need a user identifier or password to be supplied before access is granted.
| webget-j01-bat.txt | command file |
| webget-01.html | retrieved page |
| webget-01.xml | report formatted as XML |
WebGet-j02 - single secure page retrieval
This command file retrieves a single web page which requires a user identifier ("myuser") and password ("mypass") to be supplied before access is granted.
The user identifier is separated from the password by the colon character (":"). The user identifier and password combination must appear immediately after the leading "http://" and be terminated by the at character ("@"). The password can be an empty string for example "myuser:@".
| webget-j02-bat.txt | command file |
| webget-02.html | retrieved page |
| webget-02.xml | report formatted as XML |
WebGet-j03 - single page retrieval with referer
This command file retrieves a single web page and sets the referer for the retrieved page. Some web sites implement processing dependent on the previous page retrieved and so use the "referer" in this processing.
| webget-j03-bat.txt | command file |
| webget-03.html | retrieved page |
| webget-03.xml | report formatted as XML |
WebGet-j04 - single secure page retrieval with referer
This command file retrieves a single secure web page and sets the referer for the retrieved page.
| webget-j04-bat.txt | command file |
| webget-04.html | retrieved page |
| webget-04.xml | report formatted as XML |
WebGet-j05 - single secure page retrieval with referer and user agent
This command file retrieves a single secure web page and sets the referer page and the user agent. Some web sites implement processing dependent on the user agent. The user agent is set as part of the licence.
| webget-j05-bat.txt | command file |
| webget-05.html | retrieved page |
| webget-05.xml | report formatted as XML |
WebGet-j06 - single page retrieval with report as text
This command file retrieves a single web page and produces a report formatted as text. It is strongly recommended that systems are implemented with reports formatted as XML.
| webget-j06-bat.txt | command file |
| webget-06.html | retrieved page |
| webget-06.txt | report formatted as text |
WebGet-j07 - single page retrieval with report as HTML
This command file retrieves a single web page and produces a report formatted as HTML. It is strongly recommended that systems are implemented with reports formatted as XML.
| webget-j07-bat.txt | command file |
| webget-07.html | retrieved page |
| webget-07.htm | report formatted as HTML |
WebGet-j08 - single page retrieval with report as HTML and CSS
This command file retrieves a single web page and produces a report formatted as HTML and CSS. It is strongly recommended that systems are implemented with reports formatted as XML.
| webget-j08-bat.txt | command file |
| webget-08.html | retrieved page |
| webget-08.htm | report formatted as HTML and CSS |
WebGet-j09 - single web form page retrieval for form data extraction
This command file retrieves a single web page and extracts the form data in the format that can be edited and submitted through FormGet or FormPost.
| webget-j09-bat.txt | command file |
| webget-09.html | retrieved page |
| webget-09.txt | extracted form data |
| webget-09.xml | report formatted as XML |
WebGet-j11 - multiple retrieval of single page
This command file retrieves a single web page fifteen times with a delay of two seconds (two thousand milli-seconds) between retrievals.
| webget-j11-bat.txt | command file |
| webget-11.html | retrieved page |
| webget-11.xml | report formatted as XML |
WebGet-j12 - multiple retrieval of single page
This command file retrieves a single web page five times with a delay of four seconds between retrievals.
| webget-j12-bat.txt | command file |
| webget-12.html | retrieved page |
| webget-12.xml | report formatted as XML |
WebGet-j13 - multiple retrieval of single page
This command file retrieves a single web page three times with a delay of 1 minute between retrievals.
| webget-j13-bat.txt | command file |
| webget-13.html | retrieved page |
| webget-13.xml | report formatted as XML |
WebGet-j14 - multiple retrieval of single page
This command file retrieves a single web page two times with a delay of one hour between retrievals.
| webget-j14-bat.txt | command file |
| webget-14.html | retrieved page |
| webget-14.xml | report formatted as XML |
WebGet-j15 - multiple retrieval of single page
This command file retrieves a single web page four times with a delay of one minute twenty seconds five hundred milli-seconds between retrievals.
| webget-j15-bat.txt | command file |
| webget-15.html | retrieved page |
| webget-15.xml | report formatted as XML |
WebGet-j16 - multiple retrieval of single page
This command file retrieves a single web page five times with a random delay of between two and ten seconds between retrievals.
| webget-j16-bat.txt | command file |
| webget-16.html | retrieved page |
| webget-16.xml | report formatted as XML |
WebGet-j21 - multiple retrieval of different pages
This command file retrieves 3 different web pages using a "referred to" list retrieving them to the same page.
| webget-j21-bat.txt | command file |
| webget-21u.txt | referred to URL file |
| webget-21.html | retrieved page |
| webget-21.xml | report formatted as XML |
WebGet-j22 - multiple retrieval of single page
This command file retrieves a single web page and stores them to three different pages. If the web page contained changing information over time this approach could be used to obtain trend information.
| webget-j22-bat.txt | command file |
| webget-21p.txt | referred to page file |
| webget-22-1.html | retrieved page 1 |
| webget-22-2.html | retrieved page 2 |
| webget-22-3.html | retrieved page 3 |
| webget-22.xml | report formatted as XML |
WebGet-j23 - multiple retrieval of different page
This command file retrieves 3 different web pages to three different pages. Typically this approach is used to track a share portfolio.
| webget-j23-bat.txt | command file |
| webget-23u.txt | referred to URL file |
| webget-23p.txt | referred to page file |
| webget-23-1.html | retrieved page 1 |
| webget-23-2.html | retrieved page 2 |
| webget-23-3.html | retrieved page 3 |
| webget-23.xml | report formatted as XML |