WhyWaitForever

London England

Design Paper 7

A Guide To Web Forms

This paper has been written to provide businesses, web agencies and IT services with an understanding of web form processing so that they can understand and assess the risks associated with implementing a web form system.

Developing a simple web form processing system is trivial. Developing a robust and resilient web form system that can handle the worse that malicious automated programs can deliver is a very challenging software engineering task.

WhyWaitForever developed VisibleWeb to allow developers to independently test many of the critical capabilities of their web applications. It is very useful in demonstrating to business sponsors the possible pressures that their applications will need to cope with and to clarify the contingency processes that may need to be invoked.

  1. Web Forms
  2. Web forms provide the mechanisms for exchanging information between customers and web site owners.

    A web site owner can take the information supplied by a customer, process this information and undertake the work that is required.

    It is difficult for a web site owner to verify the status of a customer who has submitted a web form. Customers can be considered to fall into a number of distinct categories:

    • An actual customer interested in the service or product.
    • A naive or malicious customer.
    • A competitor.
    • A market analyst or researcher or information aggregator.

    The majority of web traffic on the Internet is undertaken by automated processes. This type of traffic includes that created by the vast variety of web robots.

    It is straightforward for competitors and market analysts to obtain or to develop a web form robot and to configure it to complete and submit web forms automatically and periodically to track any information available through the form. Such information can include prices, special deals or offers and terms and conditions. The retrieved information can be fed into a change detection analysis system and important changes can be flagged to the human analyst by standard alert processes such as instant messaging or email notification or text messaging.

    Web form robots which are created and used for legitimate business purposes can easily be executed by naive or malicious customers.

  3. Web Form Robots
  4. Web form robots simulate the actions of a human using a web browser and represent the actions of a human entering information on a web page that contains the form.

    Step 1 - Casing

    In order to configure a robot the steps a human would take needs to be analysed by manual inspection.

    Initial Investigation

    The form might be held in a web page in a unsecured area and so can be accessed using "http".

    The form might be held in an secure area such as that provided by "Secure Socket Layer (SSL)" and so needs to be accessed using "https". This approach is meant to protect the information exchanged from eavesdropping by third parties. The encryption processes might have inbuilt trap doors allowing parties (usually the security services) to access the encrypted information. These access rights are usually tightly restricted and are governed by statute. SSL provides verification and authentication of the provided of the form. It adds nothing to what is known about the person or robot completing the form.

    The form might comprise a number of linked forms on different web pages. In less secure sites hidden fields are used to pass previously entered information to the next web page which contains the next form in the sequence. When all information has been received the final web page submits the form for processing by the web server.

    Security Registration

    The form might be held in an area where a user identifier and password needs to be provided before access. Irrespective of whether the form page is protected or not the program requested by the "action" statement is not necessarily protected.

    In "open" user groups the user identifier and password are set up usually through a registration form page. The password may be the users choice or it may be automatically generated. The password may be displayed on a secure web page or it may be emailed to an email address provided by the user.

    In "closed" user groups there may be additional automated and manual steps in distributing a user identifier and password. These manual processes can include contacting the user by telephone or writing to the user at their registered address.

    If the processes are all automated then web robots can be configured to register themselves as a new user whenever the information that they require has been stopped by manual or automated intruder detection processes. A web robot could extract pertinent information from the returned "password" page or from the returned email, re-configure, re-register and reschedule itself.

    It is conceivable that a web robot could be configured to register all users who a listed in a quality information source such as the UK Electoral Register or the UK Census or the UK telephone directory.

    UK Computer Misuse Act

    A Web form developer should make it absolutely clear in the text of the form who the acceptable users of the web form are. Where appropriate there should be explicit text stating that the form should only be used by humans and all submissions by other automated processes is expressly forbidden.

    In the UK multiple submissions by web robots could be construed as an attack. Such attacks on particular organisations or types of companies can be handled under anti-terrorism legislation.

    Securing Form Data

    The most secure way of securing form data is to encrypt using a encoding process that can be validated as conforming to the mathematical encryption algorithm selected pertinent to the level of risk. These algorithms are relatively straigtforward mathematically and so can be directly implemented in a programming language of choice. If "trap doors" are not required they can be omitted. UK RIP legislation in this area can be very constraining.

    Web Page Sources

    It is necessary to obtain the source for all elements of the web page. Web browser software usually provides "view source" options. These options can be disabled. Web Browser software can copy sources to the "Temporary Internet Files" directory cache from where it can be retrieved.

    Even without using a powerful tool such as VisibleWeb a standard utility such as WGET can be used to retrieve both the web page and the elements that make up the web page. Many versions of this utility can be configured to provide the HTTP header information to allow the retrieval to be passed off as a retrieval by something other than WGET. The standard web browser suppliers take a dim view of misrepresentation. In a testing situation where the web server can be designed to undertake different actions dependent on the particular browser this facility can be very useful.

    Form Action Statement and Fields

    From the elements of the web page the action statement (which may be deferred into an ECMAScript procedure or function) as well as all the fields needs to be identifed.

    In most forms local ECMAScript validation can be bypassed if necessary. Of special interest are functions to replace "special" characters in input fields. These might be significant in back-office processing. A comma delimited file might disallow commas within particular fields. A maliciously configured robot could add extra commas or special characters which may cause problems in the back-office processing.

    Repetitive Submissions

    An analysis should be undertaken to see if forms can be manually repeat submitted. Checks should be made on the minimum amount of data that needs to be changed to allow a re-submission.

    Step 2 - Ranging

    The valid range for each field needs to be identified and the robot configured accordingly.

    Step 3 - Disguise

    If there are security intruder detection mechanisms in place to prevent repeated submissions from the same source IP address then a range of user identifiers and IP addresses need to be used. MS IIS uses session variables which can be used to limit access to a particular page by a particular user. To counter this, periodically, a web robot might need to break a network connection and reconnect to be allocated a different IP address.

    Intruder detection processes should look for multiple submissions from the same IP address or repeat ranges of IP addresses.

    Step 4 - Scheduling

    If there are security intruder detection mechanisms in place to prevent repeated submissions over time then the scheduling time intervals between submissions could be adjusted to avoid detection.

    A form submission where pre-requisite pages have not been visited could trigger investigation by the intruder detection processes. In cases where time delays between a visit to a pre-requisite page and the form are too short might also merit investigation by an intruder detection process.

    If the web form uses the GET action method the standard utility WGET can be used. If the web form uses the POST action method a more sophisticated program such as VisibleWeb needs to be used. The execution of WGET can be time scheduled using a scheduler such as CRONTAB for Unix or AT for Windows. The utilities SED and AWK can be used to amend and disguise the form data between executions.

    Step 5 - Retrieval and Analysis

    The returned page need to be safely stored and processed to see if it contains error or warning messages. If there are no error messages present the returned data should be the information that would have been returned if the form had been submitted by a valid human customer.

    In many situations it is important to detect differences in the information retrieved over time. Tracking a share price is an obvious example. A combination of SED, AWK and DIFF can be used to detect differences.

  5. Web Page Standards
  6. Creating web pages that conform to standards should be seen as a first defence. Conforming pages implies that implementation issues including security and intruder detection have been considered. A "clean" implementation suggests to a malicious user that the risks associated with an attack are higher than might be the case elsewhere. In consequence it might discourage some malicious users. An excellent basic set of tools for validation and conformance are as follows:

    For maximum benefit these tools should be used in "strict" form.

    In addition utilities such as SED and AWK which are available through the GNU open source movement can be used to further clean up source code removing comments. Comments can provide invaluable assistance to malicious users planning attacks.

    • GNU [ http://www.gnu.org/ ]

  7. Web Page Security Certificates, Devices and Biometrics
  8. More and more Internet users are aware of the issues that surrounds Internet security. With web forms it is essential to protect information from eavesdroppers that the form is protected by "Secure Socket Layer (SSL)" processing. Naturally the security certificate should relate to the web site owner which may be different than that of the web site host.

    There are "One Time Password" electronic devices that can be issued to an individual for their sole use that provides authentication which cannot be refuted. These are expensive to purchase, issue and maintain.

    In time bio-metric security validation devices might become available. Pessimistically at the point of transmission, the thumbprint or eye scan or ear print as a digital signature, is actually a series of bytes. It seems obvious but any series of bytes can be spoofed.

  9. Web Form Processing
  10. For increased security for most companies Internet web servers need to be hosted externally to the web site owner's trusted firewall protected network. Remember the battle cry "through the router, over the firewall and into Grandma's house we go".

    If the web site host is on the same local network as that of other trusted systems then all systems are exposed to very high security risks. Connections increase the risks which range from relatively minor denial of service attacks through to full penetration attacks perhaps using paging file rifling and buffer overflow techniques.

    The Internet usually allows "anonymous" multiple connections and does not require passwords. In contrast most trusted networks allow only user identifiers which are assigned to known individuals and passwords which allow only a few failed login attempts before the access rights of the user identifier are immediately revoked.

    It is vital both to provide separation between network segments and to prevent unconstrained users access to the trusted network where time can be used to crack the most secure of defences.

    Local Validation

    This is the processing carried out by the web browser software on the local machine. It usually comprises the execution of ECMAScripts which validate the entered data prior to submission. In many web sites comments are left in the scripts.

    Where Java Jar files are used these can be unzipped and the Java classes decompiled to reveal the processing and validation.

    Web Server

    The program in the form action attribute can undertake validation, information storage and receipt acknowledgement.

    In a number of web sites standard programs are provided by the web site host which take all the values and place these in a file. Sometimes it is possible to specify whether the file should be comma delimited and if fields are mandatory.

    Where validation is minimal or non existent it is trivial to generate form data which is erroneous or incomplete or to generate multiple duplicates of form data.

    Back-office Server

    The form data is retrieved from the main web server. The utility WGET can be used to retrieve the form data if it is accessible via HTTP or FTP. Many web servers allow access through FTP alone. In these cases the FTP utility programs should be used. After the file has been retrieved it can be replaced with an initialised file where all the form data previously held has been deleted.

    The form data needs to be validated and erroneous, incomplete and duplicate data needs to be removed. The utilities SED and AWK are useful for removing duplicates. It should be noted that duplicates might have different time stamps.

    Data which has been cleaned up can be updated through the web site owners' backoffice transaction and DBMS applications.

  11. PostForm Utility Program
  12. A web robot that automatically and periodically posts form data could carry out the following functions.

    1. A scheduler.
    2. A string editor to amend the form data between each submission and to disguise the data where necessary.
    3. A network program to connect across the Internet, post the data and receive the reponse.
    4. A string and file processing program to handle the response and take appropriate action.

    The following code extract from a Java program highlights how the java.net.* classes could be used for the network processing. Similar simple programs can be written using Perl or TCL.

    
    import java.net.*;
    
    URL url;
    
    String formData, responseData;
    String fieldName1, valueName1 .... ;
    
    
    try {
    
        /* form data */
        formData = fieldName1 + "=" + URLEncoder.encode(valueName1) + "&" +
                   fieldName2 + "=" + URLEncoder.encode(valueName2) ..;
    
        /* URL connection */
        URLConnection urlConnection = url.openConnection();
        urlConnection.setAllowUserInteraction(true);
        urlConnection.setDoInput(true);
        urlConnection.setDoOutput(true);
        urlConnection.setRequestProperty(
                "Authorization", "Basic " + Base64.encode(user, pw));
        urlConnection.setUseCaches(false);
        urlConnection.setRequestProperty(
                "Content-Type", "application/x-www-form-urlencoded");
    
        /* Post form data */
        DataOutputStream dataOutputStream = new
                DataOutputStream(urlConnection.getOutputStream());
        dataOutputStream.writeBytes(formData);
        dataOutputStream.flush();
        dataOutputStream.close();
    
        /* Receive response */
        DataInputStream dataInputStream = new
                DataInputStream(urlConnection.getInputStream());
        String inputString;
        while ((inputString = dataInputStream.readLine()) != null) {
           responseData.append(inputString + "\n"); }
        dataInputStream.close();
    
    }
    catch (MalformedURLException e1) {
        System.err.println("MalformedURLException: " + e1); }
    catch (IOException e2) {
        System.err.println("IOException: " + e2.getMessage()); }
    
    

    More sophisticated web robots can connect using "Dial Up Networking" on demand and connect through to a number of different ISPs. These ISPs are chosen to be the same as those used by a typical customer. In the UK these could include BT, Freeserve, Aol, Claranet and others used by residential customers. Tracking back to the originating company without access to police or security resources would be very difficult.

    If the web robot activity can be demonstrated as being illegal then it is possible for the authorities to trace through the ISP and telephone company logs to find the owner of the telephone line that was used.

    It would be a very unintelligent person to launch a web form robot from an Internet gateway that could be easily traced from the records in the web server activity logs. Nevertheless it is not untypical to find log records listing the browser type as "Robot X" and the originating IP address as "gateway.mycompany.com".

  13. Contingency and Fallback
  14. Take the scenario that every day for the last few months on average one hundred and twenty customers signup to the services and products on offer through the web form on which no registration is required.

    On Friday the 13th, thirteen thousand one hundred and twenty customers appear to signup. All the customers appear genuine as they have been signed up with data from the UK Electoral Roll. The originating IP addresses are a scatter across many IP addresses used by most UK residential ISPs. The confirmation signup pack ( cost per pack £1.50p ) needs to be sent out by snail mail.

    There are two options:

    1. The signup packs are sent out.
      • Vast numbers of complaints are received.
      • The following Friday a further thirteen thousand appear to signup.
    2. The signup packs are not sent out.
      • An apology / explanation web page is placed on the web site.
      • Registration via email is now required with telephone confirmation of password.

  15. Summary
  16. The following are a list of measures that balance risk against userability and implementation cost.

    1. All elements should conform to the highest web implementation standards.
    2. The web site host should be remote from the trusted network.
    3. SSL should be used specifying the web site owner's certificate.
    4. Local validation should be used.
    5. Web server validation should be used.
    6. Back office validation should be used.
    7. Multiple automated submissions should be expected, detected and appropriate measures taken.

    It is possible to implement a robust and resilient web form processing application but full testing using a product such as VisibleWeb reduces the chances of surprises.


Life's too short why wait forever
Privacy Declaration
Copyright © 2000 - 2005. WhyWaitForever. All rights reserved.
Legal Disclaimer