Author: Michael Elfial
There are many components designed to help the developers to manage the uploads sent by the clients (browsers, client side components). some of them are free, others are commercial, but all they have one common weak side - they treat the upload as file upload and concentrate on it. In fact the uploaded information can be anything and certain ASP pages may want to do some kind of processing that differs from simple file save operations. I want to show some server side techniques that may be useful or give you some ideas at least. But let us start with usual upload.
If you are already familiar with some of the details you can skip the first topics and go directly to the second part of this article.
The term upload is used when the client sends to the server more than just headers in the request. The information being sent may differ. It may contain files, text fields, dynamically generated values and so on. In most cases such kind of operation will need to pack different data types in one block and send it to the server in turn. So a standard was born few years ago (RFC1867) and even if it looks a bit strange today it is the only standard supported by all the browsers and other clients and we need to obey. I'll try to spend some time to write an article on some non-standard techniques in future and show how the work can be done with less efforts if you have some control over the clients - to install additional software etc. However if we need to minimize the need of additional software we need to follow the standard. Now let me say a few words about it:
In general the data sent looks this way (an upload request sample):
POST /somescript.asp HTTP/1.0 <headers> ---UNIQUE-DELIMITER <part-headers> <part-data> ---UNIQUE-DELIMITER <part-headers> <part-data> ---UNIQUE-DELIMITER--
The general request headers are not subject of this article. After them all the chunks of data are packed as parts and send one by one. Each chunk can be something different so set of headers is needed to describe it - <part-headers>. Each browser sends different things but there are some minimal headers you can always expect to receive:
Content-Type - defines the content type of the part and may include
additional elements depending on the implementation (on the client side)
Content-Disposition - defines the "meaning" of the part. This
includes the form field name, file name (if the part represents a file) and may
include additional elements.
The unique delimiter looks strange today (when we have such standards as XML and so on), but remember that the first "form based upload" implementations are created in 1995. So, the unique delimiter comes from the standards intended for the mail attachments and in fact the file upload is very similar to a body of a mail message with attachments. The delimiter should be selected carefully (by the browser or other client) and it must be a sequence that is not found in any parts of data uploaded. We suppose this is ok and we are able to determine what the delimiter is by inspecting the first "text line" in the uploaded data.
Here comes one of the most difficult problems when handling uploads - what it "text line"? Usually it is defined as it is defined in the text files on the client OS. However the different OS-es use different line delimiters and we cannot rely on hard coded values. Fortunately the most OS-es used today use some combination (or single character) of the new-line and line-feed characters (ASCII codes &H0A, &H0D). This gives us a chance - everything from the beginning of the file to the first occurrence of one of these characters is the unique delimiter and then we just need to see what kind of combination of these characters follows the delimiter up to the next non-line-end character (e.g. what combination of &H0A and &H0D we have after the delimiter). This gives us the line delimiter used in the uploaded data.
Why we need all these details?
The <part-data> is always followed by new-line so if it is a file we will want to extract the real file and cut the new line posted after it by the client. Therefore we need to know how the client represents the new-line. Also - the <part-data> begins is separated from the headers by two new lines (if you prefer to think this way - this is one empty header line).
In other words the uploaded data is sent in a manner which combines the pure textual techniques with data parts which may contain everything - text or binary data. This encoding is named multipart/form-data.
The headers can be complex. In general they may contain something like this:
Header-name: value1; namedvalue=value; namedvalue2="value"; "value"
Also we can expect hyphenation (I've never saw this in upload but who knows):
Header-name: value1; namedvalue=value;
namedvalue2="value"; "value"
In which case the next line will begin with blank space.
So reading the headers may be quite complex task that requires us to handle the quotes and the ; character correctly if we want to extract the correct values. For example the Opera browser sends a Content-Type header for each part formed like this:
Content-Type: application/octet-stream; name="filename.ext"
And if we neglect the fact that Contet-Type may contain more than just the type name we will read very strange value from it. Some other browsers may not use correct case for the header names - so checking the header name in case-insensitive manner will prevent us from mistakes.
Forms intended to send files or other data encoded as described above are coded in HTML like this:
<FORM METHOD="POST" ENCTYPE="multipart/form-data" ACTION="_URL_"> <INPUT NAME="FileField" TYPE="FILE"> <INPUT NAME="TextField" TYPE="TEXT" VALUE="something"> <INPUT TYPE="SUBMIT" VALUE="Submit me"> </FORM>
The encoding type must be explicitly specified because the default encoding used for the forms is URL encoding which is not effective, nor applicable for non-textual data. The TYPE="FILE" input fields are treated by the browser as an instruction to display UI that allows the user to select a file that will be uploaded. the form may contain other fields more file fields and so on. Each field is encoded as separate part in the upload block.
Most browsers will not ignore the file fields if the ENCTYPE is not specified. However instead of sending the file they will send just its file name. This behavior can be useful, but do not forget to check if the browsers your clients will use support this feature.
As in any form the field names may appear more than once in the form. This means a field named "File1" may appear several times and on the server side the field name may not be enough! For those who are not familiar with ASP enough let me remind the behavior of the Request object for the normal (non-multipart) forms. For example the collection Request.Form contains a collection of fields for each name found in the form. So the following is always possible and is a good way to determine what the form contains:
Request.Form("some_field_name").Count
If the field name is used only once (as in the most cases) this will return 1, but if you have many fields with the same name in the form the number returned will be their count. Therefore the Request.Form("field_name") returns a collection and not a string! This collection has convenient behavior and returns a comma separated list of the field values in IIS or the first value in ALP, but we must be aware of the fact that this is a collection and not just a value. In VBScript this difference is not so important because of the Set/Let variable assignment technique - Set will obtain the collection, Let (equivalent to simple assignment without keyword) will fetch the string returned as default value by the collection. But the ASP programmers that use JScript know this problem very well - in JScript using the Request.Form("name") directly may cause many problems. So remember that Request.Form("name")(index) is a way to access each field separately. We will want the same for the upload.
The CUpload class is implemented in the upl-inc.asp file. To use it you need to include the file in your pages. The object functionality is done during the object creation and most of the custom work is done over the collection the object creates during the instantiation. So the work with the object looks like this
Set upl = New CUpload ' and the collection is in upl.Post ' It is very similar to the Request.Form ' But we need to pay a bit more attention to the ' fact that it contains sub-collections.
To get a text field value we will use syntax like this:
Dim v
v = upl.Post("Field_name")(1)
' Which means the value of the first field named "Field_name"
The uploaded forms may contain mixed fields with the same name. For example you may have 3 fields named "AField". And the first can be normal text field, the second can be a file field and the third can be again text field. Combining the values in a comma delimited string is impossible because the file can be anything (image, exe etc.) and using a bit more brackets will keep us from stupid mistakes which will not be fatal in normal forms, but may cause much more troubles in case of upload (and will consume a lot of memory for nothing also).
The CUpload object has mostly private members used internally. You need only the Post property (or its alias Value property) to access the prepared/decoded data. The Post collection is implemented using the VarDictionary object from the newObjects ActiveX Pack1 library. In other words SUploads decodes the upload and puts each part in a separate VarDictionary collection together with some additional attributes to inform the application about the nature of the field. To make it clear let use an example:
Suppose we submit this form:
<FORM METHOD="POST" ENCTYPE="multipart/form-data" ACTION="_URL_"> <INPUT NAME="Field1" TYPE="FILE"> <INPUT NAME="Field1" TYPE="TEXT" VALUE="something"> <INPUT NAME="Field2" TYPE="TEXT" VALUE="something"> <INPUT NAME="SUBMIT" VALUE="Submit me"> </FORM>
The collection content will be as follows:
upl.Post("Field1").Count ' Will return 2 - 2 elements with the name Field 1
upl.Post("Field2").Count ' will return 1 - 1 element with the name Field2
upl.Post("Field1")(1) ' will refer to a collection that represents the first (file) field
upl.Post("Field1")(2) ' will refer to a collection that represents the second (text) field
upl.Post("Field2")(1) ' will refer to a collection that represents the text field Field2
The mentioned collections are configured to behave in a convenient way and return the value of the field. However they are collections and contain more data which is important:
Set F = upl.Post("Field1")(1)
We get the file field and let see what it contains:
F("Value") - contains the binary data that represents the file
F("IsFile") - Boolean value that is true (this is a file - not a text
field)
F("ContentLength) - The size of the binary data/file
F("ContentType) - the content type of the data
F("FileName") - the file name (without paths)
F("FileNameExtension") - the file name extension
In the case of text field only these values will exist:
Set T = upl.Post("Field1")(2)
F("Value") - string - the value of the form field
F("IsFile") - false - not a file field
CUpload will attach also additional values to the file fields if there are some more headers in the corresponding part of the uploaded data. Sometimes the browsers send more headers - for example a plug-in may want to encode the file or zip it or specify some more data about the file. In such cases the browser will attach more headers to indicate these details. They will be attached to the collection under names equal to the header name and the header value will be put in the value as string. If the application expects such a behavior it may check for some headers and process the values further - using external components for example.
CUpload also uses SFStream object (from ActiveX Pack1) to handle the new-lines problem (see the part 1) and search the unique delimiters. If you take a look at the CUpload source code you will see that the code is quite simple and most of it is dedicated to the headers - everything else is done by the SFStream object and we need only the header decoding logic the VBScript code.
Only the ASP code is listed here. See the uplfile.asp file for functional example.
If Request.ServerVariables("REQUEST_METHOD") = "POST" Then
' The upload may occur only if post method is used
' If we want to bemore precise we should check also
' the CONTENT_TYPE server variable.
Set upl = New CUpload
Set Post = upl.Post ' To make the lines below short
Set sf = Server.CreateObject("newObjects.utilctls.SFMain")
' Create main object from the Storages and Files library
' We will use it to create a file later
If Post("File").Count > 0 Then
' We have such field
Set F = Post("File")(1)
If F("IsFile") Then
Set target = sf.CreateFile(Server.MapPath("C:\Uploads\" & F("FileName")))
' Create file. We will save the content of the filed in it
target.WriteBin F("Value")
' Write all the data from the field there
Set target = Nothing ' close the file
' That is all
Else
' This is not a file - do something else or issue error
End If
End IF
End If
Well, the code above is full of if-then constructions but in normal case you can strip most of them - we just want to show what can be done with a field from the upload. In fact the file manipulation is outside the CUpload and we can do something else with certain field instead of just saving it as a file - which is the topic for the next article.
upload1.zip - This article, the CUpload class in an ASP include file and a sample page to demonstrate it in action.
ActiveX pack1 - The sample code uses the storages and files components from it. If you have ALP 1.1 beta or later installed the pack is already on your machine.
Copyright newObjects and Michael Palazov - Elfial 2002