Gigabit File uploads Over HTTP
Please see the NODE.js version of this article.
Updated version that uses NGINX reverse proxy and asynchronous code
Large file uploads how do I love thee? Let me count the ways. These days having to deal with large file uploads is rather common place in IT environments, and by large file uploads I mean files that are over 500 MB in size. Sure we have replication technologies that can help us to keep our data in sync, but there is still that need to move large amounts of data on a regular basis.
Most of the times
that I had to move anything over 500 MB in size I would typically split the file
into several smaller files using a file compression utility and then upload these
smaller files via ftp or secure shell (ssh). Once all of the smaller files have been
uploaded I would use the file compression to recreate the original file. However
that required the setup of an FTP server or a Secure Shell server and the use
of a third party file compression utility.
So I asked "With the prevalence of web browsers and web servers in the IT
environment could I accomplish the same
thing using the HTTP protocol?"
A quick search on the Internet showed that web servers will
typically limit the maximum file size that can be uploaded to anywhere between
2 GB and 4 GB, and in addition to that most web browsers will allow you to upload
around 2 GB. I suppose the reason for that is that the Content-Length header is being
treated as a signed 32-bit integer in the web browser and the maximum size of a
signed 32-bit integer is 2,147,483,647.
As I continued to search I began to look at the HTML 5
specification and the APIs that are a
part of that specification because the indication was that these new APIs would
allow me to upload files greater than 2 GB in size over the HTTP protocol. I
also came across examples of code on the Internet that indicated what could be
done, but not a complete example of how it could be done.
A probable
solution?
The HTML 5 API Specifications include several pieces of
interesting technologies that have been implemented in modern web browsers and these include:
- The File API specification - This specification provides a standard way for interacting with local files on my computer. The specification allows me to place multiple files from my computer into an array and also allows me slice a file into byte ranges.
- The Web Worker API specification - This specification allows me to spawn background scripts that run in parallel with the main page. Think threading in JavaScript. The benefit of this is that I can pass off a long running task to a web worker without blocking the UI on my main page or other scripts.
- The XMLHttpRequest Level 2 specification - This specification provides new capabilities for transferring data between the web browser and the web server. The capability that I am most interested in is to be able to send data to the server in a variety of formats. This will allow me to upload data to the server using FormData objects.
By combining these technologies together I will be able to
upload files greater than 2GB to a web server by:
- Using the File API specification to select one or multiple files in the web browser.
- Passing the selected file or files to one or more web worker that is dedicated to processing the file.
- In the web worker that is dedicated to processing the file we will use the File API specification to slice the file into chunks . This would be similar to splitting a large file into smaller files using a file compression utility. We will use the SparkMD5 library to generate the MD5 checksum for the file and then send the chunk back to the main page (parent page form now on). Note that generating the MD5 checksum for the file and uploading the file should happen in parallel.
- In the parent page we will send the chunk to a web worker that is dedicated to the task of sending the file chunks to the web server.
- In the web worker that is dedicated to sending the file chunks to the web server we will create a pool of XMLHttpRequest Level 2 objects that will use FormData objects to POST the file chunk to the web server as multipart/form-data. In the case where the web browser does not support the FormData object we will manually create the multipart/form-data request and convert the file chunk to a base64 string.
- At the web server we will receive each chunk from the client and save each chunk as an individual file. The web server will continue to do this until it receives a command to merge all of the individual chunks into a single file. Once the file is created on the web server it will send a success message to the web browser that includes the MD5 checksum of the saved file.
Let's Get
Started
Now that we have identified how we are going to go about
implementing large file uploads over the HTTP protocol let's get to work. However
before we write any code we need to download the jQuery 2.x library from http://code.jquery.com/jquery-2.1.1.min.js and the SparkMD5 from https://github.com/satazor/SparkMD5/archive/master.zip.
The jQuery 2.x library will be used to provide client side JavaScript
functionality and the SparkMD5 library provides an MD5 algorithm, written in
JavaScript, that will be used to generate an MD5 checksum for local files on
the client side.
We will be using Visual Studio Express 2013 for Web as the
development environment and we start off by creating a new project and then select
an ASP.NET Web Application.
Figure 1 Create New Project
In the next screen we will select the template for an empty project and select
the option to add folders and core references for a Web API project.
Figure 2 Add template for Web API Project
After our project has been created go to the Solution
Explorer and add a new folder and name it Scripts. We will place the jQuery 2.x
and the SparkMD5 libraries in the Scripts folder.
Figure 3 Add required libraries
The Client
Side
The next stage is to actually start creating the file upload
application, and since this is a web based application we need to have a client
side application and a server side application. The client side application
will be a browser based application that should run in the latest versions of
all major web browsers.
We will add a new HTML file to our protect and name it
"Defaul.html". We will set "Default.html" as the startup page in our project. We are
going to use the
"Default.html" page to achieve a few things and these are:
- The Default.html page will host the HTML form that provides the standard <input type="file"> element that will be used to select the files from the local file system.
- The Default.html page will provide status information on the file upload progress using the XMLHttpRequest object upload event.
- The Default.html page will display a table that shows the checksum of the file that is to be uploaded on the local file system, and checksum of the file after it has been uploaded on the remote system.
- The Default.html page will display any error messages that occur either on the client side or on the server side.
Figure 4 Add Default.html page
The following figure shows the Default.html page being
displayed in the Chrome web browser. Please note that I have used the <div>
and <fieldset> tags to create the layout that is shown on the page. I
don't believe that what I did with the <fieldset> tag is necessarily the standard way of using this
tag, but it achieved the result that I wanted.
Figure 5 The CelerFT User Interface
The HTML code shows that all that we have is a simple HTML
form that has a file input element and
this element has a multiple option that allows us to select any number of files
from the local file system to be uploaded.
<fieldset
class="top">
<ol style="list-style:none">
<li
class="form">
Please enter the name for the destination
directory:
<li />
<li class="form">
Please select up to 5 files to upload:
</li>
<li
class="form">
Upload files greater than 500MB in size using asynchronous mode:
</li>
<li
class="form">
</li>
<li
class="form">
Select the bytes per chunk:
</li>
<li
class="form">
<input
type="button" id="cancel_workers" value="Cancel
Uploads" style="float:right;margin-left: 10px" />
</li>
</ol>
</fieldset>
As an added measure we will use the jQuery $(document).ready(function (){}); to check that the browser supports all of the HTML
5 API specifications that we need to use in the file upload tool.
<!-- Check
browser compatability -->
<script type="text/javascript">
if
(!(window.File && window.Blob && window.FormData)) {
alert("Please upgrade your browser to one that supports
the HTML5 file api.");
return;
}
if
(typeof Worker == "undefined") {
alert("Please upgrade your browser to one that supports
the HTML5 Webworker
api.");
return;
}
});
</script>
A click event is tied to the "Upload File" button
and when it is fired some basic checks are done to ensure that we have provided
the mandatory input that the form requires. Once all of the validation checks
are completed two sets of web workers are created.
The first set of web worker that is created is the one that
will be used to upload the file chunks to the web server. The upload worker is
created by calling the Worker() constructor and providing the URI of the script
that is to be executed in the worker thread. In this instance the URI is "Scripts/workeruploadchunk.js".
In addition
the upload worker's onmessage property is set so that the parent page can
receive notifications from the upload worker. The parent page and the web
worker communicates by passing messages. The types of messages that can be sent
between the upload worker and the parent page are "checksum",
"status", "progress",
and "error" messages, and when the
parent page receives any of these message types it will update the appropriate
element on the parent page. This is because the upload worker does not have
access to the DOM of the parent page .
The onerror property
of the upload worker is set to write all errors to the Error Messages area of
the Default.html page. All of the upload workers are pushed into an array after
they have been created.
//
Create the upload workers
var uploadworker = new
Worker("Scripts/workeruploadchunk.js");
uploadworker.onmessage = function
(e) {
switch (data.type) {
// Update the checksum
value in the table
case 'checksum':
break;
// Update legend of the
progressbar
case 'status':
$("#file_name" +
(data.message);
break;
// Update the progressbar itself
case 'progress':
break;
//
Show error messages and stop the worker
case 'error':
'</p>');
workers[j].terminate();
}
uploadworkers[i].terminate();
}
break;
}
}
uploadworkers.push(uploadworker);
}
The creation of the web worker that will carve the file into
chunks and create the local MD5 checksum of the file is very similar to the
creation of the upload worker. The only difference is that the URI for the web
worker is "Scripts/workerprocessfile.js"
.
In addition to this the web worker passes additional message types to the
parent page and these are the "upload"
and "merge" messages. So when the parent page receives an upload message from the file
processing worker it does not change any element in the DOM, instead it passes
the data that is sent in the message to the upload worker.
The upload worker will then send file chunk
to web server. Similarly when the file processing worker has creaked the last
file chunk it will send a merge message to the parent page, and the parent page
will send this message to the upload
worker.
This message passing system between the file processing worker
and the upload worker is based on the fact that when the worker is created it
is assigned an ID ranging from 0 - 4. This ID is used to associate a file
processing worker with an upload worker. So file processing worker with ID 0
will be associated with the upload worker with ID 0. Along with this each
element of the page that is to be updated by a message received from the web worker has an id that ends in a number.
So "progressbar0" refers to the
progress bar that will be updated by upload worker with ID 0.
//
Create the file processing web worker
var file = files[j];
// Update the
table with the file name
// Create the
file processing web worker
var worker = new
Worker("Scripts/workerprocessfile.js");
worker.onmessage = function
(e) {
switch (data.type) {
// Update the checksum value in the table
case
'checksum':
break;
// Update legend of the progressbar
case 'status':
break;
// Update the progressbar itself
case
'progress':
break;
// Send and uplaod message to the web worker
case
'upload':
break;
// Send a
merge message to the web worker
case 'merge':
break;
// Show error messages and stop the web worker
case 'error':
workers[j].terminate();
}
uploadworkers[i].terminate();
}
break;
}
}
// Check if we
are going to do an async upload of a large file
var
uploadlargfileasync = false;
if
($('#select_asyncstate').prop('checked')) {
uploadlargfileasync = true;
}
// Start the file
upload by sending a postMessage to the file processing web worker
worker.postMessage({ 'cmd': 'start',
'files': file, 'directory': $("#select_directory").val(),
uploadlargfileasync, 'id': j });
workers.push(worker);
}
The parent page starts the file upload by sending a "postMessage" to the file processing web worker.
The Web
Workers
In the "Scripts/workerprocessfile.js"
file the "self.onmessage" event is
triggered when the file processing web worker receives a message from the
parent page. In the "self.onmessge" function
the message payload sent by the parent page is accessible to the web worker in
the Event.data. There are several methods for passing data between the parent
page and the web worker and the one that will be used in the file upload tool
is to copy the data between the parent page and the web worker.
In the file processing web worker the file chunks are
created by using the "slice" method of the
File API specification. The "slice"
method is used to create a blob that containing the data in the specified range
of bytes from the source file. Once a file chunk is created the chunk is passed
back to the parent page via a "upload"
message.
In the file processing worker we keep track of the number of
file chunks that are created and uploaded by using a "chunkCount"
JSON array. The "chunkCount" array keeps track
of the total number of file chunks that will be created, the current file chunk,
the number of file chunks that have been uploaded, and the time at which we
started to process the file. When the number of uploaded file chunks is equal
to the total number of chunks the file processing web worker sends a "merge" message to the parent page.
//
This worker is used to split the file into chunks,
//
and calculate the file checksum.
//
Each chunk is sent back to the parent page to be
//
uploaded by another worker
//
Import additional scripts
//
MD5 checksum libraray https://github.com/satazor/SparkMD5
importScripts('/Scripts/spark-md5.js');
//
Global variables
//
Note IE 10 does not recognize the const declaration so we have to use var
var
LARGE_FILE = 500 * 1024 * 1024;
var
workerdata = '';
var
asyncstate = true;
//
1MB chunk sizes. The default
var
BYTES_PER_CHUNK = 1 * 1024 * 1024;
//
This function is used to read the file, calculate the checksum,
//
and send the file chunk to the web worker that uploads the file chunk
function
processFile(blob) {
//
Size of the file
var SIZE = blob.size;
//
The total number of file chunks
var
Total_Number_of_Chunks = Math.ceil(blob.size / BYTES_PER_CHUNK);
//
Array used to hold the total number of chunks, the number of chunks that have
been uploaded,
// and the current chunk.
This information is sent to the web worker that uploads the file chunks
var chunkCount = {
currentNumber: 1,
numberOfChunks:
Total_Number_of_Chunks,
numberOfUploadedChunks: 0,
starttime: new
Date()
};
var start = 0;
var end =
BYTES_PER_CHUNK;
var fileReader = new
FileReaderSync();
var spark = new
SparkMD5.ArrayBuffer();
while (start < SIZE)
{
var chunk
= blob.slice(start, end);
//
Read the chunk into another variable to calculate the checksum
var chunk1
= fileReader.readAsArrayBuffer(chunk);
spark.append(chunk1);
// Send the chunk back to the parent
self.postMessage({ 'type': 'upload', 'filename': blob.name, 'blob': chunk,
'chunkCount':
chunkCount, 'asyncstate':
asyncstate,'id': workerdata.id });
chunkCount.currentNumber++;
chunkCount.numberOfUploadedChunks++;
start = end;
end = start +
BYTES_PER_CHUNK;
if
(chunkCount.numberOfUploadedChunks == chunkCount.numberOfChunks) {
// All done calculate the checksum
var
md5hash = spark.end();
self.postMessage({ 'type': 'checksum', 'message': md5hash.toUpperCase(), 'id':
workerdata.id
});
// Merge the file on the remote server
self.postMessage({ 'type': 'merge', 'filename': blob.name, 'chunkCount':
chunkCount, 'id':
workerdata.id });
}
}
}
// This is
where we start.
// The parent sends us the file as a part
of the data
self.onmessage = function (e) {
workerdata = e.data;
//
If we have an id greater than 5 then we abort. We upload five files at a time.
if (workerdata.id >
5) {
self.postMessage({
'type': 'error', 'message': "We can only upload five files at a
time.", 'id':
workerdata.id });
return;
}
//
If we have a large file we will use a synchronous upload by default.
// Large file is greater
than 500GB
if
(workerdata.files.size > LARGE_FILE &&
workerdata.uploadlargfileasync == false) {
asyncstate = false;
}
//
Configure the bytes per chunk.
// The default is 1MB
switch (workerdata.bytesperchunk)
{
case
'50MB':
BYTES_PER_CHUNK = 50 * 1024 * 1024;
break;
case
'20MB':
BYTES_PER_CHUNK = 20 * 1024 * 1024;
break;
case
'10MB':
BYTES_PER_CHUNK = 10 * 1024 * 1024;
break;
case
'5MB':
BYTES_PER_CHUNK = 5 * 1024 * 1024;
break;
case
'2MB':
BYTES_PER_CHUNK = 2 * 1024 * 1024;
break;
case
'1MB':
BYTES_PER_CHUNK = 1 * 1024 * 1024;
break;
case
'500K':
BYTES_PER_CHUNK = 500 * 1024;
break;
case
'256K':
BYTES_PER_CHUNK = 256 * 1024;
case
'128K':
BYTES_PER_CHUNK = 128 * 1024;
break;
case
'64K':
BYTES_PER_CHUNK = 64 * 1024;
break;
default:
BYTES_PER_CHUNK = 1024 * 1024;
}
//
Process the file for uploading
// Send a status message to the parent page
self.postMessage({ 'type':
'status', 'message': "Uploading file " + workerdata.files.name, 'id':
workerdata.id });
// Start processing
the file
processFile(workerdata.files);
}
The upload worker is
started when an upload message is received from the file processing worker in
the Default.html page.
case 'upload':
uploadworkers[data.id].postMessage({ 'chunk': data.blob, 'filename': data.filename, 'directory':
$("#select_directory").val(),
'chunkCount': data.chunkCount,
'asyncstate': data.asyncstate,
'id': data.id });
The upload worker receives this message in the "self.onmessage" event. On receipt of the
message the upload worker creates six XMLHttpRequest objects in the
XMLHttpRequest worker pool . It then post a file chunk to each of the UploadChunk
urls on the web server. The UploadChunk urls are REST based urls.
The upload function is the heart of the upload worker
script. In this function we get an XMLHttpRequest object from the worker pool
and assign events to the "upload.onprogress", and the "onreadystatechange"
events of the XMLHttpRequest object.
The "upload.onprogress" events allows us to send file upload progress
messages back to the parent page. This is how the progress bar is updated.
The "onreadystatechange"
event is fired during communication with the web server. The state that we are
most interested in is state 4 which is fired when the request finished and response is ready.
This allows us to get the HTTP status code that is returned from the web
server.
The file chunk is uploaded to the web server using multipart/form-data encoding, and this is
supported by the FormData object. However if the browser does not support the
FormData object we use the "buildFormData"
function to manually create the multipart/form-data encoded data by base64
encoding the file chunk. This is done by
creating a FileReader object and
converting the file chunk into a bse64 encoded string by using the "readAsDataURL" method of the FileReader
object. The base64 string is extracted from the "data:URL"
using a regular expression. It should
note that some web browsers do not support the asynchronous version of the
FileReader object in a web worker and as such we have to use the synchronous
version of the FileReader object which is created by the FileReaderSync constructor.
//
This worker is used to upload the file chunks to the backend server.
//
The file chunks is sent to the worker by the parent page
//
Url for WebAPI functions
var
webapiUrl = "/api/CelerFTFileUpload/UploadChunk";
var
webapiGetMergeAllUrl = "/api/CelerFTFileUpload/MergeAll";
//
Global variables
//
Note IE 10 does not recognize the const declaration so we have to use var
var LARGE_FILE = 500 * 1024 * 1024;
var workerdata = '';
var blocks = [];
var xhrworkerspool = '';
var totalnumberofchunks = 0;
var uploadedchunk = 0;
var urlcount = 0;
// Function
used to create the XMLHttpRequest worker pool
function XHRWorkerPool(numberofxhr) {
this.xhrworkerpool
= [];
for
(var i = 0; i < numberofxhr; i++) {
var xhr = new XMLHttpRequest();
this.xhrworkerpool.push(xhr);
}
}
XHRWorkerPool.prototype.getWorker = function () {
var
xhr;
xhr = this.xhrworkerpool.pop();
}
else
{
xhr = new XMLHttpRequest();
}
return
xhr;
}
XHRWorkerPool.prototype.releaseWorker = function (xhr) {
this.xhrworkerpool.push(xhr);
}
XHRWorkerPool.prototype.terminateWorkers = function () {
this.xhrworkerpool[i].abort();
}
}
// Function used to creae the multipart/form-data in
browsers
//
that don't support Formdata
function
buildFormData(chunk) {
// Transform the data into a base64 string
var
reader = new FileReaderSync();
var
dataUrl = reader.readAsDataURL(chunk);
var
chunkdata = dataUrl.match(/,(.*)$/)[1];
// Create the form request
// Hard code the boundary
var
boundary = '----12345678wertysdfg';
// We start a new part in our body's request
// We provide the mime type of
the file. In this case it is text for base64 encoded file
// There is always a blank line between the meta-data and
the data
// We append the binary data to our body's request
// Once we are done, we "close" the body's
request
reader = null;
}
// Function used to send the
request to the server to merge the file chunks
//
into one file
function
mergeall(filename, chunkCount) {
var
xhr = new XMLHttpRequest();
xhr.onreadystatechange = function (e) {
if (this.readyState == 4 && this.status
== 200) {
// Update the UI
with the information that we have finished the file upload, and indicate
the time taken
// Update the UI with the
remote file checksum
if (chunkCount.numberOfUploadedChunks ==
chunkCount.numberOfChunks) {
var endtime = new
Date();
var timetaken = new
Date();
var timetaken =
(((endtime.getTime() - chunkCount.starttime.getTime()) / 1000) / 60);
var md5hash = this.responseText.split(",");
self.postMessage({ 'type': 'status',
'message': filename + " uploaded succesfully. It
took
" + timetaken.toFixed(2) + " minutes to upload.", 'id':
workerdata.id });
self.postMessage({ 'type':
'checksum', 'message': md5hash[1], 'id': workerdata.id });
}
}
// A 400 message indicates that we
can't merge all of the files as yet.
// So queue up the merge
request to run in 5 seconds
if (this.readyState == 4 && this.status
== 400) {
setTimeout(function () { mergeall(filename,
chunkCount); }, 5000);
}
};
// Send the request to merge the file
xhr.open('GET', webapiGetMergeAllUrl + '/?filename=' + filename +
'&directoryname=' +
workerdata.directory
+ '&numberOfChunks=' + chunkCount.numberOfChunks, false);
xhr.send(null);
xhr
= null;
}
// Function used to upload the file chunks
function
upload(chunk, filename, chunkCount, uploadurl, asyncstate) {
// Grab a worker from the pool
var
xhr = xhrworkerspool.getWorker();
// xhr.upload causes an error in IE. Use the try catch
block to
// catch the failure in IE, and then upload the
progress block in
// the catch routine.
try
{
if (asyncstate == true) {
xhr.upload.onprogress = function (e) {
//var
progress = parseInt((e.loaded * 100 / e.total), 10);
var progress =
parseInt((chunkCount.currentNumber * 100 / chunkCount.numberOfChunks),
10);
self.postMessage({ 'type':
'progress', 'percentage': progress, 'id': workerdata.id });
}(chunkCount);
}
}
catch
(e) {
xhr.onprogress = function (e) {
//var progress =
parseInt((e.loaded * 100 / e.total), 10);
var progress =
parseInt((chunkCount.currentNumber * 100 / chunkCount.numberOfChunks), 10);
self.postMessage({ 'type': 'progress',
'percentage': progress, 'id': workerdata.id });
}(chunkCount);
}
xhr.onreadystatechange = function (e) {
if (this.readyState == 4 && this.status
== 200) {
// Send back progess
information for synchronous uploads
// The upload.onprogress
method only fires on asynchornous uploads
// and we are doing
synchronous uploads
if (asyncstate == false) {
var progress =
parseInt((chunkCount.currentNumber * 100 / chunkCount.numberOfChunks),
10);
self.postMessage({ 'type':
'progress', 'percentage': progress, 'id': workerdata.id });
}
}
if (this.readyState == 4 && this.status
== 415) {
// Tried to upload
file that is not multipart/form-data.
// End the upload
self.postMessage({ 'type': 'error', 'message':
"Upload Error: " + this.responseText, 'id':
workerdata.id
});
}
if (this.readyState == 4 && this.status
== 413) {
// Tried to upload
file that is greater than the maximum file size.
// End the upload
self.postMessage({ 'type': 'error', 'message':
"Upload Error: " + this.responseText, 'id':
workerdata.id
});
}
if (this.readyState == 4 && this.status
== 500) {
// Fatal error
occured on the server side
// Send the error message
and end the webworker
self.postMessage({ 'type': 'error', 'message':
"Server Error: " + this.responseText, 'id':
workerdata.id
});
}
};
// Open the url and upload the file chunk
xhr.open('POST', uploadurl + '?filename=' + filename + '&directoryname=' +
workerdata.directory +
'&chunkNumber='
+ chunkCount.currentNumber + '&numberOfChunks=' +
chunkCount.numberOfChunks,
asyncstate);
var
formData = '';
if
(typeof FormData == "undefined") {
// The browser does not support
the FormData object.
// We will manually create the from
formData = buildFormData(chunk);
// Create the form with
appropriate header
xhr.setRequestHeader("Content-Type",
"multipart/form-data; boundary=----12345678wertysdfg");
xhr.setRequestHeader("CelerFT-Encoded", "base64");
}
else
{
// Browser supports the Formdata
object
// Create the form
formData = new FormData();
}
// Send the form
xhr.send(formData);
formData = null;
xhrworkerspool.releaseWorker(xhr);
}
// This is where we start
//
The upload information is sent as the message payload
//
in e.data
self.onmessage = function
(e) {
// Create the xhr upload workers.
// We will upload to multiple urls
xhrworkerspool = new XHRWorkerPool(6);
if
(workerdata.chunk != null) {
if (urlcount >= 6) {
urlcount = 0;
}
if (urlcount == 0) {
uploadurl = webapiUrl;
}
else {
uploadurl = webapiUrl + urlcount;
}
upload(workerdata.chunk, workerdata.filename,
workerdata.chunkCount, uploadurl,
workerdata.asyncstate);
urlcount++;
}
else
{
mergeall(workerdata.filename, workerdata.chunkCount);
}
}
The Server Side
The web server side of the file uploader tool is based on
ASP.NET Web API version 2.x. The ASP.NET Web API version 2.x is a RESTFUL based
API and we will be using action based routing to call the methods exposed by
the backend web server. This requires that we edit the"
WebApiConfig.cs" file and an {action} in the
routing template.
Figure 6 Modify the WebApiConfig.cs file to add action
parameter
In the Solution Explorer we are going to add a Controller
that will accept the file uploads. The Controller is named
CelerFTFileUploadController.
Figure 7 Add the CelerFTFileUpload Controller
The CelerFTFileUploadController exposes several methods.
The "UploadChunk" are POST methods and accept the file
uploads. The reason for using multiple "UploadChunk"
POST methods is so that we can write to all of the methods in parallel.
[System.Web.Http.HttpPost]
public async Task<HttpResponseMessage> UploadChunk(string
filename, string directoryname, int chunknumber, int
numberofChunks)
{
HttpResponseMessage returnmessage = await ProcessChunk(filename,
directoryname,
chunknumber, numberofChunks);
return returnmessage;
}
The "UploadChunk" POST methods
implement the same functionality and this is defined in the "ProcessChunk"
method. The "ProcessChunk" method checks to see if we are uploading
multipart/form-data encoded data, and that we are not trying to upload a file
chunk that is greater than 50MB.
Once we have passed those checks the next check is to
look for the "CelerFT-Encoded" header. If this header is
present it means that we are uploading the file chunk as a base64 encoded
string. This means that we have to extract the base64 encoded string from the
multipart/from-data encoded data and convert it to byte array using the "Convert.FromBase64String"
method.
If the "CelerFT-Encoded" header is
not present then we read the uploaded data into a byte array. Once we have read
the file chunk into a byte array we save the byte array to file in a temporary
directory. The temporary directory that is created is based on the name of the
file.
Each file chunk is saved as a separate file and we pad
the file names so that we can give each file a sequential number. This is
important because when we try to merge the file chunks into a single file if
the names of the file are not in the right order then we run the risk of
creating a corrupt file. The sequential number is created by passing the "chunknumber"
to the "UploadChunk" method and appending the "chunknumber"
to the file name.
private async
Task<HttpResponseMessage> ProcessChunk(string filename, string
directoryname, int chunknumber, int numberofChunks)
{
// Check if the
request contains multipart/form-data.
if
(!Request.Content.IsMimeMultipartContent())
{
}
// Check that we are
not trying to upload a file greater than 50MB
Int32 maxinputlength = 51 * 1024 * 1024;
if (Convert.ToInt32(HttpContext.Current.Request.InputStream.Length)
> maxinputlength)
{
return
Request.CreateErrorResponse(HttpStatusCode.RequestEntityTooLarge, "Maximum
upload
chunk size exceeded");
}
try
{
byte[] filedata = null;
// If
we have the custom header then we are processing hand
made multipart/form-data
if
(HttpContext.Current.Request.Headers["CelerFT-Encoded"] != null)
{
// Read in the request
HttpPostedFileWrapper(HttpContext.Current.Request.Files["Slice"]);
if (base64file
== null)
{
return
Request.CreateErrorResponse(HttpStatusCode.BadRequest, "No file chunk
uploaded");
}
// Convert the base64 string into a byte
array
await
base64file.InputStream.ReadAsync(base64filedata, 0,
Convert.ToInt32(HttpContext.Current.Request.InputStream.Length));
var base64string =
System.Text.UTF8Encoding.UTF8.GetString(base64filedata);
filedata =
Convert.FromBase64String(base64string);
}
else
{
HttpPostedFileWrapper(HttpContext.Current.Request.Files["Slice"]);
if (file == null)
{
return
Request.CreateErrorResponse(HttpStatusCode.BadRequest, "No file chunk
uploaded");
}
await
file.InputStream.ReadAsync(filedata, 0,
Convert.ToInt32(HttpContext.Current.Request.InputStream.Length));
}
if (filedata == null)
{
return
Request.CreateErrorResponse(HttpStatusCode.BadRequest, "No file chunk
uploaded");
}
// Write the
byte array to a file
var newfilename =
filename.Split('.');
string baseFilename = Path.GetFileNameWithoutExtension(filename);
string extension
= Path.GetExtension(filename);
string tempdirectoryname =
Path.GetFileNameWithoutExtension(filename);
var localFilePath = getFileFolder(directoryname
+ "\\" + tempdirectoryname) + "\\" +
baseFilename + "." + chunknumber.ToString().PadLeft(16,
Convert.ToChar("0")) + "." +
extension + ".tmp";
var outputFile = File.Open(localFilePath, FileMode.OpenOrCreate, FileAccess.Write,
FileShare.Read);
await input.CopyToAsync(outputFile);
input.Close();
outputFile.Close();
filedata = null;
{
StatusCode =
HttpStatusCode.OK
};
}
catch (Exception ex)
{
return
Request.CreateErrorResponse(HttpStatusCode.InternalServerError, ex);
}
}
The "MergeAll" is a GET method and
is used to merge all of the file chunks that are saved in a temporary folder
into the file.
[System.Web.Http.HttpGet]
public
HttpResponseMessage MergeAll(string filename, string
directoryname, int numberofChunks)
{
string tempdirectoryname =
Path.GetFileNameWithoutExtension(filename);
var localFilePath = getFileFolder(directoryname +
"\\" + tempdirectoryname) + "\\";
string baseFilename =
Path.GetFileNameWithoutExtension(filename);
string extension =
Path.GetExtension(filename);
// If the number
of uploaded files is less than the total number of files then
// return an error. This will
happen in asynchronous file uploads where the final
// chunk arrives before other
chunks
if (diSource.GetFiles("*.tmp").Length
!= numberofChunks)
{
return
Request.CreateErrorResponse(HttpStatusCode.BadRequest, "Number
of file chunks
less than total count");
}
FileMode.OpenOrCreate, FileAccess.Write);
try
{
// Get all of
the file chunks in the directory and use them to create the final file.
// All of the file chunks are
named in sequential order.
foreach (FileInfo fiPart in
diSource.GetFiles("*.tmp")) {
byte[] filedata
= System.IO.File.ReadAllBytes(fiPart.FullName);
outputFile.Write(filedata, 0, filedata.Length);
File.Delete(fiPart.FullName);
}
outputFile.Flush();
outputFile.Close();
// Move the file to the
top level directory
string oldfilelocation =
localFilePath + baseFilename + extension;
string newfilelocation =
getFileFolder(directoryname + "\\") + baseFilename +
extension;
//
Check if the file exists. If it does delete it then move the file
if(System.IO.File.Exists(newfilelocation))
{
System.IO.File.Delete(newfilelocation);
}
System.IO.File.Move(oldfilelocation,
newfilelocation);
//
Delete the temporary directory
System.IO.Directory.Delete(localFilePath);
// Get
the MD5 hash for the file and send it back to the client
//string
checksumMd5 = GetHashFromFile(localFilePath + baseFilename + extension, MD5);
string checksumMd5 =
GetHashFromFile(newfilelocation, MD5);
{
checksumMd5),
StatusCode =
HttpStatusCode.OK
};
}
catch (Exception ex)
{
return Request.CreateErrorResponse(HttpStatusCode.InternalServerError,
ex);
}
}
The Web.Config
The web.config file of the web server has to be edited to
support large file uploads.
<httpRuntime maxRequestLength="2147483647"
requestLengthDiskThreshold="10240" targetFramework="4.5"
/>
<asp
enableChunkedEncoding="true"/>
<security>
<requestFiltering>
<requestLimits maxAllowedContentLength="2147483648" />
</requestFiltering>
</security>
</system.webServer>
The Test Results
I have carried out several tests with the file upload
tool. It took around 3 minutes to send a 975 MB file across a 100 MB WAN link to
a remote server. I have also been able to upload five 9 GB files in parallel
across a 100 MB WAN link in 135 minutes.
The default method of uploading a large file is to synchronously POST the file chunks to the file upload methods. In testing I have found that when I tried to do an asynchronous upload of a file of around 2 GB in size, using the default chunk size of 1 MB, it would result in the browser crashing. However when I used a bigger chunk size, of say 20 MB, I could do an asynchronous upload of the 2 GB file.
So the lesson learnt is that a synchronous upload of gigabit sized file will work, however if you are going to attempt an asynchronous upload use a chunk size of 20 MB or greater.
The default method of uploading a large file is to synchronously POST the file chunks to the file upload methods. In testing I have found that when I tried to do an asynchronous upload of a file of around 2 GB in size, using the default chunk size of 1 MB, it would result in the browser crashing. However when I used a bigger chunk size, of say 20 MB, I could do an asynchronous upload of the 2 GB file.
So the lesson learnt is that a synchronous upload of gigabit sized file will work, however if you are going to attempt an asynchronous upload use a chunk size of 20 MB or greater.
Figure 8 File upload testing with CelerFT
Please test and let me know your results. The complete
source code for the CelerFT tool can be found at this GitHub repository https://github.com/adminastoolsmith/CelerFToverHTTP.
No comments:
Post a Comment