The Admin As A Toolsmith: Gigabit File uploads Over HTTP

Gigabit File uploads Over HTTP - The Node.js with NGINX version

Please see the ASP .Net version of the article. It provides background information that might not be covered here.

Please see the original NODE.js version of this article.

One of the things that we wanted to do after blogging about Gigabit File uploads with Node.js was to see how we could improve the performance of the application. In the previous version of the application the code that was written was mostly synchronous and as a result of that we had high CPU usage, did quite a lot of I/O, and used up a fair amount of memory. All in all what was created had more to do with demonstrating the concept of how to do the Gigabit File uploads over HTTP rather than for performance.

Now that we have established the concept it is now time to see how the application's performance can be improved.

The Performance Tuning

The areas that we want to look at to address the Gigabit File upload performance are:

Implementing a reverse proxy server in front of the Node.js server.
Offloading the file upload requests to the reverse proxy.
Converting the MergeAll blocking synchronous code to non-blocking asynchronous code
Creating an API for each backend request. As it is now the UploadChunk API call is used to manage all uploads.
Removing the checksum calculation from the MergeAll API call. A GetChecksum API will be created to calculate the checksum of the uploaded file.

The performance testing was conducted on a Centos 7 virtual machine running NGINX version 1.9.9. and Node.js version 5.3.0. This is a departure from our previous blog post, because that work was done on a Windows 2012 platform.

The Reverse Proxy

Node.js allows you to build fast, scalable network applications capable of handling a huge number of simultaneous connections with high throughput. This means that from the very start Node.js is quite capable of handling the Gigabit File uploads.

So why would we want to use a reverse proxy in front of our Node.js server in this scenario? We want to do this because offloading the file handling to the NGINX web server will reduce the overhead on the Node.js backend and this should provide a performance boost. The following figure shows how this is achieved.

Figure 1 Offloading file upload to NGINX reverse proxy

The client computer uploads the file chunks by calling the XFileName API. Once the NGINX reverse proxy sees a call to /api/CelerFTFileUpload/UploadChunk/XFileName it will save the file chunk to the NGINX private temporary directory, because we have enabled the NGINX client_body_in_file_only directive. The NGINX private temporary directory can be found under /tmp. This happens because in the NGINX systemd file the PrivateTmp configuration option is set to true. Please consult the systemd man pages for more information on the PrivateTmp configuration option.
After the file chunk has been saved NGINX will set the X-File-Name header with the name of the file chunk. This will be sent to Node.js.
Once all of the file chunks have been uploaded the client calls the MergeAll API and this is sent directly to Node.js by NGINX. Once Node.js receives the MergeAll request it will merge all of the uploaded file chunks to create the file.
Once Node.js receives the X-File-Name header it will move the file chunk from the NGINX private temporary directory and save it to the file upload directory with the correct name.

We used the following NGINX configuration:

# redirect CelerFT

location = /api/CelerFTFileUpload/UploadChunk/XFileName {

aio on;

directio 10M;

client_body_temp_path /tmp/nginx 1;

client_body_in_file_only on;

client_body_buffer_size 10M;

client_max_body_size 60M;

proxy_pass_request_headers on;

proxy_set_body off;

proxy_redirect off;

proxy_ignore_client_abort on;

proxy_http_version 1.1;

proxy_set_header Connection "";

proxy_set_header Host $host;

##proxy_set_header Host $http_host;

proxy_set_header X-Real-IP $remote_addr;

proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

proxy_set_header X-Forwarded-Proto $scheme;

proxy_set_header X-File-Name $request_body_file;

proxy_pass http://127.0.0.1:1337;

# proxy_redirect default;

proxy_connect_timeout 600;

proxy_send_timeout 600;

proxy_read_timeout 600;

send_timeout 600;

access_log off;

error_log /var/log/nginx/nginx.upload.error.log;

}

The key parameter is the X-File-Name header which is set to the name of the file. The Node.js backend has to then process the individual chunks. The crucial part of the code is to find out where the NGINX private temporary directory is created, because this is where NGINX will write the file chunks. Under systemd the NGINX private temporary directory will have a different name each time NGINX is restarted and so we have to get the name of that directory before we can move the file chunk to the final destination.

app.post('*/api/CelerFTFileUpload/UploadChunk/XFileName*', function (request, response) {

// Check if we uploading using a x-file-header

// This means that we have offloaded the file upload to the

// web server (NGINX) and we are sending up the path to the actual

// file in the header. The file chunk will not be in the body

// of the request

if (request.headers['x-file-name']) {

// Temporary location of our uploaded file

// Nginx uses a private file path in /tmp on Centos

// we need to get the name of that path

var temp_dir = fs.readdirSync('/tmp');

var nginx_temp_dir = [];

for (var i = 0; i < temp_dir.length; i++) {

if (temp_dir[i].match('nginx.service')) {

nginx_temp_dir.push(temp_dir[i]);

}

var temp_path = '/tmp/' + nginx_temp_dir[0] + request.headers['x-file-name'];

fs.move(temp_path , response.locals.localfilepath, {}, function (err) {

if (err) {

response.status(500).send(err);

return;

}

// Send back a sucessful response with the file name

response.status(200).send(response.locals.localfilepath);

response.end();

});

}

});

The MergeAll Asynchronous API

In the previous blog post we used the fs.readdirSync and the fs.readfileSync function calls quite extensively. The fs.readdirSync was called each time we needed to check whether or not we had uploaded all of the file chunks. The fs.readfileSync was called when we merged all of the uploaded file chunks to create the file.

Each of those function calls are synchronous calls and caused the MergeAll API to block each time they had to be called.

The getfilesWithExtensionName function that was being called in the MergeAll API was replaced with a fs.readdir function call that is used to check that we have uploaded all of the file chunks.

The getfilesWithExtensionName function.

function getfilesWithExtensionName(dir, ext) {

var matchingfiles = [];

if (fs.ensureDirSync(dir)) {

return matchingfiles;

}

var files = fs.readdirSync(dir);

for (var i = 0; i < files.length; i++) {

if (path.extname(files[i]) === '.' + ext) {

matchingfiles.push(files[i]);

}

return matchingfiles;

}

The MergeAll API was written to use the fs.readdir function to check if we have uploaded all of the file chunks. In each call to fs.readdir we populate the an array named fileslist with the filenames. Once we have uploaded all of the file chunks we populate an array named files with all of the file names as shown.

for (var i = 0; i < fileslist.length; i++) {

if (path.extname(fileslist[i]) == '.tmp') {

//console.log(fileslist[i]);

files.push(fileslist[i]);

}

The next thing that is done is to use the fs.createWriteStream to create the output file.

// Create tthe output file

var outputFile = fs.createWriteStream(filename);

We then used a recursive function named mergefiles to merge the file chunks into the final output file. In the mergefiles function we use fs.createReadStream to read each file in the files array and write them to the output file. The mergefiles function is called with the index set to 0, and after each successful call to fs.createReadStream we increment the index.

var index = 0;

// Recrusive function used to merge the files

// in a sequential manner

var mergefiles = function (index) {

// If teh index matches the items in the array

// end the function and finalize the output file

if (index == files.length) {

outputFile.end();

return;

}

console.log(files[index]);

// Use a read stream too read the files and write them to the write stream

var rstream = fs.createReadStream(localFilePath + '/' + files[index]);

rstream.on('data', function (data) {

outputFile.write(data);

});

rstream.on('end', function () {

//fs.removeSync(localFilePath + '/' + files[index]);

mergefiles(index + 1);

});

rstream.on('close', function () {

fs.removeSync(localFilePath + '/' + files[index]);

//mergefiles(index + 1);

});

rstream.on('error', function (err) {

console.log('Error in file merge - ' + err);

response.status(500).send(err);

return;

});

};

mergefiles(index);

The complete code for the MergeAll API call.

// Request to merge all of the file chunks into one file

app.get('*/api/CelerFTFileUpload/MergeAll*', function (request, response) {

if (request.method == 'GET') {

// Get the extension from the file name

var extension = path.extname(request.param('filename'));

// Get the base file name

var baseFilename = path.basename(request.param('filename'), extension);

var localFilePath = uploadpath + request.param('directoryname') + '/' + baseFilename;

var filename = localFilePath + '/' + baseFilename + extension;

// Array to hold files to be processed

var files = [];

// Use asynchronous readdir function to process the files

// This provides better i/o

fs.readdir(localFilePath, function (error, fileslist) {

if (error) {

response.status(400).send('Number of file chunks less than total count');

//response.end();

console.log(error);

return;

}

//console.log(fileslist.length);

//console.log(request.param('numberOfChunks'));

if ((fileslist.length) != request.param('numberOfChunks')) {

response.status(400).send('Number of file chunks less than total count');

//response.end();

return;

}

// Check if all of the file chunks have be uploaded

// Note we only want the files with a *.tmp extension

if ((fileslist.length) == request.param('numberOfChunks')) {

for (var i = 0; i < fileslist.length; i++) {

if (path.extname(fileslist[i]) == '.tmp') {

//console.log(fileslist[i]);

files.push(fileslist[i]);

}

if (files.length != request.param('numberOfChunks')) {

response.status(400).send('Number of file chunks less than total count');

//response.end();

return;

}

// Create tthe output file

var outputFile = fs.createWriteStream(filename);

// Done writing the file. Move it to the top level directory

outputFile.on('finish', function () {

console.log('file has been written ' + filename);

//runGC();

// New name for the file

var newfilename = uploadpath + request.param('directoryname') + '/' + baseFilename + extension;

// Check if file exists at top level if it does delete it

// Use move with overwrite option

fs.move(filename, newfilename , {}, function (err) {

if (err) {

console.log(err);

response.status(500).send(err);

//runGC();

return;

}

else {

// Delete the temporary directory

fs.remove(localFilePath, function (err) {

if (err) {

response.status(500).send(err);

//runGC();

return;

}

// Send back a sucessful response with the file name

response.status(200).send('Sucessfully merged file ' + filename);

//response.end();

//runGC();

});

// Send back a sucessful response with the file name

//response.status(200).send('Sucessfully merged file ' + filename + ", " + md5results.toUpperCase());

//response.end();

}

});

var index = 0;

// Recrusive function used to merge the files

// in a sequential manner

var mergefiles = function (index) {

// If teh index matches the items in the array

// end the function and finalize the output file

if (index == files.length) {

outputFile.end();

return;

}

console.log(files[index]);

// Use a read stream too read the files and write them to the write stream

var rstream = fs.createReadStream(localFilePath + '/' + files[index]);

rstream.on('data', function (data) {

outputFile.write(data);

});

rstream.on('end', function () {

//fs.removeSync(localFilePath + '/' + files[index]);

mergefiles(index + 1);

});

rstream.on('close', function () {

fs.removeSync(localFilePath + '/' + files[index]);

//mergefiles(index + 1);

});

rstream.on('error', function (err) {

console.log('Error in file merge - ' + err);

response.status(500).send(err);

return;

});

};

mergefiles(index);

}

/*else {

response.status(400).send('Number of file chunks less than total count');

//response.end();

return;

}*/

});

}

});

Other Improvements

As mentioned the other thing that we did was to create an API call for each type of file upload that is supported by CelerFT.

The Base64 API call will handle uploads in which the CelerFT-Encoded header is set to base64
The FormData API call will handle all multipart/form-data uploads.
The XFileName API call will be used to offload file uploads to the NGINX reverse proxy.

The preliminary tests showed marked improvements in the performance of the backend server during the file uploads. Please feel free to download CelerFT and provide feedback on its performance.

The code for this project can be found at my github repository under the nginxasync branch.

1 comment:

ballentinejagneaux5 March 2022 at 04:03
Titanium Rod in Leg 1st Degree - TITNCER
Titanium Rod - Best Sellers in titanium knife TITNCER. titanium knee replacement Titanium Rod - Best Quality T-Shirts. titanium nose stud Iron Clipper - Best Overall titanium white dominus Item: Iron Clipper titanium symbol - Best Racket

Pages

Friday, 15 April 2016

Gigabit File uploads Over HTTP - The Node.js with NGINX version

1 comment: