Gigabit File
uploads Over HTTP - The Node.js with NGINX version
Please see the ASP .Net version of the article. It provides background information that might not be covered here.
Please see the original NODE.js version of this article.
One of the things that we wanted to do after blogging about Gigabit File uploads with Node.js was to see how we could improve the performance of the application. In the previous version of the application the code that was written was mostly synchronous and as a result of that we had high CPU usage, did quite a lot of I/O, and used up a fair amount of memory. All in all what was created had more to do with demonstrating the concept of how to do the Gigabit File uploads over HTTP rather than for performance.
Now that we have established the concept it is now time to
see how the application's performance can be improved.
The Performance
Tuning
The areas that we want to look at to address the Gigabit
File upload performance are:
- Implementing a reverse proxy server in front of the Node.js server.
- Offloading the file upload requests to the reverse proxy.
- Converting the MergeAll blocking synchronous code to non-blocking asynchronous code
- Creating an API for each backend request. As it is now the UploadChunk API call is used to manage all uploads.
- Removing the checksum calculation from the MergeAll API call. A GetChecksum API will be created to calculate the checksum of the uploaded file.
The performance testing was conducted on a Centos 7 virtual machine running NGINX
version 1.9.9. and Node.js version
5.3.0. This is a departure from our previous blog post, because that work was
done on a Windows 2012 platform.
The Reverse
Proxy
Node.js allows you to build fast, scalable network applications capable of handling a huge number of simultaneous connections with high throughput. This means that from the very start Node.js is quite capable of handling the Gigabit File uploads.
So why would we want to use a reverse proxy in front of our
Node.js server in this scenario? We want to do this because offloading the file
handling to the NGINX web server will reduce the overhead on the Node.js
backend and this should provide a performance boost. The following figure shows
how this is achieved.
Figure 1 Offloading file upload to NGINX reverse proxy
- The client computer uploads the file chunks by calling the XFileName API. Once the NGINX reverse proxy sees a call to /api/CelerFTFileUpload/UploadChunk/XFileName it will save the file chunk to the NGINX private temporary directory, because we have enabled the NGINX client_body_in_file_only directive. The NGINX private temporary directory can be found under /tmp. This happens because in the NGINX systemd file the PrivateTmp configuration option is set to true. Please consult the systemd man pages for more information on the PrivateTmp configuration option.
- After the file chunk has been saved NGINX will set the X-File-Name header with the name of the file chunk. This will be sent to Node.js.
- Once all of the file chunks have been uploaded the client calls the MergeAll API and this is sent directly to Node.js by NGINX. Once Node.js receives the MergeAll request it will merge all of the uploaded file chunks to create the file.
- Once Node.js receives the X-File-Name header it will move the file chunk from the NGINX private temporary directory and save it to the file upload directory with the correct name.
We used the following NGINX configuration:
# redirect CelerFT
location =
/api/CelerFTFileUpload/UploadChunk/XFileName {
aio on;
directio 10M;
client_body_temp_path
/tmp/nginx 1;
client_body_in_file_only on;
client_body_buffer_size 10M;
client_max_body_size 60M;
proxy_pass_request_headers on;
proxy_set_body off;
proxy_redirect off;
proxy_ignore_client_abort on;
proxy_http_version 1.1;
proxy_set_header
Connection "";
proxy_set_header Host
$host;
##proxy_set_header Host
$http_host;
proxy_set_header
X-Real-IP $remote_addr;
proxy_set_header
X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header
X-Forwarded-Proto $scheme;
proxy_set_header
X-File-Name $request_body_file;
proxy_pass
http://127.0.0.1:1337;
# proxy_redirect
default;
proxy_connect_timeout 600;
proxy_send_timeout 600;
proxy_read_timeout 600;
send_timeout 600;
access_log off;
error_log
/var/log/nginx/nginx.upload.error.log;
}
The key parameter is the X-File-Name header which is set to the
name of the file. The Node.js backend has to then process the individual
chunks. The crucial part of the code is to find out where the NGINX private
temporary directory is created, because this is where NGINX will write the file
chunks. Under systemd the NGINX private temporary directory will have a
different name each time NGINX is restarted and so we have to get the name of
that directory before we can move the file chunk to the final destination.
app.post('*/api/CelerFTFileUpload/UploadChunk/XFileName*',
function (request, response) {
// Check if we
uploading using a x-file-header
// This means that we
have offloaded the file upload to the
// web server (NGINX)
and we are sending up the path to the actual
// file in the header.
The file chunk will not be in the body
// of the request
if
(request.headers['x-file-name']) {
// Temporary location of our
uploaded file
// Nginx uses a
private file path in /tmp on Centos
// we need to get
the name of that path
var temp_dir = fs.readdirSync('/tmp');
var
nginx_temp_dir = [];
for
(var i = 0; i < temp_dir.length; i++) {
if (temp_dir[i].match('nginx.service')) {
nginx_temp_dir.push(temp_dir[i]);
}
}
var
temp_path = '/tmp/' + nginx_temp_dir[0] + request.headers['x-file-name'];
fs.move(temp_path , response.locals.localfilepath, {}, function (err) {
if (err) {
response.status(500).send(err);
return;
}
// Send back a sucessful response
with the file name
response.status(200).send(response.locals.localfilepath);
response.end();
});
}
});
The MergeAll Asynchronous API
In the previous blog post we used the fs.readdirSync and the fs.readfileSync function calls quite extensively. The fs.readdirSync was called each time we needed to check whether or not we had uploaded all of the file chunks. The fs.readfileSync was called when we merged all of the uploaded file chunks to create the file.
Each of those function calls are synchronous calls and
caused the MergeAll API to block each time they had to be called.
The getfilesWithExtensionName function
that was being called in the MergeAll API was replaced with a fs.readdir function call that is used to check that we
have uploaded all of the file chunks.
The getfilesWithExtensionName
function.
function getfilesWithExtensionName(dir, ext) {
var
matchingfiles = [];
if
(fs.ensureDirSync(dir)) {
return matchingfiles;
}
var files = fs.readdirSync(dir);
for
(var i = 0; i < files.length; i++) {
if
(path.extname(files[i]) === '.' + ext) {
matchingfiles.push(files[i]);
}
}
return
matchingfiles;
}
The MergeAll API was written to use the fs.readdir function to check if we have uploaded all of the file chunks.
In each call to fs.readdir we populate the an array
named fileslist with the filenames. Once we
have uploaded all of the file chunks we populate an array named files with all
of the file names as shown.
for (var i = 0; i < fileslist.length; i++) {
if
(path.extname(fileslist[i]) == '.tmp') {
//console.log(fileslist[i]);
files.push(fileslist[i]);
}
}
The next thing that is done is to use the fs.createWriteStream to create the output file.
// Create tthe output file
var outputFile = fs.createWriteStream(filename);
We then used a recursive function named mergefiles to
merge the file chunks into the final output file. In the mergefiles function we use fs.createReadStream
to read each file in the files array and
write them to the output file. The mergefiles function
is called with the index set to 0, and after each successful call to fs.createReadStream we increment the index.
var index = 0;
// Recrusive function used to merge the files
// in a sequential manner
var mergefiles = function (index) {
// If teh index matches the items
in the array
// end the function and
finalize the output file
if
(index == files.length) {
outputFile.end();
return;
}
console.log(files[index]);
// Use a read stream too read the
files and write them to the write stream
var
rstream = fs.createReadStream(localFilePath + '/' + files[index]);
rstream.on('data', function (data) {
outputFile.write(data);
});
rstream.on('end', function () {
//fs.removeSync(localFilePath +
'/' + files[index]);
mergefiles(index + 1);
});
rstream.on('close', function () {
fs.removeSync(localFilePath + '/' + files[index]);
//mergefiles(index + 1);
});
rstream.on('error', function (err) {
console.log('Error in file merge - ' + err);
response.status(500).send(err);
return;
});
};
mergefiles(index);
The complete code for the MergeAll API call.
// Request to merge all of the file chunks into one file
app.get('*/api/CelerFTFileUpload/MergeAll*',
function (request, response) {
if
(request.method == 'GET') {
// Get the extension from the file
name
var extension =
path.extname(request.param('filename'));
// Get the base file name
var
baseFilename = path.basename(request.param('filename'), extension);
var
localFilePath = uploadpath + request.param('directoryname') + '/' +
baseFilename;
var
filename = localFilePath + '/' + baseFilename + extension;
// Array to hold files to be
processed
var
files = [];
// Use asynchronous readdir
function to process the files
// This provides
better i/o
fs.readdir(localFilePath, function (error, fileslist) {
if (error) {
response.status(400).send('Number of file chunks less than total
count');
//response.end();
console.log(error);
return;
}
//console.log(fileslist.length);
//console.log(request.param('numberOfChunks'));
if ((fileslist.length) != request.param('numberOfChunks')) {
response.status(400).send('Number of file chunks less than total
count');
//response.end();
return;
}
// Check if
all of the file chunks have be uploaded
// Note we
only want the files with a *.tmp extension
if ((fileslist.length) == request.param('numberOfChunks')) {
for (var i = 0; i < fileslist.length; i++) {
if
(path.extname(fileslist[i]) == '.tmp') {
//console.log(fileslist[i]);
files.push(fileslist[i]);
}
}
if (files.length != request.param('numberOfChunks')) {
response.status(400).send('Number of file chunks less than total
count');
//response.end();
return;
}
// Create tthe
output file
var outputFile = fs.createWriteStream(filename);
// Done writing
the file. Move it to the top level directory
outputFile.on('finish', function () {
console.log('file has been
written ' + filename);
//runGC();
// New name for
the file
var newfilename =
uploadpath + request.param('directoryname') + '/' + baseFilename + extension;
// Check if file
exists at top level if it does delete it
// Use
move with overwrite option
fs.move(filename,
newfilename , {}, function (err) {
if (err) {
console.log(err);
response.status(500).send(err);
//runGC();
return;
}
else {
// Delete the
temporary directory
fs.remove(localFilePath, function (err) {
if (err) {
response.status(500).send(err);
//runGC();
return;
}
// Send back a sucessful
response with the file name
response.status(200).send('Sucessfully merged file ' + filename);
//response.end();
//runGC();
});
// Send back a
sucessful response with the file name
//response.status(200).send('Sucessfully merged file ' + filename +
", " + md5results.toUpperCase());
//response.end();
}
});
});
var index = 0;
// Recrusive
function used to merge the files
// in a
sequential manner
var mergefiles = function (index) {
// If teh index
matches the items in the array
// end
the function and finalize the output file
if (index == files.length)
{
outputFile.end();
return;
}
console.log(files[index]);
// Use a read
stream too read the files and write them to the write stream
var rstream =
fs.createReadStream(localFilePath + '/' + files[index]);
rstream.on('data', function
(data) {
outputFile.write(data);
});
rstream.on('end', function
() {
//fs.removeSync(localFilePath
+ '/' + files[index]);
mergefiles(index + 1);
});
rstream.on('close',
function () {
fs.removeSync(localFilePath + '/' + files[index]);
//mergefiles(index
+ 1);
});
rstream.on('error',
function (err) {
console.log('Error in
file merge - ' + err);
response.status(500).send(err);
return;
});
};
mergefiles(index);
}
/*else {
response.status(400).send('Number of file chunks less than total
count');
//response.end();
return;
}*/
});
}
});
Other
Improvements
As mentioned the other thing that we did was to create an API call for each type of file upload that is supported by CelerFT.
- The Base64 API call will handle uploads in which the CelerFT-Encoded header is set to base64
- The FormData API call will handle all multipart/form-data uploads.
- The XFileName API call will be used to offload file uploads to the NGINX reverse proxy.
The preliminary tests showed marked improvements in the performance
of the backend server during the file uploads. Please feel free to download CelerFT
and provide feedback on its performance.
The code for this project can be found at my github
repository under the nginxasync branch.
Titanium Rod in Leg 1st Degree - TITNCER
ReplyDeleteTitanium Rod - Best Sellers in titanium knife TITNCER. titanium knee replacement Titanium Rod - Best Quality T-Shirts. titanium nose stud Iron Clipper - Best Overall titanium white dominus Item: Iron Clipper titanium symbol - Best Racket