Monday, March 30, 2009

speeding up your nginx server with memcached

Nginx is a high performance web and proxy (web and mail proxy) server. Generally, nginx is used as a front-end proxy server to Apache webserver. Nginx is known to be slow while serving dynamic pages like php. Normally, nginx is using fast-cgi method which is slow. Therefore, it's a good idea to run Apache as back-end server to Nginx and serve dynamic php pages from Apache. If your website's php pages suitable to cache for a certain time, you can use Nginx proxy module and proxy_store command to cache Apache served php pages output in Nginx automatically as html. Here, I'll give you instructions how to use Nginx's memcache module and Danga Software's memcached deamon to store your content in memory and serve it. Serving content from memory will be faster than serving it from disk. memcached's default listening port is 11211. You can find instructions on Danga Software's website how to compile and run memcached.

Now, we can look our Nginx configuration for memcache implementation. Let's suppose we have two Apache webservers running on two different physical servers. IP addresses of the Apache webservers are 192.168.2.3 and 192.168.2.4. We'll use those Apache webservers as back-end servers. We have a Nginx server as front-end to them on 192.168.2.1 ip adress. First of all, we have to tell Nginx about those back-end servers. We use Nginx upstream module for this purpose. As you can see below, we defined a upstream named "backend". The configuration has our two Apache webservers ip addresses. Upstream module let's you also give weight to each server in configuration. Our first server's hardware configuration is better than the second one, so we gave the first one weight value 2. This configuration should be in http section of Nginx configration file (nginx.conf).



upstream backend {
server 192.168.2.3 weight=2;
server 192.168.2.4;
}

We have created our upstream configuration. Now, we have to tell Nginx, which files will be server by memcache module. I have decided to only serve some image types by memcache. The following configuration part should be in server section of Nginx configuration. The "location" directive tell's the nginx to handle every file which ends with given extensions like .jpg,.png and .gif in url. As first step, Nginx will check the url in memcached. Memcached is simple key value memory database. Every row has a unique key.In our case the key is our url. If Nginx, finds the key (url) in memcached, it will get contents of the key from mecached and send it back to client. This operation is running completely from memory. In case that the key (url) not found, it will fallback to 404 and as you can see, we catch 404 error and send request to our back-end Apache servers. Nginx will then send Apache's response to client.


location ~* \.(jpg|png|gif)$ {
access_log off;
expires max;
add_header Last-Modified "Thu, 26 Mar 2000 17:35:45 GMT";
set $memcached_key $uri;
memcached_pass 127.0.0.1:11211;
error_page 404 = /fetch;
}

location /fetch {
internal;
access_log off;
expires max;
add_header Last-Modified "Thu, 26 Mar 2000 17:35:45 GMT";
proxy_pass http://backend;
break;
}

Of course, we have a drawback here. Nginx's memcache module never put anything automatically in memcached. You have to store your information in it manually by using something like a script. Considering our example, if we forget to store information about a file in memcached, it will be always served by back-end Apache servers. Here is a simple php script, which finds given image types and deploy it into memcached for Nginx.


<?php

function rscandir($base='', &$data=array()) {
$array = array_diff(scandir($base), array('.', '..'));

foreach($array as $value) :
if (is_dir($base.$value)) {
$data = rscandir($base.$value.'/', $data);

}
elseif (is_file($base.$value)) {
$rest = substr($value, -4);
if ((!strcmp($rest,'.jpg')) || (!strcmp($rest,'.png'))
|| (!strcmp($rest,'.gif')) ){
$data[] = $base.$value;
}
}

endforeach;
return $data;
}

$mylist=rscandir("/var/www/mysite");

$srch = array('/var/www/mysite');
$newval = array('');

$memcache_obj = memcache_connect("192.168.2.1", 11211);

while (list($key, $val) = each($mylist)) {
$url=str_replace($srch,$newval,$val);
echo "$key => $val -> ".filesize($val)."\n";
$value = file_get_contents($val);
memcache_add($memcache_obj, $url, $value, false, 0);
}
?>


You need to run this script one time, it will find all given image types and store them into memcached. I run this on one of the Apache back-end servers. It will store data into memcached. This memcached is located on Nginx server which ip address is 192.168.2.1 .

14 comments:

Anonymous said...

You make it sound in the beginning of your post that serving plain PHP over NGINX is slower than serving plain PHP over APACHE ... is that correct?

If so, why?

Levent Serinol said...

Nginx is using fcgi method which is slow. You can see the difference by testing Apache and Nginx with big php codes and high concurrency. For better performance on nginx, it looks like many pre-forked php-cgi's needed.If you're serving php static content or have a chance to cache your php files html output for a certain time then you can run nginx server as front end server with proxy module and apache with mod_php as backend server. So, you can cache html output of your php files.On static content like image and html, Nginx is superior to Apache. Nginx is lightweight, faster and more resource friendly than Apache.

Jason said...

I would like to see some results on the claim that nginx+fastcgi is slower then apache+mod*.

In my benchmarks, I've used a small ec2 instance and ran nginx+fastcgi and apache+fastcgi and apache+modperl. The application code functionality was.

nginx+fastcgi: 91 reqs/s
apache+fastcgi: 90 reqs/s
apache+modperl: 84 reqs/s

Plus the main benefit of not embedding your application code in the apache process (mod_perl/mod_php/mod_python) is that you create a tiered architecture. This allows you to accept/route connections without having to tie up the application and its large memory footprint.

For instance in mod_* if a slow connection makes a request the apache thread (big memory footprint) is tied up for the entire length of the connection. In fastcgi land a slow connection will first hit nginx which proxies the request to fascgi which will process the request (fast) and send the result back to nginx which will being to stream the data back to the requester. While nginx is doing that the fastcgi process is able to handle other requests.

Levent Serinol said...

Try your test with a huge php application and compare nginx+fcgi+php with apache+mod_php by sending many concurrent connections. When you reach you're pre defined fast-cgi's max process limit , nginx will begin to slow down.This is not a problem of nginx, it's the problem of fcgi implementation. Of course, nginx is faster than apache and lightweight on resources. But fast-cgi is a bottleneck for nginx.

Rense said...

Hi,

I want to try out this configuration because I manage a high volume dynamic site with a lot of PHP in it.

I have a couple of questions though:

- I probably need more then 1 nginx memcache, is this possible and if so, how does this affect the configuration scripts? Can I just let my loadbalancer do layer-4 leased connections loadbalancing over the NGINX instances?

- What is the ration of NGINX to Apache?

- Most of my traffic is SSL, where does that fit in? Do I need to terminate that in NGINX first before I can pass anything to the backend? (and how will it return?).

Thanks a lot, great article btw :)

Rens

Anonymous said...

Hi Levent,
Thank you for this great explanation.

As Rense,I want to know if it's possible to configure nginx with a memcached cluster (memcached or nginx memcache)?

Configuring nginx 'memcached_pass' with several memcached IP is not the solution because memcache pairs (key/value) are not distributed between all memcache daemons. It seems they act as a backup if the first daemon failed.

Am i right ?

Thank you for your help.
Myxans

Anonymous said...

memcache_pass is part of, but not the complete picture.

memcache_next_upstream is the option you are looking for, I think. I our environment, we use a valid upstream name in the memcache_pass directive, and defined what to do if the data is not found.
ex. of conf
upstream memcache {
server x.x.x.x:11211;
server x.x.x.x:11211;
server x.x.x.x:11211;
}
(normal config options)
memcached_next_upstream not_found;
memcached_pass www3.memcache;

With this config, if nginx gets a 'not_found' response from memcache, it will try the next server in the upstream config, and continue until found or out of upstreams, at which point we hand it back to apache, which handles the rest of the communication with the customer.

Adam said...

I could serve about 9000 requests a second from the static image file. Now that it is in memcached, I can only serve 2500 a second. Should work great for content, but not for static files I don't think.

Colin said...

Thank you for this article, with a minor tweak this has worked really quite well. It certainly makes a difference on marginal hardware or network based storage. Tests are showing a real improvement in load times.

Anonymous said...

Levent, Great article.

I am setting up a nginx frontend to multiple apache backends to have a high available setup. I used your howto and it is working great.

Yet to test things with, I tried to use the php script to query images into memcached, but its not returning anything in cache. I am able to test memcache with simple scripts and it is returning, but not yours. It runs fine, or seems to. Can you give me any insight on what it could be?
Michael

Anonymous said...

Hi!

Do you think using memcache for storing nginx cache is better somehow than having nginx cache dir in tmpfs volume?

Ben Lancaster said...

nginx + fastcgi definitely isn't "slower" than apache + mod_php, or even apache + fastcgi.

Performance can be gauged in so many ways too, e.g. execution time, resource usage per request, concurrent requests and so on.

I use the Symfony framework. Every request that goes via ANY framework has a very large overhead (due to the sheer number of classes and files involved in the request), and in my experience, execution times are quicker (albeit marginally) with nginx + fastcgi when compared to Apache. The big benefit you get with nginx + fastcgi is concurrency. In my experience, the nginx+fastcgi combo is capable of handling so many more concurrent requests with a lower overhead than even the most tuned Apache server.

Sid Ahuja said...

What about Last-Modified header? In the example you have a hard-coded value for Last-Modified. How do we configure this to be set dynamically?

nginx sets the Last-Modified header automatically when it handles the request itself but when the request is passed to memcached, this header is not included in the HTTP response.

Wuvist said...

@Adam

How do you test?

"9000 requests a second from the static image file."
Is the 9000 requests for the same file?
If so, nginx should cache everything in memory, thus it's very fast.

However, the real world scenario is very different.

Try benchmark a large number(more than 10K) of img path, you should have a different result.