Nginx: Everything about proxy_pass

With the advent of Microservices, ingress routing and routing between services has been an every-increasing demand. I currently default to nginx for this - with no plausible reason or experience to back this decision, just because it seems to be the most used tool currently.

However, the often needed proxy_pass directive has driven me crazy because of it's - to me unintuitive - behavior. So I decided to take notes on how it works and what is possible with it, and how to circumvent some of it's quirks.

First, a note on https

By default proxy_pass does not verify the certificate of the endpoint if it is https (how can this be the default behavior, really?!). This can be useful internally, but usually you want to do this very explicitly. And in case that you use publicly routed endpoints, which I have done in the past, make sure to set proxy_ssl_verify to on. You can also authenticate against the upstream server that you proxy_pass to using client certificates and more, make sure to have a look at the available options at https://docs.nginx.com/nginx/admin-guide/security-controls/securing-http-traffic-upstream/.

A simple example

proxy_pass is usually used when there is an nginx instance that handles many things, and delegates some of those requests to other servers. Some examples are ingress in a Kubernetes cluster that spreads requests among the different microservices that are responsible for the specific locations. Or you can use nginx to directly deliver static files for a frontend, while some server-side rendered content or API is delivered by a WebApp such as ASP.NET Core or flask.

Let's imagine we have a WebApp running on http://localhost:5000 and want it to be available on http://localhost:8080/webapp/, here's how we would do it in a minimal nginx.conf:

daemon off;
events {
}
http {
    server {
        listen 8080;
        location /webapp/ {
            proxy_pass http://127.0.0.1:5000/api/;
        }
    }
}

You can save this to a file, e.g. nginx.conf, and run it with

nginx -c $(pwd)/nginx.conf.

Now, you can access http://localhost:8080/webapp/ and all requests will be forwarded to http://localhost:5000/api/. Note how the /webapp/ prefix is "cut away" by nginx. That's how locations work: They cut off the part specified in the location specification, and pass the rest on to the "upstream". "upstream" is called whatever is behind the nginx.

To slash or not to slash

Except for when you use variables in the proxy_pass upstream definition, as we will learn below, the location and upstream definition are very simply tied together. That's why you need to be aware of the slashes, because some strange things can happen when you don't get it right.

Here is a handy table that shows you how the request will be received by your WebApp, depending on how you write the location and proxy_pass declarations. Assume all requests go to http://localhost:8080:

locationproxy_passrequestreceived by upstream

/webapp/

http://localhost:5000/api/

/webapp/foo?bar=baz

/api/foo?bar=baz

/webapp/

http://localhost:5000/api

/webapp/foo?bar=baz

/apifoo?bar=baz

/webapp

http://localhost:5000/api/

/webapp/foo?bar=baz

/api//foo?bar=baz

/webapp

http://localhost:5000/api

/webapp/foo?bar=baz

/api/foo?bar=baz

/webapp

http://localhost:5000/api

/webappfoo?bar=baz

/apifoo?bar=baz

JIn other words: You usually always want a trailing slash, never want to mix with and without trailing slash, and only want without trailing slash when you want to concatenate a certain path component together (which I guess is quite rarely the case). Note how query parameters are preserved!

$uri and request_uri

You have to ways to circumvent that the location is cut off: First, you can simply repeat the location in the proxy_pass definition, which is quite easy:

location /webapp/ {
    proxy_pass http://127.0.0.1:5000/api/webapp/;
}

That way, your upstream WebApp will receive /api/webapp/foo?bar=baz in the above examples.

Another way to repeat the location is to use request_uri. The difference is that uri discards them:

locationproxy_passrequestreceived by upstream

/webapp/

http://localhost:5000/api$request_uri

/webapp/foo?bar=baz

/api/webapp/foo?bar=baz

/webapp/

http://localhost:5000/api$uri

/webapp/foo?bar=baz

/api/webapp/foo

Note how in the proxy_pass definition, there is no slash between "api" and uri. This is because a full URI will always include a leading slash, which would lead to a double-slash if you wrote "api/$uri".

Capture regexes

While this is not exclusive to proxy_pass, I find it generally handy to be able to use regexes to forward parts of a request to an upstream WebApp, or to reformat it. Example: Your public URI should be http://localhost:8080/api/cart/items/123, and your upstream API handles it in the form of http://localhost:5000/cart_api?items=123. In this case, or more complicated ones, you can use regex to capture parts of the request uri and transform it in the desired format.

location ~ ^/api/cart/([a-z]*)/(.*)$ {
   proxy_pass http://127.0.0.1:5000/cart_api?$1=$2;
}

Use try_files with a WebApp as fallback

A use-case I came across was that I wanted nginx to handle all static files in a folder, and if the file is not available, forward the request to a backend. For example, this was the case for a Vue single-page-application (SPA) that is delivered through flask - because the master HTML needs some server-side tuning - and I wanted to handle nginx the static files instead of flask. (This is recommended by the official gunicorn docs.)

You might have everything for your SPA except for your index.html available at /app/wwwroot/, and http://localhost:5000/ will deliver your server-tuned index.html.

Here's how you can do this:

location /spa/ {
   root /app/wwwroot/;
   try_files $uri @backend;
}
location @backend {
   proxy_pass http://127.0.0.1:5000;
}

Note that you can not specify any paths in the proxy_pass directive in the @backend for some reason. Nginx will tell you:

nginx: [emerg] "proxy_pass" cannot have URI part in location given by regular expression,
or inside named location, or inside "if" statement, 
or inside "limit_except" block in /home/daniel/projects/nginx_blog/nginx.conf:28

That's why your backend should receive any request and return the index.html for it, or at least for the routes that are handled by the frontend's router.

Let nginx start even when not all upstream hosts are available

One reason that I used 127.0.0.1 instead of localhost so far, is that nginx is very picky about hostname resolution. For some unexplainable reason, nginx will try to resolve all hosts defined in proxy_pass directives on startup, and fail to start when they are not reachable. However, especially in microservice environments, it is very fragile to require all upstream services to be available at the time the ingress, load balancer or some intermediate router starts.

You can circumvent nginx's requirement for all hosts to be available at startup by using variables inside the proxy_pass directives. HOWEVER, for some unfathomable reason, if you do so, you require a dedicated resolver directive to resolve these paths. For Kubernetes, you can use kube-dns.kube-system here. For other environments, you can use your internal DNS or for publicly routed upstream services you can even use a public DNS such as 1.1.1.1 or 8.8.8.8.

Additionally, using variables in proxy_pass changes completely how URIs are passed on to the upstream. When just changing

proxy_pass https://localhost:5000/api/;

to

set $upstream https://localhost:5000;
proxy_pass $upstream/api/;

... which you might think should result in exactly the same, you might be surprised. The former will hit your upstream server with /api/foo?bar=baz with our example request to /webapp/foo?bar=baz. The latter, however, will hit your upstream server with /api/. No foo. No bar. And no baz. :-(

We need to fix this by putting the request together from two parts: First, the path after the location prefix, and second the query parameters. The first part can be captured using the regex we learned above, and the second (query parameters) can be forwarded using the built-in variables $is_args and $args. If we put it all together, we will end up with a config like this:

daemon off;events {}http {
    server {
        access_log /dev/stdout;
        error_log /dev/stdout;
        listen 8080;
        # My home router in this case:
        resolver 192.168.178.1;
        location ~ ^/webapp/(.*)$ {
            # Use a variable so that localhost:5000 might be down while nginx starts:
            set $upstream http://localhost:5000;
            # Put together the upstream request path using the captured component after the location path, and the query parameters:
            proxy_pass $upstream/api/$1$is_args$args;
        }
    }}

While localhost is not a great example here, it works with your service's arbitrary DNS names, too. I find this very valuable in production, because having an nginx refuse to start because of a probably very unimportant service can be quite a hassle while wrangling a production issue. However, it makes the location directive much more complex. From a simple location /webapp/ with a proxy_pass http://localhost/api/ it has become this behemoth. I think it's worth it, though.

Better logging format for proxy_pass

To debug issues, or simply to have enough information at hand when investigating issues in the future, you can maximize the information about what is going on in your location that uses proxy_pass.

I found this handy log_format, which I enhanced with a custom variable upstream in all your locations that use proxy_pass, you can use this log_format and have often much needed information in your log:

log_format upstream_logging '[$time_local] $remote_addr - $remote_user - $server_name to: 
$upstream: $request upstream_response_time $upstream_response_time msec $msec request_time $request_time';

Here is a full example:

daemon off;
events {
}
http {
    log_format upstream_logging '[$time_local] $remote_addr - $remote_user - $server_name to: "$upstream": "$request" upstream_response_time $upstream_response_time msec $msec request_time $request_time';
    server {
        listen 8080;

        location /webapp/ {
            access_log /dev/stdout upstream_logging;
            set $upstream http://127.0.0.1:5000/api/;
            proxy_pass $upstream;
        }
    }
}

However, I have not found a way to log the actual URI that is forwarded to $upstream, which would be one of the most important things to know when debugging proxy_pass issues.

Conclusion

I hope that you have found helpful information in this article that you can put to good use in your development and production nginx configurations.

转载自:https://dev.to/danielkun/nginx-everything-about-proxypass-2ona


参考:nginx rewrite与proxy_pass详解

anzhihe 安志合个人博客,版权所有 丨 如未注明,均为原创 丨 转载请注明转自:https://chegva.com/4962.html | ☆★★每天进步一点点,加油!★★☆ | 

您可能还感兴趣的文章!

发表评论

电子邮件地址不会被公开。 必填项已用*标注