I am in the process of migrating an online B2B enterprise built on design patterns from a 1990's dial-up ISP to a more modern infrastructure. It has taken over 20 years of effort to build a network of services which are difficult to manage to and insecure. Replacing this, with no downtime, is something of a challenge. A really important stepping stone to the target architecture is getting every exposed service routed via a proxy. This makes it much, MUCH simpler for us to:
- (re)route services within our network
- Upgrade services to HTTP2
- Provision certificates / manage encryption
- Configure browser-side caching
- Analyse traffic
- ...and more
Running a handful of BigIp F5s is unfortunately not an option, so my switchboard is a stack of Ubuntu + nginx.
Up till recently the onboarding exercise had gone really well - but then I started encountering
421 errors. My initial reading kept leading me to old bugs in Chrome and other issues with HTTP2. These are neatly summarized
by Kevin as follows:
This is caused by the following sequence of events:
- The server and client both support and use HTTP/2.
- The client requests a page at
foo.example.com
.
- During TLS negotiation, the server presents a certificate which is valid for both
foo.example.com
and bar.example.com
(and the client accepts it). This could be done with a wildcard certificate or a SAN certificate.
- The client reuses the connection to make a request for
bar.example.com
.
- The server is unable or unwilling to support cross-domain connection reuse (for example because you configured their SSL differently and Apache wants to force a TLS renegotiation), and serves HTTP 421.
- The client does not automatically retry with a new connection (see for example Chrome bug #546991, now fixed). The relevant RfC
says that the client MAY retry, not that it SHOULD or MUST. Failing to
retry is not particularly user-friendly, but might be desirable for a
debugging tool or HTTP library.
However, I was able to reproduce the bug in the first request from a browser instance I had just started. Also I was able to see the issue across sites using distinct certificates. I was confused.
Eventually I looked at the logs for the origin server (Apache 2.4.26). I hadn't considered that before as I knew it did not support HTTP2. But low and behold, there in the logs, a 421 error against my request.
[Fri Jun 12 18:44:47.945706 2020] [ssl:error] [pid 21423:tid 140096556701440] AH02032: Hostname foo.example.net provided via SNI and hostname bar.example.com provided via HTTP have no compatible SSL setupSo disabling SSL session re-use on the connection from the nginx proxy to the origin resolved the issue:
proxy_ssl_session_reuse off;
This does mean slightly more overhead between the proxy and the origin, but since both are on the same LAN, its not really noticeable.
Although its something of a gray area, I don't think this is a bug with nginx - I had multiple sites in nginx pointing at the same backend URL. It would be interesting to check if the issue occurs when I have multiple unique DNS names for the origin - one for each nginx front end - if it still occurs there, then that is probably a bug.