Eliminating 502 Proxy Errors

While working on an infrastructure refresh and consolidation project for one of my clients they had a legacy archive of public data consisting of several hundred gigabytes. There are a couple of approaches to handle this, each with its advantages and disadvantages.

Decisions

  1. Move the data into the web container

    • Advantages
      • Local data is easy to manage with standard tools
    • Disadvantages
      • Bloated containers
      • Synchronization requirement for each container
      • Cost
  2. Move the data into a central nfs server

    • Disadvantages
      • Managing additional resources (HA NFS)
      • Cost
    • Advantages
      • A single location to manage data
      • Local data is easy to manage with standard tools
  3. Move the data into a public bucket on object storage

    • Disadvantages
      • Data stored in object storage is harder to manage.
        It requires additional software to be quasi POSIX, or web interface.
    • Advantages
      • No additional resources to deploy/manage
      • A single location to manage data
      • Cost

This was the perfect use case to offload this data to object storage in the cloud. One of my favourite object storage companies are the fine folks over at Backblaze. Their B2 service offers S3 compatible object storage at a fraction of the cost of Amazon S3, Google Cloud Storage, or Azure Blob Storage. Seriously, check out their calculator.

The First Crack

I added the following directives to the apache configuration and it worked, well, mostly!

   SSLProxyEngine on

   # archives now hosted on backblaze b2
   <Location /archive/content/>
     ProxyPreserveHost Off
     ProxyPass "https://f000.backblazeb2.com/file/<bucket-name>/archive/content/"
   </Location>

Let’s work through these directives.

  • SSLProxyEngine: B2 is only available over https so we need to enable the SSLProxy engine in apache
  • Location: You’re likely familiar with the <Directory> directive, <Location> is similar but operates on request URLs instead of filesystem paths. In this case, any request we receive for /archive/content/*.
  • ProxyPreserveHost: This directive determines is we send the Host header that matches our hostname. This is important as the B2 certificates don’t list our hostname as a CN/SNA so we’ll get a certificate error if it’s enabled. Disable it.
  • ProxyPass: This directive takes any request for /archive/content/ and proxies it to Backblaze.

That Was Too Easy

It didn’t take long for apache to start complaining about connection errors to Backblaze. It was a small percentage of errors and mostly during high concurrency. This led to the following messages in the apache error log.

[Sun Feb 09 22:11:20.008414 2020] [proxy:error] [pid 17464:tid 140027392677632] [client xx.xx.xx.xx:57900] AH00898: Error reading from remote server returned by /archive/content/image.jpg, referer: https://<redacted>.com/<redacted>/
[Sun Feb 09 22:11:21.103086 2020] [proxy_http:error] [pid 27370:tid 140027375892224] (70014)End of file found: [client xx.xx.xx.xx:58898] AH01102: error reading status line from remote server f000.backblazeb2.com:443, referer: https://<redacted>.com/<redacted>/

The Second Crack

I suspected that Backblaze is terminating connections once they are idle. When apache tried to reuse that connection the above errors were tossed and resulted in a 502 Proxy Error returned to the client. Let’s disable keepalives and downgrade our protocol to HTTP/1.0 instead of HTTP/1.1:

   SSLProxyEngine on

   # archives now hosted on backblaze b2
   <Location /archive/content/>
     ProxyPreserveHost Off
     ProxyPass "https://f000.backblazeb2.com/file/<bucket-name>/archive/content/"

     # disable keepalives / http 1.1 to prevent proxy 502 errors.
     SetEnv force-proxy-request-1.0 1
     SetEnv proxy-nokeepalive 1
   </Location>

I’ve added the three lines below ProxyPass.

  • force-proxy-request-1.0: This tells apache to use the less feature rich HTTP/1.0 protocol and disables connection reuse.
  • proxy-nokeepalive: This disables keepalives, that is, multiple requests over the same connection.

That’s it. No more connection errors.

Public object storage buckets are a viable way of keeping your web containers lean, your storage costs low, and reducing infrastructure requirements for synchronized datasets.


Illustration of Vince

Vince Hillier is the President and Founder of Revenni Inc. He is an opensource advocate specializing in system engineering and infrastructure. Outside of building solid architecture that doesn't break the bank, he's interested in information security, privacy, and performance.