Error 502 Bad Gateway
This error occurs during overloaded situations, alot of websites have a custom "oops" page or something similar. Our webserver(Nginx) passes web pages requests to workers (in our case PHP) and if they are busy if will result in a 502.
Our old webserver(lighttpd) would handle this situation differently. It would make it appear the browser was trying to load the page and eventually timeout, as if the site wasn't even up.
I haven't worked on the site during the break but I'm aware of the 502 and addressing it, even before the break. I have audited the access logs for the site and found a few things. Firstly I am now blocking a few IPs found hammering the site. Google bots hit the site at about 4 reqs/second which the site can handle but the crawl rate is higher than what Google documented so I sent them a message. We get a ton of reqs from Guildwork plugin users. I isolated GW requests to a separate worker pool in hopes it doesn't interfere with the sites main worker pool. Each time an item tooltip is loaded, it makes a request. I configured Nginx to cache this so that it doesn't hit PHP at all (that was a big one). The worker pools had a backlog setting default of 128 requests, I upped it to 5000. The shout page is very popular, it appears some 3rd party program is accessing shout data. It should be blocked now. If someone wants shout data for 3rd party use, ask Guildwork.com I'm considering making that page for authenticated users only. I also started using the DB slave for achievement point calculations.
According to the logs through December, 502 would occur in 5-20min intervals. As of my changes it hasn't occurred, which is about 24 hours as of this post. Traffic has increased to the website in Dec due to a number of factors. The catch-22 in making the site faster is request volume can increase and you can find yourself back with the 502. Increasing volume is a good problem to have though. The site has plenty of hardware power for now so I'm focusing on optimizations. I'm not going to lie, the site can use plenty of optimizing.
This post is in response to a few people concerned with it and some thinking I'm not doing anything about it. This site isn't a wordpress, vbulletin or other cms that I can just throw a prepackaged caching solution or plugin in front of it. I aim to make Guildwork data, AH transactions and stock available as soon as it comes in. We are doing a lot of data collection and processing on the backend, over half of database queries are inserts! Considering the amount of reads we get, it's significant. When a 502 occurs, I will have it contact me and I will investigate.