This article was last updated 210 days ago. The information in it may have developed or changed. If it is invalid, please leave a message in the comment section.

Attack Review

When I woke up at 5 o'clock, I accidentally glanced at the email:

Oh shit, are you messing with me again? And the last time was 02:16, this time it was 02:21, so around 2:20 am is the working time? Quickly check the traffic report:

Take a look at the duration:

It was completely over at 2:21:37 seconds. The content of the packet was similar to the last time, so I won't post it. In short, it was a crazy get operation on my blog homepage based on the http/2 version of the request. This attack lasted 5 minutes (16 minutes last time), with a total of 6.49 million requests (10 million last time), and a request bandwidth of 23.7G (35G last time). You can see that this attack took 1/3 of the time to achieve 2/3 of the effect of the last attack. What does this mean? It means that my response ability this time is twice that of the last time. Why is this so? This is because I changed the optimization method. Do you remember the reminder about the worker quota in the email during the last attack:

There is no such reminder this time, because I felt that the way of optimizing access by building workers-KV by myself was too fragile last time. Although the quota of 100,000 requests a day is not used up under normal circumstances, it will be very miserable if some bored idiots come to carry out this kind of unskilled DDOS attack, so I decided to try the official version of building workers-KV by myself: APO based on wordpress. In fact, its implementation principle is similar to that of building workers-KV by myself. Both cache the content of wordpress in the form of static HTML in the cache of the edge network, and both rely on workers-KV, but this method seems to have no limit on the number of times the worker can be used each time, and it is very well combined with the cache function.

However, although the responsiveness has doubled, I found a problem that I didn’t find last time, which is the famous saying in Fist of the North Star: Although I look alive, I am actually dead! Why do I say this? Because I found a problem. Although the website can be opened, the content is this:

My normal one is like this:

Why did the content change? Then I realized that my main WordPress site was dead, and the load balancer directly transferred the traffic to the backup WordPress site, but I didn’t have time to synchronize the data with the backup site. Then I looked at the uptime monitoring, and sure enough:

This is because I changed the uptime monitoring page address to point tohttps://blog.tangwudi.com/meThis link was created on the main WordPress site a few days ago. There was no such link on the backup site, so after the load balancer switched the traffic to the backup site, although the site was running normally, the monitoring showed it was down (as for why there was a small amount of green in the middle, I guess it was because most of the main site was dead, but there was still some soul. Occasionally, the soul came back and could respond to a few requests, so the load balancer would occasionally transfer a few requests to it, and at this time the monitoring page existed, so it showed green, but it died again immediately, and the load balancer switched back to the backup site, and then the monitoring page disappeared again~), it can also be seen from the statistics of Changting Lei Chi at that time:

61.1%'s 404 error was caused by the main site being down before the traffic was switched to the backup site (there was a waiting timeout before nginxWebUI switched the traffic to the backup site). This can be indirectly confirmed by an email I sent:

Why can this email prove the above speculation? Because wordpress uses php, and because of the working mechanism of php, operations such as scheduled tasks can only be executed when triggered by access requests, which also includes the automatic update of wordpress. When the main site is normal, the backup site cannot receive access requests, so it is useless even if the automatic update of wordpress is set. At 2:21, the main site was killed (the attack traffic also ended at that time. Could it be that it ended because it detected that my site was not responding? It saves resources quite a lot). After the timeout period, the blog traffic was switched to the backup site. After the request, the automatic update mechanism of the backup site wordpress was triggered, and then the upgrade was completed at 2:23. PS: My main site automatically completed the version upgrade the day before yesterday:

The direct evidence is that when using the intranet address to access the main site, it basically cannot be opened. After restarting the docker of the main site, it returns to normal.

Now that I think about it, I happened to synchronize data for the backup site before the last attack, so the contents of the primary and backup sites looked exactly the same, and I didn’t notice this problem. Later, because of my obsessive-compulsive disorder, I restarted the docker of the primary site, causing the dead primary site to be resurrected and take over the traffic again, so I couldn’t notice it anymore.

Well, now that the analysis is clear, let’s summarize the advantages and disadvantages of the current blog service architecture based on the information at hand.
1. Advantages

APO is really awesome. The response speed of normal access and the ability to deal with attacks have been greatly improved.
Cloudflare's architecture based on anycast + unlimited cache once again worked wonders when it encountered a DDOS attack. Ultimately, only 1.6% of the attack requests and 1.6% of the attack bandwidth entered the home data center.
The load balancing based on nginx works normally and can achieve the most basic failover
The active/standby redundant architecture of WordPress in the same data center still works
2. Disadvantages
- The current performance bottleneck is WordPress. The attack traffic that enters the home data center through the cloudflare tunnel has little pressure on nginx (Changting Lei Chi WAF and nginx Web UI load balancing have no pressure), but the main WordPress site itself is killed.
- The load balancing provided by nginxWebUI has no problem in terms of performance (it currently uses the stream-based 4-layer TCP proxy mode), but its functions are too simple. The default health check (actually, it is not even a health check) is based on the standard of whether the TCP port used can establish a connection. It is a bit crude, and there are no statistical charts, which is very inconvenient (of course, this is also because I use all the default values. However, there is really no motivation to study and tinker with this simplified version of the load balancing function).
- The frequency of data synchronization between the main and backup WordPress sites in the home data center is a problem, but at present, once a week is enough, and one or two less articles is acceptable.

Process Optimization

The purpose of process optimization is to address points 1 and 2 of the shortcomings mentioned above.

First point:

The so-called performance bottleneck of wordpress only occurs when it is attacked by abnormal traffic. Under normal circumstances, after APO cache, the number of requests that need to be returned to the source is much less. In addition, my small website has little traffic, which means that there is no traffic at all in normal times. So I only need to solve the "abnormal traffic" problem when there is an attack. So, where is this abnormality? Because I have configured the rate limit of a single IP for the entire site in cloudflare:

Single IP high-frequency access restrictions are also configured on Changting WAF:

Therefore, the main reason for the exception is not multiple visits to a single IP, but the problem of the number of concurrent connections. As long as the number of concurrent connections is limited, the upper limit of the number of requests that reach WordPress at the same time is fixed, so no matter how many abnormal requests there are, it makes no difference to WordPress. The Changting Lei Chi Community Edition does not have this function, and I don’t want to change Changting’s underlying nginx. Although nginxWebUI can be changed, I want to directly switch to a dedicated open source load balancing, so after thinking about it, I decided to deploy another nginx before Changting Lei Chi waf (my pagoda panel is back again), which is specifically for WordPress to limit the number of concurrent connections:

Then point to Changting Lei Chi waf through the reverse proxy of the Pagoda panel.

Second point:

Since the function of nginxWebUI is really simple, zevenet is officially enabled to replace it and complete the cutover. The steps are as follows:

Similarly, add a backup site, the only difference is that you need to set the Priority. The final result is as follows:

Next, you only need to change the IP and port originally pointing to nginxWebUI load balancing on Changting Leichi waf to point to the VIP and port on zevenet:

Then we can finally get rid of the embarrassment of seeing nothing by default in nginx load balancing:

Summarize

After the above process optimization, the request route for accessing the blog inside the home data center through the cloudflare tunnel has become:
cloudflare tunnel—>宝塔nginx—>长亭雷池waf—>zevenet负载均衡—>wordpress主站点。

The first problem was solved by controlling the overall concurrent connections entering the blog through nginx on the Baota panel. The community version of professional load balancing zevenet (zevenet is a well-known open source load balancing platform) was used to replace the previous nginx, which brought more powerful scalability and made operation and maintenance through the GUI interface much more convenient. Although it has fewer functions than the commercial version, it is more than enough for the general environment, which also solved the second problem.

The optimization of the internal access process of the blog is completed. Next, I look forward to the next attack to see if it can kill my main site again (actually it doesn’t have to be killed, as long as I can switch to the backup site, I will lose~).

Comments

obaby

Macintosh Chrome 118.0.0.0

9 months ago
2024-5-11 11:15:08

I have been attacked by various attacks recently, and CF is considered to have a relatively good defensive effect.
- tangwudi
  Owner
  obaby
  
  Macintosh Chrome 124.0.0.0
  
  9 months ago
  2024-5-11 11:26:06
  
  Yes, it mainly depends on the optimization method you use. CF can prevent most of them, but even the small part left may kill the application, so I go to so much trouble to optimize the access process.
Autumn Wind on Weishui River

Windows Chrome 123.0.0.0

9 months ago
2024-5-11 10:14:14

Under the supervision of DDos, learning to defend yourself is also a very interesting thing. Just don't let being attacked affect your mentality. If the main site is really destroyed, it will be destroyed. I don't make money from my personal blog. It's just a place to record my daily life. If it can help others, that's the best. If it's destroyed, then it's gone.
- tangwudi
  Owner
  Autumn Wind on Weishui River
  
  Macintosh Chrome 124.0.0.0
  
  9 months ago
  2024-5-11 10:19:55
  
  Yes, this time it's awesome, the attack has been going on since around 1 a.m., and it was still going on when you posted this comment. I was writing the summary when this guy started again.
- tangwudi
  Owner
  Autumn Wind on Weishui River
  
  Macintosh Chrome 124.0.0.0
  
  9 months ago
  2024-5-11 10:22:57
  
  There is no loss if I don't fight or not, but I will lose face. As a technician, how can I admit defeat?

Attack Review

Process Optimization

Summarize

Comments

Send Comment Edit Comment

Related Posts