一个"馒头"引发的血案:记因为升级tailscale而引发的这几天博客访问异常的现象

This article was last updated 196 days ago. The information in it may have developed or changed. If it is invalid, please leave a message in the comment section.

Article Summary

升级Tailscale版本后，Mac设备的固定私有IP地址发生变更，导致灾备站点探测脚本误判家庭数据中心故障，使腾讯云灾备站点持续接管博客服务。因家庭数据中心与腾讯云网络结构差异，Cloudflare Tunnel多源负载均衡机制出现异常，引发访问不稳定及502错误。群友提示后确认问题，最终通过调整策略恢复。事件印证了割接操作中应避免同时实施多项变更的经验。

Qwen3-14B · 2026-06-18

A few days ago, I saw that the Tailscale versions running on my devices were all strange. Due to my obsessive-compulsive disorder, I upgraded all the devices running Tailscale (except Qnap's NAS and OpenWRT, because the latest official installation packages of these two are only 1.58) to version 1.78.1:

Unexpectedly, among so many upgraded devices (including macOS, iPhone, iPad, and all Linux hosts), it was macOS that encountered the problem: a normal upgrade only involves upgrading the Tailscale version and does not affect the corresponding fixed private IP addresses starting with 100. However, in this upgrade, all three macOS devices (including the basic M1 Mac Mini running the main blog site) were affected, and their fixed private IP addresses all changed. As a result, in the "Machines" interface on the Tailscale website, each macOS device showed two records (one from the previous one, indicating offline status; and one from the new one, indicating online status).

I simply deleted the original records without thinking too much, but I forgot the most critical issue: the disaster recovery site I run on the Tencent Cloud host uses the reachability of the previous fixed private IP address of the blog main site macmini as the key indicator of whether the home data center is down (for details, please refer to the article:Home data center series uses cloudflare tunnel to realize automatic takeover of disaster recovery site when WordPress main site fails).

This led to a very serious consequence. Since I upgraded the tailscale version of macmini, the original fixed private IP address has become invalid, causing the detection script of the disaster recovery site of the Tencent Cloud host to always think that the home data center is down, so the disaster recovery site has been enabled. That is to say, during this period of time, there are two connectors in the cloudflare tunnel where the blog domain name is located, while there should be only one connector in the normal state:

Of course, the same Cloudflare tunnel can also support multiple connectors. This is actually the multi-source load balancing function that comes with Cloudflare tunnel (see the article for details:Cloudflare tutorial series for home data centers (Part 9) Introduction to common Zero Trust functions and multi-scenario usage tutorials), however, there is a big premise: the network structure of multiple source stations is exactly the same. Generally speaking, there is no problem using localhost, which is also the most normal way for cloudflare tunnel to support load balancing of multiple source stations.

However, due to the complex structure of my home data center (the host running the Cloudflare tunnel, the host running the internal network WAF, and the host running the main blog site are all independent), and the network planning is definitely different from the IP address of the Tencent Cloud Light Server, the disaster recovery site on the Tencent Cloud host needs to undergo some technical "special processing" in order to take over the blog service when the home data center goes down (it can only take over the blog service, and there is no way to do anything else, as the performance of the Light Cloud host is limited).

I'm too lazy to delve into the specific technical details, but the end result is that access to the blogs is a matter of luck. Some are assigned to the main blog site in a home data center, while others are assigned to a disaster recovery site on Tencent Cloud. Also, due to this "special handling," Cloudflare's detection of the origin server intermittently encounters problems, frequently resulting in a 502 error message indicating that the origin server is inaccessible.

Moreover, this prompt is quite random. When accessing the same article, some people encounter it while others can access it normally, which is more like a "paranormal problem".

Actually, I have also felt that the blog access is a bit abnormal these days, but because I have adjusted many intranet strategies (including the multi-line DNS function of iQuick) and tinkered with some things (using 2 Apple TVs as backup scientific Internet access outlets and unified tailsacle exit nodes respectively), I have always thought it was because of my tinkering. In addition, I can access it normally with cellular traffic (now it seems that cellular access has better performance), so I didn’t pay much attention to it.

At this point I would like to thank two friends in the group (jdejdndns and RadiantHope) for their reminders:

If it weren’t for these two people’s reminder, I wouldn’t be able to confirm that there was really a problem with the blog access. Thank you both here!

In addition, this experience fully verifies a rule of thumb in cutover: Do not make multiple changes in one cutover, otherwise if a problem occurs, you will not know where to start.

📌 Content Structure Hints:

This content belongs to "Cloudflare Learning MapThis is part of the document; you can view the full content path here: Cloudflare Learning Map .

Share this article

Send Comment Edit Comment

👋 Welcome to "Invincible Personal Blog"“