code February 07, 2026 8 min read

c Feb 07, 2026

270 views

EN 简繁

IPv6 Broke My Server — But I Blamed Apache First

Intermittent 522 errors, a perfectly healthy server, and a wrong diagnosis — the real culprit was invisible IPv6 routing failure.

Intermittent 522 errors across multiple Cloudflare-fronted sites. A perfectly healthy-looking server. A wrong diagnosis that felt right. And an invisible IPv6 routing failure that was the real killer all along.

The Symptom

My web server was slow. Not a little slow — Cloudflare was returning 522 Connection Timed Out errors across multiple sites. Pages that did manage to load would then hang trying to fetch static assets like CSS and JavaScript. A fresh reboot didn't fix it. The problem came back within minutes.

The Misleading First Look

Every sysadmin's first instinct is to check the usual suspects:

| Resource | Status | |----------|--------| | CPU | 12 cores, 83% idle | | RAM | 32GB total, 25GB free | | Disk | 26% used, 0% I/O wait | | Load Average | 0.07 | | DNS | 38ms resolution |

Everything looked perfectly healthy. The server was practically asleep. So why were users getting timeout errors?

The Setup

The server runs a fairly common stack:

- Apache 2.4 (event MPM) serving ~250 virtual hosts - PHP-FPM (multiple versions: 7.4, 8.2, 8.3, 8.5) - MySQL 8.0, Redis, Memcached - BT Panel (aaPanel) for server management - Cloudflare in front of all sites

A quick curl to localhost confirmed the web server itself was fast — sub-millisecond response times for HTTP, 5ms for HTTPS. The bottleneck was somewhere between Cloudflare and the response.

The Wrong Diagnosis: Dead Reverse Proxies

Digging into the Apache error logs, I found a flood of proxy errors and "long lost child came home" warnings — classic signs of thread starvation. Across ~250 sites, about 20 had reverse proxy rules pointing to backends that no longer existed — dead local ports and unreachable external servers.

Combined with a 600-second Apache timeout and KeepAlive disabled, these dead backends were consuming worker threads and starving the entire server. I cleaned them all up, reduced the timeout to 100 seconds, enabled KeepAlive, and felt confident I'd found the root cause.

I was wrong.

The 522 errors kept coming back. The intermittent failures persisted. The cleanup helped Apache's overall health, but it wasn't why requests were timing out at the Cloudflare edge.

The Real Investigation

With the proxy red herring out of the way, I ran a systematic availability test across all four Cloudflare-fronted domains sharing this origin server: c., c.*, j.ee, and cu.com*.

Baseline: Something Is Very Wrong

A 60-second burst test with the OS choosing its preferred IP version:

| Domain | Total | OK | Fail | Success% | Avg (ms) | |--------|-------|----|------|----------|----------| | c.** | 9 | 4 | 5 | 44.4% | 70 | | c.** | 13 | 8 | 5 | 61.5% | 59 | | j*.ee | 11 | 6 | 5 | 54.5% | 201 | | c*u.com | 8 | 2 | 6 | 25.0% | 53 |

Over half the requests were failing. All failures were HTTP status 000 — TCP connection timeouts at 10 seconds. When connections succeeded, response times were fast (50–100ms). The failures came in periodic bursts, not randomly.

The Clue: Which IP Version?

I checked what IP version the server was actually using for outbound connections:

Target               Outgoing IP                    Version
1.1.1.1 (forced v4)  43.***.***.98                    IPv4
[2606:4700::1111]     2400:****:****:****::2020       IPv6
c.** (actual)         2400:****:****:****::2020       IPv6  ← OS preferred v6

The OS was routing all Cloudflare traffic over IPv6 by default.

IPv4 vs IPv6: Head-to-Head

I ran parallel tests forcing -4 and -6 on curl for each domain:

| Domain | IPv4 Rate | IPv4 OK/Total | IPv6 Rate | IPv6 OK/Total | |--------|-----------|---------------|-----------|---------------| | c.** | 55% | 6/11 | 71% | 12/17 | | c. | 40% | 4/10 | 0% | 0/59** | | j*.ee | 25% | 2/8 | 44% | 4/9 | | c*u.com | 55% | 6/11 | 64% | 9/14 |

c. was 100% broken on IPv6** — all 59 attempts failed. The other domains showed failures on both protocols, but IPv6 was clearly the unstable path. IPv4 failures were likely collateral damage from the network's IPv6 routing issues affecting the shared infrastructure.

The Smoking Gun: Disable IPv6

I disabled IPv6 at the kernel level:

sysctl -w net.ipv6.conf.all.disable_ipv6=1
sysctl -w net.ipv6.conf.default.disable_ipv6=1

Then ran the same test:

| Domain | Total | OK | Fail | Success% | Avg (ms) | P50 (ms) | P95 (ms) | |--------|-------|----|------|----------|----------|----------|----------| | c. | 56 | 56 | 0 | 100%** | 55 | 52 | 62 | | c. | 56 | 56 | 0 | 100%** | 64 | 59 | 69 | | j*.ee | 55 | 55 | 0 | 100% | 78 | 73 | 106 | | c*u.com | 56 | 56 | 0 | 100% | 51 | 46 | 58 |

223 out of 223 requests successful. Zero timeouts. Zero failures.

Manual browsing confirmed it — every site loaded instantly and consistently.

The Root Cause

IPv6 routing failure between the server (2400:****:****:****::2020) and Cloudflare's HKG edge.

The symptoms: - TCP connections over IPv6 would periodically fail to establish - Failures manifested as complete 10-second connection timeouts (not HTTP errors) - Failures occurred in periodic bursts, not randomly - All Cloudflare-fronted domains were equally affected - c.** was additionally 100% unreachable over IPv6 (likely missing or misconfigured AAAA records)

The server OS (Linux) preferred IPv6 when both A and AAAA records were available, causing the majority of outbound traffic to route over the broken IPv6 path.

Before vs After

| Domain | Before (IPv6 active) | After (IPv4 only) | |--------|----------------------|--------------------| | c. | 44–85% | 100%** | | c. | 0–62% | 100%** | | j*.ee | 25–55% | 100% | | c*u.com | 25–62% | 100% |

Response times: - Before: 50–310ms with 10,000ms timeout spikes - After: 42–170ms, median 46–73ms, no spikes

The Fix

Immediate (applied at 19:00 UTC):

sysctl -w net.ipv6.conf.all.disable_ipv6=1
sysctl -w net.ipv6.conf.default.disable_ipv6=1

Permanent (applied at 19:10 UTC):

Added to /etc/sysctl.conf:

net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1

Lessons Learned

1. IPv6 failures are invisible. Nothing in CPU, RAM, disk, or load average will tell you that your IPv6 path is broken. The server looks perfectly healthy while half your requests silently die.

2. Check your IP version. When debugging connectivity, always verify which protocol the OS is actually using. A server with both A and AAAA records might be routing everything over a broken IPv6 path without you knowing.

3. Obvious problems can be red herrings. The dead reverse proxy backends were real, visible in logs, and satisfying to fix. But they weren't the actual cause. Don't stop investigating just because you found a problem — make sure it's the problem.

4. Test systematically. Parallel IPv4 vs IPv6 tests with forced protocol selection made the root cause undeniable. Without that controlled comparison, I might have kept chasing Apache configuration ghosts.

5. Report upstream. IPv6 routing issues on the hosting provider's network should be reported — the path to Cloudflare HKG was unreliable and may affect other customers.

What About the Dead Proxies?

The reverse proxy cleanup from my initial (wrong) diagnosis wasn't wasted effort. Dead backends pointing to unreachable external servers with a 600-second timeout were consuming Apache threads unnecessarily. Fixing that, reducing the timeout to 100 seconds, and enabling KeepAlive were all legitimate improvements to server health.

But those were maintenance issues, not the outage cause. The outage was IPv6.

Final Thought

The irony is that I wrote an entire detailed blog post about dead reverse proxies being the root cause, complete with diagnostic commands and architectural explanations. It was thorough, well-reasoned, and wrong. The real culprit was two sysctl lines away from being revealed.

Sometimes the hardest problems aren't about what you can see in the logs. They're about what the logs never show you.

IPv6 搞垮了我的服务器——但我一开始怪错了 Apache

间歇性 522 错误、一台看起来完全正常的服务器、一个错误的诊断——真正的元凶是隐形的 IPv6 路由故障。

多个 Cloudflare 站点间歇性 522 错误。服务器看起来完全正常。一个看似合理的错误诊断。以及一个隐形的 IPv6 路由故障——才是真正的元凶。

症状

我的 Web 服务器很慢。不是一般的慢——Cloudflare 在多个站点上返回 522 连接超时 错误。那些好不容易加载出来的页面，又会在获取 CSS 和 JavaScript 等静态资源时卡住。重启服务器也没用，问题几分钟内就会卷土重来。

误导性的第一印象

每个系统管理员的第一反应都是检查常见的可疑对象：

| 资源 | 状态 | |----------|--------| | CPU | 12 核心，83% 空闲 | | 内存 | 总计 32GB，25GB 可用 | | 磁盘 | 26% 已用，0% I/O 等待 | | 负载平均 | 0.07 | | DNS | 38ms 解析时间 |

一切看起来都很正常。服务器几乎在"休眠"。那为什么用户会收到超时错误呢？

环境配置

服务器运行着一个相当常见的技术栈：

- Apache 2.4（event MPM）托管约 250 个虚拟主机 - PHP-FPM（多版本：7.4、8.2、8.3、8.5） - MySQL 8.0、Redis、Memcached - 宝塔面板（BT Panel）用于服务器管理 - Cloudflare 在所有站点前方

用 curl 快速测试 localhost 确认 Web 服务器本身很快——HTTP 响应时间不到一毫秒，HTTPS 只需 5ms。瓶颈在 Cloudflare 和响应之间的某个地方。

错误的诊断：死掉的反向代理

深入 Apache 错误日志后，我发现了大量代理错误和"long lost child came home"警告——线程饥饿的经典标志。在约 250 个站点中，约 20 个站点的反向代理规则指向了不再存在的后端——死掉的本地端口和不可达的外部服务器。

再加上 600 秒的 Apache 超时设置和被禁用的 KeepAlive，这些死掉的后端正在消耗工作线程，让整个服务器陷入饥饿。我清理了所有这些配置，将超时降低到 100 秒，启用了 KeepAlive，自信满满地认为找到了根本原因。

我错了。

522 错误持续出现。间歇性故障依然存在。清理工作确实改善了 Apache 的整体健康状况，但它并不是请求在 Cloudflare 边缘超时的原因。

真正的调查

排除了代理这个干扰项后，我对共享同一源服务器的四个 Cloudflare 域名进行了系统性的可用性测试：c.、c.*、j.ee 和 cu.com*。

基线测试：问题很严重

60 秒突发测试，由操作系统选择首选 IP 版本：

| 域名 | 总计 | 成功 | 失败 | 成功率 | 平均 (ms) | |--------|-------|----|------|----------|----------| | c.** | 9 | 4 | 5 | 44.4% | 70 | | c.** | 13 | 8 | 5 | 61.5% | 59 | | j*.ee | 11 | 6 | 5 | 54.5% | 201 | | c*u.com | 8 | 2 | 6 | 25.0% | 53 |

超过一半的请求失败。所有失败的 HTTP 状态码都是 000——10 秒 TCP 连接超时。成功的请求响应很快（50–100ms）。失败是周期性批量出现的，而非随机。

线索：到底用了哪个 IP 版本？

我检查了服务器实际使用的出站 IP 版本：

目标                 出站 IP                         版本
1.1.1.1 (强制 v4)    43.***.***.98                     IPv4
[2606:4700::1111]     2400:****:****:****::2020        IPv6
c.** (实际)           2400:****:****:****::2020        IPv6  ← 系统优先 v6

操作系统默认将所有 Cloudflare 流量路由到 IPv6。

IPv4 vs IPv6：正面对决

我对每个域名并行运行了强制 -4 和 -6 的 curl 测试：

| 域名 | IPv4 成功率 | IPv4 成功/总计 | IPv6 成功率 | IPv6 成功/总计 | |--------|-----------|---------------|-----------|---------------| | c.** | 55% | 6/11 | 71% | 12/17 | | c. | 40% | 4/10 | 0% | 0/59** | | j*.ee | 25% | 2/8 | 44% | 4/9 | | c*u.com | 55% | 6/11 | 64% | 9/14 |

c. 在 IPv6 上完全不可用**——全部 59 次尝试失败。其他域名在两种协议上都有失败，但 IPv6 显然是不稳定的路径。IPv4 的失败可能是 IPv6 路由问题影响共享基础设施造成的连带损害。

铁证：禁用 IPv6

我在内核层面禁用了 IPv6：

sysctl -w net.ipv6.conf.all.disable_ipv6=1
sysctl -w net.ipv6.conf.default.disable_ipv6=1

然后运行相同的测试：

| 域名 | 总计 | 成功 | 失败 | 成功率 | 平均 (ms) | P50 (ms) | P95 (ms) | |--------|-------|----|------|----------|----------|----------|----------| | c. | 56 | 56 | 0 | 100%** | 55 | 52 | 62 | | c. | 56 | 56 | 0 | 100%** | 64 | 59 | 69 | | j*.ee | 55 | 55 | 0 | 100% | 78 | 73 | 106 | | c*u.com | 56 | 56 | 0 | 100% | 51 | 46 | 58 |

223 次请求中 223 次成功。零超时。零失败。

手动浏览也确认了——每个站点都即时、稳定地加载。

根本原因

服务器（2400:****:****:****::2020）与 Cloudflare HKG 边缘之间的 IPv6 路由故障。

症状： - IPv6 上的 TCP 连接会周期性地无法建立 - 失败表现为完整的 10 秒连接超时（不是 HTTP 错误） - 失败以周期性突发方式出现，而非随机 - 所有 Cloudflare 前置域名同样受影响 - c.** 在 IPv6 上完全不可达（可能是 AAAA 记录缺失或配置错误）

服务器操作系统（Linux）在 A 和 AAAA 记录同时可用时优先使用 IPv6，导致大部分出站流量走了那条不稳定的 IPv6 路径。

修复前后对比

| 域名 | 修复前 (IPv6 启用) | 修复后 (仅 IPv4) | |--------|----------------------|--------------------| | c. | 44–85% | 100%** | | c. | 0–62% | 100%** | | j*.ee | 25–55% | 100% | | c*u.com | 25–62% | 100% |

响应时间： - 修复前： 50–310ms，伴随 10,000ms 超时尖峰 - 修复后： 42–170ms，中位数 46–73ms，无尖峰

修复方案

即时修复（19:00 UTC 应用）：

sysctl -w net.ipv6.conf.all.disable_ipv6=1
sysctl -w net.ipv6.conf.default.disable_ipv6=1

永久修复（19:10 UTC 应用）：

写入 /etc/sysctl.conf：

net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1

经验教训

1. IPv6 故障是隐形的。 CPU、内存、磁盘、负载平均值——没有任何指标会告诉你 IPv6 路径坏了。服务器看起来完全正常，而一半的请求在悄无声息地死去。

2. 检查你的 IP 版本。 调试连接问题时，务必验证操作系统实际使用的是哪个协议。一台同时有 A 和 AAAA 记录的服务器可能正在将所有流量路由到一条坏掉的 IPv6 路径上，而你毫不知情。

3. 显而易见的问题可能是干扰项。 死掉的反向代理后端是真实存在的、日志中可见的、修起来也很有成就感的问题。但它们不是真正的原因。不要因为找到了一个问题就停止调查——确保它是那个问题。

4. 系统性测试。 使用强制协议选择的 IPv4 vs IPv6 并行测试，让根本原因无可辩驳。没有这个对照实验，我可能还在追逐 Apache 配置的幽灵。

5. 向上游报告。 主机商网络上的 IPv6 路由问题应该被报告——通往 Cloudflare HKG 的路径不稳定，可能影响其他客户。

那死掉的代理后端呢？

最初（错误）诊断中的反向代理清理并非白费功夫。指向不可达外部服务器、配合 600 秒超时的死后端确实在不必要地消耗 Apache 线程。修复这些问题、降低超时到 100 秒、启用 KeepAlive，都是对服务器健康状况的合理改善。

但那些是维护层面的问题，不是故障的原因。故障的原因是 IPv6。

最后的想法

讽刺的是，我曾经写了一整篇详细的博客文章，将根本原因归咎于死掉的反向代理，配有诊断命令和架构分析。写得很彻底、很有道理——但是错的。真正的元凶只需要两行 sysctl 命令就能揭示。

有时候最难的问题不在于你在日志里能看到什么，而在于日志永远不会告诉你什么。

IPv6 搞垮了我的伺服器——但我一開始怪錯了 Apache

間歇性 522 錯誤、一台看起來完全正常的伺服器、一個錯誤的診斷——真正的元兇是隱形的 IPv6 路由故障。

多個 Cloudflare 網站間歇性 522 錯誤。伺服器看起來完全正常。一個看似合理的錯誤診斷。以及一個隱形的 IPv6 路由故障——才是真正的元兇。

症狀

我的 Web 伺服器很慢。不是一般的慢——Cloudflare 在多個網站上回傳 522 連線逾時 錯誤。那些好不容易載入的頁面，又會在擷取 CSS 和 JavaScript 等靜態資源時卡住。重新啟動伺服器也沒用，問題幾分鐘內就會捲土重來。

誤導性的第一印象

每個系統管理員的第一反應都是檢查常見的可疑對象：

| 資源 | 狀態 | |----------|--------| | CPU | 12 核心，83% 閒置 | | 記憶體 | 總計 32GB，25GB 可用 | | 磁碟 | 26% 已用，0% I/O 等待 | | 負載平均 | 0.07 | | DNS | 38ms 解析時間 |

一切看起來都很正常。伺服器幾乎在「休眠」。那為什麼使用者會收到逾時錯誤呢？

環境配置

伺服器運行著一個相當常見的技術棧：

- Apache 2.4（event MPM）託管約 250 個虛擬主機 - PHP-FPM（多版本：7.4、8.2、8.3、8.5） - MySQL 8.0、Redis、Memcached - 寶塔面板（BT Panel）用於伺服器管理 - Cloudflare 在所有網站前方

用 curl 快速測試 localhost 確認 Web 伺服器本身很快——HTTP 回應時間不到一毫秒，HTTPS 只需 5ms。瓶頸在 Cloudflare 和回應之間的某個地方。

錯誤的診斷：死掉的反向代理

深入 Apache 錯誤日誌後，我發現了大量代理錯誤和「long lost child came home」警告——執行緒飢餓的經典標誌。在約 250 個網站中，約 20 個網站的反向代理規則指向了不再存在的後端——死掉的本地連接埠和不可達的外部伺服器。

再加上 600 秒的 Apache 逾時設定和被停用的 KeepAlive，這些死掉的後端正在消耗工作執行緒，讓整個伺服器陷入飢餓。我清理了所有這些配置，將逾時降低到 100 秒，啟用了 KeepAlive，自信滿滿地認為找到了根本原因。

我錯了。

522 錯誤持續出現。間歇性故障依然存在。清理工作確實改善了 Apache 的整體健康狀況，但它並不是請求在 Cloudflare 邊緣逾時的原因。

真正的調查

排除了代理這個干擾項後，我對共享同一源伺服器的四個 Cloudflare 網域進行了系統性的可用性測試：c.、c.*、j.ee 和 cu.com*。

基線測試：問題很嚴重

60 秒突發測試，由作業系統選擇首選 IP 版本：

| 網域 | 總計 | 成功 | 失敗 | 成功率 | 平均 (ms) | |--------|-------|----|------|----------|----------| | c.** | 9 | 4 | 5 | 44.4% | 70 | | c.** | 13 | 8 | 5 | 61.5% | 59 | | j*.ee | 11 | 6 | 5 | 54.5% | 201 | | c*u.com | 8 | 2 | 6 | 25.0% | 53 |

超過一半的請求失敗。所有失敗的 HTTP 狀態碼都是 000——10 秒 TCP 連線逾時。成功的請求回應很快（50–100ms）。失敗是週期性批量出現的，而非隨機。

線索：到底用了哪個 IP 版本？

我檢查了伺服器實際使用的出站 IP 版本：

目標                 出站 IP                         版本
1.1.1.1 (強制 v4)    43.***.***.98                     IPv4
[2606:4700::1111]     2400:****:****:****::2020        IPv6
c.** (實際)           2400:****:****:****::2020        IPv6  ← 系統優先 v6

作業系統預設將所有 Cloudflare 流量路由到 IPv6。

IPv4 vs IPv6：正面對決

我對每個網域並行運行了強制 -4 和 -6 的 curl 測試：

| 網域 | IPv4 成功率 | IPv4 成功/總計 | IPv6 成功率 | IPv6 成功/總計 | |--------|-----------|---------------|-----------|---------------| | c.** | 55% | 6/11 | 71% | 12/17 | | c. | 40% | 4/10 | 0% | 0/59** | | j*.ee | 25% | 2/8 | 44% | 4/9 | | c*u.com | 55% | 6/11 | 64% | 9/14 |

c. 在 IPv6 上完全不可用**——全部 59 次嘗試失敗。其他網域在兩種協定上都有失敗，但 IPv6 顯然是不穩定的路徑。IPv4 的失敗可能是 IPv6 路由問題影響共享基礎設施造成的連帶損害。

鐵證：停用 IPv6

我在核心層面停用了 IPv6：

sysctl -w net.ipv6.conf.all.disable_ipv6=1
sysctl -w net.ipv6.conf.default.disable_ipv6=1

然後運行相同的測試：

| 網域 | 總計 | 成功 | 失敗 | 成功率 | 平均 (ms) | P50 (ms) | P95 (ms) | |--------|-------|----|------|----------|----------|----------|----------| | c. | 56 | 56 | 0 | 100%** | 55 | 52 | 62 | | c. | 56 | 56 | 0 | 100%** | 64 | 59 | 69 | | j*.ee | 55 | 55 | 0 | 100% | 78 | 73 | 106 | | c*u.com | 56 | 56 | 0 | 100% | 51 | 46 | 58 |

223 次請求中 223 次成功。零逾時。零失敗。

手動瀏覽也確認了——每個網站都即時、穩定地載入。

根本原因

伺服器（2400:****:****:****::2020）與 Cloudflare HKG 邊緣之間的 IPv6 路由故障。

症狀： - IPv6 上的 TCP 連線會週期性地無法建立 - 失敗表現為完整的 10 秒連線逾時（不是 HTTP 錯誤） - 失敗以週期性突發方式出現，而非隨機 - 所有 Cloudflare 前置網域同樣受影響 - c.** 在 IPv6 上完全不可達（可能是 AAAA 記錄缺失或配置錯誤）

伺服器作業系統（Linux）在 A 和 AAAA 記錄同時可用時優先使用 IPv6，導致大部分出站流量走了那條不穩定的 IPv6 路徑。

修復前後對比

| 網域 | 修復前 (IPv6 啟用) | 修復後 (僅 IPv4) | |--------|----------------------|--------------------| | c. | 44–85% | 100%** | | c. | 0–62% | 100%** | | j*.ee | 25–55% | 100% | | c*u.com | 25–62% | 100% |

回應時間： - 修復前： 50–310ms，伴隨 10,000ms 逾時尖峰 - 修復後： 42–170ms，中位數 46–73ms，無尖峰

修復方案

即時修復（19:00 UTC 應用）：

sysctl -w net.ipv6.conf.all.disable_ipv6=1
sysctl -w net.ipv6.conf.default.disable_ipv6=1

永久修復（19:10 UTC 應用）：

寫入 /etc/sysctl.conf：

net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1

經驗教訓

1. IPv6 故障是隱形的。 CPU、記憶體、磁碟、負載平均值——沒有任何指標會告訴你 IPv6 路徑壞了。伺服器看起來完全正常，而一半的請求在悄無聲息地死去。

2. 檢查你的 IP 版本。 除錯連線問題時，務必驗證作業系統實際使用的是哪個協定。一台同時有 A 和 AAAA 記錄的伺服器可能正在將所有流量路由到一條壞掉的 IPv6 路徑上，而你毫不知情。

3. 顯而易見的問題可能是干擾項。 死掉的反向代理後端是真實存在的、日誌中可見的、修起來也很有成就感的問題。但它們不是真正的原因。不要因為找到了一個問題就停止調查——確保它是那個問題。

4. 系統性測試。 使用強制協定選擇的 IPv4 vs IPv6 並行測試，讓根本原因無可辯駁。沒有這個對照實驗，我可能還在追逐 Apache 配置的幽靈。

5. 向上游報告。 主機商網路上的 IPv6 路由問題應該被報告——通往 Cloudflare HKG 的路徑不穩定，可能影響其他客戶。

那死掉的代理後端呢？

最初（錯誤）診斷中的反向代理清理並非白費功夫。指向不可達外部伺服器、配合 600 秒逾時的死後端確實在不必要地消耗 Apache 執行緒。修復這些問題、降低逾時到 100 秒、啟用 KeepAlive，都是對伺服器健康狀況的合理改善。

但那些是維護層面的問題，不是故障的原因。故障的原因是 IPv6。

最後的想法

諷刺的是，我曾經寫了一整篇詳細的部落格文章，將根本原因歸咎於死掉的反向代理，配有診斷命令和架構分析。寫得很徹底、很有道理——但是錯的。真正的元兇只需要兩行 sysctl 命令就能揭示。

有時候最難的問題不在於你在日誌裡能看到什麼，而在於日誌永遠不會告訴你什麼。

#ipv6 #cloudflare #apache #troubleshooting #devops #networking

SHARE TRANSMISSION

IPv6 Broke My Server — But I Blamed Apache First

The Symptom

The Misleading First Look

The Setup

The Wrong Diagnosis: Dead Reverse Proxies

The Real Investigation

Baseline: Something Is Very Wrong

The Clue: Which IP Version?

IPv4 vs IPv6: Head-to-Head

The Smoking Gun: Disable IPv6

The Root Cause

Before vs After

The Fix

Immediate (applied at 19:00 UTC):

Permanent (applied at 19:10 UTC):

Lessons Learned

What About the Dead Proxies?

Final Thought

IPv6 搞垮了我的服务器——但我一开始怪错了 Apache

症状

误导性的第一印象

环境配置

错误的诊断：死掉的反向代理

真正的调查

基线测试：问题很严重

线索：到底用了哪个 IP 版本？

IPv4 vs IPv6：正面对决

铁证：禁用 IPv6

根本原因

修复前后对比

修复方案

即时修复（19:00 UTC 应用）：

永久修复（19:10 UTC 应用）：

经验教训

那死掉的代理后端呢？

最后的想法

IPv6 搞垮了我的伺服器——但我一開始怪錯了 Apache

症狀

誤導性的第一印象

環境配置

錯誤的診斷：死掉的反向代理

真正的調查

基線測試：問題很嚴重

線索：到底用了哪個 IP 版本？

IPv4 vs IPv6：正面對決

鐵證：停用 IPv6

根本原因

修復前後對比

修復方案

即時修復（19:00 UTC 應用）：

永久修復（19:10 UTC 應用）：

經驗教訓

那死掉的代理後端呢？

最後的想法

Related Transmissions

What Is SSO? From 玉玺 to WeChat Login

Quantum Responses