I’m trying to build up a push system recently. To increase the scalability of the system, the best practice is to make each connection as stateless as possible. Therefore when bottleneck appears, the capacity of the whole system can be easily expanded by adding more machines. Speaking of load balancing and reverse proxying, Nginx is probably the most famous and acknowledged one. However, TCP proxying is a rather recent thing. Nginx introduced TCP load balancing and reverse proxying from v1.9, which is released in late May this year with a lot of missing features. On the other hand, HAProxy, as the pioneer of TCP loading balacing, is rather mature and stable. I chose to use HAProxy to build up the system and eventually I reached a result of 300k concurrent tcp socket connections. I could have achieved a higher number if it were not for my rather outdated client PC.
我最近正在尝试构建一个推送系统。为了提高系统的可伸缩性,最佳实践是使每个连接尽可能无状态。因此,当出现瓶颈时,可以通过添加更多机器来轻松扩展整个系统的容量。说到负载均衡和反向代理,Nginx 可能是最著名和公认的。但是,TCP 代理是一件相当新的事情。Nginx 从今年 5 月下旬发布的 v1.9 开始引入了 TCP 负载均衡和反向代理,但有很多缺失的功能。另一方面,HAProxy 作为 TCP 负载均衡的先行者,相对成熟和稳定。我选择使用 HAProxy 来构建系统,最终我达到了 300k 并发 tcp 套接字连接的结果。如果不是因为我相当过时的客户端 PC,我本可以实现更高的数字。
Step 1. Tuning the Linux system
步骤 1.优化 Linux 系统
300k concurrent connection is not a easy job for even the high end server PC. To begin with, we need to tune the linux kernel configuration to make the most use of our server.
即使对于高端服务器 PC 来说,300k 并发连接也不是一件容易的事。首先,我们需要调整 linux 内核配置以充分利用我们的服务器。
File Descriptors 文件描述符
Since sockets are considered equivalent to files from the system perspective, the default file descriptors limit is rather small for our 300k target. Modify /etc/sysctl.conf to add the following lines:
由于从系统角度来看,套接字等同于文件,因此对于我们的 300k 目标,默认文件描述符限制相当小。修改 /etc/sysctl.conf 以添加以下行:
fs.file-max = 10000000 fs.nr_open = 10000000
These lines increase the total file descriptors’ number to 1 million.
这些行将文件描述符的总数增加到 100 万个。
Next, modify /etc/security/limits.conf to add the following lines:
接下来,修改 /etc/security/limits.conf 以添加以下行:
* soft nofile 10000000 * hard nofile 10000000 root soft nofile 10000000 root hard nofile 10000000
If you are a non-root user, the first two lines should do the job. However, if you are running HAProxy as root user, you need to claim that for root user explicitly.
如果您是非 root 用户,则前两行应该可以完成这项工作。但是,如果您以 root 用户身份运行 HAProxy,则需要为 root 用户显式声明该 HAProxy。
TCP Buffer TCP 缓冲区
Holding such a huge number of connections costs a lot of memory. To reduce memory use, modify /etc/sysctl.conf to add the following lines.
持有如此大量的连接会消耗大量内存。要减少内存使用,请修改 /etc/sysctl.conf 以添加以下行。
net.ipv4.tcp_mem = 786432 1697152 1945728 net.ipv4.tcp_rmem = 4096 4096 16777216 net.ipv4.tcp_wmem = 4096 4096 16777216
Step 2. Tuning HAProxy
步骤 2。调整 HAProxy
Upon finishing tuning Linux kernel, we need to tune HAProxy to better fit our requirements.
完成 Linux 内核调优后,我们需要调优 HAProxy 以更好地满足我们的要求。
Increase Max Connections
增加最大连接数
In HAProxy, there is a “max connection cap” both globally and backend specifically. In order to increase the cap, we need to add a line of configuration under the global scope.
在 HAProxy 中,全局和后端都有一个 “最大连接上限”。为了提高 cap,我们需要在 global scope 下添加一行配置。
maxconn 2000000
Then we add the same line to our backend scope, which makes our backend look like this:
然后,我们将相同的行添加到后端范围,这将使我们的后端看起来像这样:
backend pushserver
mode tcp
balance roundrobin
maxconn 2000000
Tuning Timeout 优化超时
By default, HAProxy will detect dead connections and close inactive ones. However, the default keepalive threshold is too low and when applied to a circumstance where connections have to be kept in a long-pulling way. From my client side, my long socket connection to the push server is always closed by HAProxy as the heartbeat is 4 minutes in my client implementation. Heartbeat that is too frequent is a heavy burden for both client (actually android device) and server. To increase this limit, add the following lines to your backend. By default these numbers are all in milliseconds.
默认情况下,HAProxy 将检测死连接并关闭非活动连接。但是,默认的 keepalive 阈值太低,并且当应用于必须以长拉方式保持连接的情况时。从我的客户端来看,我与推送服务器的长套接字连接总是被 HAProxy 关闭,因为在我的客户端实现中,心跳是 4 分钟。太频繁的心跳对客户端(实际上是 android 设备)和服务器来说都是一个沉重的负担。要提高此限制,请将以下行添加到您的后端。默认情况下,这些数字均以毫秒为单位。
timeout connect 5000 timeout client 50000 timeout server 50000
Configuring Source IP to solve port exhaustion
配置源 IP 解决端口耗尽问题
When you are facing simultaneous 30k connections, you will encounter the problem of “port exhaustion”. It is resulted from the fact that each reverse proxied connection will occupy an available port of a local IP. The default IP range that is available for outgoing connections is around 30k~60k. In other words, we only have 30k ports available for one IP. This is not enough. We can increase this range by modify /etc/sysctl.conf to add the following line.
当你同时面对 30k 连接时,你会遇到 “端口耗尽” 的问题。这是因为每个反向代理连接都将占用本地 IP 的可用端口。可用于传出连接的默认 IP 范围约为 30k~60k。换句话说,我们只有一个 IP 可用的 30k 端口。这还不够。我们可以通过修改 /etc/sysctl.conf to add the following line. 来增加这个范围
net.ipv4.ip_local_port_range = 1000 65535
But this does not solve the root problem, we will still run out of ports when the 60k cap is reached.
但这并不能解决根本问题,当达到 60k 上限时,我们仍然会用完端口。
The ultimate solution to this port exhaustion issue is to increase the number of available IPs. First of all, we bind a new IP to a new virtual network interface.
此端口耗尽问题的最终解决方案是增加可用 IP 的数量。首先,我们将新的 IP 绑定到新的虚拟网络接口。
ifconfig eth0:1 192.168.8.1
This command bind a intranet address to a virtual network interface eth0:1 whose hardware interface is eth0. This command can be executed several times to add arbitrary number of virtual network interfaces. Just remember that the IP should be in the same sub-network of your real application server. In other words, you cannot have any kind of NAT service in your link between HAProxy and application server. Otherwise, this will not work.
该命令将内网地址绑定到硬件接口为 eth0 的虚拟网络接口 eth0:1 上。此命令可以多次执行,以添加任意数量的虚拟网络接口。请记住,IP 应该位于您的真实应用程序服务器的同一子网中。换句话说,在 HAProxy 和应用程序服务器之间的链接中不能有任何类型的 NAT 服务。否则,这将不起作用。
Next, we need to config HAProxy to use these fresh IPs. There is a source command that can be used either in a backend scope or as a argument of server command. In our experiment, the backend scope one doesn’t seem to work, so we chose the argument one. This is how HAProxy config file looks like.
接下来,我们需要配置 HAProxy 以使用这些新的 IP。有一个 source 命令,既可以在后端范围内使用,也可以用作 server command 的参数。在我们的实验中,后端范围 1 似乎不起作用,因此我们选择了参数 1。这是 HAProxy 配置文件的样子。
backend mqtt
mode tcp
balance roundrobin
maxconn 2000000
server app1 127.0.0.1:1883 source 192.168.8.1
server app2 127.0.0.1:1883 source 192.168.8.2
server app3 127.0.0.1:1883 source 192.168.8.3
server app4 127.0.0.1:1884 source 192.168.8.4
server app5 127.0.0.1:1884 source 192.168.8.5
server app6 127.0.0.1:1884 source 192.168.8.6
Here is the trick, you need to declare them in multiple entries and give them different app names. If you set the same app name for all four entries, the HAProxy will just not work. If you can have a look at the output of HAProxy status report, you will see that even though these entries has the same backend address, HAProxy still treats them as different apps.
这是诀窍,您需要在多个条目中声明它们并给它们不同的应用程序名称。如果您为所有四个条目设置了相同的应用程序名称,则 HAProxy 将不起作用。如果你能查看 HAProxy 状态报告的输出,你会发现即使这些条目具有相同的后端地址,HAProxy 仍然将它们视为不同的应用程序。
That’s all for the configuration! Now your HAProxy should be able to handle over 300k concurrent TCP connections, just as mine.
这就是配置的全部内容!现在,您的 HAProxy 应该能够处理超过 300k 的并发 TCP 连接,就像我的一样。
I'm not sure about the IP source exhaustion solution:
我不确定 IP 源耗尽解决方案:
the "net.ipv4.ip_local_port_range = 1000 65535" tweak makes sense.
“net.ipv4.ip_local_port_range = 1000 65535”的调整是有道理的。
This will allow ~60.000 conns targeting a single backend server (having its own IP in a real world szenario).
这将允许 ~60.000 个连接针对单个后端服务器(在现实世界的 szenario 中拥有自己的 IP)。
The next 60.000 conns can target the next backend server (having another than the first backend and so on).
接下来的 60.000 个连接可以针对下一个后端服务器(具有第一个后端以外的其他服务器,依此类推)。
Adding additional IP's to local network interface is only required when targeting a single backend.
仅当以单个后端为目标时,才需要向本地网络接口添加额外的 IP。
Yeah, it's just as you said.
是的,就像你说的。
Our backend server has the ability to handle over 60,000 connections, that's why we have to do this to maximize the capability of the back end server.
我们的后端服务器能够处理超过 60,000 个连接,这就是为什么我们必须这样做以最大限度地发挥后端服务器的能力。
openwrt 一直修改ulimit,还是运行一段时间之后还是会半死不活的状态,绝望
Hi there, 嘿,你好
Thanks for a great tutorial. I studied it twice trying to fix issue we are having with our Chrome Ext. and a PHP Ratchet backend server. Problem is that there is a limit on HAProxy or PHP itself (or Debian?) that limits number of concurrent connections.
感谢您的精彩教程。我研究了它两次,试图解决我们在使用 Chrome Ext. 和 PHP Ratchet 后端服务器时遇到的问题。问题是 HAProxy 或 PHP 本身(或 Debian?)有一个限制并发连接数量的限制。
We had a PHP Websocket server run on port 8080 and limit of concurrent connections was around 1000 connections (1024?), so we have implemented HAProxy and now its LoadBalance traffic from 8080 to 8081, 8082, 8083 and so on (so we have multiple instances of Websocket server on different ports to handle more clients) … unfortunately after hours or digging around (few of thing from your tutorial were already implemented) and changes of configuration 2000 (2048?) is the highest number we can go!
我们有一个 PHP Websocket 服务器运行在端口 8080 上,并发连接数限制为 1000 个连接(1024?),因此我们实现了 HAProxy,现在它的 LoadBalance 流量从 8080 到 8081、8082、8083 等(因此我们在不同的端口上有多个 Websocket 服务器实例来处理更多的客户端)...不幸的是,经过几个小时或四处挖掘(您的教程中很少有东西已经实现)和配置更改 2000 (2048?) 是我们能达到的最高数字!
Do you have any idea what might be wrong? Would you have time to have a look at our setup and infrastructure?
您知道可能出了什么问题吗?您有时间看看我们的设置和基础设施吗?
Thanks! 谢谢!
Is your PHP side able to handle 2000 connections?
您的 PHP 端是否能够处理 2000 个连接?
Note that if you're connecting to 127.0.0.1, you don't need to bind to a "public" address, just use 127.X.Y.Z, they're all yours!
注意,如果你要连接 127.0.0.1,不需要绑定 “公共” 地址,只需使用 127.X.Y.Z 即可,它们都是你的!
Correct! 正确!
I don't understand the significance of this comment:
我不明白这条评论的意义:
"Note that if you're connecting to 127.0.0.1, you don't need to bind to a "public" address, just use 127.X.Y.Z, they're all yours!"
“注意,如果你要连接到 127.0.0.1,你不需要绑定到 '公共' 地址,只需使用 127.X.Y.Z,它们都是你的!”
Can you explain in more detail?
您能更详细地解释一下吗?
Thanks for share! 感谢分享!
hi 你好
tnx for your Article.
tnx 为您的文章。
I saw you use the loop back IP Address (127.0.0.1) on backend.
我看到您在后端使用环回 IP 地址 (127.0.0.1)。
haproxy service and Your APP run on same server?
haproxy 服务和您的 APP 运行在同一台服务器上?
This is just a demo config. In this demo, yes.
这只是一个演示配置。在这个演示中,是的。
Hi 你好
I use haproxy-1.5.14-3.el7.x86_64 on centos 7.2 whit kernel 3.10.0-327.18.2.el7.x86_64
我在 centos 7.2 whit 内核 3.10.0-3.el7.x86_64 上使用 haproxy-1.5.14-327.18.2.el7.x86_64
I set two ip on haproxy server for Example eth0=10.10.10.1 and Virtual interface eth0:1 = 10.10.10.2 and use one backend server whit IP 10.10.10.11
我在 haproxy 服务器上设置了两个 ip,例如 eth0=10.10.10.1 和虚拟接口 eth0:1 = 10.10.10.2,并使用一个 IP 为 10.10.10.11 的后端服务器
I use “source” on configuretion file on haproxy for send request from two IP Address (eth0=10.10.10.1 and eth0:1=10.10.10.2) to backend side,plz see this config :
我在 haproxy 上的配置文件上使用“source”从两个 IP 地址(eth0=10.10.10.10.1 和 eth0:1=10.10.10.10.2)向后端发送请求,请参阅此配置:
backend test 后端测试
mode tcp TCP 模式
log global 对数全局
option tcplog 选项 tcplog
option tcp-check 选项 tcp-check
balance roundrobin 余额循环
server myapp-A 10.10.10.11:9999 check source 10.10.10.1
服务器 myapp-A 10.10.10.11:9999 检查源 10.10.10.1
server myapp-B 10.10.10.11:9999 check source 10.10.10.2
服务器 myapp-B 10.10.10.11:9999 检查源 10.10.10.2
With this scenario,i get 120k connection on backend side (10.10.10.11) and Everything is ok.
在这种情况下,我在后端(10.10.10.11)获得 120k 连接,一切正常。
for give more connection I add other backend server for Example 10.10.10.12 , plz see this config :
为了提供更多连接,我添加了其他后端服务器,例如 10.10.10.12,请参阅此配置:
backend test 后端测试
mode tcp TCP 模式
log global 对数全局
option tcplog 选项 tcplog
option tcp-check 选项 tcp-check
balance roundrobin 余额循环
server myapp-A 10.10.10.11:9999 check source 10.10.10.1
服务器 myapp-A 10.10.10.11:9999 检查源 10.10.10.1
server myapp-B 10.10.10.11:9999 check source 10.10.10.2
服务器 myapp-B 10.10.10.11:9999 检查源 10.10.10.2
server myapp-C 10.10.10.12:9999 check source 10.10.10.1
服务器 myapp-C 10.10.10.12:9999 检查源 10.10.10.1
server myapp-D 10.10.10.12:9999 check source 10.10.10.2
服务器 myapp-D 10.10.10.12:9999 检查源 10.10.10.2
In this scenario i expected give 120k on Each backend server,But no! On each backend server only give 60k conncetion!
在这个场景下我预期给 120k 的 Back end Server,但是没有!在每个后端服务器上只提供 60k 连接!
what was wrong? 出了什么问题?
can you help me?
你可以帮我吗?
Tnx
Looks like proxy exhausted its port. You need more IPs for each proxy.
看起来代理用尽了它的端口。每个代理需要更多 IP。
backend mqtt 后端 MQTT
mode tcp TCP 模式
balance roundrobin 余额循环
maxconn 2000000 麦斯康 2000000
server app1 127.0.0.1:1883 source 192.168.8.1
服务器 app1 127.0.0.1:1883 源 192.168.8.1
server app2 127.0.0.1:1883 source 192.168.8.2
服务器 app2 127.0.0.1:1883 源 192.168.8.2
server app3 127.0.0.1:1883 source 192.168.8.3
服务器 app3 127.0.0.1:1883 源 192.168.8.3
server app4 127.0.0.1:1884 source 192.168.8.4
服务器 app4 127.0.0.1:1884 源 192.168.8.4
server app5 127.0.0.1:1884 source 192.168.8.5
服务器 app5 127.0.0.1:1884 源 192.168.8.5
server app6 127.0.0.1:1884 source 192.168.8.6
服务器 app6 127.0.0.1:1884 源 192.168.8.6
In above configuration, does it mean that we will have two MQTT nodes run on port 1883 and 1884?
在上面的配置中,是否意味着我们将有两个 MQTT 节点运行在端口 1883 和 1884 上?
Yes. Server should be able to handle requests from both ports.
是的。服务器应该能够处理来自两个端口的请求。
Setting the hard and soft limits to 10 million like you posted will result in a broken system – this is too much even for our Dell R630's that are running CentOS 6.7 (128GB memory)!
像你贴的一样,将硬性和软性限制设置为 1000 万会导致系统损坏 —— 即使对于我们运行 CentOS 6.7(128GB 内存)的戴尔 R630 来说,这也太过分了!
1 million is the maximum that you can set these to – I think you have a typo.
100 万是您可以设置的最大值 – 我认为您有拼写错误。
Not quite sure about CentOS. Was using Debian and able to reach the number.
不太确定 CentOS。正在使用 Debian 并能够达到该号码。
You need to set more File Descriptors to be able set more than 1 million. I solved that last day and it is hard to google it. Take look for sysctl fs.nr_open there is by default set 1 million and fs.file-max. Then you will be able set ulimit more than 1 million.
您需要设置更多的 File Descriptors 才能设置超过 100 万个。我解决了最后一天的问题,很难用谷歌搜索。查找 sysctl fs.nr_open 默认情况下设置为 100 万和 fs.file-max。然后,您将能够设置 ulimit 超过 100 万。
Petr 彼得
Hello, 你好
We have two redis web servers behind haproxy, but i need all traffic should go to Redis-web1 only and haproxy should divert traffic to Redis-web2 only when Redis-web1 is down ?
我们在 haproxy 后面有两个 redis Web 服务器,但我需要所有流量都应该只流向 Redis-web1,并且只有当 Redis-web1 宕机时,haproxy 才应该将流量转移到 Redis-web2?
Is this possible ? Please suggest
这可能吗 ?请建议
Thanks 谢谢
Sushil R 苏希尔 R
What happens if one using haproxy to proxy traffic to remote servers?
如果使用 haproxy 将流量代理到远程服务器,会发生什么情况?
Will the virtual network interface still work? I noticed you suing localhost which means apps will be running locally where haproxy is, but for cases where the apps are running on another server does it mean this is still possible?
虚拟网络接口是否仍有效?我注意到您起诉 localhost,这意味着应用程序将在 haproxy 所在的本地运行,但对于应用程序在另一台服务器上运行的情况,这是否意味着这仍然是可能的?
If it will be possible then does it mean i will have to create the virtual interfaces on the remote servers? I am guessing that will not be possible right?
如果可能的话,这是否意味着我必须在远程服务器上创建虚拟接口?我猜这是不可能的,对吧?
Please let me know if you understand my question.
如果您理解我的问题,请告诉我。
Thanks!!! 谢谢!!!
It's definitely doable, just creating the virtual interface will be more complicated. In the meanwhile, your remote server should be configured to accept multiple connections from the same host.
这绝对是可行的,只是创建虚拟界面会更复杂。同时,您的远程服务器应配置为接受来自同一主机的多个连接。
Hi, 你好
What's the significance of having the server listen on two different port numbers to this setup? Server won't have any port exhaustion issues because it is not initiating outbound connections the same way haproxy is.
让服务器侦听两个不同的端口号对此设置有什么意义?服务器不会有任何端口耗尽问题,因为它不会像 haproxy 那样启动出站连接。
regards, 问候
Just a demo for load balancing. No actual usage if you only have one server.
只是一个负载均衡的演示。如果您只有一台服务器,则没有实际使用情况。
ist real production level? can you give detail specification e.g ram, proc, cpu?
真正的生产水平?您能否提供详细的规格,例如 RAM、PROC、CPU?
I kind of forgot. It's just a normal server configuration, like 16 physical cores with 64GB RAM IIRC.
我有点忘了。它只是一个普通的服务器配置,比如 16 个物理内核和 64GB RAM IIRC。
Pingback: How we fine-tuned HAProxy to achieve 2,000,000 concurrent SSL connections | Cong Nghe Thong Tin - Quang Tri He Thong
pingback 的: 我们如何微调 HAProxy 以实现 2,000,000 个并发 SSL 连接 |Cong Nghe Thong Tin - Quang Tri He Thong
These lines increase the total file descriptors’ number to 1 million.
这些行将文件描述符的总数增加到 100 万个。
Next, modify /etc/security/limits.conf to add the following lines:
接下来,修改 /etc/security/limits.conf 以添加以下行:
* soft nofile 10000000 * 软 NOFILE 10000000
* hard nofile 10000000
* 困难 NOFILE 10000000
root soft nofile 10000000
root hard nofile 10000000
The above setting is harmful, it will prevent you from logging into your server. Apply this with caution
上述设置是有害的,它会阻止您登录服务器。请谨慎应用
I am using haproxy to loadbalance my MQTT brokers cluster. Each MQTT Broker can handle up to 1,00,000 Connections easily. But the problem i am facing with haproxy is that is only handling upto30k connections per node. Whenever if any node is reaching near 32k connections, the haproxy CPU Would suddenly spike to 100% and now all connections start dropping.
我正在使用 haproxy 对我的 MQTT 代理集群进行负载均衡。每个 MQTT 代理可以轻松处理多达 1,00,000 个连接。但是我在使用 haproxy 时面临的问题是每个节点最多只能处理 30k 个连接。每当任何节点达到接近 32k 连接时,haproxy CPU 就会突然飙升至 100%,现在所有连接都开始下降。
The problem with this is, that for every 30k connection, i have to roll another MQTT broker. How can I increase it to at least 60k connections per MQTT broker node?
这样做的问题是,对于每 30k 个连接,我必须滚动另一个 MQTT 代理。如何将其增加到每个 MQTT 代理节点至少 60k 个连接?
Note: I cannot increase virtual network interfaces in digitalocean vpc.
注意:我无法在 digitalocean vpc 中增加虚拟网络接口。
My config – 我的配置 –
“`
bind 0.0.0.0:1883 绑定 0.0.0.0:1883
maxconn 1000000 麦斯康 1000000
mode tcp TCP 模式
#sticky session load balancing – new feature
#sticky 会话负载平衡 – 新功能
balance source 平衡源
stick-table type string len 32 size 200k expire 30m
棍子桌子型带子 LEN 32 尺寸 200k 过期 30m
stick on req.payload(0,0),mqtt_field_value(connect,client_identifier)
坚持 req.payload(0,0),mqtt_field_value(connect,client_identifier)
option clitcpka # For TCP keep-alive
option clitcpka # 用于 TCP 保持活动状态
option tcplog 选项 tcplog
timeout client 600s 超时客户端 600 秒
timeout server 2h 超时服务器 2H
timeout check 5000 超时检查 5000
server mqtt1 10.20.236.140:1883 check-send-proxy send-proxy-v2 check inter 10s fall 2 rise 5
服务器 mqtt1 10.20.236.140:1883 check-send-proxy send-proxy-v2 检查间 10s 下降 2 上升 5
server mqtt2 10.20.236.142:1883 check-send-proxy send-proxy-v2 check inter 10s fall 2 rise 5
服务器 mqtt2 10.20.236.142:1883 check-send-proxy send-proxy-v2 检查内部 10 秒下降 2 上升 5
server mqtt3 10.20.236.143:1883 check-send-proxy send-proxy-v2 check inter 10s fall 2 rise 5
服务器 mqtt3 10.20.236.143:1883 check-send-proxy send-proxy-v2 检查内部 10 秒下降 2 上升 5
“`
I have done the net.ipv4.ip_local_port_range = 1000 65535 thing.
我已经完成了 net.ipv4.ip_local_port_range = 1000 65535 的事情。
Running haproxy 2.4 on ubuntu 20
在 ubuntu 20 上运行 haproxy 2.4