Contents
Preface
As mentioned at the end of a previous article (see article:Docker series Traefik file dynamic configuration practice: efficient implementation of local network load balancing), I am not satisfied with Traefik to realize the load balancing (hot standby) of the main blog site and the backup blog site in the intranet of my home data center. After thinking about it, I decided to try HAProxy. After all, this is the traditional professional solution for load balancing of applications (non-docker environment, non-k8s environment): Traefik was originally designed for microservices and containerized environments, and it is a bit unsuitable to let it do traditional load balancing. Of course, the most important thing is to write another article.
Note: To read this article, you need to have a certain understanding of traditional load balancing (another name is "application delivery"), because you need a lot of basic knowledge in this field. Otherwise, you may feel uncomfortable while reading (side effects such as dizziness and drowsiness).
HAproxy Introduction
HAProxy is a high-performance, reliable open source load balancer and reverse proxy software, widely used in Web services, databases and other high-concurrency scenarios. It supports L4 (transport layer) and L7 (application layer) traffic distribution, and provides rich features, including multiple scheduling algorithms, fine-grained health checks, SSL/TLS termination, session persistence, etc. With its excellent performance, stability and flexibility, HAProxy has become the preferred load balancing solution for many enterprises and large-scale websites.
Compared with other common free application load balancing software, HAproxy has unique advantages in terms of comprehensiveness, performance, and configuration:
software | Features | Comparing the Disadvantages of HAProxy |
---|---|---|
Nginx | Lightweight, widely used in reverse proxy, supports L4 and L7. | The load balancing function is not the core, the performance and flexibility are slightly inferior, and the configuration complexity is higher than HAProxy. |
LVS | Kernel-level load balancing, with extremely high performance, is suitable for ultra-large-scale traffic scenarios. | It only supports L4, lacks application layer functions, is complex to configure, and is difficult to debug. |
Apache | It is feature-rich, modularly designed, and suitable for existing Apache environments. | The performance is far inferior to HAProxy, the configuration is complex and the support for high concurrency is poor. |
Keepalived | It can be used together with LVS to achieve high availability, health check and failover suitable for simple scenarios. | The independent use function is limited, tends to be health check and high availability, and is not a professional load balancer. |
Pen/Balance | A lightweight load balancer that is simple and easy to use and suitable for small-scale environments. | The functions are very limited and only support basic load balancing requirements, which is not suitable for complex scenarios. |
Therefore, from the perspective of traditional professional load balancing software,HAProxy is one of the most comprehensive and mature solutions among the free load balancing software., especially in the following aspects:
1. Performance and stability
• high performance: HAProxy is specially designed for load balancing and is highly optimized. A single instance can easily support hundreds of thousands or even millions of concurrent connections.
• stability: After years of practical testing, HAProxy has performed very stably in production environments and is widely used in high-traffic websites and enterprises around the world.
2. Comprehensive functions
• Support multiple protocols: Supports TCP and HTTP/HTTPS protocols, covering load balancing requirements from L4 (transport layer) to L7 (application layer).
• Diversified scheduling algorithms: Supports multiple load balancing algorithms such as polling, least connection, source address hash, weight, etc.
• Health Check: Provides fine-grained backend health checks to ensure that only healthy backend servers receive traffic.
• SSL/TLS Termination: Supports termination and forwarding of SSL/TLS traffic, simplifying backend configuration.
• HTTP layer functionality: Supports advanced features such as URL rewriting, request and response modification, and session persistence.
3. Easy to expand and integrate
• High Availability: HAProxy can be used with Keepalived to achieve a high availability architecture (Active-Passive or Active-Active).
• Dynamic backend management: Supports dynamic addition or removal of backend servers without restarting the service.
• Logging and Monitoring: Provides detailed logging capabilities and can be integrated with monitoring tools such as Prometheus and Grafana.
4. Community and Support
• Strong communityHAProxy has a very active community and detailed documentation, making it easy to find solutions when you encounter problems.
• Corporate Support: HAProxy offers a commercial version (HAProxy Enterprise), which includes more features (such as advanced health checks, fast patch support, etc.) and professional support services.
Why is HAProxy the most comprehensive?
• Wide range of functions: From small applications to enterprise-level deployments, from simple polling to complex session persistence, HAProxy can meet almost all needs.
• High performance and flexibility: Even in high-concurrency scenarios, HAProxy can still flexibly handle complex rules.
• Adapt to traditional application scenarios: Compared to modern tools like Traefik, HAProxy's configuration method is more traditional, intuitive, and more suitable for non-containerized environments.
• Ease of use and documentation: Although the configuration file may seem a bit complicated at first glance, the logic is clear and the documentation is rich, so the learning cost is relatively low.
therefore,HAProxy is the first choice in traditional professional load balancing scenarios, especially in applications requiring stability, performance and flexibility.
另:原本我对Traefik还充满期望,但是试用之后还是失望了。
HAproxy deployment
Deployment method selection
There are two ways to deploy HAproxy, source code installation (taking APT as an example) and docker installation. The advantages and disadvantages of the two methods are compared as follows:
Comparison Items | Source code deployment | Deployment via Docker |
---|---|---|
Advantage | ||
Deployment complexity | Installation through the package manager is simple and fast, and no additional tool support is required. | No complex dependencies are required, and it can be quickly started with just a Docker environment, making it suitable for cross-platform deployment. |
flexibility | Configuration directly modifies system files, which is easy to operate and has no container restrictions. | It can be adjusted by mounting configuration files and environment variables, with strong isolation and adaptability to different scenarios. |
Performance | No container overhead, stable performance, suitable for high-concurrency scenarios. | The performance is suitable for most general scenarios, with strong environmental isolation and little impact on the host environment. |
Update and Maintenance | use apt upgrade Automatic updates simplify maintenance. |
Image updates are quick and easy, just pull the new version of the image and restart the container. |
Dependency Management | Package managers automatically handle dependencies, reducing conflict issues. | All dependencies are encapsulated in the image, which has strong isolation and will not interfere with the host environment. |
portability | Deployed using the native operating system method, it is suitable for long-term fixed environments. | The image runs across platforms, has strong environmental compatibility, and is suitable for distributed deployment requirements. |
Resource consumption | There is no additional container overhead, and resource utilization is high, which is suitable for resource-sensitive scenarios. | Containerization technology is well optimized, resource isolation is efficient, and modern systems have almost no additional burden. |
Debugging Difficulty | Directly use system tools for debugging, which is flexible in operation and easy to troubleshoot problems. | Container isolation facilitates quick debugging and troubleshooting in different environments. |
Community Support | The official repository and documentation are maintained stably and are suitable for long-term support needs. | The Docker community is active and resourceful, allowing for quick resolution of issues. |
Disadvantages | ||
Deployment complexity | Administrator privileges are required and support may be limited depending on the operating system version. | The Docker environment needs to be installed in advance, which increases learning and maintenance costs. |
flexibility | The flexibility is limited by the operating system version, and parameters cannot be adjusted freely like source code installation. | The dependency version in the image is fixed, and the image needs to be rebuilt to adjust functions or optimize. |
Performance | Additional system-level optimization may be required to meet specific performance requirements. | Containers have a small performance overhead, and in extreme scenarios their performance is not as good as that of local services running directly. |
Update and Maintenance | Version updates depend on the system repository and may lag behind the official latest version. | The reliability of the image source needs to be pulled and verified regularly to prevent problems caused by uneven image quality. |
Dependency Management | Due to the limitation of system repository dependency versions, some versions may be incompatible. | The dependent versions in the image cannot be changed, which lacks flexibility. |
portability | It depends on the operating system environment, has weak cross-platform capabilities, and is not suitable for heterogeneous environments. | It requires Docker support to run and has poor compatibility with traditional environments. |
Debugging Difficulty | You need to be familiar with the debugging tools and log management methods of system services, which has a certain threshold for novices. | Debugging may require entering the container, which adds some complexity. |
Community Support | The community has fewer resources and problem solving may take more time. | The quality of images varies, so you need to identify official and trustworthy sources. |
These two methods have their own advantages and disadvantages. You can choose according to your needs. I will introduce both deployment methods.
From the perspective of stability,Source code deploymentCompared toDeployment via DockerMore advantages:
- Fewer layers of dependencies
Source code deployment directly depends on the operating system and compilation environment, reducing compatibility issues caused by changes in the container environment (such as Docker version, basic image, etc.).
- Greater controllability
Source code deployment can finely control versions, compilation parameters, and dependencies, and is suitable for production environments that require specific optimizations, while Docker images may use generic configurations and have low flexibility.
- Updates and maintenance stability
When deploying source code, updates are subject to manual operations and testing processes, and the risks are controllable; however, Docker image updates may introduce additional incompatible or unnecessary components.
- Performance and resource optimization
Source code deployment runs directly on the host machine without the need for an additional abstraction layer of the container, and performance and resource usage may be better.
Applicable scenarios: The source code method is more suitable for core business systems that have extremely high requirements for performance and stability.
Source code deployment (APT method)
There are several ways to deploy HAProxy using source code. I will take the simplest APT method as an example:
apt update apt install haproxy
Make sure the HAProxy service is started, and then you can view its status through the command:
systemctl status haproxy
If it is normal, it will be displayed as follows:
The path of the haproxy.cfg configuration file for HAProxy installed via source code (APT) is located at:
/etc/haproxy/haproxy.cfg
You can view its contents with the following command:
cat /etc/haproxy/haproxy.cfg
The output is as follows:
Deployment in Docker mode
Deployment by docker run
1. Create a working directory:
mkdir -p /docker/haproxy
2. Create a pre-configuration file haproxy.cfg:
vim /docker/haproxy/haproxy.cfg
Then paste the predefined configuration file contents into it and save it (for the specific format of the configuration file, refer to the "Configuration file: haproxy.cfg" section at the end of the article).
3. Optional step: If haproxy is required to uninstall (decrypt) the SSL certificate, place the SSL certificate and private key files in the specified path of the host machine, assuming it is the "/etc/ssl/certs" directory.
4. The command for deployment using docker run is as follows:
docker run --name haproxy -d -p 80:80 -p 443:443 --restart=always \ -v /docker/haproxy:/usr/local/etc/haproxy \ -v /etc/ssl/certs/certificate .crt:/etc/ssl/certs/mydomain.pem \ -v /etc/ssl/certs/private.key:/etc/ssl/private/mydomain.key \ haproxy:latest
Deployment using docker-compose
If you use docker-compose to deploy, steps 1, 2, and 3 in the previous section remain unchanged, but starting from step 4, they become:
4. Create a docker-compose.yml file:
vim /docker/haproxy/docker-compose.yml
Then paste the following content in and save it:
version: '3.8' services: haproxy: image: haproxy:latest container_name: haproxy restart: always ports: - "80:80" - "443:443" volumes: - /docker/haproxy:/usr/local/etc/haproxy - /etc/ssl/certs/certificate.crt:/etc/ssl/certs/mydomain.pem - /etc/ssl/certs/private.key:/etc/ssl/private/mydomain.key
5. Start the service:
cd /docker/haproxy docker-compose up -d
Note: Deploying HAproxy in docker is not flexible enough. You need to determine the ports that can provide load balancing services at the beginning (such as 80 and 443 in the above configuration, because you need to determine-p
Parameters need to map the host machine to the port inside the container). This method is acceptable for HTTP applications (different applications can be distinguished by domain name), but it is not friendly to TCP and UDP applications. Therefore, if there are not only HTTP applications, but also TCP and UDP applications that need to be load balanced, it is recommended to deploy them using source code.
Configuration file: haproxy.cfg
Prerequisite knowledge: Basic concepts of load balancing
Note: To understand the meaning of some options in the HAproxy configuration file, it is necessary to briefly describe several key concepts in load balancing. Friends who are not interested in technical details can skip this part.
1. VIP (Virtual IP Address)
• Popular explanation: VIP is like a virtual house number, which is the "entrance address" displayed by the load balancing system to the outside world.
• effect: When you visit a website, you are actually visiting this VIP, and the system will assign your request to a specific server for processing. In this way, even if the subsequent servers change (such as increasing or decreasing the number), the VIP is always the fixed address for access.
Note: As for the concept of VIP, since HAProxy itself cannot directly handle the underlying network protocols (such as ARP), it can only respond to the IP address bound to its host network interface. To use a specific virtual IP address (VIP) for HAproxy, you need to manually bind the VIP to the network card in the operating system, or manage the VIP through tools such as Keepalived and ensure that the operating system responds to ARP requests. Compared with the built-in VIP and ARP support of professional load balancers (such as F5 and A10), HAProxy relies more on the host's network configuration to complete these functions.
2. Server (IP and port of specific application)
• Popular explanation:Servers are workers that handle specific tasks. They have their own "address" (IP) and "job number" (port).
• effect:Each server is responsible for completing user requests, such as loading web pages, processing data, etc. The load balancer will assign tasks to these servers according to the rules.
3. Service Group (a collection of multiple servers)
• Popular explanation: A service group can be understood as a team of workers who work together to complete the same type of tasks.
• effect: By organizing servers into groups, the load balancer can distribute work more efficiently. For example, when one of the workers is too busy or takes leave, the task will be assigned to other workers.
4. Health Check
• Popular explanation: Health checks are like giving workers a physical exam to make sure they can work properly.
• effect:The load balancer will regularly check whether the server is healthy (for example, whether the network is unobstructed and whether the service can respond normally). If a server is found to be sick, it will temporarily stop assigning tasks to it to avoid affecting the user experience.
5. Session persistence
• Popular explanation: Session persistence is like equipping workers with a special toolkit, ensuring that the same customer is always served by the same worker.
• effect:In load balancing, session persistence ensures that requests from the same user are always sent to the same server. For example, when you browse products on a shopping website, each request will be directed to the same server, so that the website can remember your previous operations and provide a continuous experience.
6. Working mode (TCP, UDP, HTTP)
• Popular explanation:Working modes are like different working environments. TCP, UDP, and HTTP are like three different types of workplaces, each with its own rules and requirements.
• effect: The load balancer handles different types of traffic by selecting different working modes:
• TCP Mode: Handles traditional network communication traffic, such as FTP, telnet, etc.
• UDP Mode: Processing traffic with high real-time requirements, such as video, voice, etc.
• HTTP Mode: Specialized in handling web browsing traffic, data requests when users visit websites through browsers.
7. Connection optimization (connection reuse, limiting maximum connections)
• Popular explanation: Connection optimization is like managing the efficiency of workers. Connection reuse means using the same tool multiple times, and limiting the maximum number of connections ensures that workers are not overwhelmed by too much work.
• effect:
• Connection Reuse: By allowing multiple requests to use the same connection, the overhead of repeatedly establishing connections is avoided and efficiency is improved.
• Limit maximum connections:To prevent a server from being overloaded, the load balancer sets the maximum number of connections to ensure that the server can handle an appropriate number of requests so that it does not overload.
8. SNAT (Source Address Translation)
• Popular explanation: SNAT is like adding a unified "identity tag" to the task, ensuring that the operations of each worker will not be confused and that the feedback of the task can be returned correctly.
• effect:SNAT (Source Address Translation) is used to modify the source IP address of a data packet. When multiple clients access a server through a load balancer, SNAT can replace the client's source IP address with the load balancer's IP address. In this way, the source of the request received by the server is the load balancer, not the real IP address directly from the client.
9. SSL offload
• Popular explanation: SSL Offload is like setting up a special decryption station at the entrance of the factory, decrypting and organizing all the encrypted information before sending it to the workshop for processing, so that the workshop can focus on production without spending time on decryption.
• effect: SSL Offload is used to handle HTTPS encryption and decryption tasks on a load balancer or dedicated device. The client communicates with the load balancer through an encrypted connection (HTTPS), and the load balancer decrypts and forwards the plaintext data to the backend server (usually using HTTP). This method reduces the computing burden on the backend server, thereby improving overall performance.
The usage of SNAT varies depending on the load balancing deployment method.
1. SNAT in serial deployment mode
In serial deployment mode, the load balancer is located in the path of the traffic, inserted directly between the client and the server. In this way, SNAT is optional because the load balancer is already in the path of the traffic and it can achieve source address translation by modifying the source IP. However, even without SNAT, the load balancer can still correctly forward traffic to the server and process the response.
• Example:Assuming that the load balancer is located in the middle of the traffic, the request initiated by the client first reaches the load balancer, and the load balancer then forwards the request to the backend server. If there is no SNAT, the server will see the client's real IP address, and the load balancer is only responsible for traffic distribution. However, if SNAT is enabled, the load balancer will change the source IP of the request to its own IP address, so that the requests seen by the server are from the load balancer.
2. SNAT in bypass deployment mode
In bypass deployment mode, the load balancer is outside the traffic path, and there is no direct connection between the client and the server. At this time, SNAT is necessary because it ensures symmetry of the round-trip path of the traffic. The load balancer must modify the source IP address of the client request, otherwise the server will return the response to the client's original IP address instead of the load balancer's IP, which will cause the request to fail or the response to be lost.
• Example: In a bypass deployment, after the client request reaches the server, the server returns a response directly to the client. Without SNAT, the server will try to send the response directly back to the client's real IP address, but because the traffic does not pass through the load balancer, the response will not reach the client correctly. By enabling SNAT, the load balancer replaces the client's source IP with its own IP address, ensuring that the server's response can be correctly returned to the client through the load balancer.
3. Use of SNAT in different working modes
• HTTP Mode: In HAProxy's HTTP mode, SNAT is usually not needed. Because in HTTP mode, HAProxy processes traffic based on the HTTP protocol, and each request is forwarded by establishing a new TCP connection. Since it is a short connection, each request and response is independent, so the client's source IP is directly passed to the backend server, and HAProxy does not need to modify the source IP. The traffic in this mode is short-lived and does not need to be passed through SNAT to ensure the correctness of the session.
• TCP/UDP mode: In TCP and UDP modes, the connection is a long connection, and the load balancer is responsible for continuously processing traffic. In this mode, SNAT is required. Since the connection remains unchanged, HAProxy needs to modify the source IP address to ensure that all backend server responses can be returned to the client through the load balancer. Without SNAT, the server may send the response to the client's real IP address, resulting in an asymmetric response path and ultimately failing to return it correctly to the client.
Summarize
• Serial deployment mode: SNAT is optional, the load balancer is inserted directly into the traffic path and can choose whether to perform source address translation.
• Bypass deployment mode: SNAT is required to ensure that the round-trip path of the connection is symmetric and to prevent responses from flowing to the wrong destination.
• HTTP Mode: Since each request is an independent short connection, SNAT is usually not required.
• TCP/UDP mode: SNAT is required to ensure connection persistence and the correct return path.
In addition: I don't want to talk too much about the principles of load balancing in this article, because it is not clear in a few words, and it is not the focus of this article. Friends who are interested in the principles of load balancing technology can refer to a PPT I wrote before (transferred to PDF), the link address is as follows:Introduction to basic concepts of load balancingThis PPT can only be best experienced if you give a presentation. If you only look at it, you can only get a 30% experience at most (the animation effect is gone after converting it to PDF~), but you can still take a look. I originally wanted to find a good article and post a link, but I couldn't find a suitable one after searching for a long time on the Internet. I could only use my previous PPT to delete the irrelevant parts and then give it a thumbs up. Since it is a pre-sales document for users, it is not so technical and is just suitable for entry-level introductions.
haproxy.cfg content explanation
No matter which method you use to deploy HAProxy, the configuration file "haproxy.cfg" will eventually be involved. However, when using the source code method (APT) to deploy, the file will be automatically generated with the default content (default path:/etc/hparoxy/haproxy.cfg
); when using docker to deploy, you need to prepare in advance (because you need to use the -v parameter to mount).
The following is a sample "haproxy.cfg" file that contains the basic concepts of load balancing mentioned above. It is assumed that the servers corresponding to the http applications that need load balancing are 192.168.1.11:8080, 192.168.1.12:8080, 192.168.1.13:8080, and 192.168.1.14:8080, and domain names need to be used to distinguish access requests. It is necessary to configure health checks, session persistence, and SNAT using specified addresses. At the same time, connection reuse and server maximum connection limit functions are also configured:
# 全局设置
global
log stdout format raw local0 # 日志输出到标准输出,便于调试和监控
maxconn 2000 # 全局最大连接数限制,防止过载
tune.ssl.default-dh-param 2048 # 设置SSL连接的默认DH参数大小
daemon # 后台运行
stats socket /var/run/haproxy.sock mode 660 level admin # 管理套接字,用于动态调整配置
nbthread 4 # 使用4个线程以充分利用多核CPU,提高并发性能
# 默认设置
defaults
log global # 继承全局日志设置
option httplog # 启用HTTP日志记录
option dontlognull # 对空连接请求不记录日志
timeout connect 5s # 连接超时设置为5秒
timeout client 30s # 客户端超时时间
timeout server 30s # 服务器超时时间
timeout http-request 10s # HTTP请求超时时间
timeout http-keep-alive 15s # 保持长连接超时时间
maxconn 1000 # 默认最大连接数
retries 3 # 连接失败时重试3次
# 前端监听配置 - HTTP
frontend http_front
bind *:80 # 监听80端口
mode http # 使用HTTP模式
option http-server-close # 每次请求后关闭与客户端的连接
acl host_app1 hdr(host) -i app1.example.com # 匹配域名app1.example.com
acl host_app2 hdr(host) -i app2.example.com # 匹配域名app2.example.com
use_backend app1_backend if host_app1 # 如果匹配域名app1.example.com,则转发到app1_backend
use_backend app2_backend if host_app2 # 如果匹配域名app2.example.com,则转发到app2_backend
default_backend default_backend # 如果域名不匹配,转发到默认后端
# 前端监听配置 - HTTPS
frontend https_front
bind *:443 ssl crt /etc/ssl/private/haproxy.pem # 监听443端口并启用SSL,使用证书文件haproxy.pem
mode http # 使用HTTP模式
option http-server-close # 每次请求后关闭与客户端的连接
acl host_app1 hdr(host) -i app1.example.com # 匹配域名app1.example.com
acl host_app2 hdr(host) -i app2.example.com # 匹配域名app2.example.com
use_backend app1_backend if host_app1 # 如果匹配域名app1.example.com,则转发到app1_backend
use_backend app2_backend if host_app2 # 如果匹配域名app2.example.com,则转发到app2_backend
default_backend default_backend # 如果域名不匹配,转发到默认后端
# 后端配置 - 应用1
backend app1_backend
mode http # 使用HTTP模式
option httpchk GET /health # 健康检查路径
balance roundrobin # 轮询负载均衡策略
server app1 192.168.1.11:8080 check # 后端服务器1
server app2 192.168.1.12:8080 check # 后端服务器2
option redispatch # 如果会话保持的服务器不可用,则重新分配到其他健康的服务器
option http-keep-alive # 后端启用长连接
default-server maxconn 100
# 后端配置 - 应用2
backend app2_backend
mode http # 使用HTTP模式
option httpchk GET /health # 健康检查路径
balance roundrobin # 轮询负载均衡策略
source 192.168.1.100 # 使用指定的地址作为SNAT的地址,而非默认的HAproxy宿主机IP
server app3 192.168.1.13:8080 check # 后端服务器1
server app4 192.168.1.14:8080 check # 后端服务器2
# 默认后端配置
backend default_backend
mode http # 使用HTTP模式
balance roundrobin # 轮询负载均衡策略
server default1 192.168.1.15:8080 check # 默认后端服务器
# 统计页面配置
listen stats
bind *:8085 # 监听端口8085
mode http # 使用HTTP模式
stats enable # 启用统计页面
stats uri /stats # 统计页面路径
stats realm Haproxy\ Statistics # 设置统计页面标题
stats auth admin:password # 设置访问统计页面的用户名和密码
stats admin if TRUE # 允许管理员权限
Interpret the above configuration from the perspective of the nine basic concepts of load balancing mentioned above:
Configure content parsing:
- VIP (Virtual IP Address)
Configuration:
bind * :80 bind * :443 ssl crt /etc/ssl/certs/haproxy.pem.
explain: VIP is represented in the configuration as a listening IP and port combination. By default, HAProxy responds to the IP addresses bound to all hosts (such as *), but if a specific VIP is required, the IP must be bound in the operating system in advance, or implemented through tools (such as Keepalived).
- Server (IP and port of specific application)
Configuration:
server app1 192.168.1.11:8080 check server app2 192.168.1.12:8080 check
explain: Each server defines a backend server, including its IP and port. The check parameter turns on health checks to ensure that the server status is available.
- Service group (a collection of multiple servers)
Configuration:
backend app_backend server app1 192.168.1.11:8080 check server app2 192.168.1.12:8080 check
explain:backend represents the configuration of the service group, which contains multiple servers. The service group is the basic unit for HAProxy to achieve load balancing.
- Health Check
Configuration:
option httpchk GET /health
explain: Configured to check the server health status through the HTTP /health path. If the backend server returns a response code other than 2xx or 3xx, the server is considered unavailable.
- Session persistence
HAproxy does not have session persistence configured by default. If you want to configure it, HTTP mode is slightly different from TCP and UDP modes: TCP and UDP generally use session persistence based on the source IP address, while HTTP mode can use session persistence based on cookies in addition to session persistence based on the source IP address.
Configuration:
HTTP mode cookie-based session persistence
cookie SERVERID insert indirect nocache
explain: Insert a cookie for each client, marking its associated server. This ensures that subsequent requests from the client are directed to the same server, thus maintaining session continuity.
TCP or UDP mode session persistence based on source IP address
# Create stick-table for source address sticky session persistence stick-table type ip size 200k expire 30m store conn_cur,conn_rate # Persist session based on source address stick on src # Persist session using client source IP address
Explanation: Source address-based session persistence records the client's IP address and binds it to a backend server. In this way, each time the same client sends a request, the load balancer will always forward its request to the same server based on the record. This method does not rely on the client to support cookies, and is suitable for scenarios where the protocol does not support cookies or needs to simplify session management.
- Working mode (TCP, UDP, HTTP)
Configuration:
mode http in frontend http_front mode tcp in backend tcp_backend
explain: In the configuration, HTTP mode is used to process Web traffic, and TCP mode is used to process traffic from other applications (such as databases). Different modes are adapted to different protocol requirements.
- Connection optimization (connection reuse, limiting maximum connections)
Configuration:
option http-keep-alive default-server maxconn 100
explain: Set up HTTP connection reuse to reduce the overhead of connection establishment and closing; at the same time, limit the maximum number of connections for a single server to avoid overload.
- SNAT (Source Address Translation)
Configuration:
source 192.168.1.100
explain:Use source to specify the source IP address used by SNAT (this address must be a valid IP address on the HAProxy deployment device) instead of the default HAProxy listening address. After this configuration, HAproxy will use the specified source IP address for source address translation (SNAT) instead of the default bound IP address.
9. SSL offload
Configuration:
bind *:443 ssl crt /etc/ssl/certs/haproxy.pem
explain:Monitor HTTPS traffic and use SSL certificate to decrypt it (haproxy.pem). The decrypted plaintext traffic is forwarded to the backend server via HTTP, reducing the computing pressure on the backend server.
Note: All SSL offloading only needs to be configured on the front end, and all back ends are uniformly configured in HTTP mode.
Summarize:
This configuration file comprehensively covers the key functions of HAProxy, including VIP binding, server management, health check, working mode selection, connection optimization, SNAT configuration, SSL offload, etc.
Note 1: The above configuration file content is just what I think is commonly used for personal use. HAproxy has many other options, which I have not listed one by one. If you have other special requirements, please refer to the official documentation:https://docs.haproxy.org/.
注2:我为了能够在一个示范配置中能够尽可能多的讲解这些选项,所以配置看起来比较复杂,但是在日常的使用时,没有那么多需求的话,配置文件会很简单。
Note 3: The above nine points are just the most commonly used concepts of load balancing in my opinion. Since load balancing is originally a technical field (although a niche one), there are too many technical points in it, and it is mainly used in the commercial field. It is not necessary for individuals to understand it too deeply, so I will not go into it in detail in this article. Friends who are interested can search it on the Internet by themselves.
Make the modified configuration file content effective
If you customized the haproxy.cfg content according to the previous demonstration, you need to reload the haproxy configuration for the modified content to take effect:
systemctl reload haproxy
After that, assuming that the host IP of HAproxy is 192.168.1.200, you only need to do the relevant DNS resolution: resolve app1.example.com and app2.example.com to 192.168.1.200, and then you can achieve the following load balancing effect:
accesshttp://app1.example.com
andhttps://app1.example.com
Will load balance to 192.168.1.11:8080 and 192.168.1.12:8080
accesshttp://app2.example.com
andhttps://app2.example.com
Will load balance to 192.168.1.13:8080 and 192.168.1.14:8080
Direct accesshttp://192.168.1.200
andhttps://192.168.1.200
Will be directly assigned to the default backend server: 192.168.1.15:8080
Command line tools related to HAproxy
HAProxy provides several commonly used command line tools. Combined with some system-provided tools, you can view and manage its status, configuration, and runtime information in real time. The following are some commonly used tools and commands related to HAproxy:
1. haproxy -v – Check the HAProxy version
• This command is used to view the version information of HAProxy.
haproxy -v
Output example:
HAProxy version 2.3.9-2+deb11u1 2021/12/01 Copyright 2000-2021 HAProxy Technologies
2. haproxy -c -f /path/to/haproxy.cfg – Check the configuration file syntax
• Before starting or reloading HAProxy, you can use this command to check whether the configuration file syntax is correct.
haproxy -c -f /etc/haproxy/haproxy.cfg
Output example:
Configuration file /etc/haproxy/haproxy.cfg is valid
3. haproxy -st – terminates the HAProxy instance of the specified process
• If you have multiple HAProxy processes running, you can kill a specific process by its PID.
haproxy -st
This will gracefully terminate the HAProxy instance.
4. haproxy -D – Run HAProxy in debug mode
• Start HAProxy with -D to run in debug mode and output detailed debugging information.
haproxy -D -f /etc/haproxy/haproxy.cfg
5. haproxy -p /var/run/haproxy.pid – Displays the running HAProxy PID
• This command displays the PID file of the current HAProxy instance.
haproxy -p /var/run/haproxy.pid
6. netstat or ss – View HAProxy ports and connection status
• Use the netstat or ss command to view the ports that HAProxy is listening on and the status of established connections.
netstat -tuln | grep haproxy
Or using ss (a more modern tool):
ss -tuln | grep haproxy
7. HAProxy Stats Page (View status via web interface)
• If enabled in the HAProxy configuration Statistics Page(stats), you can access this page through your browser to view the real-time status of HAProxy.
Example configuration:
listen stats bind *:8085 # Listen to port 8085, please modify according to the actual environment mode http # HTTP mode stats enable # Enable the statistics function stats uri /stats # Statistics page access path stats realm Haproxy\ Statistics # Page title stats auth admin:password # Access credentials stats admin if TRUE # Administrator privileges
Then you can visit http://
8. socat or nc – View HAProxy status via Unix socket
• HAProxy supports stats queries via Unix sockets. After enabling the stats socket in the configuration file, you can interact with HAProxy through the command line.
Example configuration:
global stats socket /var/run/haproxy.sock mode 600 level admin
Access the socket via socat or nc (netcat):
echo "show stat" | socat unix-connect:/var/run/haproxy.sock stdio
or
echo "show stat" | nc -U /var/run/haproxy.sock
This allows you to query the status of HAProxy in real time, such as the number of connections, number of requests, server status, etc.
9. haproxy -q – Query HAProxy status
• If you have the stats socket enabled, you can quickly view the current status of HAProxy with this command.
haproxy -q
10. ps or top – View HAProxy processes
• You can use the ps or top command to view the process status of HAProxy to understand information such as the number of processes and CPU usage.
ps aux | grep haproxy
or:
top -p $(pgrep -d',' -f haproxy)
Summarize:
• Use haproxy -v to check the version.
• Use haproxy -c -f to check the configuration file.
• Use netstat or ss to view the ports and connections that HAProxy is listening on.
• If stats is enabled page, you can view the real-time status through the browser.
• View Unix socket status via socat or nc.
• Use ps or top to view the HAProxy process and resource usage.
These commands and tools help monitor and manage HAProxy's health, connectivity, and performance.
Afterword
HAproxy can work independently. If you are familiar with HAproxy configuration files, you can use it normally without relying on other applications. Many commercial environments do this. However, HAproxy is managed by configuration files after all. Although the configuration file format is also very simple, it is still not very friendly to friends who are not familiar with the CLI interface. Therefore, there are various third-party Web GUI projects (it seems that many of them have disappeared now~). Among them, the more famous one is roxy-wi (also called haproxy-wi), but this is just for the pursuit of graphical configuration, not necessary. Everyone should pay attention to this point. I will use an article to introduce roxy-wi later.