November 2024
M T W T F S S
 123
45678910
11121314151617
18192021222324
252627282930  

Categories

November 2024
M T W T F S S
 123
45678910
11121314151617
18192021222324
252627282930  

HTTPS TLS performance optimization details

HTTPS TLS performance optimization details

HTTPS (HTTP over SSL) is a security-oriented HTTP channel and can be understood as HTTP + SSL/TLS, that is, adding an SSL/TLS layer under HTTP as a security foundation. The predecessor of TLS is SSL. Currently, TLS 1.2 is widely used.

 

 

TLS performance tuning
TLS is widely considered to slow down services, mainly because early CPUs are still slow, and only a few sites can afford cryptographic services. But today’s computing power is no longer the bottleneck of TLS. In 2010, Google enabled encryption on its e-mail service by default, after which they stated that SSL/TLS no longer costly calculations:

In our front-end services, SSL/TLS calculations account for less than 1% of CPU load, less than 10KB of memory per connection, and less than 2% of network overhead.

1. 
The speed of delay and connection management network communication is determined by two major factors: bandwidth and delay.

Bandwidth: Used to measure how much data can be sent in a unit of time. 
Delay: Describe the time required for a message to be sent from one end to the other. 
Among them, bandwidth is a secondary factor because usually you can buy more bandwidth at any time; This is unavoidable because it is a limitation that is imposed when data is transmitted over a network connection.

Latency has a particularly large impact on TLS because it has its own well-designed handshaking, adding an additional two round trips during connection initialization.

1.1 TCP Optimization 
Each TCP connection has a speed limit called a congestion window that is initially small and grows over time with guaranteed reliability. This mechanism is called slow start.

Therefore, for all TCP connections, the startup is slow and worse for the TLS connection because the TLS handshake protocol consumes precious initial connection bytes (when the congestion window is small). If the congestion window is large enough, there is no additional delay for slow start. However, if the long handshake protocol exceeds the size of the congestion window, the sender must split it into two blocks, send a block first, wait for confirmation (a round trip), increase the congestion window, and then send the rest.

1.1.1 Congestion Window Tuning The 
startup speed limit is called the initial congestion window. RFC6928 recommends that the initial congestion window be set to 10 network segments (approximately 15 KB). The early advice was to start with 2-4 network segments.

 

 

 

 

On older Linux platforms, you can change the initial congestion window of the route:

# ip route | while read p; do ip route change $p initcwnd 10; done

1.1.2 Preventing Slow Start Slow Start 
Slow start can affect the connection over a period of time without any traffic, reducing its speed, and the speed drops very quickly. On Linux, you can disable slow start when the connection is idle:

# sysctl -w net.ipv4.tcp_slow_start_after_idle=0 can be made permanent by adding this setting to the /etc/sysctl.conf configuration.

1.2 Long Connections In 
most cases, the TLS performance impact is concentrated on the start handshake phase of each connection. An important optimization technique is to keep every connection as close as possible with the number of connections allowed.

The current trend is to use an event-driven WEB server to handle all communications by using a fixed thread pool (even a single thread), thereby reducing the cost of each connection and the possibility of being attacked.

The disadvantage of long connections is that after the last HTTP connection is completed, the server waits for a certain amount of time before closing the connection, although a connection does not consume too many resources, but it reduces the overall scalability of the server. Long connections are suitable for scenarios where the client bursts a large number of requests.

When configuring large long connection timeouts, it is important to limit the number of concurrent connections to avoid server overload. Adjust the server by testing to run within capacity limits. If TLS is handled by OpenSSL, make sure that the server correctly sets the SSL_MODE_RELEASE_BUFFERS flag.

1.3 HTTP/2.0 
HTTP/2.0 is a binary protocol that provides features such as multiplexing and header compression to improve performance.

1.4 CDNs 
use CDNs to achieve world-class performance, using geographically dispersed servers to provide edge caching and traffic optimization.

The further away the user is from your server, the slower the access to the network, in which case connection establishment is a big limiting factor. For the server to be as close to the end user as possible, the CDN operates a large number of geographically distributed servers, which can provide two ways to reduce latency, namely edge caching and connection management.

1.4.1 Edge Cache 
Since the CDN server is close to the user, you can provide your file to the user just as if your server is really there.

1.4.2 Connection Management 
If your content is dynamic, user-specific, you cannot cache data through the CDN for a long time. However, a good CDN can help with connection management even without any cache, which is that it can eliminate most of the cost of establishing a connection through a long connection that is maintained for a long time.

Most of the time spent establishing a connection is spent waiting. To minimize waiting, the CDN routes traffic to its closest point to the destination through its own basic settings. Because it is the CDN’s own fully controllable server, it can maintain long internal connections for a long time.

When using a CDN, the user connects to the nearest CDN node. This is only a short distance. The network delay of the TLS handshake is also very short. The existing long-distance connection can be directly reused between the CDN and the server. This means that the user and server have established a valid connection with the CDN Fast Initial TLS handshake.

2. TLS protocol optimization 
After connection management, we can focus on the performance characteristics of TLS and have the knowledge of security and speed tuning of the TLS protocol.

2.1 Key Exchange The 
maximum cost of using TLS is the CPU-intensive cryptographic operations used for security parameter negotiation except for delays. This part of the communication is called key exchange. The CPU consumption of key exchange largely depends on the server’s chosen private key algorithm, private key length, and key exchange algorithm.

Key length 
The difficulty of cracking a key depends on the length of the key. The longer the key, the more secure it is. However, a longer key also means that it takes more time for encryption and decryption.

Key Algorithms 
There are currently two key algorithms available: RSA and ECDSA. The current RSA key algorithm recommends a minimum length of 2048 bits (112-bit encryption strength), and 3072 bits (128-bit encryption strength) will be deployed in the future. ECDSA is superior to RSA in terms of performance and security. 256-bit ECDSA (128-bit encryption strength) provides the same security as 3072-bit RSA, but with better performance.

Key Exchange 
There are currently two key exchange algorithms available: DHE and ECDHE. Which DHE is too slow is not recommended. The performance of the key exchange algorithm depends on the length of the configured negotiation parameters. For DHE, the commonly used 1024 and 2048 bits provide 80 and 112 bit security levels, respectively. For ECDHE, security and performance depend on something called a **curve**. Secp256r1 provides a 128-bit security level.

In practice, you cannot combine key and key exchange algorithms at will, but you can use combinations specified by the protocol.

2.2 Certificate During 
a complete TLS handshake, the server sends its certificate chain to the client for authentication. The length and correctness of the certificate chain have a great influence on the performance of the handshake.

Using as few certificates as possible 
for each certificate in the certificate chain increases the handshaking packet. Too many certificates in the certificate chain may cause the TCP initial congestion window to overflow.

Including Only 
Required Certificates It is a common mistake to include non-required certificates in the certificate chain. Each such certificate will add an additional 1-2 KB to the handshake protocol.

Providing a complete certificate chain 
server must provide a complete certificate chain that is trusted by the root certificate.

Using elliptic curve certificate chains 
Because ECDSA private key length uses fewer bits, ECDSA certificates can be smaller.

Avoiding the binding of too many domain names with the same certificate 
Each additional domain name increases the size of the certificate, which has a significant impact on a large number of domain names.

2.3 Revocation Checks 
Although the status of certificate revocation is constantly changing and the behavior of user agents in revocation of certificates is very different, as a server, the only thing to do is to deliver the revocation information as quickly as possible.

Certificate OCSP using OCSP information is designed to provide real-time queries, allowing the user agent to request only access to the website’s revocation information, and the query is brief and fast (an HTTP request). In contrast, the CRL is a list containing a large number of revoked certificates.

Using OCSP Responders with Fast and Reliable OCSP Responders The performance of OCSP Responders 
differs between different CAs and you check their historical OCSP Responders before submitting them to the CA. Another criterion for choosing a CA is how quickly it updates OCSP responders.

Deploying OCSP stapling 
OCSP stapling is a protocol feature that allows revocation information (entire OCSP response) to be included in the TLS handshake. After it is enabled, by giving the user agent all the information to revoke the check for better performance, the user agent can be omitted to obtain the CA’s OCSP response program through a separate connection to query the revocation information.

2.4 Protocol Compatibility 
If your server is incompatible with the features of some new version protocols (eg TLS 1.2), the browser may need to make multiple attempts with the server to negotiate an encrypted connection. The best way to ensure good TLS performance is to upgrade the latest TLS protocol stack to support newer protocol versions and extensions.

2.5 Hardware Acceleration 
As the CPU speed continues to increase, software-based TLS implementations have run fast enough on normal CPUs to process large numbers of HTTPS requests without specialized encryption hardware. However, installing an accelerator card may increase speed.

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>