{"id":7610,"date":"2018-06-20T22:10:41","date_gmt":"2018-06-20T14:10:41","guid":{"rendered":"http:\/\/rmohan.com\/?p=7610"},"modified":"2018-06-20T22:10:41","modified_gmt":"2018-06-20T14:10:41","slug":"https-tls-performance-optimization-details","status":"publish","type":"post","link":"https:\/\/mohan.sg\/?p=7610","title":{"rendered":"HTTPS TLS performance optimization details"},"content":{"rendered":"<h1 class=\"aTitle\"><strong>HTTPS TLS performance optimization details<\/strong><\/h1>\n<p>HTTPS (HTTP over SSL) is a security-oriented HTTP channel and can be understood as HTTP + SSL\/TLS, that is, adding an SSL\/TLS layer under HTTP as a security foundation.\u00a0The predecessor of TLS is SSL. Currently, TLS 1.2 is widely used.<\/p>\n<p>&nbsp;<\/p>\n<p><a href=\"http:\/\/rmohan.com\/wp-content\/uploads\/2018\/06\/TLS.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-7611\" src=\"http:\/\/rmohan.com\/wp-content\/uploads\/2018\/06\/TLS.png\" alt=\"\" width=\"423\" height=\"275\" srcset=\"https:\/\/mohan.sg\/wp-content\/uploads\/2018\/06\/TLS.png 423w, https:\/\/mohan.sg\/wp-content\/uploads\/2018\/06\/TLS-300x195.png 300w, https:\/\/mohan.sg\/wp-content\/uploads\/2018\/06\/TLS-150x98.png 150w, https:\/\/mohan.sg\/wp-content\/uploads\/2018\/06\/TLS-400x260.png 400w\" sizes=\"(max-width: 423px) 100vw, 423px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p><strong><span>TLS performance tuning<\/span><\/strong><br \/>\n<span>TLS is widely considered to slow down services, mainly because early CPUs are still slow, and only a few sites can afford cryptographic services.\u00a0But today&#8217;s computing power is no longer the bottleneck of TLS.\u00a0In 2010, Google enabled encryption on its e-mail service by default, after which they stated that SSL\/TLS no longer costly calculations:<\/span><\/p>\n<p><span>In our front-end services, SSL\/TLS calculations account for less than 1% of CPU load, less than 10KB of memory per connection, and less than 2% of network overhead.<\/span><\/p>\n<p><span>1.\u00a0<\/span><br \/>\n<span>The speed of\u00a0delay and connection management\u00a0network communication is determined by two major factors: bandwidth and delay.<\/span><\/p>\n<p><span>Bandwidth: Used to measure how much data can be sent in a unit of time.\u00a0<\/span><br \/>\n<span>Delay: Describe the time required for a message to\u00a0be sent\u00a0from one end to the other.\u00a0<\/span><br \/>\n<span>Among them, bandwidth is a secondary factor because usually you can buy more bandwidth at any time; This is unavoidable because it is a limitation that is imposed when data is transmitted over a network connection.<\/span><\/p>\n<p><span>Latency has a particularly large impact on TLS because it has its own well-designed handshaking, adding an additional two round trips during connection initialization.<\/span><\/p>\n<p><span>1.1 TCP Optimization\u00a0<\/span><br \/>\n<span>Each TCP connection has a speed limit called a congestion window that is initially small and grows over time with guaranteed reliability.\u00a0This mechanism is called slow start.<\/span><\/p>\n<p><span>Therefore, for all TCP connections, the startup is slow and worse for the TLS connection because the TLS handshake protocol consumes precious initial connection bytes (when the congestion window is small).\u00a0If the congestion window is large enough, there is no additional delay for slow start.\u00a0However, if the long handshake protocol exceeds the size of the congestion window, the sender must split it into two blocks, send a block first, wait for confirmation (a round trip), increase the congestion window, and then send the rest.<\/span><\/p>\n<p><span>1.1.1 Congestion Window Tuning The\u00a0<\/span><br \/>\n<span>startup speed limit is called the initial congestion window.\u00a0RFC6928 recommends that the initial congestion window be set to 10 network segments (approximately 15 KB).\u00a0The early advice was to start with 2-4 network segments.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><a href=\"http:\/\/rmohan.com\/wp-content\/uploads\/2018\/06\/webserver.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-7612\" src=\"http:\/\/rmohan.com\/wp-content\/uploads\/2018\/06\/webserver.jpg\" alt=\"\" width=\"651\" height=\"365\" srcset=\"https:\/\/mohan.sg\/wp-content\/uploads\/2018\/06\/webserver.jpg 651w, https:\/\/mohan.sg\/wp-content\/uploads\/2018\/06\/webserver-300x168.jpg 300w, https:\/\/mohan.sg\/wp-content\/uploads\/2018\/06\/webserver-150x84.jpg 150w, https:\/\/mohan.sg\/wp-content\/uploads\/2018\/06\/webserver-400x224.jpg 400w\" sizes=\"(max-width: 651px) 100vw, 651px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p><span>On older Linux platforms, you can change the initial congestion window of the route:<\/span><\/p>\n<p><span># ip route | while read p; do ip route change $p initcwnd 10; done<\/span><\/p>\n<p><span>1.1.2 Preventing Slow Start Slow Start\u00a0<\/span><br \/>\n<span>Slow start can affect the connection over a period of time without any traffic, reducing its speed, and the speed drops very quickly.\u00a0On Linux, you can disable slow start when the connection is idle:<\/span><\/p>\n<p><span># sysctl -w net.ipv4.tcp_slow_start_after_idle=0 can be made permanent by adding this setting to the \/etc\/sysctl.conf configuration.<\/span><\/p>\n<p><span>1.2 Long Connections In\u00a0<\/span><br \/>\n<span>most cases, the TLS performance impact is concentrated on the start handshake phase of each connection.\u00a0An important optimization technique is to keep every connection as close as possible with the number of connections allowed.<\/span><\/p>\n<p><span>The current trend is to use an event-driven WEB server to handle all communications by using a fixed thread pool (even a single thread), thereby reducing the cost of each connection and the possibility of being attacked.<\/span><\/p>\n<p><span>The disadvantage of long connections is that after the last HTTP connection is completed, the server waits for a certain amount of time before closing the connection, although a connection does not consume too many resources, but it reduces the overall scalability of the server.\u00a0Long connections are suitable for scenarios where the client bursts a large number of requests.<\/span><\/p>\n<p><span>When configuring large long connection timeouts, it is important to limit the number of concurrent connections to avoid server overload.\u00a0Adjust the server by testing to run within capacity limits.\u00a0If TLS is handled by OpenSSL, make sure that the server correctly sets the SSL_MODE_RELEASE_BUFFERS flag.<\/span><\/p>\n<p><span>1.3 HTTP\/2.0\u00a0<\/span><br \/>\n<span>HTTP\/2.0 is a binary protocol that provides features such as multiplexing and header compression to improve performance.<\/span><\/p>\n<p><span>1.4 CDNs\u00a0<\/span><br \/>\n<span>use CDNs to achieve world-class performance, using geographically dispersed servers to provide edge caching and traffic optimization.<\/span><\/p>\n<p><span>The further away the user is from your server, the slower the access to the network, in which case connection establishment is a big limiting factor.\u00a0For the server to be as close to the end user as possible, the CDN operates a large number of geographically distributed servers, which can provide two ways to reduce latency, namely edge caching and connection management.<\/span><\/p>\n<p><span>1.4.1 Edge Cache\u00a0<\/span><br \/>\n<span>Since the CDN server is close to the user, you can provide your file to the user just as if your server is really there.<\/span><\/p>\n<p><span>1.4.2 Connection Management\u00a0<\/span><br \/>\n<span>If your content is dynamic, user-specific, you cannot cache data through the CDN for a long time.\u00a0However, a good CDN can help with connection management even without any cache, which is that it can eliminate most of the cost of establishing a connection through a long connection that is maintained for a long time.<\/span><\/p>\n<p><span>Most of the time spent establishing a connection is spent waiting.\u00a0To minimize waiting, the CDN routes traffic to its closest point to the destination through its own basic settings.\u00a0Because it is the CDN&#8217;s own fully controllable server, it can maintain long internal connections for a long time.<\/span><\/p>\n<p><span>When using a CDN, the user connects to the nearest CDN node. This is only a short distance. The network delay of the TLS handshake is also very short. The existing long-distance connection can be directly reused between the CDN and the server.\u00a0This means that the user and server have established a valid connection with the CDN Fast Initial TLS handshake.<\/span><\/p>\n<p><span>2. TLS protocol optimization\u00a0<\/span><br \/>\n<span>After connection management, we can focus on the performance characteristics of TLS and have the knowledge of security and speed tuning of the TLS protocol.<\/span><\/p>\n<p><span>2.1 Key Exchange The\u00a0<\/span><br \/>\n<span>maximum cost of using TLS is the CPU-intensive cryptographic operations used for security parameter negotiation except for delays.\u00a0This part of the communication is called key exchange.\u00a0The CPU consumption of key exchange largely depends on the server&#8217;s chosen private key algorithm, private key length, and key exchange algorithm.<\/span><\/p>\n<p><span>Key length\u00a0<\/span><br \/>\n<span>The difficulty of cracking a key depends on the length of the key. The longer the key, the more secure it is.\u00a0However, a longer key also means that it takes more time for encryption and decryption.<\/span><\/p>\n<p><span>Key Algorithms\u00a0<\/span><br \/>\n<span>There are currently two key algorithms available: RSA and ECDSA.\u00a0The current RSA key algorithm recommends a minimum length of 2048 bits (112-bit encryption strength), and 3072 bits (128-bit encryption strength) will be deployed in the future.\u00a0ECDSA is superior to RSA in terms of performance and security. 256-bit ECDSA (128-bit encryption strength) provides the same security as 3072-bit RSA, but with better performance.<\/span><\/p>\n<p><span>Key Exchange\u00a0<\/span><br \/>\n<span>There are currently two key exchange algorithms available: DHE and ECDHE.\u00a0Which DHE is too slow is not recommended.\u00a0The performance of the key exchange algorithm depends on the length of the configured negotiation parameters.\u00a0For DHE, the commonly used 1024 and 2048 bits provide 80 and 112 bit security levels, respectively.\u00a0For ECDHE, security and performance depend on something called a **curve**.\u00a0Secp256r1 provides a 128-bit security level.<\/span><\/p>\n<p><span>In practice, you cannot combine key and key exchange algorithms at will, but you can use combinations specified by the protocol.<\/span><\/p>\n<p><span>2.2 Certificate During\u00a0<\/span><br \/>\n<span>a complete TLS handshake, the server sends its certificate chain to the client for authentication.\u00a0The length and correctness of the certificate chain have a great influence on the performance of the handshake.<\/span><\/p>\n<p><span>Using as few certificates as possible\u00a0<\/span><br \/>\n<span>for each certificate in the certificate chain increases the handshaking packet. Too many certificates in the certificate chain may cause the TCP initial congestion window to overflow.<\/span><\/p>\n<p><span>Including Only\u00a0<\/span><br \/>\n<span>Required Certificates It is a common mistake to include non-required certificates in the certificate chain. Each such certificate will add an additional 1-2 KB to the handshake protocol.<\/span><\/p>\n<p><span>Providing a complete certificate chain\u00a0<\/span><br \/>\n<span>server must provide a complete certificate chain that is trusted by the root certificate.<\/span><\/p>\n<p><span>Using elliptic curve certificate chains\u00a0<\/span><br \/>\n<span>Because ECDSA private key length uses fewer bits, ECDSA certificates can be smaller.<\/span><\/p>\n<p><span>Avoiding the binding of too many domain names with the same certificate\u00a0<\/span><br \/>\n<span>Each additional domain name increases the size of the certificate, which has a significant impact on a large number of domain names.<\/span><\/p>\n<p><span>2.3 Revocation Checks\u00a0<\/span><br \/>\n<span>Although the status of certificate revocation is constantly changing and the behavior of user agents in revocation of certificates is very different, as a server, the only thing to do is to deliver the revocation information as quickly as possible.<\/span><\/p>\n<p><span>Certificate OCSP\u00a0using OCSP\u00a0information is designed to provide real-time queries, allowing the user agent to request only access to the website&#8217;s revocation information, and the query is brief and fast (an HTTP request).\u00a0In contrast, the CRL is a list containing a large number of revoked certificates.<\/span><\/p>\n<p><span>Using OCSP Responders with Fast and Reliable OCSP Responders The performance of OCSP Responders\u00a0<\/span><br \/>\n<span>differs between different CAs and you check their historical OCSP Responders before submitting them to the CA.\u00a0Another criterion for choosing a CA is how quickly it updates OCSP responders.<\/span><\/p>\n<p><span>Deploying OCSP stapling\u00a0<\/span><br \/>\n<span>OCSP stapling is a protocol feature that allows revocation information (entire OCSP response) to be included in the TLS handshake.\u00a0After it is enabled, by giving the user agent all the information to revoke the check for better performance, the user agent can be omitted to obtain the CA&#8217;s OCSP response program through a separate connection to query the revocation information.<\/span><\/p>\n<p><span>2.4 Protocol Compatibility\u00a0<\/span><br \/>\n<span>If your server is\u00a0incompatible\u00a0with the features of some new version protocols (eg TLS 1.2), the browser may need to make multiple attempts with the server to negotiate an encrypted connection.\u00a0The best way to ensure good TLS performance is to upgrade the latest TLS protocol stack to support newer protocol versions and extensions.<\/span><\/p>\n<p><span>2.5 Hardware Acceleration\u00a0<\/span><br \/>\n<span>As the CPU speed continues to increase, software-based TLS implementations have run fast enough on normal CPUs to process large numbers of HTTPS requests without specialized encryption hardware.\u00a0However, installing an accelerator card may increase speed.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>HTTPS TLS performance optimization details <\/p>\n<p>HTTPS (HTTP over SSL) is a security-oriented HTTP channel and can be understood as HTTP + SSL\/TLS, that is, adding an SSL\/TLS layer under HTTP as a security foundation. The predecessor of TLS is SSL. Currently, TLS 1.2 is widely used.<\/p>\n<p>&nbsp;<\/p>\n<\/p>\n<p>&nbsp;<\/p>\n<p>TLS performance tuning TLS is widely [&#8230;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[49],"tags":[],"_links":{"self":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts\/7610"}],"collection":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=7610"}],"version-history":[{"count":1,"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts\/7610\/revisions"}],"predecessor-version":[{"id":7613,"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts\/7610\/revisions\/7613"}],"wp:attachment":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=7610"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=7610"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=7610"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}