{"id":6726,"date":"2017-05-08T16:56:40","date_gmt":"2017-05-08T08:56:40","guid":{"rendered":"http:\/\/rmohan.com\/?p=6726"},"modified":"2017-05-08T16:56:40","modified_gmt":"2017-05-08T08:56:40","slug":"centos-rhel-7-tips-on-troubleshooting-ntp-chrony-issues","status":"publish","type":"post","link":"https:\/\/mohan.sg\/?p=6726","title":{"rendered":"CentOS \/ RHEL 7 : Tips on Troubleshooting NTP \/ chrony Issues"},"content":{"rendered":"<p>\nThe chrony service does not change the time<br \/>\nThe often misconception is that the chrony service is setting the time to the one given by the NTP server. This is incorrect \u2013 what actually happens is that based on the answer from the NTP server, chrony just tells the system clock to go faster or slower. For this reason, sometimes even though the time is wrong and the NTP server is working, the time does not get corrected immediately.<br \/>\nOnly time when chrony sets time<\/p>\n<p>When the chrony service starts, there are some settings in the \/etc\/chrony\/chrony.conf file that tells it to actually set the time if specific conditions occur:<\/p>\n<p># Force system clock correction at boot time.<br \/>\nmakestep 1000 10<br \/>\nwhich means that if chrony detects during the first 10 measurements after its start that the time is off by more than 1000 seconds it will set the clock.<\/p>\n<p>Some useful commands<\/p>\n<p>Below are some useful commands which can be used for the troubleshooting of chrony related issues.<\/p>\n<p># chronyc tracking<br \/>\n# chronyc sources<br \/>\n# chronyc sourcestats<br \/>\n# systemctl status chronyd<br \/>\n# chronyc activity<br \/>\n# timedatectl<br \/>\nCheck chronyd status<\/p>\n<p>To check the status of the chronyd daemon :<\/p>\n<p># systemctl status -l chronyd<br \/>\n? chronyd.service &#8211; NTP client\/server<br \/>\n   Loaded: loaded (\/usr\/lib\/systemd\/system\/chronyd.service; enabled; vendor preset: enabled)<br \/>\n   Active: active (running) since Fri 2016-08-12 13:22:22 IST; 1s ago<br \/>\n  Process: 33263 ExecStartPost=\/usr\/libexec\/chrony-helper update-daemon (code=exited, status=0\/SUCCESS)<br \/>\n  Process: 33259 ExecStart=\/usr\/sbin\/chronyd $OPTIONS (code=exited, status=0\/SUCCESS)<br \/>\n Main PID: 33261 (chronyd)<br \/>\n   CGroup: \/system.slice\/chronyd.service<br \/>\n           ??33261 \/usr\/sbin\/chronyd<\/p>\n<p>Aug 12 13:22:22 NVMBD1S11BKPMED03 systemd[1]: Starting NTP client\/server&#8230;<br \/>\nAug 12 13:22:22 NVMBD1S11BKPMED03 chronyd[33261]: chronyd version 2.1.1 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +DEBUG +ASYNCDNS +IPV6 +SECHASH)<br \/>\nAug 12 13:22:22 NVMBD1S11BKPMED03 chronyd[33261]: Frequency 0.000 +\/- 1000000.000 ppm read from \/var\/lib\/chrony\/drift<br \/>\nAug 12 13:22:22 NVMBD1S11BKPMED03 systemd[1]: Started NTP client\/server.<br \/>\nThe chronyc sources command<\/p>\n<p>Running chronyc sources -v shows the current state of the NTP server\/s configured in the system. Here is an example output, in which ntp.example.com shows as a valid server which is online:<\/p>\n<p># chronyc sources -v<br \/>\n210 Number of sources = 1<\/p>\n<p>  .&#8211; Source mode  &#8216;^&#8217; = server, &#8216;=&#8217; = peer, &#8216;#&#8217; = local clock.<br \/>\n \/ .- Source state &#8216;*&#8217; = current synced, &#8216;+&#8217; = OK for sync, &#8216;?&#8217; = unreachable,<br \/>\n| \/                &#8216;x&#8217; = time may be in error, &#8216;~&#8217; = time is too variable.<br \/>\n||                                                 .- xxxx [ yyyy ] +\/- zzzz<br \/>\n||                                                \/   xxxx = adjusted offset,<br \/>\n||         Log2(Polling interval) -.             |    yyyy = measured offset,<br \/>\n||                                  \\            |    zzzz = estimated error.<br \/>\n||                                   |           |<br \/>\nMS Name\/IP address           Stratum Poll LastRx Last sample<br \/>\n============================================================================<br \/>\n^* ntp.example.com          3    6     40    +31us[  -98us] +\/-  118ms<br \/>\nNote that a Source state different than \u2018*\u2019 usually indicates a problem with the NTP server.<\/p>\n<p>Source state \u2018~\u2019 means that the time is too variable<br \/>\nIf the Source state is \u2018~\u2018, it probably means that the server is accessible but the time is too variable. This can happen if the server responds too slow or responds sometimes slower and sometimes faster. You could check the response time of the pings to the server to see if they are slow or variable. This state has also been noticed when the server is running on virtual machines which are too slow causing timing issues.<\/p>\n<p>Chrony check and restart every hour<\/p>\n<p>Once an hour, the chrony service checks the output of the chronyc sources -v command, by running script \/usr\/sbin\/palladion_chrony_healthcheck which runs \/usr\/sbin\/palladion_check_chrony and checks its output:<\/p>\n<p>if \/usr\/sbin\/palladion_check_chrony returns 1 \u2013 it means there was no online source (no source with Source state = \u2018*\u2019) , so chrony restarts in an attempt to re-initialize the server status<br \/>\nif \/usr\/sbin\/palladion_check_chrony returns 0 \u2013 this means everything is ok, chrony does not need to be restarted because it already has a valid online source<br \/>\n# cat \/etc\/cron.d\/chrony<br \/>\nSHELL=\/bin\/sh<br \/>\nPATH=\/usr\/local\/sbin:\/usr\/local\/bin:\/sbin:\/bin:\/usr\/sbin:\/usr\/bin<br \/>\n#<br \/>\n# Check chrony every hour and restart if necessary.<br \/>\n#<br \/>\n16 * * * *     root    \/usr\/sbin\/palladion_chrony_healthcheck<br \/>\nChrony logs<\/p>\n<p>There are several chrony logs that can be used to troubleshoot. Most of them are located in \/var\/log\/chrony\/. Note that the latest file is not always the *.log one. Sometimes it happens that even the *.log.2 or *.log.3 file are the ones that are more recent. Here is an example of listing the files with sorting by the most recent:<\/p>\n<p># ls -lisaht  \/var\/log\/chrony\/<br \/>\ntotal 1.5M<br \/>\n3801115 580K -rw-r&#8211;r&#8211;  1 root root 574K Oct 21 14:56 measurements.log.3<br \/>\n3801131 544K -rw-r&#8211;r&#8211;  1 root root 540K Oct 21 14:56 statistics.log.3<br \/>\n3801166 356K -rw-r&#8211;r&#8211;  1 root root 350K Oct 21 14:56 tracking.log.3<br \/>\n3801089 4.0K drwxr-xr-x 16 root root 4.0K Oct 21 00:01 ..<br \/>\n3801114 4.0K drwxr-xr-x  2 root root 4.0K Oct 21 00:01 .<br \/>\n3801128    0 -rw-r&#8211;r&#8211;  1 root root    0 Oct 21 00:01 tracking.log<br \/>\n3801110    0 -rw-r&#8211;r&#8211;  1 root root    0 Oct 21 00:01 measurements.log<br \/>\n3801120    0 -rw-r&#8211;r&#8211;  1 root root    0 Oct 21 00:01 statistics.log<br \/>\n3801167    0 -rw-r&#8211;r&#8211;  1 root root    0 Oct 20 00:01 tracking.log.1<br \/>\n3801165    0 -rw-r&#8211;r&#8211;  1 root root    0 Oct 20 00:01 statistics.log.1<br \/>\n3801159    0 -rw-r&#8211;r&#8211;  1 root root    0 Oct 20 00:01 measurements.log.1<br \/>\n&#8230;&#8230;&#8230;&#8230;<br \/>\nTry setting only one NTP server by entering its IP address<\/p>\n<p>If until now you have been using two or more NTP servers (either because they were set or because you entered an FQDN that resolves in different IP addresses), try to set one single NTP server by entering only one IP address. This may solve your NTP related issue.<\/p>\n<p>Tracing the communication with the NTP server<\/p>\n<p>To double check if the NTP server is answering or not, it is possible to trace the traffic between chrony and the NTP server for a period of time while monitoring the server:<br \/>\n1. Start a pcap trace with tcpdump on the NTP port 123 and leave it running until the issue appears (run it in \u2018screen\u2019 or with \u2018nohup\u2019 to avoid it from being stopped if you disconnect from the shell command)<br \/>\n2. As soon as the issue re-appears, get a System Diagnostics covering the entire history since you have set the server to DNS name until the gap reoccurred. If this produces a file that is too big, just get the System Diagnostics for Current data and in addition copy all the files from \/var\/log\/chrony\/, and all files called \/var\/log\/syslog* . Remember to stop the trace you started at step 1<\/p>\n","protected":false},"excerpt":{"rendered":"<p> The chrony service does not change the time The often misconception is that the chrony service is setting the time to the one given by the NTP server. This is incorrect \u2013 what actually happens is that based on the answer from the NTP server, chrony just tells the system clock to go faster [&#8230;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[73],"tags":[],"_links":{"self":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts\/6726"}],"collection":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=6726"}],"version-history":[{"count":1,"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts\/6726\/revisions"}],"predecessor-version":[{"id":6727,"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts\/6726\/revisions\/6727"}],"wp:attachment":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=6726"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=6726"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=6726"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}