{"id":6522,"date":"2017-03-02T14:47:06","date_gmt":"2017-03-02T06:47:06","guid":{"rendered":"http:\/\/rmohan.com\/?p=6522"},"modified":"2017-03-02T14:47:06","modified_gmt":"2017-03-02T06:47:06","slug":"use-tar-pigz-ssh-to-achieve-efficient-transmission-of-large-data","status":"publish","type":"post","link":"https:\/\/mohan.sg\/?p=6522","title":{"rendered":"Use tar + pigz + ssh to achieve efficient transmission of large data"},"content":{"rendered":"<p><span id=\"result_box\" class=\"\" lang=\"en\"><span title=\"??tar+pigz+ssh??????????\n\n\">Use tar + pigz + ssh to achieve efficient transmission of large data<\/p>\n<p><\/span><span title=\"???????????????????????100GB?mysql???????????????\n\n\u00a0\u00a0\u00a0\u00a0\">Before we copy large data when the host, such as to copy more than 100GB of mysql raw data, we usually practice as follows:<\/p>\n<p><\/span><span title=\"????????tar.gz??\n\u00a0\u00a0\u00a0\u00a0\">Package the tar.gz file at the source<br \/>\n<\/span><span title=\"??scp??rsync??????????\n\u00a0\u00a0\u00a0\u00a0\">Using scp or rsync copy to the target host<br \/>\n<\/span><span title=\"?????????\n\n\">Unzip the file at the target host<\/p>\n<p><\/span><span title=\"????????????????????????????\n\n\">These three processes are synchronized, that is, they can not be executed at the same time, resulting in inefficiency.<\/p>\n<p><\/span><span title=\"???????????????????????????????????????????3???????????\n\n\u00a0\u00a0\u00a0\u00a0\">Now we will optimize the process to the data stream, while the implementation of (non-blocking mode), the efficiency can generally be increased to more than 3 times the original, the specific realization is as follows:<\/p>\n<p><\/span><span title=\"????----&gt;??----&gt;??------&gt;??----&gt;???--&gt;??----&gt;??\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\">Disk read &#8212;-&gt; packaging &#8212;-&gt; compression &#8212;&#8212;&gt; transmission &#8212;-&gt; decompression -&gt; unpacking &#8212;-&gt; plate<\/p>\n<p><\/span><span title=\"|-&gt;tar    |-&gt;gzip      |-&gt;ssh      |-&gt;gzip  |-&gt;tar\n\n\">-&gt; tar | -&gt; gzip | -&gt; ssh | -&gt; gzip | -&gt; tar<\/p>\n<p><\/span><span title=\"????????test?????\u201c??IP\u201d??data?????????\n\n\u00a0\u00a0\u00a0\u00a0\">For example, I want to copy the local test directory to the &#8220;target IP&#8221; data directory, the command is as follows:<\/p>\n<p><\/span><span title=\"tar -c test\/ |pigz |ssh -c arcfour128 ??IP &quot;gzip -d|tar -xC \/data&quot;\n\n\">Tar -c test \/ | pigz | ssh -c arcfour128 Target IP &#8220;gzip -d | tar -xC \/ data&#8221;<\/p>\n<p><\/span><span title=\"?????????????????????gzip??????????lz4???????????????????????\n\n\">Of course, here the decompression process is still using the efficiency of the lower gzip, if the decompression tool replaced lz4 (but need to compile and install separately), then the efficiency can be improved a lot.<\/p>\n<p><\/span><span title=\"??????????????\n\n\u00a0\u00a0\u00a0\u00a0\">If you do not need to extract, the command changes to:<\/p>\n<p><\/span><span title=\"tar -c test\/ |pigz |ssh -c arcfour128 ??IP &quot;cat &gt;\/data\/test.tar.gz&quot;\n\n\">Tar -c test \/ | pigz | ssh -c arcfour128 target IP &#8220;cat&gt; \/data\/test.tar.gz&#8221;<\/p>\n<p><\/span><span title=\"????????????????????-i????tar \u2013ixf \/data\/test.tar.gz ?\n\n\">Note: Because of the use of streaming compression, decompression process must be added-i parameters, and tar -ixf \/ data\/test.tar.gz.<\/p>\n<p><\/span><span title=\"??? pigz????????????????CPU?????????????????\">Description: pigz is an efficient compression tool that can be used for each sub-core CPU&#8217;s remaining performance for compression calculations. <\/span><span title=\"????gzip??????CPU?\">The traditional gzip can only use single-core CPU. <\/span><span title=\"????2?8core cpu?????pigz?gzip?????????????????7-8?????????????16????????????????????\">For example, a 2 8core cpu server using pigz and gzip compression of the same data, the general performance gap of at least 7-8 times more than (generally do not reach the theory of 16 times, because limited by the disk read and write speed and memory resources <\/span><span title=\"??\">).<\/span><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Use tar + pigz + ssh to achieve efficient transmission of large data<\/p>\n<p>Before we copy large data when the host, such as to copy more than 100GB of mysql raw data, we usually practice as follows:<\/p>\n<p>Package the tar.gz file at the source Using scp or rsync copy to the target host Unzip the [&#8230;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5],"tags":[],"_links":{"self":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts\/6522"}],"collection":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=6522"}],"version-history":[{"count":1,"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts\/6522\/revisions"}],"predecessor-version":[{"id":6523,"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts\/6522\/revisions\/6523"}],"wp:attachment":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=6522"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=6522"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=6522"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}