May 2024
M T W T F S S
 12345
6789101112
13141516171819
20212223242526
2728293031  

Categories

May 2024
M T W T F S S
 12345
6789101112
13141516171819
20212223242526
2728293031  

Use tar + pigz + ssh to achieve efficient transmission of large data

Use tar + pigz + ssh to achieve efficient transmission of large data

Before we copy large data when the host, such as to copy more than 100GB of mysql raw data, we usually practice as follows:

Package the tar.gz file at the source
Using scp or rsync copy to the target host
Unzip the file at the target host

These three processes are synchronized, that is, they can not be executed at the same time, resulting in inefficiency.

Now we will optimize the process to the data stream, while the implementation of (non-blocking mode), the efficiency can generally be increased to more than 3 times the original, the specific realization is as follows:

Disk read —-> packaging —-> compression ——> transmission —-> decompression -> unpacking —-> plate

-> tar | -> gzip | -> ssh | -> gzip | -> tar

For example, I want to copy the local test directory to the “target IP” data directory, the command is as follows:

Tar -c test / | pigz | ssh -c arcfour128 Target IP “gzip -d | tar -xC / data”

Of course, here the decompression process is still using the efficiency of the lower gzip, if the decompression tool replaced lz4 (but need to compile and install separately), then the efficiency can be improved a lot.

If you do not need to extract, the command changes to:

Tar -c test / | pigz | ssh -c arcfour128 target IP “cat> /data/test.tar.gz”

Note: Because of the use of streaming compression, decompression process must be added-i parameters, and tar -ixf / data/test.tar.gz.

Description: pigz is an efficient compression tool that can be used for each sub-core CPU’s remaining performance for compression calculations. The traditional gzip can only use single-core CPU. For example, a 2 8core cpu server using pigz and gzip compression of the same data, the general performance gap of at least 7-8 times more than (generally do not reach the theory of 16 times, because limited by the disk read and write speed and memory resources ).

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>