{"id":7665,"date":"2018-07-09T08:52:24","date_gmt":"2018-07-09T00:52:24","guid":{"rendered":"http:\/\/rmohan.com\/?p=7665"},"modified":"2018-07-09T10:08:52","modified_gmt":"2018-07-09T02:08:52","slug":"rsync-3","status":"publish","type":"post","link":"https:\/\/mohan.sg\/?p=7665","title":{"rendered":"rsync"},"content":{"rendered":"<p>There are many commands to copy a directory in Linux. The difference between them in current Linux distribution are very small. All of them support link, time, ownership and sparse.<\/p>\n<p>I tested them to copy a Linux kernel source tree. Each command I tested twice and keep the lower result.<br \/>\nThe original directory size is 639660032 bytes. All methods generate exact same size of 675446784 bytes without sparse option.<\/p>\n<table>\n<thead>\n<tr>\n<th><\/th>\n<th>Non Sparse<\/th>\n<th>Sparse<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<th>rsync<\/th>\n<td>rsync -a src \/tmp<\/td>\n<td>rsync -a -S src \/tmp<\/td>\n<\/tr>\n<tr>\n<th>cpio<\/th>\n<td>find src -depth|cpio -pdm \/tmp<\/td>\n<td>find src -depth|cpio -pdm \u2013sparse \/tmp<\/td>\n<\/tr>\n<tr>\n<th>cp<\/th>\n<td>cp -a \u2013sparse=never src \/tmp<\/td>\n<td>cp -a \u2013sparse=always src \/tmp<\/td>\n<\/tr>\n<tr>\n<th>tar<\/th>\n<td>tar -c src|tar -x -C \/tmp<\/td>\n<td>tar -c -S src|tar -x -C \/tmp<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2><\/h2>\n<h2><\/h2>\n<h2>SCP: Secure Copy<\/h2>\n<p><strong>Secure Copy<\/strong> is just like the <code>cp<\/code> command, but secure. More importantly, it has the ability to send files to remote servers via SSH!<\/p>\n<p>Copy a file to a remote server:<\/p>\n<pre><code class=\"hljs bash\"><span class=\"hljs-comment\"># Copy a file:<\/span>\r\n$ scp \/path\/to\/<span class=\"hljs-built_in\">source<\/span>\/file.ext username@hostname.com:\/path\/to\/destination\/file.ext\r\n\r\n<span class=\"hljs-comment\"># Copy a directory:<\/span>\r\n$ scp -r \/path\/to\/<span class=\"hljs-built_in\">source<\/span>\/dir username@server-host.com:\/path\/to\/destination<\/code><\/pre>\n<p>This will attempt to connect to <code>hostname.com<\/code> as user <code>username<\/code>. It will ask you for a password if there&#8217;s no SSH key setup (or if you <em>don&#8217;t<\/em> have a password-less SSH key setup between the two computers). If the connection is authenticated, the file will be copied to the remote server.<\/p>\n<p>Since this works just like SSH (using SSH, in fact), we can add flags normally used with the SSH command as well. For example, you can add the <code>-v<\/code> and\/or <code>-vvv<\/code> to get various levels of verbosity in output about the connection attempt and file transfer.<\/p>\n<p>You can also use the <code>-i<\/code> (identity file) flag to specify an SSH identity file to use:<\/p>\n<pre><code class=\"hljs javascript\">$ scp -i ~<span class=\"hljs-regexp\">\/.ssh\/<\/span>some_identity.pem \/path\/to\/source\/file.ext username@hostname:<span class=\"hljs-regexp\">\/path\/<\/span>to\/destination\/file.ext<\/code><\/pre>\n<p><strong>Here are some other useful flags:<\/strong><\/p>\n<ul>\n<li><code>-p<\/code> (lowercase) &#8211; Preserves modification times, access times, and modes from the original file<\/li>\n<li><code>-P<\/code> &#8211; Choose an alternate port<\/li>\n<li><code>-c<\/code> (lowercase) &#8211; Choose another cypher other than the default <code>AES-128<\/code> for encryption<\/li>\n<li><code>-C<\/code> &#8211; Compress files before copying, for faster upload speeds (already compressed files are not compressed further)<\/li>\n<li><code>-l<\/code> &#8211; Limit bandwidth used in kiltobits per second (8 bits to a byte!).\n<ul>\n<li>e.g. Limit to 50 KB\/s: <code>scp -l 400 ~\/file.ext user@host.com:~\/file.ext<\/code><\/li>\n<\/ul>\n<\/li>\n<li><code>-q<\/code> &#8211; Quiet output<\/li>\n<\/ul>\n<h2>Rsync: Sync Files Across Hosts<\/h2>\n<p><strong>Rsync<\/strong> is another secure way to transfer files. Rsync has the ability to detect file differences, giving it the opportunity to save bandwidth and time when transfering files.<\/p>\n<p>Just like <code>scp<\/code>, <code>rsync<\/code> can use SSH to connect to remote hosts and send\/receive files from them. The same (mostly) rules and SSH-related flags apply for <code>rsync<\/code> as well.<\/p>\n<p>Copy files to a remote server:<\/p>\n<pre><code class=\"hljs bash\"><span class=\"hljs-comment\"># Copy a file<\/span>\r\n$ rsync \/path\/to\/<span class=\"hljs-built_in\">source<\/span>\/file.ext username@hostname.com:\/path\/to\/destination\/file.ext\r\n\r\n<span class=\"hljs-comment\"># Copy a directory:<\/span>\r\n$ rsync -r \/path\/to\/<span class=\"hljs-built_in\">source<\/span>\/dir username@hostname.com:\/path\/to\/destination\/dir<\/code><\/pre>\n<p>To use a specific SSH identity file and\/or SSH port, we need to do a little more work. We&#8217;ll use the <code>-e<\/code> flag, which lets us choose\/modify the remote shell program used to send files.<\/p>\n<pre><code class=\"hljs bash\"><span class=\"hljs-comment\"># Send files over SSH on port 8888 using a specific identity file:<\/span>\r\n$ rsync -e <span class=\"hljs-string\">'ssh -p 8888 -i \/home\/username\/.ssh\/some_identity.pem'<\/span> \/<span class=\"hljs-built_in\">source<\/span>\/file.ext username@hostname:\/destination\/file.ext<\/code><\/pre>\n<p><strong>Here are some other common <a href=\"http:\/\/linux.die.net\/man\/1\/rsync\">flags<\/a> to use:<\/strong><\/p>\n<ul>\n<li><code>-v<\/code> &#8211; Verbose output<\/li>\n<li><code>-z<\/code> &#8211; Compress files<\/li>\n<li><code>-c<\/code> &#8211; Compare files based on checksum instead of mod-time (create\/modified timestamp) and size<\/li>\n<li><code>-r<\/code> &#8211; Recursive<\/li>\n<li><code>-S<\/code> &#8211; Handle <a href=\"http:\/\/gergap.wordpress.com\/2013\/08\/10\/rsync-and-sparse-files\/\">sparse files<\/a> efficiently<\/li>\n<li>Symlinks:\n<ul>\n<li><code>-l<\/code> &#8211; Copy symlinks as symlinks<\/li>\n<li><code>-L<\/code> &#8211; Transform symlink into referent file\/dir (copy the actual file)<\/li>\n<\/ul>\n<\/li>\n<li><code>-p<\/code> &#8211; Preserve permissions<\/li>\n<li><code>-h<\/code> &#8211; Output numbers in a human-readable format<\/li>\n<li><code>--exclude=\"\"<\/code> &#8211; Files to exclude\n<ul>\n<li>e.g. Exclude the .git directory: <code>--exclude=\".git\"<\/code><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>There are many <a href=\"http:\/\/linux.die.net\/man\/1\/rsync\">other options<\/a> as well &#8211; you can do a LOT with rsync!<\/p>\n<p><strong>Do a Dry-Run:<\/strong><\/p>\n<p>I often do a dry-run of rsync to preview what files will be copied over. This is useful for making sure your flags are correct and you won&#8217;t overwrite files you don&#8217;t wish to:<\/p>\n<p>For this, we can use the <code>-n<\/code> or <code>--dry-run<\/code> flag:<\/p>\n<pre><code class=\"hljs php\"><span class=\"hljs-comment\"># Copy the current directory<\/span>\r\n$ rsync -vzcrSLhp --dry-run .\/ username@hostname.com:\/<span class=\"hljs-keyword\">var<\/span>\/www\/some-site.com\r\n<span class=\"hljs-comment\">#&gt; building file list ... done<\/span>\r\n<span class=\"hljs-comment\">#&gt; ... list of directories\/files and some meta data here ...<\/span><\/code><\/pre>\n<p><strong>Resume a Stalled Transfer:<\/strong><\/p>\n<p>Once in a while a large file transfer might stall or fail (while either using <code>scp<\/code> or <code>rsync<\/code>). We can actually use rsync to finish a file transfer!<\/p>\n<p>For this, we can use the <code>--partial<\/code> flag, which tells rsync to not delete partially transferred files but keep them and attempt to finish its transfer on a next attempt:<\/p>\n<pre><code class=\"hljs sql\">$ rsync <span class=\"hljs-comment\">--partial --progress largefile.ext username@hostname:\/path\/to\/largefile.ext<\/span><\/code><\/pre>\n<p><strong>The Archive Option:<\/strong><\/p>\n<p>There&#8217;s also a <code>-a<\/code> or <code>--archive<\/code> option, which is a handy shortcut for the options <code>-rlptgoD<\/code>:<\/p>\n<ul>\n<li><code>-r<\/code> &#8211; Copy recursively<\/li>\n<li><code>-l<\/code> &#8211; Copy symlinks as symlinks<\/li>\n<li><code>-p<\/code> &#8211; Preserve permissions<\/li>\n<li><code>-t<\/code> &#8211; Preserve modification times<\/li>\n<li><code>-g<\/code> &#8211; Preserve group<\/li>\n<li><code>-o<\/code> &#8211; Preserve owner (User needs to have permission to change owner)<\/li>\n<li><code>-D<\/code> &#8211; Preserve <a href=\"http:\/\/en.wikipedia.org\/wiki\/Device_file\">special\/device files<\/a>. Same as <code>--devices --specials<\/code>. (User needs permissions to do so)<\/li>\n<\/ul>\n<pre><code class=\"hljs bash\"><span class=\"hljs-comment\"># Copy using the archive option and print some stats<\/span>\r\n$ rsync -a --stats \/<span class=\"hljs-built_in\">source<\/span>\/dir\/path username@hostname:\/destination\/dir\/path\r\n\r\n\r\n<\/code><\/pre>\n<p>1) technique<\/p>\n<pre><code class=\"hljs bash\"><\/code><\/pre>\n<p>copy from source<\/p>\n<pre><code class=\"hljs bash\"><\/code><\/pre>\n<p>tar -cf &#8211; \/backup\/ | pv | pigz | nc -l 8888<\/p>\n<pre><code class=\"hljs bash\"><\/code><\/pre>\n<p>Destination<\/p>\n<pre><code class=\"hljs bash\"><\/code><\/pre>\n<p>nc master.active.ai 8888 | pv | pigz -d | tar xf &#8211; -C \/<\/p>\n<pre><code class=\"hljs bash\"><\/code><\/pre>\n<p>2)<br \/>\ntime tar -c \/backup\/ |pv|lz4 -B4| ssh -c aes128-ctr root@192.168.1.73 &#8220;lz4 -d |tar -xC \/backup&#8221;<\/p>\n<pre><code class=\"hljs bash\"><\/code><\/pre>\n<p>3) copy files using netcat<\/p>\n<pre><code class=\"hljs bash\"><\/code><\/pre>\n<p>4) rysnc<\/p>\n<pre><code class=\"hljs bash\"><\/code><\/pre>\n<p>50 MB \/SEC<\/p>\n<pre><code class=\"hljs bash\"><\/code><\/pre>\n<p>rsync -aHAXWxv &#8211;numeric-ids &#8211;no-i-r &#8211;info=progress2 -e &#8220;ssh -T -c chacha20-poly1305@openssh.com,aes192-cbc -o Compression=no -x&#8221; \/backup\/ root@192.168.1.73:\/backup\/<\/p>\n<pre><code class=\"hljs bash\"><\/code><\/pre>\n<p>time rsync -aHAXWxv &#8211;numeric-ids &#8211;no-i-r &#8211;info=progress2 -e &#8220;ssh -T -c chacha20-poly1305@openssh.com,aes192-cbc -o Compression=no -x&#8221; \/backup\/ root@192.168.1.73:\/backup\/<\/p>\n<pre><\/pre>\n<p>hen copying to the local file system I always use the following rsync options:<\/p>\n<pre><code># rsync -avhW --no-compress --progress \/src\/ \/dst\/\r\n<\/code><\/pre>\n<p>Here&#8217;s my reasoning:<\/p>\n<pre><code>-a is for archive, which preserves ownership, permissions etc.\r\n-v is for verbose, so I can see what's happening (optional)\r\n-h is for human-readable, so the transfer rate and file sizes are easier to read (optional)\r\n-W is for copying whole files only, without delta-xfer algorithm which should reduce CPU load\r\n--no-compress as there's no lack of bandwidth between local devices\r\n--progress so I can see the progress of large files (optional)\r\n<\/code><\/pre>\n<pre><code class=\"hljs bash\"><\/code><\/pre>\n<p>70 MB \/ SEC<br \/>\n5) time tar cvf &#8211; \/backup\/* | ssh -T -c chacha20-poly1305@openssh.com,aes192-cbc -o Compression=no -x root@192.168.1.73 &#8220;tar xf &#8211; -C \/ &#8221;<\/p>\n<pre><code class=\"hljs bash\"><\/code><\/pre>\n<p>time tar cvf &#8211; \/backup\/* | pv | ssh -T -c chacha20-poly1305@openssh.com,aes192-cbc -o Compression=no -x root@192.168.1.73 &#8220;tar xf &#8211; -C \/ &#8221;<\/p>\n<pre><code class=\"hljs bash\"><\/code><\/pre>\n<p>time tar -cpSf &#8211; \/backup\/* | pv | ssh -T -c chacha20-poly1305@openssh.com,aes192-cbc -o Compression=no -x root@192.168.1.73 &#8220;tar xf &#8211; -C \/ &#8221;<\/p>\n<pre><code class=\"hljs bash\"><\/code> 6)<\/pre>\n<pre>tar cvf - ubuntu.iso | gzip -9 - | split -b 10M -d - .\/disk\/ubuntu.tar.gz.\r\n\r\n\r\n\r\n<\/pre>\n<div class=\"container\" title=\"Hint: double-click to select code\">\n<div class=\"line number1 index0 alt2\"><code class=\"bash preprocessor bold\">#!\/bin\/bash<\/code><\/div>\n<div class=\"line number3 index2 alt2\"><code class=\"bash comments\"># SETUP OPTIONS<\/code><\/div>\n<div class=\"line number4 index3 alt1\"><code class=\"bash functions\">export<\/code> <code class=\"bash plain\">SRCDIR=<\/code><code class=\"bash string\">\"\/folder\/path\"<\/code><\/div>\n<div class=\"line number5 index4 alt2\"><code class=\"bash functions\">export<\/code> <code class=\"bash plain\">DESTDIR=<\/code><code class=\"bash string\">\"\/folder2\/path\"<\/code><\/div>\n<div class=\"line number6 index5 alt1\"><code class=\"bash functions\">export<\/code> <code class=\"bash plain\">THREADS=<\/code><code class=\"bash string\">\"8\"<\/code><\/div>\n<div class=\"line number7 index6 alt2\"><\/div>\n<div class=\"line number8 index7 alt1\"><code class=\"bash comments\"># RSYNC DIRECTORY STRUCTURE<\/code><\/div>\n<div class=\"line number9 index8 alt2\"><code class=\"bash functions\">rsync<\/code> <code class=\"bash plain\">-zr -f<\/code><code class=\"bash string\">\"+ *\/\"<\/code> <code class=\"bash plain\">-f<\/code><code class=\"bash string\">\"- *\"<\/code> <code class=\"bash plain\">$SRCDIR\/ $DESTDIR\/ \\<\/code><\/div>\n<div class=\"line number10 index9 alt1\"><code class=\"bash comments\"># FOLLOWING MAYBE FASTER BUT NOT AS FLEXIBLE<\/code><\/div>\n<div class=\"line number11 index10 alt2\"><code class=\"bash comments\"># cd $SRCDIR; find . -type d -print0 | cpio -0pdm $DESTDIR\/<\/code><\/div>\n<div class=\"line number12 index11 alt1\"><code class=\"bash comments\"># FIND ALL FILES AND PASS THEM TO MULTIPLE RSYNC PROCESSES<\/code><\/div>\n<div class=\"line number13 index12 alt2\"><code class=\"bash functions\">cd<\/code> <code class=\"bash plain\">$SRCDIR\u00a0 &amp;&amp;\u00a0 <\/code><code class=\"bash functions\">find<\/code> <code class=\"bash plain\">. ! -<\/code><code class=\"bash functions\">type<\/code> <code class=\"bash plain\">d -print0 | <\/code><code class=\"bash functions\">xargs<\/code> <code class=\"bash plain\">-0 -n1 -P$THREADS -I% <\/code><code class=\"bash functions\">rsync<\/code> <code class=\"bash plain\">-az % $DESTDIR\/% <\/code><\/div>\n<div class=\"line number14 index13 alt1\"><\/div>\n<div class=\"line number16 index15 alt1\"><code class=\"bash comments\"># IF YOU WANT TO LIMIT THE IO PRIORITY,\u00a0<\/code><\/div>\n<div class=\"line number17 index16 alt2\"><code class=\"bash comments\"># PREPEND THE FOLLOWING TO THE rsync &amp; cd\/find COMMANDS ABOVE:<\/code><\/div>\n<div class=\"line number18 index17 alt1\"><code class=\"bash comments\">#\u00a0\u00a0 ionice -c2<\/code><\/div>\n<\/div>\n<div><\/div>\n<div>\n<div class=\"container\" title=\"Hint: double-click to select code\">\n<div class=\"line number1 index0 alt2\"><code class=\"bash functions\">rsync<\/code> <code class=\"bash plain\">-zr -f<\/code><code class=\"bash string\">\"+ *\/\"<\/code> <code class=\"bash plain\">-f<\/code><code class=\"bash string\">\"- *\"<\/code> <code class=\"bash plain\">-e <\/code><code class=\"bash string\">'ssh -c arcfour'<\/code> <code class=\"bash plain\">$SRCDIR\/ remotehost:\/$DESTDIR\/ \\<\/code><\/div>\n<div class=\"line number2 index1 alt1\"><code class=\"bash plain\">&amp;&amp; \\<\/code><\/div>\n<div class=\"line number3 index2 alt2\"><code class=\"bash functions\">cd<\/code> <code class=\"bash plain\">$SRCDIR\u00a0 &amp;&amp;\u00a0 <\/code><code class=\"bash functions\">find<\/code> <code class=\"bash plain\">. ! -<\/code><code class=\"bash functions\">type<\/code> <code class=\"bash plain\">d -print0 | <\/code><code class=\"bash functions\">xargs<\/code> <code class=\"bash plain\">-0 -n1 -P$THREADS -I% <\/code><code class=\"bash functions\">rsync<\/code> <code class=\"bash plain\">-az -e <\/code><code class=\"bash string\">'ssh -c arcfour'<\/code> <code class=\"bash plain\">% remotehost:\/$DESTDIR\/%\u00a0<\/code><\/div>\n<\/div>\n<\/div>\n<div><\/div>\n<div>\n<header class=\"entry-header\">\n<h1 class=\"entry-title\">Parallelizing rsync<\/h1>\n<\/header>\n<div class=\"entry-content\">\n<p>Last week I had a massive hardware failure on one of the GlusterFS storage nodes in the <a title=\"ILRI, Kenya Research Computing\" href=\"http:\/\/hpc.ilri.cgiar.org\/\">ILRI, Kenya Research Computing<\/a> cluster: two drives failed simultaneously on the underlying RAID5. As RAID5 can only withstand one drive failure, the entire 31TB array was toast. FML.<\/p>\n<p>After replacing the failed disks, rebuilding the array, and formatting my bricks, I decided I would use <code>rsync<\/code> to pre-seed my bricks from the good node before bringing <code>glusterd<\/code> back up.<\/p>\n<p><strong>tl;dr<\/strong>: <code>rsync<\/code> is amazing, but it\u2019s single threaded and struggles when you tell it to sync large directory hierarchies. <a href=\"https:\/\/mjanja.ch\/2014\/07\/parallelizing-rsync\/#sync_brick\">Here\u2019s how you can speed it up<\/a>.<\/p>\n<h2>rsync #fail<\/h2>\n<p>I figured syncing the brick hierarchy from the good node to the bad node was simple enough, so I stopped the <code>glusterd<\/code> service on the bad node and invoked:<\/p>\n<pre><code># rsync -aAXv --delete --exclude=.glusterfs storage0:\/path\/to\/bricks\/homes\/ storage1:\/path\/to\/bricks\/homes\/<\/code><\/pre>\n<p>After a day or so I noticed I had only copied ~1.5TB (over 1 hop on a dedicated 10GbE switch!), and I realized something must be wrong. I attached to the <code>rsync<\/code> process with <code>strace -p<\/code> and saw a bunch of system calls in one particular user\u2019s directory. I dug deeper:<\/p>\n<pre><code># find \/path\/to\/bricks\/homes\/ukenyatta\/maker\/genN_datastore\/ -type d | wc -l\r\n1398640<\/code><\/pre>\n<p>So this one particular directory in one user\u2019s home contained over a million <em>other<\/em> directories and $god knows how many files, and this command itself took several hours to finish! To make matters worse, careful trial and error inspection of other user home directories revealed more massive directory structures as well.<\/p>\n<p>What we\u2019ve learned:<\/p>\n<ul>\n<li><code>rsync<\/code> is single threaded<\/li>\n<li><code>rsync<\/code> generates a list of files to be synced <strong>before<\/strong> it starts the sync<\/li>\n<li><a title=\"MAKER - Department of Human Genetics - University of Utah\" href=\"http:\/\/www.yandell-lab.org\/software\/maker.html\">MAKER<\/a> creates a ton of output files\/directories ????<\/li>\n<\/ul>\n<p>It\u2019s pretty clear (now) that a recursive <code>rsync<\/code> on my huge directory hierarchy is out of the question!<\/p>\n<h2>rsync #winning<\/h2>\n<p>I had a look around and saw lots of people complaining about <code>rsync<\/code> being \u201cslow\u201d and others suggesting tips to speed it up. One very promising strategy was described on <a title=\"Parallel rsync\" href=\"http:\/\/web.archive.org\/web\/20140903090503\/https:\/\/wiki.ncsa.illinois.edu\/display\/~wglick\/Parallel+Rsync\">this wiki<\/a> and there\u2019s a great discussion in the comments.<\/p>\n<p>Basically, he describes a clever use of <code>find<\/code> and <code>xargs<\/code> to split up the problem set into smaller pieces that <code>rsync<\/code> can process more quickly.<\/p>\n<h2 id=\"sync_brick\">sync_brick.sh<\/h2>\n<p>So here\u2019s my adaptation of his script for the purpose of syncing failed GlusterFS bricks, <code>sync_brick.sh<\/code>:<\/p>\n<pre><code>#!\/usr\/bin\/env bash\r\n# borrowed \/ adapted from: https:\/\/wiki.ncsa.illinois.edu\/display\/~wglick\/Parallel+Rsync\r\n\r\n# RSYNC SETUP\r\nRSYNC_PROG=\/usr\/bin\/rsync\r\n# note the important use of --relative to use relative paths so we don't have to specify the exact path on dest\r\nRSYNC_OPTS=\"-aAXv --numeric-ids --progress --human-readable --delete --exclude=.glusterfs --relative\"\r\nexport RSYNC_RSH=\"ssh -T -c arcfour -o Compression=no -x\"\r\n\r\n# ENV SETUP\r\nSRCDIR=\/path\/to\/good\/brick\r\nDESTDIR=\/path\/to\/bad\/brick\r\n# Recommend to match # of CPUs\r\nTHREADS=4\r\nBAD_NODE=server1\r\n\r\ncd $SRCDIR\r\n\r\n# COPY\r\n# note the combination of -print0 and -0!\r\nfind . -mindepth 1 -maxdepth 1 -print0 | \\ \r\n    xargs -0 -n1 -P$THREADS -I% \\\r\n        $RSYNC_PROG $RSYNC_OPTS \"%\" $BAD_NODE:$DESTDIR<\/code><\/pre>\n<p>Pay attention to the source\/destination paths, the number of <code>THREADS<\/code>, and the <code>BAD_NODE<\/code> name, then you should be ready to roll.<\/p>\n<h2>The Magic, Explained<\/h2>\n<p>It\u2019s a bit of magic, but here are the important parts:<\/p>\n<ul>\n<li>The <code>-aAXv<\/code> options to <code>rsync<\/code> tell it to <strong>archive<\/strong>, preserve <strong>ACLs<\/strong>, and preserve <strong>eXtended<\/strong> attributes. Extended attributes are <a title=\"What is this new .glusterfs directory in 3.3?\" href=\"http:\/\/joejulian.name\/blog\/what-is-this-new-glusterfs-directory-in-33\/\">critically important in GlusterFS &gt;= 3.3<\/a>, and also if you\u2019re using SELinux.<\/li>\n<li>The <code>--exclude=.glusterfs<\/code> option to <code>rsync<\/code> tells it to ignore this directory at the root of the directory, as the self-heal daemon?\u2014?<code>glustershd<\/code>?\u2014?will rebuild it based on the files\u2019 extended attributes once we restart the <code>glusterd<\/code> service.<\/li>\n<li>The <code>--relative<\/code> option to <code>rsync<\/code> is so we don\u2019t have to bother constructing the destination path, as <code>rsync<\/code> will imply the path is relative to our destination\u2019s top.<\/li>\n<li>The <code>RSYNC_RSH<\/code> options influence <code>rsync<\/code>\u2018s use of SSH, basically telling it to use very weak encryption and disable any unnecessary features for non-interactive sessions (tty, X11, etc).<\/li>\n<li>Using <code>find<\/code> with <code>-mindepth 1<\/code> and <code>-maxdepth 1<\/code> just means we concentrate on files\/directories 1 level below each directory in our immediate hierarchy.<\/li>\n<li>Using <code>xargs<\/code> with <code>-n1<\/code> and <code>-P<\/code> tells it to use 1 argument per command line, and to launch <code>$THREADS<\/code> number of processes at a time.<\/li>\n<\/ul>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>There are many commands to copy a directory in Linux. The difference between them in current Linux distribution are very small. All of them support link, time, ownership and sparse.<\/p>\n<p>I tested them to copy a Linux kernel source tree. Each command I tested twice and keep the lower result. The original directory size is [&#8230;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[73],"tags":[],"_links":{"self":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts\/7665"}],"collection":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=7665"}],"version-history":[{"count":5,"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts\/7665\/revisions"}],"predecessor-version":[{"id":7670,"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts\/7665\/revisions\/7670"}],"wp:attachment":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=7665"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=7665"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=7665"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}