November 2024
M T W T F S S
 123
45678910
11121314151617
18192021222324
252627282930  

Categories

November 2024
M T W T F S S
 123
45678910
11121314151617
18192021222324
252627282930  

EC2 volume size incorrect

Expanding the Storage Space of an EBS Volume on Linux

[root@data ~]# df -TH
Filesystem Type Size Used Avail Use% Mounted on
/dev/xvda1 xfs 859G 818G 42G 96% /
devtmpfs devtmpfs 4.1G 0 4.1G 0% /dev
tmpfs tmpfs 4.2G 0 4.2G 0% /dev/shm
tmpfs tmpfs 4.2G 26M 4.1G 1% /run
tmpfs tmpfs 4.2G 0 4.2G 0% /sys/fs/cgroup
/dev/xvdg ext4 212G 201G 4.1k 100% /databackup
/dev/mapper/vg_newlvm-centos7_newvol ext4 106G 20G 81G 20% /data1
tmpfs tmpfs 821M 0 821M 0% /run/user/1004
tmpfs tmpfs 821M 0 821M 0% /run/user/0

[root@data ~]# resize2fs /dev/xvdg
resize2fs 1.42.9 (28-Dec-2013)
Filesystem at /dev/xvdg is mounted on /databackup; on-line resizing required
old_desc_blocks = 25, new_desc_blocks = 50
The filesystem on /dev/xvdg is now 104857600 blocks long.

[root@data ~]# df -TH
Filesystem Type Size Used Avail Use% Mounted on
/dev/xvda1 xfs 859G 818G 42G 96% /
devtmpfs devtmpfs 4.1G 0 4.1G 0% /dev
tmpfs tmpfs 4.2G 0 4.2G 0% /dev/shm
tmpfs tmpfs 4.2G 26M 4.1G 1% /run
tmpfs tmpfs 4.2G 0 4.2G 0% /sys/fs/cgroup
/dev/xvdg ext4 423G 201G 203G 50% /databackup
/dev/mapper/vg_newlvm-centos7_newvol ext4 106G 20G 81G 20% /data1
tmpfs tmpfs 821M 0 821M 0% /run/user/1004
tmpfs tmpfs 821M 0 821M 0% /run/user/0
[root@data ~]#

rsync increment data

command will copy increment data and keep it in sync with remote server.

  1. It will copy only incremental data.
  2. It will delete if any data deleted from source.
  3. It will copy again from source if any data deleted at destination.
  4. basically this command will keep the both environment in sync.

rsync -avWe ssh --delete-before (source) root@localhost:(destination)
rsync -avW --delete-before -e ssh (source) root@localhost:(destination)

Example:

rsync -avWe ssh --delete-before /data root@192.168.1.4:/data
rsync -avW --delete-before -e ssh /data root@192.168.1.4:/data


To delete files in the target, add the --delete option to your command. For example:

rsync -avh source/ dest/ --delete

Apache Configure CORS Headers for Whitelist Domains

Apache Configure CORS Headers for Whitelist Domains

 

 

In the current implementation of Cross Origin Resource Sharing (CORS) the Access-Control-Allow-Origin header can only provide a single host domain or a wildcard as the accept value. This is not optimal when you have multiple clients connecting to the same virtual server and simply want to allow a list of known client host domains to the “allow” list.

Since only a single domain in a single access header can be delivered back to the client, Apache must read the incoming Origin header and match it to the list of “white” (accepted) domains. If an appropriate match is found, echo the domain host back to client as the value of Access-Control-Allow-Origin.

Use the following configuration snippet in the Apache virtual host “.conf” file or in the server “.htaccess” file. Ensure mod_headers and SetEnvIfNoCase are enabled.

<IfModule mod_headers.c>
   SetEnvIfNoCase Origin "https?://(www\.)?(domain\.com|staging\.domain\.com)(:\d+)?$" ACAO=$0
   Header set Access-Control-Allow-Origin %{ACAO}e env=ACAO
</IfModule>

The regular expression https?://(www\.)?(domain\.com|staging\.domain\.com)(:\d+)?$ matches the URL of Origin, a required HTTP header for all requests. The pattern matches both the http and https protocols. It will match an optional www. subdomain and finally matches the actual host name of your whitelist entries. Any characters after the domain name are ignored. This example will therefore enable:

* http://domain.com
* https://domain.com
* http://www.domain.com
* https://www.domain.com
* http://staging.domain.com
* https://staging.domain.com
* http://www.staging.domain.com
* https://www.staging.domain.com

If you send a request from http://staging.domain.com/app/, the response would include the header:

Access-Control-Allow-Origin: http://staging.domain.com

If you sent another request from https://www.domain.com/client/, the response would include the header:

Access-Control-Allow-Origin: https://www.domain.com

Linux: include hidden files in tar archive

When you create a tar archive of a directory tree the hidden files are normally not included. Here’s how to include the hidden files.

Say you have a web directory called “/var/www/html/mysite/” that contains the following tree:

.htaccess
index.php
logo.jpg
style.css
admin_dir/.htaccess
admin_dir/includes.php
admin_dir/index.php

Normally you would use the tar command like this:

tar czf mysite.tar.gz /var/www/html/mysite/*

This way all files are tarred except the .htaccess files.

The solution is actually quite simple: replace the asterisk (*) by a dot (.):

tar czf mysite.tar.gz /var/www/html/mysite/.

This way the .htaccess files are included in the tarfile.

Reset your WordPress password via email

Reset your WordPress password via email

If you’ve forgotten your WordPress admin password, you can reset it via email from the WordPress dashboard login page following these steps:

  1. Go to your WordPress login page (example.com/wp-admin)
  2. Click on Lost you password? at the bottom
  3. Enter the Username or E-mail of your WordPress admin user, then click on Get New Password
  4. You should get an email with the subject [WordPress Site] Password Reset. The body of this email will contain a link below the text To reset your password, visit the following address, go ahead and click on that link.
  5. Type in your New password, confirm it, then click on Reset Password

 

WordPress makes it super easy to reset your password. You can simply go to the login screen and click on the ‘Lost your password’ link.

ow you can easily reset a WordPress password from phpMyAdmin.

Server: localhost »Database: wordpress »Table: wp__users

change the user name
change the user email id

Adding SSL to Apache on EC2 with Amazon Linux

Adding SSL to Apache on EC2 with Amazon Linux

 

These notes assume you have Apache installed and working on EC2 with Amazon Linux, but it’s fairly similar for other versions of Linux.

Install OpenSSL and the Apache Connector

// for Apache 2.2
yum install openssl mod_ssl
// for Apache 2.4
yum install openssl mod24_ssl
// restart Apache
service httpd restart

yum install openssl mod24_ssl httpd24-tools httpd24

Test SSL

https://yoursever.com/

This will bring up the default key that was create when you installed OpenSSL.

Generate Key

cd  /etc/pki/tls/private
openssl genrsa -out domain-name.key 2048
chown root.root domain-name.key
chmod 600 domain-name.key

Generate Request

mkdir ssl under /ec2-user/domain-name/ssl
cd /ec2-user/domain-name/ssl
sudo openssl req -new -key /etc/pki/tls/private/domain-name.key -out domain-name.pem

Once the request has been generated and sent to your certificate authority they will send you back two .crt files. One is the domain cert and one is the bundle cert. You can rename them to domain-name.crt and domain-name-bundle.crt.

// put crt file on the server with correct permissions
cp domain-name.crt /etc/pki/tls/certs/domain-name.crt
chown root.root /etc/pki/tls/certs/domain-name.crt
chmod 600 /etc/pki/tls/certs/domain-name.crt
  
cp domain-name-bundle.crt /etc/pki/tls/certs/domain-name-bundle.crt
chown root.root /etc/pki/tls/certs/domain-name-bundle.crt
chmod 600 /etc/pki/tls/certs/domain-name-bundle.crt
  

It’s important to change the permissions on the file for Apache and OpenSSL will not work.

Configure Apache SSL

// backup the conf file
cp /etc/httpd/conf.d/ssl.conf /etc/httpd/conf.d/ssl.conf.bkp
// edit the file
nano /etc/httpd/conf.d/ssl.conf


// search for the .key file line below and change the localhost.key

SSLCertificateKeyFile /etc/pki/tls/private/domain-name.key

// search for the .crt file line below and change the localhost.crt

SSLCertificateFile /etc/pki/tls/certs/domain-name.crt

// search for the bundle.crt file line below and point to the new bundle.crt

SSLCACertificateFile /etc/pki/tls/certs/domain-name-bundle.crt

// restart Apache
service httpd restart

This allows one SSL Domain on the server. If you want to have more than one SSL domain on the server it’s a bit more setup. I’ll cover that in a different post.

How to Improve rsync Performance

I need to transfer 10TB of data from one machine to another machine. Those 10TB of files are living in a large RAID which span across 7 different disks. The target machine has another large RAID which span across 12 different disks. It is not easy to copying those files locally. Therefore, I decide to copy the files over the LAN.

There are four options popping up in my head: scprsyncrsyncd (rsync as daemon) and netcat.

scp

scp is handy, easy to use but comes with two disadvantages: slow and not fault-tolerant. Since scp comes with the highest security, all data are encrypted before the transfer. It will slow down the overall performance because of the extra encryption stuffs (which makes the data larger), and extra computational resource (which uses more CPU). If the transfer is interrupted, there is no easy way to resume the process other than transferring everything again. Here are some example commands:

#Source machine
#Typical speed is about 20 to 30MB/s
scp -r /data target_machine:/data

#Or you can enable the compression on the fly
#Depending on the type of your data, if your data is already compressed, you may see no or negative speed improvement
scp -rC /data target_machine:/data

rsync

rsync is similar to scp. It comes with the encryption (via SSH) such that the data is safe. It also allows you to transfer the newer files only. This will reduce the amount of data being transferred. However, it comes with few disadvantages: long decision time, encryption (which increase the size of overhead) and extra computational resource(e.g., data comparison, encryption and decryption etc). For example, if I use rsync to transfer 10TB of files from one machine to another machine (where the directory on the target machine is blank), it can easily take 5 hours to determine which files will need to be transferred before the actual data transfer is initialized.

#Run on the target machine
rsync -avzr -e ssh --delete-after source_machine:/data/ /data/

#Use a less secure encryption algorithm to speed up the process
rsync -avzr --rsh="ssh -c blowfish" --delete-after source_machine:/data/ /data/

#Use an even less secure algorithm to get the top speed
rsync -avzr --rsh="ssh -c arcfour" --delete-after source_machine:/data/ /data/

#By default, rsync compares the files using checksum, file size and modification date.
#Reduce the decision process by skipping the hash check
rsync -avzr --rsh="ssh -c arcfour" --delete-after --whole-file source_machine:/data/ /data/

Anyway, no matter what you do, the top speed of rsync in a consumer-grade gigabit network is around 45MB/s. On average, the speed is around 25-35MB/s. Keep in mind that this number does not include the decision time, which can be few hours.

rsyncd (rsync as a daemon)

Thanks for the comment of our reader. I got a chance to investigate the rsync as a daemon. Basically, the idea of running rsync as a daemon is similar to rsync. On the server, we run rsync as a service/daemon. We specify which directory we want to “export” to the clients (e.g., /usr/ports). When the files get changed on the server, it records the changes so that the when the clients talk to the server, the decision time will be faster. Here is how to set up rsync server on FreeBSD

sudo nano /usr/local/etc/rsyncd.conf

And this is my configuration file:

pid file = /var/run/rsyncd.pid

#Notice that I use derrick here instead of other systems users, such as nobody
#That's because nobody does not have permission to access the path, i.e., /data/
#Either you make the source directory available to "nobody", or you change the daemon user.
uid = derrick
gid = derrick
use chroot = no
max connections = 4
syslog facility = local5
pid file = /var/run/rsyncd.pid

[mydata]
   path = /data/
   comment = data
Don't forget to include the following in /etc/rc.conf, so that the service will be started automatically.

rsyncd_enable="YES"
#Let's start the rsync service:

sudo /usr/local/etc/rc.d/rsyncd start

To pull the files from the server to the clients, run the following:

rsync -av myserver::mydata /data/

#Or you can enable compression
rsync -avz myserver::mydata /data/

To my surprise, it works much better than running rsync alone. Here are some data I collected during transferring 10TB files from ZFS to ZFS:

Bandwidth measured on the client machine: 70MB/s

zpool IO speed on the client side: 75MB/s

P.S. Initially, the speed was about 45-60MB/s, after I tweak my Zpool, I can get the top speed to 75-80MB/s. Please check out here for references.

I notice that the decision time is much faster than running rsync alone. Also the process is much more stable, with zero interruption, i.e.,

rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at io.c(521) [receiver=3.1.0]
rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(632) [generator=3.1.0]
rsync: [receiver] write error: Broken pipe (32)

NetCat

NetCat is similar to cat, except that it works at the network level. I decide to use netcat for the initial transfer. If it is interrupted, I will let rsync to kick in the process. Netcat does not encrypt the data, so the overhead is very small. If you transfer the file within a local network and you don’t care about the security, netcat is a perfect choice.

There is only one disadvantage of using netcat. It can only handle one file at a time. It doesn’t mean you need to run netcat for every single file. Instead, we can tar the file before feeding to netcat, and untar the file at the receiving end. As long as we do not compress the files, we can keep the CPU usage small.

#Open two terminals, one for the source and another one for the target machine.

#On the target machine:
#Go to the directory, e.g., 
cd /data

#Run the following:
nc -l 9999| tar xvfp -

#On the source machine:
#Go to the directory, e.g.,
cd /data

#Pick a port number that is not being used, e.g., 9999
tar -cf - . | nc target_machine 9999

Unlike rsync, the process will start right the way, and the maximum speed is around 45 to 60MB/s in a gigabit network.

Conclusion

Candidates Top Speed (w/o compression) Top Speed (w/ compression) Resume Stability Instant Start?
scp 40MB/s 25MB/s No Low Instant
rsync 25MB/s 50MB/s Yes Medium Long Preparation
rsyncd 30MB/s 70MB/s Yes High Short Preparation
netcat 60MB/s (tar w/o -z) 40MB/s (tar w/ -z) No Very High Instant

 

Choice
Command
Pros
Cons
#1
scp
  • can be speed up choosing simple encryption
  • can recursively copy directories
#2
rsync
flexible and convenient for directory synchronization
possible, but not easy to configure to NOT to use encryption
#3
sftp
can be speed up choosing simple encryption
can’t recursively copy directories

Notice: when running any tool it consumes about 5-10% of CPU at both sender and receiver machines, apparently, doing encryption/decryption.

TEST DETAILS

These are the actual commands and generated output by these tools.

rsync

default “-a” –archive mode
“-z” compress
rsync -a –progress DIR2REPLICATE root@10.10.10.2:/tmp
   411533312   0%   45.38MB/s    0:26:04
  1407877120   1%   44.41MB/s    0:26:16
  1716748288   2%   42.09MB/s    0:27:36
  2002550784   2%   46.47MB/s    0:24:541
  2382397440   3%   45.31MB/s    0:25:24
  2762407936   3%   45.34MB/s    0:25:15
 rsync -az –progress DIR2REPLICATE root@10.10.10.2:/tmp
991383915 100%   13.67MB/s    0:01:09
   990955265 100%   14.02MB/s    0:01:07
   202624740 100%   15.42MB/s    0:00:12
   202771784 100%   15.87MB/s    0:00:12
    91676674 100%   12.86MB/s    0:00:06
    91628045 100%   11.76MB/s    0:00:07
  1082301721 100%   16.86MB/s    0:01:01
  1081744094 100%   17.14MB/s    0:01:00
   444531263 100%   13.06MB/s    0:00:32
   444311917 100%   12.97MB/s    0:00:32
    25956199 100%   11.99MB/s    0:00:02
    25387962 100%   16.94MB/s    0:00:01
    94059363 100%   15.51MB/s    0:00:05
    94189273 100%   14.61MB/s    0:00:06
   369550738 100%   16.31MB/s    0:00:21
   370924791 100%   15.96MB/s    0:00:22
   143659839 100%   14.75MB/s    0:00:09
   141681760 100%   14.58MB/s    0:00:09
    74662680 100%   14.45MB/s    0:00:04
    73882769 100%   12.73MB/s    0:00:05
     1809543 100%   13.59MB/s    0:00:00
###
### “-c arcfour” cipher is defined in RFC 4253; it is plain RC4 with a 128-bit key
###
rsync -a -P -e “ssh -T -c arcfour -o Compression=no -x” DIR2REPLICATE root@10.10.10.2:/tmp
  1081744094 100%   65.35MB/s    0:00:15
444531263 100%   56.34MB/s    0:00:07
444311917 100%   61.61MB/s    0:00:06
369550738 100%   53.94MB/s    0:00:06
370924791 100%   60.03MB/s    0:00:05
23319017231 100%   65.89MB/s    0:05:37
23308793162 100%   64.88MB/s    0:05:42
11951287020 100%   65.68MB/s    0:02:53
3453648896  28%   68.11MB/s    0:02:0

scp

default “-r” recursive
“-C” compress “-r” recursive
scp -r DIR2REPLICATE root@10.10.10.2:/tmp
100%  193MB  64.4MB/s   00:03
100%  424MB  60.6MB/s   00:07
100%  945MB  63.0MB/s   00:15
100%  945MB  59.1MB/s   00:16
100% 1032MB  64.5MB/s   00:16
100% 1032MB  60.7MB/s   00:17
100%  749MB  53.5MB/s   00:14
100% 1253MB  62.6MB/s   00:20
18% 4615MB  62.6MB/s   05:18
scp -Cr DIR2REPLICATE root@10.10.10.2:/tmp
100%  193MB  16.1MB/s   00:12
100%  424MB  14.6MB/s   00:29
100%  945MB  15.0MB/s   01:03
100%  945MB  14.8MB/s   01:04
100%  424MB  14.1MB/s   00:30
100%  352MB  17.6MB/s   00:20
100%  193MB  17.6MB/s   00:11
100%  135MB  16.9MB/s   00:08
100% 1032MB  17.8MB/s   00:58
100% 1032MB  17.8MB/s   00:58
100%  354MB  17.7MB/s   00:20
100%  749MB  18.3MB/s   00:41
100% 1253MB  18.4MB/s   01:08
  6% 1518MB  17.7MB/s   21:43
“-c arcfour” cipher is defined in RFC 4253; it is plain RC4 with a 128-bit key.
scp -c arcfour -r DIR2REPLICATE root@10.10.10.2:/tmp
100%  424MB 141.3MB/s   00:03
100%  945MB 135.0MB/s   00:07
100%  945MB 189.1MB/s   00:05
100%  424MB 141.2MB/s   00:03
100%  352MB 117.5MB/s   00:03
100% 1032MB 147.4MB/s   00:07
100% 1032MB 147.5MB/s   00:07
100%  749MB 149.8MB/s   00:05
100% 1253MB 156.6MB/s   00:08
100%   24GB 142.0MB/s   02:53
100%  595MB 119.1MB/s   00:05
100%   82GB 138.3MB/s   10:09
51% 9099MB 141.3MB/s   01:01
sftp
default behavior
“-R” to increase request queue length (default is 64)
“-B” to increase read/write request size (default is 32 KB)
sftp  root@10.10.10.2:/tmp
10% 2363MB  57.8MB/s   05:43 ETA
15% 3349MB  58.1MB/s   05:25 ETA
32% 7311MB  59.3MB/s   04:11 ETA
35% 7803MB  60.6MB/s   03:58 ETA
43% 9594MB  62.1MB/s   03:23 ETA
69%   15GB  58.6MB/s   01:55 ETA
77%   17GB  62.1MB/s   01:20 ETA
sftp  -R 128 -B 65536 root@10.10.10.2:/tmp
  2%  551MB  58.9MB/s   06:08 ETA
  8% 1806MB  62.3MB/s   05:28 ETA
41% 9170MB  60.6MB/s   03:35 ETA
56%   12GB  62.6MB/s   02:32 ETA
100%   22GB  62.5MB/s   05:56
“-c arcfour” cipher is defined in RFC 4253; it is plain RC4 with a 128-bit key.
sftp -oCiphers=arcfour root@10.10.10.2:/tmp
3%  711MB 142.5MB/s   02:31 ETA
18% 4115MB 146.0MB/s   02:04 ETA
23% 5156MB 148.1MB/s   01:55 ETA
28% 6379MB 144.6MB/s   01:49 ETA
34% 7672MB 144.0MB/s   01:41 ETA
37% 8389MB 143.7MB/s   01:36 ETA
62%   14GB 143.8MB/s   00:58 ETA
85%   19GB 142.4MB/s   00:22 ETA
92%   20GB 142.3MB/s   00:12 ETA
100%   22GB 144.4MB/s   02:34

TEST ENVIRONMENT

The test was performed between two servers interconnected by private 10 Gbit link with 9000 MTU “jumbo frame“. The files copies were large (100’s GB) binary files.

iperf network bandwidth test between 10.10.10.2 and 10.10.10.1
Network interface configuration (10 Gbit, MTU 9000)
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
————————————————————
[  4] local 10.10.10.2 port 5001 connected with 10.10.10.1 port 57279
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  11.5 GBytes  9.89 Gbits/sec
[  5]  0.0-30.0 sec  34.6 GBytes  9.89 Gbits/sec
[  4]  0.0- 0.9 sec  1000 MBytes  9.80 Gbits/sec
[  5]  0.0- 8.8 sec  9.77 GBytes  9.53 Gbits/sec
[  4]  0.0- 8.7 sec  10.0 GBytes  9.89 Gbits/sec
[  5]  0.0-86.8 sec   100 GBytes  9.89 Gbits/sec
[root@10.10.10.2]# cat /etc/sysconfig/network-scripts/ifcfg-bond0
DEVICE=”bond0″
BOOTPROTO=none
IPADDR=10.10.10.2
NETMASK=255.255.255.192
ONBOOT=”yes”
USERCTL=no
TXQUEUELEN=100000
MTU=8912
BONDING_OPTS=”mode=1 miimon=200 primary=eth0″

 

ALTERNATIVE METHODS TO MOVE DATA FAST

Copy directories via netcat: tar | nc. Renders speed  ~251 Mb/s ( = ~1 TB/hr).

### On receiver ###
nc -v -l 5555  | tar -xvf –

### On Sender: test2del – large directory to move ###
time tar -cvf – test2del | nc -v 10.100.100.2 5555

### Output calculated ###
11GB in  27.465 s = 293 MB/s
42GB in 2m51.513s = 249 Mb/s (~1 TB/hr)
42GB in 2m50.630s = 251 Mb/s (~1 TB/hr)

Replacing IP Address in Apache2 config files with SED

Suppose i just mirrored my vps machine (starting from a clone and then rsync-ing all needed files) with rsync. Obviously i need to change the IP Address value contained into all the config files, but I’m lazy.
So, let’s use “SED” to do it at once, with a single line command.
I need to replace the IP Address “192.168.100.5” with “192.168.100.4” in all files contained in /etc/apache2/*

Our command for one file should be:

$ sed -i 's/192.168.100.5/192.168.100.4/g' /etc/apache2/sites-available/default

We want to do it on a bounce of files that contain that string, but unfortunately SED can’t accept wildcard chars so we need run it through a loop.
For this purpose we can use the linux FIND utility so we will end up with sed within this loop. The command should look like this:

$ find /etc/apache2/sites-available/ -type f -exec sed -i 's/192\.168\.100\.4/192\.168\.100\.5/g' {} \;

And while we are here, let’s say that if you manage to obtain the backup machine from a clone of the production machine and you are mantaining the filesystem in sync, this is what you should do with relevant files once you have finished to sync.

 

sed -i -r ‘s/192.168.1.35$/192.168.1.14/g’

find . -type f -exec sed -i ‘s/192\.168\.1\.35/192\.168\.1\.14/g’ {} \;

rsync

There are many commands to copy a directory in Linux. The difference between them in current Linux distribution are very small. All of them support link, time, ownership and sparse.

I tested them to copy a Linux kernel source tree. Each command I tested twice and keep the lower result.
The original directory size is 639660032 bytes. All methods generate exact same size of 675446784 bytes without sparse option.

Non Sparse Sparse
rsync rsync -a src /tmp rsync -a -S src /tmp
cpio find src -depth|cpio -pdm /tmp find src -depth|cpio -pdm –sparse /tmp
cp cp -a –sparse=never src /tmp cp -a –sparse=always src /tmp
tar tar -c src|tar -x -C /tmp tar -c -S src|tar -x -C /tmp

SCP: Secure Copy

Secure Copy is just like the cp command, but secure. More importantly, it has the ability to send files to remote servers via SSH!

Copy a file to a remote server:

# Copy a file:
$ scp /path/to/source/file.ext username@hostname.com:/path/to/destination/file.ext

# Copy a directory:
$ scp -r /path/to/source/dir username@server-host.com:/path/to/destination

This will attempt to connect to hostname.com as user username. It will ask you for a password if there’s no SSH key setup (or if you don’t have a password-less SSH key setup between the two computers). If the connection is authenticated, the file will be copied to the remote server.

Since this works just like SSH (using SSH, in fact), we can add flags normally used with the SSH command as well. For example, you can add the -v and/or -vvv to get various levels of verbosity in output about the connection attempt and file transfer.

You can also use the -i (identity file) flag to specify an SSH identity file to use:

$ scp -i ~/.ssh/some_identity.pem /path/to/source/file.ext username@hostname:/path/to/destination/file.ext

Here are some other useful flags:

  • -p (lowercase) – Preserves modification times, access times, and modes from the original file
  • -P – Choose an alternate port
  • -c (lowercase) – Choose another cypher other than the default AES-128 for encryption
  • -C – Compress files before copying, for faster upload speeds (already compressed files are not compressed further)
  • -l – Limit bandwidth used in kiltobits per second (8 bits to a byte!).
    • e.g. Limit to 50 KB/s: scp -l 400 ~/file.ext user@host.com:~/file.ext
  • -q – Quiet output

Rsync: Sync Files Across Hosts

Rsync is another secure way to transfer files. Rsync has the ability to detect file differences, giving it the opportunity to save bandwidth and time when transfering files.

Just like scp, rsync can use SSH to connect to remote hosts and send/receive files from them. The same (mostly) rules and SSH-related flags apply for rsync as well.

Copy files to a remote server:

# Copy a file
$ rsync /path/to/source/file.ext username@hostname.com:/path/to/destination/file.ext

# Copy a directory:
$ rsync -r /path/to/source/dir username@hostname.com:/path/to/destination/dir

To use a specific SSH identity file and/or SSH port, we need to do a little more work. We’ll use the -e flag, which lets us choose/modify the remote shell program used to send files.

# Send files over SSH on port 8888 using a specific identity file:
$ rsync -e 'ssh -p 8888 -i /home/username/.ssh/some_identity.pem' /source/file.ext username@hostname:/destination/file.ext

Here are some other common flags to use:

  • -v – Verbose output
  • -z – Compress files
  • -c – Compare files based on checksum instead of mod-time (create/modified timestamp) and size
  • -r – Recursive
  • -S – Handle sparse files efficiently
  • Symlinks:
    • -l – Copy symlinks as symlinks
    • -L – Transform symlink into referent file/dir (copy the actual file)
  • -p – Preserve permissions
  • -h – Output numbers in a human-readable format
  • --exclude="" – Files to exclude
    • e.g. Exclude the .git directory: --exclude=".git"

There are many other options as well – you can do a LOT with rsync!

Do a Dry-Run:

I often do a dry-run of rsync to preview what files will be copied over. This is useful for making sure your flags are correct and you won’t overwrite files you don’t wish to:

For this, we can use the -n or --dry-run flag:

# Copy the current directory
$ rsync -vzcrSLhp --dry-run ./ username@hostname.com:/var/www/some-site.com
#> building file list ... done
#> ... list of directories/files and some meta data here ...

Resume a Stalled Transfer:

Once in a while a large file transfer might stall or fail (while either using scp or rsync). We can actually use rsync to finish a file transfer!

For this, we can use the --partial flag, which tells rsync to not delete partially transferred files but keep them and attempt to finish its transfer on a next attempt:

$ rsync --partial --progress largefile.ext username@hostname:/path/to/largefile.ext

The Archive Option:

There’s also a -a or --archive option, which is a handy shortcut for the options -rlptgoD:

  • -r – Copy recursively
  • -l – Copy symlinks as symlinks
  • -p – Preserve permissions
  • -t – Preserve modification times
  • -g – Preserve group
  • -o – Preserve owner (User needs to have permission to change owner)
  • -D – Preserve special/device files. Same as --devices --specials. (User needs permissions to do so)
# Copy using the archive option and print some stats
$ rsync -a --stats /source/dir/path username@hostname:/destination/dir/path


1) technique

copy from source

tar -cf – /backup/ | pv | pigz | nc -l 8888

Destination

nc master.active.ai 8888 | pv | pigz -d | tar xf – -C /

2)
time tar -c /backup/ |pv|lz4 -B4| ssh -c aes128-ctr root@192.168.1.73 “lz4 -d |tar -xC /backup”

3) copy files using netcat

4) rysnc

50 MB /SEC

rsync -aHAXWxv –numeric-ids –no-i-r –info=progress2 -e “ssh -T -c chacha20-poly1305@openssh.com,aes192-cbc -o Compression=no -x” /backup/ root@192.168.1.73:/backup/

time rsync -aHAXWxv –numeric-ids –no-i-r –info=progress2 -e “ssh -T -c chacha20-poly1305@openssh.com,aes192-cbc -o Compression=no -x” /backup/ root@192.168.1.73:/backup/


hen copying to the local file system I always use the following rsync options:

# rsync -avhW --no-compress --progress /src/ /dst/

Here’s my reasoning:

-a is for archive, which preserves ownership, permissions etc.
-v is for verbose, so I can see what's happening (optional)
-h is for human-readable, so the transfer rate and file sizes are easier to read (optional)
-W is for copying whole files only, without delta-xfer algorithm which should reduce CPU load
--no-compress as there's no lack of bandwidth between local devices
--progress so I can see the progress of large files (optional)

70 MB / SEC
5) time tar cvf – /backup/* | ssh -T -c chacha20-poly1305@openssh.com,aes192-cbc -o Compression=no -x root@192.168.1.73 “tar xf – -C / ”

time tar cvf – /backup/* | pv | ssh -T -c chacha20-poly1305@openssh.com,aes192-cbc -o Compression=no -x root@192.168.1.73 “tar xf – -C / ”

time tar -cpSf – /backup/* | pv | ssh -T -c chacha20-poly1305@openssh.com,aes192-cbc -o Compression=no -x root@192.168.1.73 “tar xf – -C / ”

 6)
tar cvf - ubuntu.iso | gzip -9 - | split -b 10M -d - ./disk/ubuntu.tar.gz.



#!/bin/bash
# SETUP OPTIONS
export SRCDIR="/folder/path"
export DESTDIR="/folder2/path"
export THREADS="8"
# RSYNC DIRECTORY STRUCTURE
rsync -zr -f"+ */" -f"- *" $SRCDIR/ $DESTDIR/ \
# FOLLOWING MAYBE FASTER BUT NOT AS FLEXIBLE
# cd $SRCDIR; find . -type d -print0 | cpio -0pdm $DESTDIR/
# FIND ALL FILES AND PASS THEM TO MULTIPLE RSYNC PROCESSES
cd $SRCDIR  &&  find . ! -type d -print0 | xargs -0 -n1 -P$THREADS -I% rsync -az % $DESTDIR/%
# IF YOU WANT TO LIMIT THE IO PRIORITY, 
# PREPEND THE FOLLOWING TO THE rsync & cd/find COMMANDS ABOVE:
#   ionice -c2
rsync -zr -f"+ */" -f"- *" -e 'ssh -c arcfour' $SRCDIR/ remotehost:/$DESTDIR/ \
&& \
cd $SRCDIR  &&  find . ! -type d -print0 | xargs -0 -n1 -P$THREADS -I% rsync -az -e 'ssh -c arcfour' % remotehost:/$DESTDIR/% 

Parallelizing rsync

Last week I had a massive hardware failure on one of the GlusterFS storage nodes in the ILRI, Kenya Research Computing cluster: two drives failed simultaneously on the underlying RAID5. As RAID5 can only withstand one drive failure, the entire 31TB array was toast. FML.

After replacing the failed disks, rebuilding the array, and formatting my bricks, I decided I would use rsync to pre-seed my bricks from the good node before bringing glusterd back up.

tl;dr: rsync is amazing, but it’s single threaded and struggles when you tell it to sync large directory hierarchies. Here’s how you can speed it up.

rsync #fail

I figured syncing the brick hierarchy from the good node to the bad node was simple enough, so I stopped the glusterd service on the bad node and invoked:

# rsync -aAXv --delete --exclude=.glusterfs storage0:/path/to/bricks/homes/ storage1:/path/to/bricks/homes/

After a day or so I noticed I had only copied ~1.5TB (over 1 hop on a dedicated 10GbE switch!), and I realized something must be wrong. I attached to the rsync process with strace -p and saw a bunch of system calls in one particular user’s directory. I dug deeper:

# find /path/to/bricks/homes/ukenyatta/maker/genN_datastore/ -type d | wc -l
1398640

So this one particular directory in one user’s home contained over a million other directories and $god knows how many files, and this command itself took several hours to finish! To make matters worse, careful trial and error inspection of other user home directories revealed more massive directory structures as well.

What we’ve learned:

  • rsync is single threaded
  • rsync generates a list of files to be synced before it starts the sync
  • MAKER creates a ton of output files/directories ????

It’s pretty clear (now) that a recursive rsync on my huge directory hierarchy is out of the question!

rsync #winning

I had a look around and saw lots of people complaining about rsync being “slow” and others suggesting tips to speed it up. One very promising strategy was described on this wiki and there’s a great discussion in the comments.

Basically, he describes a clever use of find and xargs to split up the problem set into smaller pieces that rsync can process more quickly.

sync_brick.sh

So here’s my adaptation of his script for the purpose of syncing failed GlusterFS bricks, sync_brick.sh:

#!/usr/bin/env bash
# borrowed / adapted from: https://wiki.ncsa.illinois.edu/display/~wglick/Parallel+Rsync

# RSYNC SETUP
RSYNC_PROG=/usr/bin/rsync
# note the important use of --relative to use relative paths so we don't have to specify the exact path on dest
RSYNC_OPTS="-aAXv --numeric-ids --progress --human-readable --delete --exclude=.glusterfs --relative"
export RSYNC_RSH="ssh -T -c arcfour -o Compression=no -x"

# ENV SETUP
SRCDIR=/path/to/good/brick
DESTDIR=/path/to/bad/brick
# Recommend to match # of CPUs
THREADS=4
BAD_NODE=server1

cd $SRCDIR

# COPY
# note the combination of -print0 and -0!
find . -mindepth 1 -maxdepth 1 -print0 | \ 
    xargs -0 -n1 -P$THREADS -I% \
        $RSYNC_PROG $RSYNC_OPTS "%" $BAD_NODE:$DESTDIR

Pay attention to the source/destination paths, the number of THREADS, and the BAD_NODE name, then you should be ready to roll.

The Magic, Explained

It’s a bit of magic, but here are the important parts:

  • The -aAXv options to rsync tell it to archive, preserve ACLs, and preserve eXtended attributes. Extended attributes are critically important in GlusterFS >= 3.3, and also if you’re using SELinux.
  • The --exclude=.glusterfs option to rsync tells it to ignore this directory at the root of the directory, as the self-heal daemon?—?glustershd?—?will rebuild it based on the files’ extended attributes once we restart the glusterd service.
  • The --relative option to rsync is so we don’t have to bother constructing the destination path, as rsync will imply the path is relative to our destination’s top.
  • The RSYNC_RSH options influence rsync‘s use of SSH, basically telling it to use very weak encryption and disable any unnecessary features for non-interactive sessions (tty, X11, etc).
  • Using find with -mindepth 1 and -maxdepth 1 just means we concentrate on files/directories 1 level below each directory in our immediate hierarchy.
  • Using xargs with -n1 and -P tells it to use 1 argument per command line, and to launch $THREADS number of processes at a time.

Migrating to Amazon Linux 2

AWS also announced that Amazon Linux 2018.03 is the last release for the current generation of Amazon Linux and will be supported until June 30, 2020. Therefore, you have to come up with a migration plan.

Amazon Linux 2 comes with the same benefits as Amazon Linux, but it adds some new capabilities:

  • long-term support: Amazon Linux 2 supports each LTS release for five years
  • on-premises support: virtual machine images for on-premises development and testing are available
  • systemd: replacing SystemVinit
  • extras library: provides up-to-date versions of software bundles such as nginx

Let’s dive into some of the changes in more detail. At the end of the post, I will also outline some pitfalls I encountered when migrating our Free Templates for AWS CloudFormation to Amazon Linux 2.

Further reading: Release Notes, FAQs, AWS Blog Post, Announcement

Long-term support

The Amazon Linux delivers a continuous flow of updates that allow you to roll from one version of the Amazon Linux AMI to the most recent. A yum update always moves your system to the latest Amazon Linux version. There were no versions of Amazon Linux available, only snapshots.

Amazon Linux 2 changes this. You will have Amazon Linux 2 versions that are supplied with updates for five years. Once a new Amazon Linux 2 LTS release becomes available, no breaking changes will be introduced by AWS for this release.

systemd

Amazon Linux uses SysVinit to bootstrap the Linux user space and to manage system processes after booting. This procedure is usually called init. One of the major drawbacks of SysVinit is that it starts tasks serially, waiting for each to finish loading before moving on to the next. This can result in long delays during boot.

Amazon Linux 2 uses systemd as the init system. systemd executes elements of its startup sequence in parallel, which is faster than the traditional serial approach from SysVinit. systemd can also ensure that a service is running (e.g., it restarts a service if it crashed).

systemd is not just the name of the init system daemon but also refers to the entire software bundle around it, which includes:

  • journald: responsible for event logging (replaces syslog)
  • udevd: device manager for the Linux kernel, which handles the /dev directory and all user space actions when adding/removing devices
  • logind: manages user logins and seats in various ways.

I will not cover udevd and logind in this post. You should not get in touch with them as a normal user like me. Keep in mind that networking configuration is not controlled by networkd (also part of systemd software bundle). Instead, networking configuration is controlled by cloud-init which is triggered by systemd several times during boot. cloud-init handles early initialization of an EC2 instance (also works with other vendors).

Further reading: systemd man page

Reading logs from journald

To read all system logs (journal in journald terminology), starting with the oldest entry, run journalctl. The output is paged through less by default. Which means you can scroll down / up an entry with the DOWN / UP arrow keys, or scroll a full page down/up with the SPACE / b keys. Press the q key to quit. To reverse the order, run journalctl -r.

To show only the most recent journal entries, and continuously print new entries, run journalctl -f (like a tail -f).

There are many ways to filter the output. Based on priority, run journalctl -p err to get levels alert, crit, and err (using syslog log levels). Based on the unit, run journalctl -u sshd to get all entries for sshd. Check the further reading links for more information.

Keep in mind that some applications still write logs to /var/log. Journald also forwards logs to rsyslog which is configured (/etc/rsyslog.conf) to write some of them to files in /var/log.

Further reading: journalctl man page

Controlling systemd services

To start a service (unit in systemd terminology), you run:

systemctl start awslogsd.service

To make sure a service (unit in systemd terminology) is started during boot/reboot, you run:

systemctl enable awslogsd.service

There are many other commands. E.g., you can also reboot the system:

systemctl reboot

Further reading: systemctl man page

Extras Library

The Extras Library (aka Amazon Linux Extras Repository or Extras mechanism), provides a way to install up-to-date software bundles (topics in Amazon Linux 2 terminology) without impacting the stability of the rest of the operating system.

Extras Library is not covered by LTS!

To get a list of available topics, run:

$ amazon-linux-extras list
  0  ansible2                 available  [ =2.4.2 ]
  1  emacs                    available  [ =25.3 ]
  2  memcached1.5             available  [ =1.5.1 ]
  3  nginx1.12                available  [ =1.12.2 ]
  4  postgresql9.6            available  [ =9.6.6  =9.6.8 ]
  5  python3                  available  [ =3.6.2 ]
  6  redis4.0                 available  [ =4.0.5 ]
  7  R3.4                     available  [ =3.4.3 ]
  8  rust1                    available  [ =1.22.1  =1.26.0 ]
  9  vim                      available  [ =8.0 ]
 10  golang1.9                available  [ =1.9.2 ]
 11  ruby2.4                  available  [ =2.4.2  =2.4.4 ]
 12  nano                     available  [ =2.9.1 ]
 13  php7.2                   available  [ =7.2.0  =7.2.4  =7.2.5 ]
 14  lamp-mariadb10.2-php7.2  available  [ =10.2.10_7.2.0  =10.2.10_7.2.4  =10.2.10_7.2.5 ]
 15  libreoffice              available  [ =5.0.6.2_15 ]
 16  gimp                     available  [ =2.8.22 ]
 17  docker=latest            enabled    [ =17.12.1  =18.03.1 ]
 18  mate-desktop1.x          available  [ =1.19.0  =1.20.0 ]
 19  GraphicsMagick1.3        available  [ =1.3.29 ]
 20  tomcat8.5                available  [ =8.5.31 ]

To install an topic, run amazon-linux-extras install <topic> (e.g., amazon-linux-extras install ruby2.4).

If you install (or only enable) a topic, a new repository (plus two for sources and debuginfo) is configured in /etc/yum.repos.d/amzn2-extras.repo.

Pitfalls

I migrated Free Templates for AWS CloudFormation to Amazon Linux 2. In the following, I will outline the problems I was faced with and how I worked around them.

The awslogs agent was renamed

The awslogs agent was renamed to awslogsd but you still install it via yum install awslogs.

You can start (activate in systemd terminology) awslogs with systemctl start awslogsd.service (shortcut: systemctl start awslogsd).

The awslogs agent does not support journald

awslogs agent cannot read logs directly from the journal. journald fowards all logs to rsyslog which is configured (/etc/rsyslog.conf) to write some of the logs to files in /var/log from where the awslogs agent can pick them up.

Where are the log files?

/var/log does not contain all system logs anymore.

If in doubt, you can access all system logs with journalctl.

Ruby is missing

Ruby is no longer installed by default. This breaks cfn-init if you want to install RubyGems.

You can install Ruby 2.0 with yum install ruby or Ruby 2.4 with amazon-linux-extras install ruby2.4.

netcat is missing

netcat (or nc) is no longer installed by default.

You can install ncat with yum install nmap-ncat, but this will install nmap based ncat which behaves differently (e.g., no -z flag anymore). Learn more

Nginx package not available by default

nginx is no longer part of the default repository.

$ yum install nginx
Failed to set locale, defaulting to C
Loaded plugins: langpacks, update-motd
No package nginx available.
Error: Nothing to do

To install nginx, use the new Amazon Linux Extras Repository amazon-linux-extras install nginx1.12.

EPEL repository is missing

The EPEL repository (Extra Packages for Enterprise Linux) is no longer installed by default or available to install. The Extras Library replaces the EPEL repository but contains only a fraction of the packages which may causes troubles during your migration.

NAT and ECS optimized AMIs are missing

NAT and ECS optimized AMI are not available. You can replace your NAT instances with NAT Gateways to get around this problem. But for ECS workloads there is no easy workaround. I advise waiting for news from AWS regarding the ECS optimized AMI.

cfn-init is not integrated with the Extras Library

You can not install packages from the Extras Library with the package mechanism in cfn-init easily. cfn-init is the way how you can install software onto EC2 instances managed by CloudFormation.

There can either run amazon-linux-extras enable <topic> before running cfn-init which than can install the package by using the package mechanism. Or you can use two config sets. The first config sets uses the command mechanism to enable the topic. The second config set uses the package mechanism to install the enabled package. You have to use two config sets because commands run after package installation. Here is an example:

AutoScalingGroup:
  Type: 'AWS::AutoScaling::AutoScalingGroup'
  Properties:
    # [...]
LaunchConfiguration:
  Type: 'AWS::AutoScaling::LaunchConfiguration'
  Metadata:
    'AWS::CloudFormation::Init':
      configSets:
        default: [extras, config]
      extras:
        commands:
          a_enable_nginx:
            command: 'amazon-linux-extras enable nginx1.12'
            test: "[ ! grep -Fxq '[amzn2extra-nginx1.12]' /etc/yum.repos.d/amzn2-extras.repo ]"
      config:
        packages:
          yum:
            nginx: [] # will install nginx1.12
  Properties:
    # [...]
    UserData:
      'Fn::Base64': !Sub |
        #!/bin/bash -x
        /opt/aws/bin/cfn-init -v --stack ${AWS::StackName} --resource LaunchConfiguration --region ${AWS::Region}
        /opt/aws/bin/cfn-signal -e $? --stack ${AWS::StackName} --resource AutoScalingGroup --region ${AWS::Region}

Summary

Amazon Linux 2 is the new default for running Linux workloads on AWS. Amazon Linux 2 benefits from systemd, LTS, and a new extras library. There are a few pain points when migrating, most notably the missing EPEL repository. Besides that, you should spend some time to understand how systemd works, because that’s central in modern Linux operating systems.

It’s time to plan your migration from Amazon Linux now!