{"id":2407,"date":"2013-07-24T11:17:31","date_gmt":"2013-07-24T03:17:31","guid":{"rendered":"http:\/\/rmohan.com\/?p=2407"},"modified":"2013-07-24T11:18:42","modified_gmt":"2013-07-24T03:18:42","slug":"how-to-sync-files-to-amazon-s3-on-linux","status":"publish","type":"post","link":"https:\/\/mohan.sg\/?p=2407","title":{"rendered":"How to Sync Files to Amazon S3 on Linux"},"content":{"rendered":"<p>Amazon&#8217;s Simple Storage Service (S3) has a lot to like. It&#8217;s cheap, can be used for storing a little bit of data or as much as you want, and it can be used for distributing files publicly or just storing your private data. Let&#8217;s look at how you can take advantage of Amazon S3 on Linux.<\/p>\n<p>Amazon S3 isn&#8217;t what you&#8217;d want to use for storing just a little bit of personal data. For that, you might want to use Dropbox, SpiderOak, ownCloud, or SparkleShare. Which one depends on how much data, your tolerance for non-free software, and which features you prefer. For my work files, I use Dropbox \u2013 in large part because of its LAN sync feature.<\/p>\n<p>But S3 is really good if you need to make backups of a large amount of data, or smaller amounts but you need an offsite backup. It&#8217;s also good if you want to use S3 to host files for public distribution and don&#8217;t have a server or need to offload data sharing because of capacity issues. Maybe you just want to\u00a0<a href=\"http:\/\/www.ianwootten.co.uk\/2011\/09\/09\/hosting-an-octopress-blog-on-amazon-s3\">use it to host a blog, cheaply<\/a>. S3 also has some nifty features for content distribution and data storage from multiple regions, which we&#8217;ll get into another time.<\/p>\n<h2>Getting the Tools<\/h2>\n<p>You can use S3 in a number of ways on Linux, depending on how you&#8217;d like to manage your backups. If you look around, you&#8217;ll find a bunch of tools that support S3, including:<\/p>\n<ul>\n<li><a href=\"http:\/\/s3tools.org\/\">S3 Tools<\/a><\/li>\n<li><a href=\"http:\/\/duplicity.nongnu.org\/index.html\">Duplicity<\/a><\/li>\n<li><a href=\"http:\/\/live.gnome.org\/DejaDup\">Deja Dup<\/a><\/li>\n<li><a href=\"http:\/\/www.dragondisk.com\/\">DragonDisk<\/a><\/li>\n<\/ul>\n<p>S3 Tools and Duplicity are command line utilities that support S3. S3 Tools, as the name implies, focuses on Amazon S3. Duplicity has S3 support, but also supports several other methods of transferring files. Deja Dup is a fairly simple GNOME app for backups, which has S3 support thanks to Duplicity. Dragon Disk is a freeware (but not free software) utility that provides more fine-grained control of backups to S3. It also supports Google Cloud Storage and other cloud storage software.<\/p>\n<p>For the purposes of this article, I&#8217;m going to focus on S3 Tools. If you&#8217;re a GNOME user, it should take very little effort to set up Deja Dup for S3. We&#8217;ll tackle Duplicity and Dragon Disk another time.<\/p>\n<h2>S3 Tools<\/h2>\n<p>You might find S3 Tools in your distribution&#8217;s repositories. If not, the S3 Tools folks have\u00a0<a href=\"http:\/\/s3tools.org\/repositories\">package repositories<\/a>\u00a0and have support for several versions of Red Hat, CentOS, Fedora, openSUSE, SUSE Linux Enterprise, Debian, and Ubuntu. You&#8217;ll also find instructions on adding the tools on the package repositories page.<\/p>\n<p>Once you have S3 Tools installed, you need to configure it with your Amazon S3 credentials. If you haven&#8217;t signed up for them yet, hit the\u00a0<strong>Sign Up<\/strong>\u00a0button at the top of\u00a0<a href=\"http:\/\/aws.amazon.com\/s3\/\">the S3 overview page<\/a>. You&#8217;ll also want to look at the\u00a0<a href=\"http:\/\/aws.amazon.com\/s3\/#pricing\">pricing<\/a>, which starts at $0.125 per GB per month.<\/p>\n<p>The\u00a0<a href=\"http:\/\/calculator.s3.amazonaws.com\/calc5.html\">pricing calculator<\/a>\u00a0can help you get an idea how much it would cost to store your data in S3. For example, if you&#8217;re storing 100GB in S3, it would run about $12.50 per month &#8211; before any costs for data transfer\u00a0<em>out<\/em>\u00a0of S3. Transfer\u00a0<em>in<\/em>\u00a0to S3 is free. Amazon also charges for get\/put requests and so forth &#8211; so if you&#8217;re using S3 to serve up content, then the pricing is going to be higher.<\/p>\n<p>Back to the tools. You need to configure s3cmd (the command line utility from the S3 Tools project) like so:<\/p>\n<p><code>s3cmd --configure<\/code><\/p>\n<p>It will walk you through adding your Amazon credentials and GPG information if you want to encrypt files while stored on S3. Amazon&#8217;s storage is supposed to be private, but you should always assume that data stored on remote servers is potentially visible to others. Since I&#8217;m storing information that has no real need for privacy (WordPress backups, MP3s, photos that I&#8217;d happily publish online anyway) I don&#8217;t worry overmuch about encrypting for storage on S3.<\/p>\n<p>There&#8217;s another advantage of foregoing GPG encryption, which is that s3cmd can use an rsync-like algorithm for syncing files instead of just re-copying everything.<\/p>\n<p>Now to copy files and use s3cmd sync. You&#8217;ll find that the s3cmd syntax mimics standard *nix commands. Want to see what is being stored in your S3 account? Use\u00a0<code>s3cmd ls<\/code>\u00a0to show all buckets. (Amazon calls &#8217;em buckets instead of directories.)<\/p>\n<p>Want to copy between buckets? Use\u00a0<code>s3cmd cp\u00a0<em>bucket1<\/em>\u00a0<em>bucket2<\/em><\/code>. Note that buckets are specified by the syntax\u00a0<em>s3:\/\/bucketname<\/em>.<\/p>\n<p>To put files in a bucket, use\u00a0<code>s3cmd put\u00a0<em>filename<\/em>\u00a0<em>s3:\/\/bucket<\/em><\/code>. To get files, use\u00a0<code>s3cmd get<em>filename<\/em>\u00a0<em>local<\/em><\/code>. To upload directories, you need to use the\u00a0<code>--recursive<\/code>\u00a0option.<\/p>\n<p>But if you want to sync files and save yourself some trouble down the road, there&#8217;s the\u00a0<code>sync<\/code>\u00a0command. It&#8217;s dead simple to use:<\/p>\n<p><code>s3cmd sync\u00a0<em>directory<\/em>\u00a0<em>s3:\/\/bucket\/<\/em><\/code><\/p>\n<p>The first time, it will copy up all files. The next time it will only copy up files that don&#8217;t already exist on Amazon S3. However, if you want to get rid of files that you have removed locally, use the\u00a0<code>--delete-removed<\/code>\u00a0option. Note that you should test this with the\u00a0<code>--dry-run<\/code>\u00a0option first. You can accidentally delete files that way.<\/p>\n<p>It&#8217;s pretty simple to use\u00a0<code>s3cmd<\/code>, and you should look at its man page as well. It even has some support for the CloudFront CDN service if you need that. Happy syncing!<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<h3><a href=\"http:\/\/s3tools.org\/download\">Download<\/a><\/h3>\n<div>S3cmd source code and packages for major linux distributions can be downloaded on our\u00a0<a href=\"http:\/\/s3tools.org\/download\">Download page<\/a><\/div>\n<div>Here are currently supportive *nux distributions<\/div>\n<div><\/div>\n<div>\n<table>\n<tbody>\n<tr>\n<th>Repository<\/th>\n<th>repo file<\/th>\n<\/tr>\n<tr>\n<td>\n<a href=\"http:\/\/s3tools.org\/repo\/RHEL_5\/\">RHEL\u00a05 &amp; CentOS 5<\/a><\/td>\n<td>\n<a href=\"http:\/\/s3tools.org\/repo\/RHEL_5\/s3tools.repo\">s3tools.repo<\/a><\/td>\n<\/tr>\n<tr>\n<td><a href=\"http:\/\/s3tools.org\/repo\/RHEL_6\/\">RHEL\u00a06 &amp; CentOS 6<\/a><\/td>\n<td><a href=\"http:\/\/s3tools.org\/repo\/RHEL_6\/s3tools.repo\">s3tools.repo<\/a><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td><a href=\"http:\/\/s3tools.org\/repo\/Fedora_12\/\">Fedora 12<\/a><\/td>\n<td><a href=\"http:\/\/s3tools.org\/repo\/Fedora_12\/s3tools.repo\">s3tools.repo<\/a><\/td>\n<\/tr>\n<tr>\n<td><a href=\"http:\/\/s3tools.org\/repo\/Fedora_13\/\">Fedora 13<\/a><\/td>\n<td><a href=\"http:\/\/s3tools.org\/repo\/Fedora_13\/s3tools.repo\">s3tools.repo<\/a><\/td>\n<\/tr>\n<tr>\n<td><a href=\"http:\/\/s3tools.org\/repo\/Fedora_14\/\">Fedora 14<\/a><\/td>\n<td><a href=\"http:\/\/s3tools.org\/repo\/Fedora_14\/s3tools.repo\">s3tools.repo<\/a><\/td>\n<\/tr>\n<tr>\n<td><a href=\"http:\/\/s3tools.org\/repo\/openSUSE_11.1\/\">openSUSE 11.1<\/a><\/td>\n<td><a href=\"http:\/\/s3tools.org\/repo\/openSUSE_11.1\/s3tools.repo\">s3tools.repo<\/a><\/td>\n<\/tr>\n<tr>\n<td><a href=\"http:\/\/s3tools.org\/repo\/openSUSE_11.2\/\">openSUSE 11.2<\/a><\/td>\n<td><a href=\"http:\/\/s3tools.org\/repo\/openSUSE_11.2\/s3tools.repo\">s3tools.repo<\/a><\/td>\n<\/tr>\n<tr>\n<td><a href=\"http:\/\/s3tools.org\/repo\/openSUSE_11.3\/\">openSUSE 11.3<\/a><\/td>\n<td><a href=\"http:\/\/s3tools.org\/repo\/openSUSE_11.3\/s3tools.repo\">s3tools.repo<\/a><\/td>\n<\/tr>\n<tr>\n<td><a href=\"http:\/\/s3tools.org\/repo\/openSUSE_Factory\/\">openSUSE Factory<\/a><\/td>\n<td><a href=\"http:\/\/s3tools.org\/repo\/openSUSE_Factory\/s3tools.repo\">s3tools.repo<\/a><\/td>\n<\/tr>\n<tr>\n<td><a href=\"http:\/\/s3tools.org\/repo\/SLE_10\/\">SLES\u00a010<\/a><\/td>\n<td><a href=\"http:\/\/s3tools.org\/repo\/SLE_10\/s3tools.repo\">s3tools.repo<\/a><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td><a href=\"http:\/\/s3tools.org\/repo\/SLE_11\/\">SLES\u00a011<\/a><\/td>\n<td><a href=\"http:\/\/s3tools.org\/repo\/SLE_11\/s3tools.repo\">s3tools.repo<\/a><\/td>\n<\/tr>\n<tr>\n<td><a href=\"http:\/\/s3tools.org\/repo\/deb-all\/stable\/\">Debian &amp; Ubuntu<\/a><\/td>\n<td><a href=\"http:\/\/s3tools.org\/repo\/deb-all\/stable\/s3tools.list\">s3tools.list<\/a><\/td>\n<td><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div><\/div>\n<div>Steps to Perform for installing s3cmd<\/div>\n<div><\/div>\n<ol>\n<li>Login as a superuser (root), launch a terminal.<\/li>\n<li>Go to\u00a0<ins>\/etc\/yum.repos.d ( You can use ftp ot wget commands)<\/ins><\/li>\n<li>Download respective\u00a0<strong>s3tools.repo<\/strong>\u00a0file for your distribution. For example \u00a0 \u00a0 wget\u00a0h<ins>ttp:\/\/s3tools.org\/repo\/CentOS_5\/s3tools.repo<\/ins>\u00a0if you\u2019re on\u00a0<strong>CentOS 5\u00a0<\/strong><\/li>\n<li>Run\u00a0<ins>yum install s3cmd<\/ins>\u00a0if you don\u2019t have s3cmd rpm package installed yet.<\/li>\n<li>Do\u00a0<ins>yum upgrade s3cmd<\/ins>\u00a0if you already have s3cmd rpm installed and long for a newer version.<\/li>\n<li>You will be asked to accept a new\u00a0GPG\u00a0key \u2013 answer\u00a0<strong>yes<\/strong>\u00a0(for twice).<\/li>\n<li>That\u2019s it. From Next time you run\u00a0<ins>yum upgrade<\/ins>\u00a0you\u2019ll automatically get the very latest\u00a0<strong>s3cmd<\/strong>\u00a0for your system.<\/li>\n<\/ol>\n<h3>s3cmd<\/h3>\n<p><a href=\"http:\/\/s3tools.org\/s3cmd\" target=\"_blank\">s3cmd<\/a>\u00a0is a free Linux command line tool for uploading and downloading data to and from your Amazon S3 account.<\/p>\n<p><a href=\"http:\/\/s3tools.org\/download\" target=\"_blank\">Download and install s3tools manually<\/a>\u00a0or do what I did and\u00a0<a href=\"http:\/\/s3tools.org\/repositories\" target=\"_blank\">add their package repository to your package manager<\/a>\u00a0for a much easier install.<\/p>\n<p>After installing s3cmd configure it by running the following command:<br \/>\n<code># s3cmd --configure<\/code><br \/>\nEnter your Access Key ID and Secret Access Key discussed earlier and use the default settings for the rest of the options unless you know otherwise.<\/p>\n<p>If you haven\u2019t already created a bucket you can do that now with s3cmd:<br \/>\n<code># s3cmd mb s3:\/\/unique-bucket-name<\/code><br \/>\nList your current buckets to make sure you successfully created one:<br \/>\n<code># s3cmd ls<br \/>\n2010-10-30 02:15 s3:\/\/your-bucket-name<\/code><br \/>\nYou can now upload, list, and download content:<br \/>\n<code># s3cmd put somefile.txt s3:\/\/your-bucket-name\/somefile.txt<br \/>\nsomefile.txt -&gt; s3:\/\/your-bucket-name\/somefile.txt [1 of 1]<br \/>\n17835 of 17835 100% in 0s 35.79 kB\/s done<br \/>\n# s3cmd ls s3:\/\/your-bucket-name<br \/>\n2010-10-30 02:20 17835 s3:\/\/your-bucket-name\/somefile.txt<br \/>\n# s3cmd get s3:\/\/your-bucket-name\/somefile.txt somefile-2.txt<br \/>\ns3:\/\/your-bucket-name\/somefile.txt -&gt; somefile-2.txt [1 of 1]<br \/>\n17835 of 17835 100% in 0s 39.77 kB\/s done<br \/>\n<\/code><br \/>\nA much better and more advanced method of backing up your data is to use \u2018sync\u2019 instead of \u2018put\u2019 or \u2018get\u2019. Read more about how I use sync in the next section.<\/p>\n<h3>Automate backup with a shell script and cron job<\/h3>\n<p>Below is a sample of the shell script I wrote to backup one of my servers:<br \/>\n<code>#!\/bin\/sh<br \/>\n# Syncronize \/root with S3<br \/>\ns3cmd sync --recursive \/root\/ s3:\/\/my-bucket-name\/root\/<br \/>\n# Syncronize \/home with S3<br \/>\ns3cmd sync --recursive \/home\/ s3:\/\/my-bucket-name\/home\/<br \/>\n# Syncronize crontabs with S3<br \/>\ns3cmd sync \/var\/spool\/cron\/ s3:\/\/my-bucket-name\/cron\/<br \/>\n# Syncronize \/var\/www\/vhosts with S3<br \/>\ns3cmd sync --exclude 'mydomain.com\/some-directory\/*.jpg' --recursive \/var\/www\/vhosts\/ s3:\/\/my-bucket-name\/vhosts\/<br \/>\n# Syncronize MySQL databases with S3<br \/>\nmysqldump -u root --password=mysqlpassword --all-databases --result-file=\/root\/all-databases.sql<br \/>\ns3cmd put \/root\/all-databases.sql s3:\/\/my-bucket-name\/mysql\/<br \/>\nrm -f \/root\/all-databases.sql<\/code><br \/>\nI use \u2018s3cmd sync \u2013recursive \/root\/ s3:\/\/my-bucket-name\/root\/\u2019 and \u2018s3cmd sync \u2013recursive \/home\/ s3:\/\/my-bucket-name\/home\/\u2019 to synchronize all data in the local \/root and \/home directories including their subdirectories with S3. I use \u2018sync\u2019 instead of \u2018put\u2019 because I do not always know exactly what files are stored in these folders. I want everything backed up, including any new files created in the future.<\/p>\n<p>With \u2018s3cmd sync \/var\/spool\/cron\/ s3:\/\/my-bucket-name\/cron\/\u2019 I omit \u2018\u2013recursive\u2019 because I do not care about any subdirectories (there aren\u2019t any).<\/p>\n<p>With \u201cs3cmd sync \u2013exclude \u2018mydomain.com\/some-directory\/*.jpg\u2019 \u2013recursive \/var\/www\/vhosts\/ s3:\/\/my-bucket-name\/vhosts\/\u201d I synchronize \/var\/www\/vhosts but exclude all jpg files inside a particular directory because they are replaced very frequently by new versions and are unimportant to me once they are a few minutes old.<\/p>\n<p>Using\u00a0<a href=\"http:\/\/dev.mysql.com\/doc\/refman\/5.5\/en\/mysqldump.html\" target=\"_blank\">mysqldump<\/a>\u00a0I export all databases to a text file that can be easily used to recreate them if needed. I upload the newly created file using \u2018s3cmd put \/root\/hold-for-S3\/all-databases s3:\/\/my-bucket-name\/mysql\/\u2019.<\/p>\n<p>To read more about sync and its options such as \u2018\u2013dry-run\u2019, \u2018\u2013skip-existing\u2019, and \u2018\u2013delete-removed\u2019 read\u00a0<a href=\"http:\/\/s3tools.org\/s3cmd-sync\" target=\"_blank\">http:\/\/s3tools.org\/s3cmd-sync<\/a>.<\/p>\n<p>Create a cron job to execute your shell script as often as you like. Now you can be less worried about losing all your important data.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Amazon&#8217;s Simple Storage Service (S3) has a lot to like. It&#8217;s cheap, can be used for storing a little bit of data or as much as you want, and it can be used for distributing files publicly or just storing your private data. Let&#8217;s look at how you can take advantage of Amazon S3 on [&#8230;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[49],"tags":[],"_links":{"self":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts\/2407"}],"collection":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2407"}],"version-history":[{"count":3,"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts\/2407\/revisions"}],"predecessor-version":[{"id":2409,"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts\/2407\/revisions\/2409"}],"wp:attachment":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2407"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2407"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2407"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}