10 Sep 2003 rsbak3 8 rsbak3 Rsync Backup (Version 3) rsbak3 -v config-file config-name Introduction The rsbak3 command can be used to backup one host to another hosts harddisc using rsync. In addition to just using rsync to keep two filesystems in sync, rsbak3 is doing incremental backups and saves multi-generational history data. When rsync is syncing a file, it is re-creating the file under a 2nd name and moving it over the original name - so it's "unlinking" hardlinks. rsbak3 is taking advantage of this for making incremental backups: The old backup is "link-copied" to a new location (see "cp -al") and then this copy of the backup-tree is rsynced. So all files which have been modified get unlinked an re-created and all files which have been removed will get unlinked in the new tree - but everything which is left unmodified will stay linked to the original backup and so doesn't need any additional disc space. Of course this does backup hardlinks on the original filesystem as seperate files on the backup machine. rsbak3 needs to run as root if all the file permissions should be backed up correctly. In this case I recommend making the directory tree in which the backup is stored root-only accessable to avoid troubles caused by different uid/gid schemas on the backup host and the backed-up host. It's reccommended to store the backups made with rsbak3 on a filesystem which supports tail-merging small directories since the directories themself can't be hardlinked but need to be copied for every history version. Reiserfs is such a filesystem; ext2 isn't. Making backups using rsbak3 is rather easy: A configuration file contains the definitions (rsync urls, exclude/include rules, etc.) for one or more filesystems which should be backed up. rsbak3 is taking the name of this configuration file as first parameter. The second and all further parameters (which are optional optional) can be used to specify which of the filesystems in the configuration file should be backed up. Usually rsbak3 is run from the system crontab and not executed manually. Configuration File An rsbak3 configuration file contains one or more sections about filesystems which should be backed up. Those sections start with a line containing a backup-name in brackets (spaces between the name and the brackets are matadory). A section can match multiple backup-names usind the * wildcard (e.g. host1/*). See this example /etc/rsbak3.conf file: [ * ] backup-dir /mnt/rsbak3 generations 10:7 10:4 12:12 [ host1-* ] password secret compress bwlimit 50 exclude *~ [ host1-system ] master backup@host1.example.com::backup/ system-exclude exclude home/** [ host1-home ] master backup@host1.example.com::backup/home/ exclude *.mp3 exclude *.ogg exclude *.mpg exclude *.mpeg exclude *.avi exclude *.wmv exclude *.iso [ host2 ] master backup@host2.example.com::backup/ password secret whole-file include-tree /home/alice/** include-tree /home/bob/** include-tree /usr/local/** include-tree /etc/** include-tree /var/lib/www/** exclude * So this example configuration describes 3 backups: host1/system, host1/home and host2. The first section is used for all backups, the 2nd section is used for host1/system and host1/home and the last three sections are only used for one specific backup each. For each backup there must be at least the following tags specified in the configuration file: master, backupdir and generations. Configuration File Tags master This is the source for the backup. It can be any valid rsync remote location. See rsync(1) for details. backup-dir This is the local directory the backup is stored to. rsbak3 will create sub-directories with the backup-names as stored in the configuration file here and create the backups themself within those subdirectories. generations This is the definition how many generations should be created, how many backups they are holding each and how often a backup evolves to the next generation. See the section about generations for details. password The password which should be used when connecting using the rsync protocol. This is not used when connecting using rsh, ssh or any other remote shell. This tag will set the RSYNC_PASSWORD evironment variable for the rsync process; see rsync(1) for details. If you are using this configuration tag, the password will be stored in the rsbak3 configuration file. This is a potential security risk. Make your config file only readable by root if you are using this tag! password-file This option allows you to provide a password in a file for accessing a remote rsync server. Note that this option is only useful when accessing a rsync server using the built in transport, not when using a remote shell as the transport. The file must not be world readable. It should contain just the password as a single line. exclude Don't backup files matching this exclude pattern. See 'EXCLUDE PATTERNS' in rsync(1) for details. exclude-from Don't backup files matching the exclude patterns in the file. See 'EXCLUDE PATTERNS' in rsync(1) for details. include Backup files matching the include pattern even if a later exclude pattern would prevent the file from beeing backed up. See 'EXCLUDE PATTERNS' in rsync(1) for details. include-from Backup files matching the include patterns in the file even if a later exclude pattern would prevent the file from beeing backed up. See 'EXCLUDE PATTERNS' in rsync(1) for details. include-tree Automatically create include patterns for each element of the given path. E.g. "include-tree /home/alice/project/webpage" would produce include patterns for "/home", "/home/alice", "/home/alice/project" and "home/alice/project/webpage". See 'EXCLUDE PATTERNS' in rsync(1) why this is usefull. cvs-exclude This is a useful shorthand for excluding a broad range of files that you often donīt want to transfer between systems. It uses the same algorithm that CVS uses to determine if a file should be ignored. Be careful with that option. It would prevent stuff like CVS checkouts from beeing backed-up correctly. system-exclude This is a useful shorthand for excluding the contents of the /tmp, /dev, /proc, /sys and every lost+found/ directory as well as excluding every .journal file. compress Compress the network traffic. See rsync(1) for details. bwlimit Limit the bandwith the the given number of kilobytes per second. See rsync(1) for details. whole-file Transfer whole files instead of using the incremental rsync algorithm. See rsync(1) for details. rsh-command Use the given command instead of rsh when connecting to the remote host using a remote shell protocol. Usually you want to set this to "ssh". rsync-option Append the specified option(s) to the rsync call. Use this with care! See rsync(1) for details. Backup Generations Backup Generations are "layers" of backups. E.g. many people make daily, weekly and monthly backups. This would be three backup generations. The "generations" tag in the backup configuration is used to specify how many backup generations will be created. E.g. generations 10:7 5:4 12:0 will create three backup generations. The first generation can hold no more than 10 incremental backups. When the 11th backup is created, the oldest backup expires and is eighter removed or moved to the next generation. In this case every 7th backup which expires is moved to the next backup generation. So if you make daily backups, the 2nd backup generation will contain weekly snapshots and the 3rd generation will contain snapshots for 4 weeks intervals (which is almost a month). Note that the 2nd number for the last backup generation is ignored since there is no upper-level generation where backups could be moved to. Setting up rsync on the backed-up host Create an /etc/rsync.conf file containing a section for the backup: [ backup ] path = / read only = yes list = no uid = root gid = root auth users = backup secrets file = /etc/rsyncd.secrets hosts allow = backup.example.com And add the user 'backup' and his password to the /etc/rsyncd.secrets file (and make sure that only root can read that file): backup:secret Finally add rsyncd to your /etc/inetd.conf: rsync stream tcp nowait root /usr/bin/rsync rsyncd --daemon See rsync(1) for details on how to set up an rsync server. Secure Backups using SSH If you are making makups over the wild wild web (the public internet) or any other unsecure links (like 99% of all office networks), you might want to use an encrypted connection. One possible solution is running rsync over ssh and authenticating using an ssh-key. First you need to create an ssh-key on the backup host. This key will late on be used to authenticate the backup host on the backed-up host: ssh-keygen -t dsa -N '' -f /etc/rsbak3_key Next we need to tell rsbak3 that it should use ssh for the connection and use the key you've just created for authentication. Add the following to the rsbak3 configuration rsh-command ssh -o BatchMode=yes -i /etc/rsbak3_key and make sure that you are not using the rsync protocol for connecting to the backed-up host (one colon between host and directory, no rsync://). See rsync(1) for details on the rsync path syntax. Now the backed-up host. Instead of creating a /etc/rsyncd.conf and enabling rsync to the inetd configuration, add the public key you've created to the ~/.ssh/authorized_keys2 file of the user of your choice and add the following options (see sshd(8) for details): from="backup.example.com",command="/usr/local/sbin/rsb3swr",no-pty,no-port-forwarding public-key-from-rsbak3_key.pub Note that in some cases you need to run rsyncd as root in order to read all files on the filesystem - e.g. when doing a full backup of the operating system. Always test if it is only possible to run rsyncd using 'ssh -i /etc/rsbak3_key'. E.g. logging in to a normal shell without a password or transfering data to the host (i.e. wrong direction) using rsync must not be possible! Backing up Databases Rsbak3 can also be used to back up databases using database dumps and xdelta. This reuires xdelta and a gzip with support for the --rsyncable option to be installed on the host beeing backed up. An additional small shell script (rsbak3dump) is used on the host beeing backed up to create and maintain a directory containing a current database dump and xdeltas to earlier dumps. A cron-job must be set up to do this, somthing like: 0 2 * * * mysql rsbak3dump 'mysqldump test' /var/backups/testdb 15 The first parameter to rsbak3dump is the command which creates the database dump as output. The 2nd parameter is the directory which should contain the dumped data and the 3rd parameter is the number of xdeltas which should be kept in the directory. Keep care that rsbak3dump has finished before rsbak3 is backing up the directory! Just for the case that rsbak3 is running simultaneous with rsbak3dump, it is reccomended to exclude *.tmp in the rsbak3 configuration for the dump directory. The trick with rsbak3dump is that the directory which is beeing backed up provides already some kind of 'history information' with the xdeltas stored there. So there is no need to keep multiple 1st generation backups on the backup server. Insted the top-level generation keeps only one concurrent backup which contains the current dump and the xdeltas. So the generations entry on the server for the dumps would look like: generations 1:7 4:0 Since each 'snapshot' of the backup directory contains one compressed database dump and xdeltas, it's possible to restore every state of the database in the last month by taking one dump and applying the xdeltas as necessary. Since there are forward- as well as backward-deltas, it is always possible to restore every state from two different dumps. Backup Filesystem Structure rsbak3 creates subdirectories for all backups using the name specified in the configuration file as directory name. Those directories contain subdirectories "generation_0" to "generation_9" (they are created on demand - so they might not all be present) containing the backups as well as a symbolic link "latest" which points to the latest backup directory. The backup directories are named after the exact time when they have been created, using the format "YYYYMMDD-HHMMSS.bak". An example tree would be: /mnt/rsbak3 `- host1-home | `- generation_0 | | `- 20030909-030012.bak | | `- 20030910-030011.bak | | `- 20030911-030018.bak | | `- 20030913-030009.bak | | `- 20030914-030010.bak | | `- 20030915-030012.bak | | `- 20030916-030014.bak | | `- 20030917-030008.bak | | `- 20030918-030010.bak | | | `- generation_1 | | `- 20030901-030007.bak | | `- 20030908-030011.bak | | | `- latest -> generation_0/20030909-030012.bak | `- host1-system | `- ... | `- host2 `- ... Analysing Backup Directories The additional tool rsbak3diff can be used to list the files which have been modified, added or removed between two backups. This can be very usefull to find out which part of the backup eats up all your diskspace. Known Bugs By design, rsbak3 can not backup to filesystems which do not support hard-links. Also by design, rsbak3 is not very efficient (in terms of disk space needs) for making backups of huge files which are often modified slightly, like database files, log files or mbox files since rsbak3 is creating a seperate copy of those files for every backup. Check if your database has built-in support for incremental backups and if so, use it and exclude the database files from your rsbak3 backup. Exclude logfiles from the rsbak3 backup and archive the old logs when "roll over". And think about switching from mbox to maildir for your mailboxes in order to keep the diskusage overhead in your backup as small as possible. It needs some testing if rsbak3 is compatibe with "rsync -H". Author Written by Clifford Wolf clifford@clifford.at at LINBIT Information Technologies. Original rsbak and rsbak2 have been written by Philipp Richter philipp.richter@linbit.com. Reporting Bugs Report bugs to opensource@linbit.com. Copyright Copyright (c) 2003 Clifford Wolf. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILIT or FITNESS FOR A PARTICULAR PURPOSE. See Also rsync 1 , ssh 1 , mkreiserfs 8