10 Sep 2003
rsbak3
8
rsbak3
Rsync Backup (Version 3)
rsbak3
-v
config-file
config-name
Introduction
The rsbak3 command can be used to backup one host to another hosts
harddisc using rsync. In addition to just using rsync to keep two
filesystems in sync, rsbak3 is doing incremental backups and saves
multi-generational history data.
When rsync is syncing a file, it is re-creating the file under a 2nd
name and moving it over the original name - so it's "unlinking" hardlinks.
rsbak3 is taking advantage of this for making incremental backups: The
old backup is "link-copied" to a new location (see "cp -al") and then
this copy of the backup-tree is rsynced. So all files which have been
modified get unlinked an re-created and all files which have been removed
will get unlinked in the new tree - but everything which is left unmodified
will stay linked to the original backup and so doesn't need any additional
disc space.
Of course this does backup hardlinks on the original filesystem as seperate
files on the backup machine. rsbak3 needs to run as root if all the file
permissions should be backed up correctly. In this case I recommend making
the directory tree in which the backup is stored root-only accessable to
avoid troubles caused by different uid/gid schemas on the backup host and
the backed-up host.
It's reccommended to store the backups made with rsbak3 on a filesystem
which supports tail-merging small directories since the directories themself
can't be hardlinked but need to be copied for every history version.
Reiserfs is such a filesystem; ext2 isn't.
Making backups using rsbak3 is rather easy: A configuration file contains
the definitions (rsync urls, exclude/include rules, etc.) for one or more
filesystems which should be backed up. rsbak3 is taking the name of this
configuration file as first parameter. The second and all further
parameters (which are optional optional) can be used to specify which of
the filesystems in the configuration file should be backed up. Usually
rsbak3 is run from the system crontab and not executed manually.
Configuration File
An rsbak3 configuration file contains one or more sections about filesystems
which should be backed up. Those sections start with a line containing a
backup-name in brackets (spaces between the name and the brackets are
matadory). A section can match multiple backup-names usind the * wildcard
(e.g. host1/*). See this example /etc/rsbak3.conf file:
[ * ]
backup-dir /mnt/rsbak3
generations 10:7 10:4 12:12
[ host1-* ]
password secret
compress
bwlimit 50
exclude *~
[ host1-system ]
master backup@host1.example.com::backup/
system-exclude
exclude home/**
[ host1-home ]
master backup@host1.example.com::backup/home/
exclude *.mp3
exclude *.ogg
exclude *.mpg
exclude *.mpeg
exclude *.avi
exclude *.wmv
exclude *.iso
[ host2 ]
master backup@host2.example.com::backup/
password secret
whole-file
include-tree /home/alice/**
include-tree /home/bob/**
include-tree /usr/local/**
include-tree /etc/**
include-tree /var/lib/www/**
exclude *
So this example configuration describes 3 backups: host1/system,
host1/home and host2. The first section is used for all backups,
the 2nd section is used for host1/system and host1/home and the
last three sections are only used for one specific backup each.
For each backup there must be at least the following tags specified
in the configuration file: master, backupdir and generations.
Configuration File Tags
master
This is the source for the backup. It can be any valid rsync
remote location. See rsync(1) for details.
backup-dir
This is the local directory the backup is stored to. rsbak3 will create
sub-directories with the backup-names as stored in the configuration
file here and create the backups themself within those subdirectories.
generations
This is the definition how many generations should be created, how
many backups they are holding each and how often a backup evolves to
the next generation. See the section about generations for details.
password
The password which should be used when connecting using the rsync
protocol. This is not used when connecting using rsh, ssh or any other
remote shell. This tag will set the RSYNC_PASSWORD evironment variable
for the rsync process; see rsync(1) for details.
If you are using this configuration tag, the password will be stored
in the rsbak3 configuration file. This is a potential security risk.
Make your config file only readable by root if you are using this tag!
password-file
This option allows you to provide a password in a file for
accessing a remote rsync server. Note that this option
is only useful when accessing a rsync server using the
built in transport, not when using a remote shell as the
transport. The file must not be world readable. It should
contain just the password as a single line.
exclude
Don't backup files matching this exclude pattern.
See 'EXCLUDE PATTERNS' in rsync(1) for details.
exclude-from
Don't backup files matching the exclude patterns in the file.
See 'EXCLUDE PATTERNS' in rsync(1) for details.
include
Backup files matching the include pattern even if a later exclude
pattern would prevent the file from beeing backed up.
See 'EXCLUDE PATTERNS' in rsync(1) for details.
include-from
Backup files matching the include patterns in the file even if a
later exclude pattern would prevent the file from beeing backed up.
See 'EXCLUDE PATTERNS' in rsync(1) for details.
include-tree
Automatically create include patterns for each element of the given
path. E.g. "include-tree /home/alice/project/webpage" would produce
include patterns for "/home", "/home/alice", "/home/alice/project"
and "home/alice/project/webpage".
See 'EXCLUDE PATTERNS' in rsync(1) why this is usefull.
cvs-exclude
This is a useful shorthand for excluding a broad range
of files that you often donīt want to transfer between
systems. It uses the same algorithm that CVS uses to
determine if a file should be ignored.
Be careful with that option. It would prevent stuff like CVS
checkouts from beeing backed-up correctly.
system-exclude
This is a useful shorthand for excluding the contents of the /tmp,
/dev, /proc, /sys and every lost+found/ directory as well as
excluding every .journal file.
compress
Compress the network traffic.
See rsync(1) for details.
bwlimit
Limit the bandwith the the given number of kilobytes per second.
See rsync(1) for details.
whole-file
Transfer whole files instead of using the incremental rsync algorithm.
See rsync(1) for details.
rsh-command
Use the given command instead of rsh when connecting to the remote host
using a remote shell protocol. Usually you want to set this to "ssh".
rsync-option
Append the specified option(s) to the rsync call. Use this with care!
See rsync(1) for details.
Backup Generations
Backup Generations are "layers" of backups. E.g. many people make daily,
weekly and monthly backups. This would be three backup generations. The
"generations" tag in the backup configuration is used to specify how many
backup generations will be created. E.g.
generations 10:7 5:4 12:0
will create three backup generations. The first generation can hold no
more than 10 incremental backups. When the 11th backup is created, the
oldest backup expires and is eighter removed or moved to the next
generation. In this case every 7th backup which expires is moved to the
next backup generation. So if you make daily backups, the 2nd backup
generation will contain weekly snapshots and the 3rd generation will
contain snapshots for 4 weeks intervals (which is almost a month).
Note that the 2nd number for the last backup generation is ignored since
there is no upper-level generation where backups could be moved to.
Setting up rsync on the backed-up host
Create an /etc/rsync.conf file containing a section for the backup:
[ backup ]
path = /
read only = yes
list = no
uid = root
gid = root
auth users = backup
secrets file = /etc/rsyncd.secrets
hosts allow = backup.example.com
And add the user 'backup' and his password to the /etc/rsyncd.secrets
file (and make sure that only root can read that file):
backup:secret
Finally add rsyncd to your /etc/inetd.conf:
rsync stream tcp nowait root /usr/bin/rsync rsyncd --daemon
See rsync(1) for details on how to set up an rsync server.
Secure Backups using SSH
If you are making makups over the wild wild web (the public internet) or
any other unsecure links (like 99% of all office networks), you might
want to use an encrypted connection. One possible solution is running
rsync over ssh and authenticating using an ssh-key.
First you need to create an ssh-key on the backup host. This key will
late on be used to authenticate the backup host on the backed-up host:
ssh-keygen -t dsa -N '' -f /etc/rsbak3_key
Next we need to tell rsbak3 that it should use ssh for the connection
and use the key you've just created for authentication. Add the following
to the rsbak3 configuration
rsh-command ssh -o BatchMode=yes -i /etc/rsbak3_key
and make sure that you are not using the rsync protocol for connecting to
the backed-up host (one colon between host and directory, no rsync://).
See rsync(1) for details on the rsync path syntax.
Now the backed-up host. Instead of creating a /etc/rsyncd.conf and
enabling rsync to the inetd configuration, add the public key you've
created to the ~/.ssh/authorized_keys2 file of the user of your choice and
add the following options (see sshd(8) for details):
from="backup.example.com",command="/usr/local/sbin/rsb3swr",no-pty,no-port-forwarding public-key-from-rsbak3_key.pub
Note that in some cases you need to run rsyncd as root in order to read
all files on the filesystem - e.g. when doing a full backup of the
operating system.
Always test if it is only possible to run rsyncd using 'ssh -i
/etc/rsbak3_key'. E.g. logging in to a normal shell without a
password or transfering data to the host (i.e. wrong direction)
using rsync must not be possible!
Backing up Databases
Rsbak3 can also be used to back up databases using database dumps and
xdelta. This reuires xdelta and a gzip with support for the --rsyncable
option to be installed on the host beeing backed up. An additional small
shell script (rsbak3dump) is used on the host beeing backed up to create
and maintain a directory containing a current database dump and xdeltas
to earlier dumps. A cron-job must be set up to do this, somthing like:
0 2 * * * mysql rsbak3dump 'mysqldump test' /var/backups/testdb 15
The first parameter to rsbak3dump is the command which creates the
database dump as output. The 2nd parameter is the directory which should
contain the dumped data and the 3rd parameter is the number of xdeltas
which should be kept in the directory. Keep care that rsbak3dump has
finished before rsbak3 is backing up the directory! Just for the case that
rsbak3 is running simultaneous with rsbak3dump, it is reccomended to
exclude *.tmp in the rsbak3 configuration for the dump directory.
The trick with rsbak3dump is that the directory which is beeing backed up
provides already some kind of 'history information' with the xdeltas stored
there. So there is no need to keep multiple 1st generation backups on the
backup server. Insted the top-level generation keeps only one concurrent
backup which contains the current dump and the xdeltas. So the generations
entry on the server for the dumps would look like:
generations 1:7 4:0
Since each 'snapshot' of the backup directory contains one compressed
database dump and xdeltas, it's possible to restore every state of the
database in the last month by taking one dump and applying the xdeltas
as necessary. Since there are forward- as well as backward-deltas, it
is always possible to restore every state from two different dumps.
Backup Filesystem Structure
rsbak3 creates subdirectories for all backups using the name specified
in the configuration file as directory name. Those directories contain
subdirectories "generation_0" to "generation_9" (they are created on
demand - so they might not all be present) containing the backups as well
as a symbolic link "latest" which points to the latest backup directory.
The backup directories are named after the exact time when they have been
created, using the format "YYYYMMDD-HHMMSS.bak". An example tree would be:
/mnt/rsbak3
`- host1-home
| `- generation_0
| | `- 20030909-030012.bak
| | `- 20030910-030011.bak
| | `- 20030911-030018.bak
| | `- 20030913-030009.bak
| | `- 20030914-030010.bak
| | `- 20030915-030012.bak
| | `- 20030916-030014.bak
| | `- 20030917-030008.bak
| | `- 20030918-030010.bak
| |
| `- generation_1
| | `- 20030901-030007.bak
| | `- 20030908-030011.bak
| |
| `- latest -> generation_0/20030909-030012.bak
|
`- host1-system
| `- ...
|
`- host2
`- ...
Analysing Backup Directories
The additional tool rsbak3diff can be used to list the files which
have been modified, added or removed between two backups. This can be
very usefull to find out which part of the backup eats up all your
diskspace.
Known Bugs
By design, rsbak3 can not backup to filesystems which do not support
hard-links.
Also by design, rsbak3 is not very efficient (in terms of disk space
needs) for making backups of huge files which are often modified slightly,
like database files, log files or mbox files since rsbak3 is creating a
seperate copy of those files for every backup. Check if your database
has built-in support for incremental backups and if so, use it and exclude
the database files from your rsbak3 backup. Exclude logfiles from the
rsbak3 backup and archive the old logs when "roll over". And think about
switching from mbox to maildir for your mailboxes in order to keep the
diskusage overhead in your backup as small as possible.
It needs some testing if rsbak3 is compatibe with "rsync -H".
Author
Written by Clifford Wolf clifford@clifford.at at
LINBIT Information Technologies.
Original rsbak and rsbak2 have been written by Philipp Richter
philipp.richter@linbit.com.
Reporting Bugs
Report bugs to opensource@linbit.com.
Copyright
Copyright (c) 2003 Clifford Wolf. This is free software;
see the source for copying conditions. There is NO warranty;
not even for MERCHANTABILIT or FITNESS FOR A PARTICULAR PURPOSE.
See Also
rsync
1
,
ssh
1
,
mkreiserfs
8