Date: February 25, 2002
AIX Tip of the Week: Distributing and Synchronizing Data Files
The following table lists several techniques to distribute and synchronize data files across multiple hosts. I've ignored the many application and database specific techniques to keep this list manageable.
In general, the choice of technique depends on
· Data type (text, binary, database)
· Network protocol (Netbios, TCP/IP, SNA, …)
· Server mix (PC, Unix, Mainframe)
· Propagation delay
Editorial: Designing a data synchronization scheme is tougher than it looks. It looks easy because the initial data transfer is trivial. However, things get complicated when you consider problem recovery (failed transfers, data corruption, servers down, backout, etc). I've seen many projects fail when attempting elaborate data replication schemes that almost invariably become complex, run over schedule, over budget, and are expensive to maintain. Consequently, my "rule of thumb" is to centralize data whenever possible. When I need to distribute data, I choose the simplest technique that meets the minimum business requirement, and no more. KISS - Keep it Simple
Category |
Batch Copy |
Realtime Copy |
Hardware |
· Tape, CD, DVD, Portable Disk · SAN (Shark-Flash Copy, EMC SRDF) |
· SAN (Shark PPRC) · NAS - NFS file server in "hardware" |
Base AIX Operating System |
· uucp - copy over dial up connections · ftp, rcp - network · rdist - distributes files to multiple hosts · NIM - network installation manager · LVM - logical volume manager |
· NFS - network file system |
GNU (Shareware) |
· Rsync - similar to rdist, no rhost, bi-directional transfer |
· Samba - NT file/print server |
Middleware |
· MQ Series - queues file transfers, guaranteed any to any delivery. · Mainframe o NDM - network distribution mgr o LU6.2 - SNA file transfer o IND$FILE - 3270 PC file transfer |
· GeoRM - remote disk mirror · AIX Connections - NT file/print server |
AIX Connections - Optional LPP from IBM that provides NT file/print server functionality.
Email - an often-overlooked option, email can be used to distribute uuencoded data files. A client program on the remote server can unload the attached data in mail.
GeoRM - part of the AIX HACMP family of high availability software that provides a remote mirror of local disks. Updates are dynamic and can be done synchronously (write to remote disk first) or asynchronously (write to remote disks last).
LVM - functionally similar to EMC's Timefinder, AIX's Logical Volume Manager can mirror data to separate disks, then split off the disks into a new volume group, when can then be imported on another server on the same SSA Loop or SAN. The advantage is that this procedure can copy large data sets with minimal downtime and without congesting the network.
MQ Series - An IBM middleware product that stores and forwards data between a heterogeneous mix of servers (NT, Unix, Mainframe, …). Offers guaranteed delivery (if the remote server is down, the data is queued locally till it comes back up) and logging of all transfers.
NIM - Network Installation Manager. AIX specific. Distributes software, fixes, and data files.
rdist - Distributes identical copies of files to multiple hosts. The rdist command is one of the more popular ways of distributing files between Unix servers. Requires a ".rhost" file. (Consider "rsync" as alternative if your security policy forbids the use of ".rhost" files)
rsync - a GNU utility with functionality similar to rdist, except that it can run without ".rhost" file and that it can synchronize local and remote directories (bi-directional transfer). The rsync utility can be found on the AIX -Linux Bonus Pack CD, or downloaded from
http://www-1.ibm.com/servers/aix/products/aixos/linux/download.html
Split Mirror Copy - If you have multiple hosts on a SSA loop or SAN, you can copy large data sets by
uucp - Unix to Unix CoPy. Automates file transfer over simple dial-up connections.
Bruce Spencer
baspence@us.ibm.com
2/25/02