What I did to restore data from Japan's automated, DFS backups. 6/7/2002 Something happened to the /dfs/imgdata/EPA2____.idx file on 6-6-2002 @ 3:57 PM Japan time, so I wanted to restore the previous file and see what it looked like. The Japanese backup their DFS stuff nightly. Starting at root's crontab on ips03i (their DSM server), you'll see 0 21 * * 1-6 /usr/tivoli/tsm/sh/backup_dfs.sh 1>/dev/null 2>/dev/null /usr/tivoli/tsm/sh/backup_dfs.sh is a shell script that contains #! /bin/sh # masato-s # 2001_03_30 # # 2001_04_04 masato-s :added delbuta command # 2001_05_01 masato-s :added check out buta running # 2001_09_10 masato-s :added backup family for USimages,WOimages. # 2001_10_25 masato-s :modification of delbuta section # # bak lsftf: as of Mon Sep 10 11:43:36 JST 2001 # Fileset family family1: # Entry 1: server .*, aggregate .*, filesets: convdata # Entry 2: server .*, aggregate .*, filesets: fixdist # Entry 3: server .*, aggregate .*, filesets: imgdata # Entry 4: server .*, aggregate .*, filesets: prod # Entry 5: server .*, aggregate .*, filesets: tools # Entry 6: server .*, aggregate .*, filesets: root\.dfs\.readonly # Entry 7: server .*, aggregate .*, filesets: u\..* # # Fileset family family2: # Entry 1: server .*, aggregate .*, filesets: images\.US\..* # # Fileset family family3: # Entry 1: server .*, aggregate .*, filesets: images\.WO\..* # # Fileset family family4: # Entry 1: server .*, aggregate .*, filesets: cache\..* # DAYOFTHEWEEK=`date|awk '{print$1}'` command='/var/dce/dfs/buta/buta.sh' command_sun='bak dump -family family1 -level /sunday -tcid 1' #command_wee='bak dump -family family1 -level /sunday/weekday -noaction' command_wee='bak dump -family family1 -level /sunday/weekday -tcid 1' command_USimages='bak dump -family family2 -level /sunday -tcid 1' command_WOimages='bak dump -family family3 -level /sunday -tcid 1' ## check out buta running if ps -ef | egrep -v grep | egrep -q buta then continue else dce_login bkadmin ips0 -exec $command sleep 30 fi case $DAYOFTHEWEEK in Mon|Tue|Wed|Thu|Fri) dce_login bkadmin ips0 -exec $command_wee ;; Sat) dce_login bkadmin ips0 -exec $command_sun dce_login bkadmin ips0 -exec $command_USimages ;; *) exit 0 ;; esac # delbuta section: # You can get more informtion about delbuta. Please refer to below ! # /var/dce/dfs/buta/README.DFS.DELBUTA # http://pucc.princeton.edu/~gretchen/adsm/adsmafs/cp5ecf47.htm delbdir="/var/dce/dfs/buta" dmplist="$delbdir/dumpinfo" dmplist1="$delbdir/dumpinfo1" delbuta="$delbdir/delbuta -f $dmplist1 -a 5" dumpinfo="/usr/bin/bak dumpinfo -ndumps 50" if [ $DAYOFTHEWEEK = 'Mon' ];then if [ -r $dmplist ];then if [ -r $dmplist1 ];then dce_login bkadmin ips0 -exec $delbuta mv $dmplist1 $dmplist1.old mv $dmplist $dmplist1 dce_login bkadmin ips0 -exec $dumpinfo > $dmplist else mv $dmplist $dmplist1 dce_login bkadmin ips0 -exec $dumpinfo > $dmplist fi else dce_login bkadmin ips0 -exec $dumpinfo > $dmplist exit 0 fi fi You can read through the help text for the DFS bak command via "bak help". As root@ips03i, I did a dce_login bkadmin ips0 and bak dumpinfo (/usr/bin/bak -> /usr/lpp/dce/bin/bak, which is from the dce.dfs_server.rte fileset) and saw DumpID parentID lvl created nt nfsets dump_name --------------------------------------------------------------- 1022673606 1022328009 1 05/29/2002 21:00 1 23 family1.weekday 1022760005 0 0 05/30/2002 21:00 1 23 family1.weekday 1022846405 0 0 05/31/2002 21:00 1 23 family1.weekday 1022932808 0 0 06/01/2002 21:00 1 23 family1.sunday 1022968084 0 0 06/02/2002 06:48 1 100 family2.sunday 1023105605 1022932808 1 06/03/2002 21:00 1 23 family1.weekday 1023192005 1022932808 1 06/04/2002 21:00 1 23 family1.weekday 1023278406 1022932808 1 06/05/2002 21:00 1 23 family1.weekday 1023364806 1022932808 1 06/06/2002 21:00 1 23 family1.weekday 1023451206 1022932808 1 06/07/2002 21:00 1 23 family1.weekday The dump I wanted was 1023278406, because it started at 06/05/2002 21:00, before the incident. A bak dumpinfo -id 1023278406 command confirmed that the imgdata fileset was in there. Dump: id 1023278406, level 1, filesets 23, created: Wed Jun 5 21:00:06 2002 Tape: name family1.weekday.1 nFilesets 23, created 06/05/2002 21:00 Pos Clone time Nbytes Fileset 0 06/05/2002 21:00 2369 u.ipcc 0 06/05/2002 21:00 704 u.edward 1 06/05/2002 21:00 3470 u.ibmse 2 06/05/2002 21:00 6065 u.ayako 3 06/05/2002 21:00 79948 fixdist 4 06/05/2002 21:00 18075 tools 5 06/05/2002 21:00 48824 imgdata 6 06/05/2002 21:00 1311317 prod 7 06/05/2002 21:00 3004 u.chisa 8 06/05/2002 21:00 608 u.commerce 9 06/05/2002 21:00 608 u.dbsys03 10 06/05/2002 21:00 608 u.dbsys01 11 06/05/2002 21:00 2314 u.kevin 12 06/05/2002 21:00 608 u.eric 13 06/05/2002 21:00 608 u.rebecca 14 06/05/2002 21:00 608 u.tom 15 06/05/2002 21:00 1295 u.cht 16 06/05/2002 21:00 608 u.bruce 17 06/05/2002 21:00 1352 u.sander 18 06/05/2002 21:00 40690 u.jasper 19 03/02/2002 05:40 14361 root.dfs.readonly 20 06/05/2002 21:03 736 u.azam 21 06/05/2002 21:03 33413607 convdata A fts lsfldb imgdata command, told me the imgdata fileset was on ips07i's dfs7c aggregate. To keep things separate, I restored the imgdata fileset to ips01i's dfs1c aggregate. I first checked things out using the -noaction option, ala bak restoreft -server ips01i -aggregate dfs1c -fileset imgdata -extension raj -date 06/06/02 -noaction Starting restore Restore fileset imgdata(ID 56) from tape family1.sunday.1, dump id 1022932808, position 5 as imgdataraj. Restore fileset imgdata(ID 56) from tape family1.weekday.1, dump id 1023278406, position 5 as imgdataraj. and it seemed to be doing the right thing. So to restore, I tried bak restoreft -server ips01i -aggregate dfs1c -fileset imgdata -extension raj -date 06/06/02 but got bak: No butc connection handles available (dfs / bak) ; Unable to connect to tape coordinator at port 0 I needed the -tcid 1, which is used in the backup script above. bak restoreft -server ips01i -aggregate dfs1c -fileset imgdata -extension raj -date 06/06/02 -tcid 1 Note the raj extenstion, so the new fileset will be imgdataraj. 15 minutes later, I got Job 1: Restore finished and a fts lsfldb imgdataraj command showed me my new/old fileset. imgdataraj readWrite ID 0,,1808 valid readOnly ID 0,,1809 invalid backup ID 0,,1810 invalid number of sites: 1 server flags aggr siteAge principal owner ips01i RW dfs1c 0:00:00 hosts/ips01i A simple fts crm /dfs/.rw/imgdataraj imgdataraj command allowed me to see the data at /dfs/.rw/imgdataraj. ===================================================================================== Steps I Did to Get Dale's DFS Backup Stuff Implemented 1/21/2000 1) Get TCL out there. On Almaden side, cd /dfs/rs_aix42/local tar cvf ~/Dales_usr_local.tar . Got the bin & lib directories, about 3 MB worth. On Patent Server side, dce_login jasper cd /dfs/.rw/admin/bin mkdir backup On Almaden side again, scp -p Dales_usr_local.tar root@ar0071e0:/dfs/.rw/admin/bin/backup On Patent Server side, for each DFS server, mkdir /usr/local cd /usr/local tar xvf /dfs/.rw/admin/bin/backup/Dales_usr_local.tar chown -R jasper:www * To test, export TCL_LIBRARY=/usr/local/lib/tcl export TCLX_LIBRARY=/usr/local/lib/tcl /usr/local/bin/tclxsh should get you into the TCL environment. Type exit to leave. 2) Create /dumpfs directory structure. Dale keeps files in these "status" directories as the dump/archive process runs. The movement of these files as filesets are being processed, are ./to-do Fileset names to be dumped. ./stat-failed ./dumping Fileset names in process of being dumped. ./dump-failed If the dump fails for some reason. ./dumped Fileset names ready to be archived. ./archiving Fileset names which are being archived right now. ./archive-failed Fileset names which have failed ./archive_checking Fileset names which are being archived checked. ./archive-check-failed If the above archive_checking step finds a problem. ./archive_checked Fileset names which have been verified as archived. ./archived Fileset names which have been archived. The other directories used are ./mmddyyyy Console & log files (e.g. dsmerror.log). ./data Data from the dumped filesets. This directory/file system needs to be as big as your biggest fileset, times the number of dumps you may be running at one time. There's also these files, ./etc/config.tcl configuration file (tcl syntax only) ./status Status file to indicate dumps are running. Contains PID of main controlling process. There's also another log file at /var/spool/logs/backup.log log file From Almaden side, scp ~dale/dumpfs.tar backup@ar0071e0:/dfs/.rw/admin/bin/backup/Dales_dumpfs.tar On each DFS server, a) Create small /dumpfs file system. Made it 200 MB. mount /dumpfs b) Create big /dumpfs/data file system. Made it 1024 MB. mount /dumpfs/data c) Create directory structure. cd /dumpfs tar xvf /dfs/.rw/admin/bin/backup/Dales_dumpfs.tar d) mkdir /var/spool/logs 3) Get backup TCL scripts. On Patent Server side, dce_login jasper cd /dfs/.rw/admin/bin/backup ln -s Abort.tcl Pause.tcl On Almaden side, cd /dfs/projects/afsadmin/backup scp -p Abort.tcl jasper@ar0071e0:/dfs/.rw/admin/bin/backup scp -p retry_backup.tcl jasper@ar0071e0:/dfs/.rw/admin/bin/backup cd /dfs/projects/dceadmin/backup/bin/errno.tcl" scp -p errno.tcl jasper@ar0071e0:/dfs/.rw/admin/bin/backup scp -p backups-daily jasper@ar0071e0:/dfs/.rw/admin/bin/backup scp -p backups-friday jasper@ar0071e0:/dfs/.rw/admin/bin/backup cd /dfs/projects/dceadmin/lib scp -p foxGetOpt.tcl jasper@ar0071e0:/dfs/.rw/admin/bin/backup scp -p foxTypedOpts.tcl jasper@ar0071e0:/dfs/.rw/admin/bin/backup Dale "sources" these two things in his TCL scripts in order to parse command line arguments and options. cd /dfs/projects/afsadmin/vsc scp -p tools.tcl jasper@ar0071e0:/dfs/.rw/admin/bin/backup Dale "sources" this file from the Abort.tcl script, in order to get his nifty exec_command procedure. It has a lot of other junk that we don't need, but I just left it in there for now. 4) Tweak them for the Patent server setup. On Patent server side, cd /dfs/.rw/admin/bin/backup a) In Abort.tcl, change the source line on line 8 from source /:/projects/afsadmin/vsc/tools.tcl to source /:/admin/bin/backup/tools.tcl b) Make the same change to the 3 source lines in retry_backup.tcl around lines 1077-1079. c) Change backups-daily & backups-weekly to call retry_backup.tcl from /dfs/projects/afsadmin/backup. They were calling /:/projects/dceadmin/backup/bin/all_backup.tcl, which was a link to /:/projects/afsadmin/backup/retry_backup.tcl (pretty convuluted). Note: All these 2 scripts are, are front-ends to retry_backup.tcl. They export a couple of TCL library environment variables, then call retry_backup.tcl with different options. The options to the retry_backup.tcl script are -l dump level, either 0 or 1. -d debug level 1 or 2. -c whether or not to do the DFS clonesys command. -r permit restart. -m ADSM management class -s system type - DFS or AFS (DFS is default) The weekly one says to do a level 0 dump, the daily one, a level 1 dump. d) Change the following lines in /dumpfs/etc/config.tcl, - ndumps from 3 to 1 (these machines have single processors) - someone_who_cares to jasper, instead of dale. - paging_program to /dfs/.rw/admin/bin/backup/gscpage-substitute which I created to have simply, #!/bin/ksh echo "$(/usr/bin/date +"%D %H:%M:%S"): $*" >> /tmp/dfs-backups.log It was /usr/local/bin/gscpage, but it turns out that gscpage uses a different TCP port (8800) to get to the gscserver, than gscsend does, and that port wasn't open through patgate. Sigh. I may address this further if I see a need to send GSC events. If I do, it looks like I might use the DCEBackupsFailure & DCEBackupsOK events, already defined to GSC, and all I need do is either modify the logit_exit & tell_Dale subroutines in /dfs/projects/afsadmin/backup/retry_backup.tcl. e) mkdir /var/spool/logs 5) Define a userid which will run these scripts. Dale has dfsback defined on the Almaden side, that is in the subsys/dce/dfs-admin DCE group. Dale also has a keytab file with an entry set up for dfsback on each DFS server. a) First determine a free uid. A simple script like for i in $(dcecp -c princ cat | sed 's?/.../patent.ibm.com/??') do echo "Userid $i is uid # $(dcecp -c principal show $i | grep '{uid' | sed 's?.*{uid \([0-9]*\)}?\1?')" done will return the list of userids and corresponding uid's for the cell. I decided to use userid=dfsback (same as Dale) and uid=1505. b) dcecp dcecp> principal create dfsback -uid 1505 dcecp> group add none -member dfsback dcecp> organization add none -member dfsback dcecp> account create dfsback -group none -organization none -password temp4now dcecp> account modify dfsback -description {Userid used for DFS Backups} dcecp> group add subsys/dce/dfs-admin -member dfsback dcecp> quit I was then able to dce_login as dfsback, change the password, etc. c) On each DFS server, create a key table entry for dfsback using the new password. rgy_edit and once under there, use these subcommands, ktlist This will tell you what keys exist now in your default key table, which is /krb5/v5srvtab. ktadd -p dfsback Answer the password prompt with dfsback's DCE password. quit To exit the program.