What I did to restore data from Japan's automated, DFS backups.      6/7/2002

Something happened to the /dfs/imgdata/EPA2____.idx file on 6-6-2002 @ 3:57 PM Japan
time, so I wanted to restore the previous file and see what it looked like.

The Japanese backup their DFS stuff nightly.  Starting at root's crontab on ips03i
(their DSM server), you'll see
0 21 * * 1-6 /usr/tivoli/tsm/sh/backup_dfs.sh 1>/dev/null 2>/dev/null

/usr/tivoli/tsm/sh/backup_dfs.sh is a shell script that contains 
#! /bin/sh
#       masato-s
#       2001_03_30
#
#       2001_04_04 masato-s :added delbuta command
#       2001_05_01 masato-s :added check out buta running
#       2001_09_10 masato-s :added backup family for USimages,WOimages. 
#       2001_10_25 masato-s :modification of delbuta section
#       
#       bak lsftf: as of Mon Sep 10 11:43:36 JST 2001
#       Fileset family family1:
#           Entry   1: server .*, aggregate .*, filesets: convdata
#           Entry   2: server .*, aggregate .*, filesets: fixdist
#           Entry   3: server .*, aggregate .*, filesets: imgdata
#           Entry   4: server .*, aggregate .*, filesets: prod
#           Entry   5: server .*, aggregate .*, filesets: tools
#           Entry   6: server .*, aggregate .*, filesets: root\.dfs\.readonly
#           Entry   7: server .*, aggregate .*, filesets: u\..*
#
#       Fileset family family2:
#           Entry   1: server .*, aggregate .*, filesets: images\.US\..*
#
#       Fileset family family3:
#           Entry   1: server .*, aggregate .*, filesets: images\.WO\..*
#
#       Fileset family family4:
#           Entry   1: server .*, aggregate .*, filesets: cache\..*
#

DAYOFTHEWEEK=`date|awk '{print$1}'`
command='/var/dce/dfs/buta/buta.sh'
command_sun='bak dump -family family1 -level /sunday -tcid 1'
#command_wee='bak dump -family family1 -level /sunday/weekday -noaction'
command_wee='bak dump -family family1 -level /sunday/weekday -tcid 1'
command_USimages='bak dump -family family2 -level /sunday -tcid 1'
command_WOimages='bak dump -family family3 -level /sunday -tcid 1'


##      check out buta running

if ps -ef | egrep -v grep | egrep -q buta
        then
        continue
        else
        dce_login bkadmin ips0 -exec $command
        sleep 30
fi

case $DAYOFTHEWEEK in
        Mon|Tue|Wed|Thu|Fri)
        dce_login bkadmin ips0 -exec $command_wee
        ;;
        Sat)
        dce_login bkadmin ips0 -exec $command_sun
        dce_login bkadmin ips0 -exec $command_USimages
        ;;
        *)
        exit 0
        ;;
esac

# delbuta section:
# You can get more informtion about delbuta. Please refer to below !
# /var/dce/dfs/buta/README.DFS.DELBUTA
# http://pucc.princeton.edu/~gretchen/adsm/adsmafs/cp5ecf47.htm

delbdir="/var/dce/dfs/buta"
dmplist="$delbdir/dumpinfo"
dmplist1="$delbdir/dumpinfo1"
delbuta="$delbdir/delbuta -f $dmplist1 -a 5"
dumpinfo="/usr/bin/bak dumpinfo -ndumps 50"

if [ $DAYOFTHEWEEK = 'Mon' ];then
        if [ -r $dmplist ];then
                if [ -r $dmplist1 ];then
                                dce_login bkadmin ips0 -exec $delbuta
                                mv $dmplist1 $dmplist1.old
                                mv $dmplist $dmplist1
                                dce_login bkadmin ips0 -exec $dumpinfo > $dmplist
                else
                        mv $dmplist $dmplist1
                        dce_login bkadmin ips0 -exec $dumpinfo > $dmplist
                fi
        else
                dce_login bkadmin ips0 -exec $dumpinfo > $dmplist
                exit 0
        fi
fi      

You can read through the help text for the DFS bak command via "bak help".

As root@ips03i, I did a
    dce_login bkadmin ips0               
and bak dumpinfo (/usr/bin/bak -> /usr/lpp/dce/bin/bak, which is from the
                                                   dce.dfs_server.rte fileset)

and saw
    DumpID   parentID lvl     created       nt nfsets dump_name
---------------------------------------------------------------
1022673606 1022328009   1  05/29/2002 21:00  1     23 family1.weekday
1022760005          0   0  05/30/2002 21:00  1     23 family1.weekday
1022846405          0   0  05/31/2002 21:00  1     23 family1.weekday
1022932808          0   0  06/01/2002 21:00  1     23 family1.sunday
1022968084          0   0  06/02/2002 06:48  1    100 family2.sunday
1023105605 1022932808   1  06/03/2002 21:00  1     23 family1.weekday
1023192005 1022932808   1  06/04/2002 21:00  1     23 family1.weekday
1023278406 1022932808   1  06/05/2002 21:00  1     23 family1.weekday
1023364806 1022932808   1  06/06/2002 21:00  1     23 family1.weekday
1023451206 1022932808   1  06/07/2002 21:00  1     23 family1.weekday

The dump I wanted was 1023278406, because it started at 06/05/2002 21:00, before
the incident.  A bak dumpinfo -id 1023278406 command confirmed that the imgdata 
fileset was in there.
Dump: id 1023278406, level 1, filesets 23, created: Wed Jun  5 21:00:06 2002

Tape: name family1.weekday.1 nFilesets 23, created 06/05/2002 21:00

 Pos       Clone time               Nbytes Fileset
   0 06/05/2002 21:00                 2369 u.ipcc
   0 06/05/2002 21:00                  704 u.edward
   1 06/05/2002 21:00                 3470 u.ibmse
   2 06/05/2002 21:00                 6065 u.ayako
   3 06/05/2002 21:00                79948 fixdist
   4 06/05/2002 21:00                18075 tools
   5 06/05/2002 21:00                48824 imgdata
   6 06/05/2002 21:00              1311317 prod
   7 06/05/2002 21:00                 3004 u.chisa
   8 06/05/2002 21:00                  608 u.commerce
   9 06/05/2002 21:00                  608 u.dbsys03
  10 06/05/2002 21:00                  608 u.dbsys01
  11 06/05/2002 21:00                 2314 u.kevin
  12 06/05/2002 21:00                  608 u.eric
  13 06/05/2002 21:00                  608 u.rebecca
  14 06/05/2002 21:00                  608 u.tom
  15 06/05/2002 21:00                 1295 u.cht
  16 06/05/2002 21:00                  608 u.bruce
  17 06/05/2002 21:00                 1352 u.sander
  18 06/05/2002 21:00                40690 u.jasper
  19 03/02/2002 05:40                14361 root.dfs.readonly
  20 06/05/2002 21:03                  736 u.azam
  21 06/05/2002 21:03             33413607 convdata

A fts lsfldb imgdata command, told me the imgdata fileset was on ips07i's dfs7c
aggregate.  To keep things separate, I restored the imgdata fileset to ips01i's
dfs1c aggregate.

I first checked things out using the -noaction option, ala
    bak restoreft -server ips01i -aggregate dfs1c -fileset imgdata -extension raj -date 06/06/02 -noaction
Starting restore
Restore fileset imgdata(ID 56) from tape family1.sunday.1, dump id 1022932808, position 5 as imgdataraj.
Restore fileset imgdata(ID 56) from tape family1.weekday.1, dump id 1023278406, position 5 as imgdataraj.

and it seemed to be doing the right thing.  So to restore, I tried
    bak restoreft -server ips01i -aggregate dfs1c -fileset imgdata -extension raj -date 06/06/02          
but got
    bak: No butc connection handles available (dfs / bak) ; Unable to connect to tape coordinator at port 0

I needed the -tcid 1, which is used in the backup script above.
    bak restoreft -server ips01i -aggregate dfs1c -fileset imgdata -extension raj -date 06/06/02 -tcid 1
Note the raj extenstion, so the new fileset will be imgdataraj.  15 minutes later, I got
    Job 1: Restore finished
and a
    fts lsfldb imgdataraj
command showed me my new/old fileset.
imgdataraj
        readWrite   ID 0,,1808  valid
        readOnly    ID 0,,1809  invalid
        backup      ID 0,,1810  invalid
number of sites: 1
   server           flags     aggr   siteAge principal      owner
ips01i              RW       dfs1c   0:00:00 hosts/ips01i    <nil>


A simple fts crm /dfs/.rw/imgdataraj imgdataraj command allowed me to see the data at
/dfs/.rw/imgdataraj.


=====================================================================================
  Steps I Did to Get Dale's DFS Backup Stuff Implemented              1/21/2000

1)  Get TCL out there.  On Almaden side,
      cd /dfs/rs_aix42/local
      tar cvf ~/Dales_usr_local.tar .
    Got the bin & lib directories, about 3 MB worth.

    On Patent Server side,
      dce_login jasper
      cd /dfs/.rw/admin/bin
      mkdir backup

    On Almaden side again,
      scp -p Dales_usr_local.tar root@ar0071e0:/dfs/.rw/admin/bin/backup

    On Patent Server side, for each DFS server,
      mkdir /usr/local
      cd    /usr/local
      tar xvf /dfs/.rw/admin/bin/backup/Dales_usr_local.tar       
      chown -R jasper:www *

    To test,
      export  TCL_LIBRARY=/usr/local/lib/tcl
      export TCLX_LIBRARY=/usr/local/lib/tcl
      /usr/local/bin/tclxsh
    should get you into the TCL environment.  Type exit to leave.

2)  Create /dumpfs directory structure.  Dale keeps files in these "status"
    directories as the dump/archive process runs.  The movement of these files as
    filesets are being processed, are
      ./to-do                 Fileset names to be dumped.
      ./stat-failed
      ./dumping               Fileset names in process of being dumped.
      ./dump-failed           If the dump fails for some reason.
      ./dumped                Fileset names ready to be archived.
    
      ./archiving             Fileset names which are being archived right now.
      ./archive-failed        Fileset names which have failed
      ./archive_checking      Fileset names which are being archived checked.
      ./archive-check-failed  If the above archive_checking step finds a problem.
      ./archive_checked       Fileset names which have been verified as archived.
      ./archived              Fileset names which have been archived.

    The other directories used are
      ./mmddyyyy              Console & log files (e.g. dsmerror.log).
      ./data                  Data from the dumped filesets.  This directory/file
                              system needs to be as big as your biggest fileset,
                              times the number of dumps you may be running at one time.

    There's also these files,
      ./etc/config.tcl        configuration file (tcl syntax only)
      ./status                Status file to indicate dumps are running.  Contains PID
                              of main controlling process.

    There's also another log file at
      /var/spool/logs/backup.log    log file

    From Almaden side,
      scp ~dale/dumpfs.tar backup@ar0071e0:/dfs/.rw/admin/bin/backup/Dales_dumpfs.tar

    On each DFS server,
    a)  Create small /dumpfs file system.  Made it 200 MB.
        mount /dumpfs
    b)  Create big /dumpfs/data file system.  Made it 1024 MB.
        mount /dumpfs/data
    c)  Create directory structure.
        cd /dumpfs
        tar xvf /dfs/.rw/admin/bin/backup/Dales_dumpfs.tar
    d)  mkdir /var/spool/logs

3)  Get backup TCL scripts.  On Patent Server side,
      dce_login jasper
      cd /dfs/.rw/admin/bin/backup
      ln -s Abort.tcl Pause.tcl

    On Almaden side,
      cd /dfs/projects/afsadmin/backup
      scp -p Abort.tcl        jasper@ar0071e0:/dfs/.rw/admin/bin/backup
      scp -p retry_backup.tcl jasper@ar0071e0:/dfs/.rw/admin/bin/backup

      cd /dfs/projects/dceadmin/backup/bin/errno.tcl"
      scp -p errno.tcl        jasper@ar0071e0:/dfs/.rw/admin/bin/backup
      scp -p backups-daily    jasper@ar0071e0:/dfs/.rw/admin/bin/backup
      scp -p backups-friday   jasper@ar0071e0:/dfs/.rw/admin/bin/backup

      cd /dfs/projects/dceadmin/lib
      scp -p foxGetOpt.tcl    jasper@ar0071e0:/dfs/.rw/admin/bin/backup
      scp -p foxTypedOpts.tcl jasper@ar0071e0:/dfs/.rw/admin/bin/backup
         Dale "sources" these two things in his TCL scripts in order to parse
         command line arguments and options.

      cd /dfs/projects/afsadmin/vsc
      scp -p tools.tcl        jasper@ar0071e0:/dfs/.rw/admin/bin/backup
         Dale "sources" this file from the Abort.tcl script, in order to
         get his nifty exec_command procedure.  It has a lot of other junk
         that we don't need, but I just left it in there for now.

4)  Tweak them for the Patent server setup.  On Patent server side,
    cd /dfs/.rw/admin/bin/backup
    a)  In Abort.tcl, change the source line on line 8
        from source /:/projects/afsadmin/vsc/tools.tcl
          to source /:/admin/bin/backup/tools.tcl
    b)  Make the same change to the 3 source lines in retry_backup.tcl
        around lines 1077-1079.
    c)  Change backups-daily & backups-weekly to call retry_backup.tcl
        from /dfs/projects/afsadmin/backup.  They were calling
        /:/projects/dceadmin/backup/bin/all_backup.tcl, which was a link to
        /:/projects/afsadmin/backup/retry_backup.tcl (pretty convuluted).

        Note:  All these 2 scripts are, are front-ends to retry_backup.tcl.
               They export a couple of TCL library environment variables,
               then call retry_backup.tcl with different options.
               The options to the retry_backup.tcl script are
                 -l dump level, either 0 or 1.
                 -d debug level 1 or 2.
                 -c whether or not to do the DFS clonesys command.
                 -r permit restart.
                 -m ADSM management class
                 -s system type - DFS or AFS (DFS is default)
               The weekly one says to do a level 0 dump, the daily one,
               a level 1 dump.
    d)  Change the following lines in /dumpfs/etc/config.tcl,
        - ndumps from 3 to 1 (these machines have single processors)
        - someone_who_cares to jasper, instead of dale.
        - paging_program to /dfs/.rw/admin/bin/backup/gscpage-substitute
          which I created to have simply,
            #!/bin/ksh
            echo "$(/usr/bin/date +"%D %H:%M:%S"): $*" >> /tmp/dfs-backups.log
          It was /usr/local/bin/gscpage, but it turns out that gscpage uses a
          different TCP port (8800) to get to the gscserver, than gscsend does,
          and that port wasn't open through patgate.  Sigh.  I may address
          this further if I see a need to send GSC events.  If I do, it looks
          like I might use the DCEBackupsFailure & DCEBackupsOK events, already
          defined to GSC, and all I need do is either modify the logit_exit &
          tell_Dale subroutines in /dfs/projects/afsadmin/backup/retry_backup.tcl.

    e) mkdir /var/spool/logs


5)  Define a userid which will run these scripts.  Dale has dfsback defined on the
    Almaden side, that is in the subsys/dce/dfs-admin DCE group.  Dale also has a
    keytab file with an entry set up for dfsback on each DFS server.

    a)  First determine a free uid.  A simple script like
          for i in $(dcecp -c princ cat | sed 's?/.../patent.ibm.com/??')
          do
             echo "Userid $i is uid # $(dcecp -c principal show $i | grep '{uid' | sed 's?.*{uid \([0-9]*\)}?\1?')"
          done
        will return the list of userids and corresponding uid's for the cell.
        I decided to use userid=dfsback (same as Dale) and uid=1505.
     b) dcecp
           dcecp> principal create dfsback -uid 1505
           dcecp> group add none -member dfsback
           dcecp> organization add none -member dfsback
           dcecp> account create dfsback -group none -organization none -password temp4now
           dcecp> account modify dfsback -description {Userid used for DFS Backups}
           dcecp> group add subsys/dce/dfs-admin -member dfsback
           dcecp> quit
        I was then able to dce_login as dfsback, change the password, etc. 
     c) On each DFS server, create a key table entry for dfsback using the new password.
           rgy_edit
        and once under there, use these subcommands,
           ktlist               This will tell you what keys exist now in your default
                                key table, which is /krb5/v5srvtab.
           ktadd -p dfsback     Answer the password prompt with dfsback's DCE password.
           quit                 To exit the program.