ITEM: DN2428L

DCE: cannot replicate fileset to new DFS server


 PROBLEM: Customer has his core server, and original DFS server
(sp-f1n1e.catia.ftl.com).  He is trying to configure another DFS
server (alfdfs1) in the same domain.  alfdfs1 is also a slave
security server.

When they attempt to replicate root.dfs from sp-f1n1e to alfdfs1,
the replog claims to complete, but "fts lsheader" on alfdfs1 does
not show any filesets present.

He has also found an error in the Ftlog of:

SFTSERVER_Restore: exception while restore-terminating:
        (rpc_x_ss_pipe_comm_error) exception raised

*ACTION TAKEN: I could not find any hits on this.  I had him try to
add a fileset to the local aggregate on alf.  This was taking a large
amount of time.  Customer then realized that the preference was being
given to the fldb and file server on the other side of the WAN.  He
use "cm setpref" to change both of these.

He then added the new fileset to alfdfs1's aggregate.  This was still
taking an excessive amount of time, so I have him

dfstrace setset -set ftserver -cdsentry /.:/hosts/\/ftserver \\
        -active

He also noticed that dce_login was slow while trying to get into
cell_admin for above.  Around this time, the fileset\# was returned,
as well as where it is.  He saw a message about "dfs lost contact
with ??" but the smit command returned OK.  He did find the fileset
was created, and he was able to access it through dfs space (create
files, read them).

I had him activate the cm trace as well.  I then had him initiate
a release of the root.dfs, and then dump the logs on alf (kernal
and the ftserver logs) to a file.

Response:

This came back with "could not lock fldb entry, release terminated".
He though another colleague might be doing something.  Turns out
he was just adding an account.

*ACTION TAKEN: I noticed that the lsheader for alfdfs1 still does
not show an entry for the root.dfs.  "fts syncserv" did not
correct this and gave an error of:

Could not process FLDB entry for fileset root.dfs (572825647)
Error: no such fileset (dfs / fts)
Server alfdfs1.catia.ftl.com synchronized with FLDB

He also tried "fts rmsite" and "fts addsite" to try and get that
system to have a replica.  The commands gave no errors, but did
not correct the discrepency.

We continue to have problems with security services.  I seems that
it is aggravated by have DEBUG_SEC=1, but problems persist when
it is unset.  I also noticed that alfdfs1 resolves it own hostname
to an alias.

$ host `hostname`
alfcat.catia.ftl.com is 170.2.13.198,  
        Aliases:   alfdfs1.ftl.com, alfdfs1.catia.ftl.com

*ACTION PLAN: continue after meeting.

Response:

Electronic Response by Customer Kevin Turner

  After seeing your last entry in the item, I went ahead and fixed
the /etc/hosts file.  It should now correctly alias alfdfs1.catia.ftl.com
to alfdfs1. 

Kevin Turner

Response:

While on the system, I got:

dfs: lost contact with server 170.2.14.12 in cell: ftl.com

during a dce_login.  It was eventually successful, and I began
looking at the setup:

\# fts lsreplicas -fileset root.dfs -all

On engr600.catia.ftl.com:
   \

On alfdfs1.catia.ftl.com:
root.dfs, cell 268458746,,1879965916: src 0,,2 (dfs_ag01) (on sp-f1n1e.catia.ftl.com) => alfdfs1.catia.ftl.com 0,,2 (dfs_ag01)
   flags 0xc0, volstates 0.  NumKAs 0; lastKA sweep=Wed Dec 31 16:00:00 1969
   srcVV: 0,,0; curVV: 0,,0; WVT ID = 0,,0
   Lost token 113 ago; token expires 1258 hence; new version published 872721920 ago
   vvCurr 864937442.418705 (7784478 ago); vvPingCurr 872721807.010752 (113 ago)
   Last update attempt 872721378.038714 (542 ago); next scheduled attempt 872722232.010752 (312 hence)
   Status msg: LoseWVT: Lost WVT at 872721807: types 0 remain

\# fts lsreplicas -fileset root.dfs -all

On engr600.catia.ftl.com:
root.dfs, cell 268458746,,1879965916: src 0,,2 (dfs_ag01) (on sp-f1n1e.catia.ftl.com) => engr600.catia.ftl.com 0,,2 (dfs_ag01)
   flags 0xc8, volstates 0x10423106.  NumKAs 1; lastKA sweep=Wed Aug 27 15:36:00 1997
   srcVV: 0,,1173; curVV: 0,,1173; WVT ID = 0,,0
   Lost token 325 ago; token expires 1485 hence; new version published 872722267 ago
   vvCurr 872722192.154227 (75 ago); vvPingCurr 872721952.014405 (315 ago)
   Last update attempt 0.000000 (872722267 ago); next scheduled attempt 872721952.014271 (-315 hence)
   Status msg: StartImporting: Got VV ok

On alfdfs1.catia.ftl.com:
root.dfs, cell 268458746,,1879965916: src 0,,2 (dfs_ag01) (on sp-f1n1e.catia.ftl.com) => alfdfs1.catia.ftl.com 0,,2 (dfs_ag01)
   flags 0xc0, volstates 0.  NumKAs 0; lastKA sweep=Wed Dec 31 16:00:00 1969
   srcVV: 0,,0; curVV: 0,,0; WVT ID = 0,,0
   Lost token 461 ago; token expires 1764 hence; new version published 872722268 ago
   vvCurr 864937442.418705 (7784826 ago); vvPingCurr 872722232.006085 (36 ago)
   Last update attempt 872722232.032377 (36 ago); next scheduled attempt 0.010752 (-872722268 hence)
   Status msg: Calling FTSERVER_Forward() on primary

$ fts lsreplicas -fileset root.dfs -server /.:/hosts/alfdfs1.catia.ftl.com

root.dfs, cell 268458746,,1879965916: src 0,,2 (dfs_ag01) (on sp-f1n1e.catia.ftl.com) => alfdfs1.catia.ftl.com 0,,2 (dfs_ag01)
   flags 0xc0, volstates 0.  NumKAs 0; lastKA sweep=Wed Dec 31 16:00:00 1969
   srcVV: 0,,0; curVV: 0,,0; WVT ID = 0,,0
   Lost token 750 ago; token expires 1475 hence; new version published 872722557 ago
   vvCurr 864937442.418705 (7785115 ago); vvPingCurr 872722232.006085 (325 ago)
   Last update attempt 872722232.032377 (325 ago); next scheduled attempt 872722661.290611 (104 hence)
   Status msg: Finished dump for root.dfs; status is 382312470

$ fts lsheader -server /.:/hosts/alfdfs1.catia.ftl.com
Total filesets on server /.:/hosts/alfdfs1.catia.ftl.com aggregate dfs_ag01 (id 1): 1
test.kmt                 0,,2077 RW      9 K alloc      9 K quota On-line
Total filesets on-line 1; total off-line 0; total busy 0

Total number of filesets on server /.:/hosts/alfdfs1.catia.ftl.com: 1

Response:

$ fts lsreplicas -fileset root.dfs -server /.:/hosts/alfdfs1.catia.ftl.com
root.dfs, cell 268458746,,1879965916: src 0,,2 (dfs_ag01) (on sp-f1n1e.catia.ftl.com) => alfdfs1.catia.ftl.com 0,,2 (dfs_ag01)
   flags 0xc0, volstates 0.  NumKAs 0; lastKA sweep=Wed Dec 31 16:00:00 1969
   srcVV: 0,,0; curVV: 0,,0; WVT ID = 0,,0
   Lost token 68 ago; token expires 1303 hence; new version published 872722729 ago
   vvCurr 864937442.418705 (7785287 ago); vvPingCurr 872722661.010725 (68 ago)
   Last update attempt 872722232.032377 (497 ago); next scheduled attempt 872723167.010725 (438 hence)
   Status msg: LoseWVT: Lost WVT at 872722661: types 0 remain

*ACTION TAKEN: Customer decided to bring in a brand new system, also
AIX 4.2.1.  He installed DCE, and configured it with the same
components as alfdfs1.  He used a different system name, and ip
address.  When he tried to replicate root.dfs to this new system,
he got the same problem.  "lsreplica" shows the same lack of update
to that system as to alfdfs1.

He also created a new fileset, and then tried to replicate that
fileset to both alfdfs1 and the new system.  These both failed in
the same manner.  He was able to replicate this new fileset to the
engr600 system.  He noted that engr600 is an AIX 4.1.5 system.


Response:

The customer rebooted the dfs file server machine ,
alfdfs1.catia.ftl.com, that was having a problem with an
expired self cred.  On reboot, dce/dfs came up fine.
Kevin was still unable to replicate the root.dfs fileset
to this machine.
.Replication to other nodes is working fine.
.The /var/dce/dfs/RepLog showed the following error:
.97-Sep-02 13:06:33 0,,2:
  Starting full dump for root.dfs
97-Sep-02 13:10:42 0,,2:
  Finished dump for root.dfs; status is 382312470
97-Sep-02 13:11:13 0,,2:
  CheckVLDBRelationship: Replica is made moribund.
97-Sep-02 13:11:13 0,,2:
  Destroying replica
.The /var/dce/dfs/FtLog showed the following error:
.1997-Sep-02 13:10:39 SFTSERVER_Restore:
  Exception while restore-terminating:
  (rpc_x_ss_pipe_comm_error) exception raised
.We found that we could not ping the dfs file server
machine that holds the RW fileset for root.dfs with
a large packet size.  This could lead to the errors
in the logs.
.From alfdfs1.catia.ftl.com:
.Normal ping which sends 56 data bytes works.
.\# ping sp-f1n2e.catia.ftl.com
PING sp-f1n2e.catia.ftl.com: (170.2.14.12): 56 data bytes
64 bytes from 170.2.14.12: icmp_seq=0 ttl=254 time=3 ms
64 bytes from 170.2.14.12: icmp_seq=1 ttl=254 time=4 ms
64 bytes from 170.2.14.12: icmp_seq=2 ttl=254 time=3 ms
64 bytes from 170.2.14.12: icmp_seq=3 ttl=254 time=2 ms
\^C
----sp-f1n2e.catia.ftl.com PING Statistics----
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 2/3/4 ms
.But sending pings with a packet size of 2047 does not work.
.\# ping sp-f1n2e.catia.ftl.com 2047
PING sp-f1n2e.catia.ftl.com: (170.2.14.12): 2047 data bytes
\^C
----sp-f1n2e.catia.ftl.com PING Statistics----
188 packets transmitted, 0 packets received, 100% packet loss
.(we waited a while and finally CNTL-C'd out of the ping).
.Kevin is off trying to find a network support person
at his location to look into this.  He suspects a router
problem and will let Claudia know when it is resolved.
We will then try the replication of root.dfs to this
machine again.

Customer found that their router was the crux of the problem.
cwca.


Support Line: DCE: cannot replicate fileset to new DFS server ITEM: DN2428L
Dated: September 1997 Category: N/A
This HTML file was generated 99/06/24~13:30:15
Comments or suggestions? Contact us