QUESTION:
My customer is trying to use a PC running Sun's PC-NFS as a
workstation to his RS/6000s. He has encountered a problem wherein
PC-NFS appears to 'time out'. The specific scenario is as follows:
Before leaving in the evening, he reboots the PC, and verifies
that he can ping the RS/6000s and vice versa. Some time later,
(often the next morning), he finds that he is unable to ping
from the PC or vice versa. The only way to recover is to reboot.
His PC-NFS configuration is very simple, no nfs files, no
access of print services, only tcp/ip connection.
Customer has called Sun technical support and they have told him
that this is because PC-NFS does not support keepalive packets.
While I have read a number of items on keepalive, I am not
very familiar at this level of tcp/ip. My questions are:
1. Given that the keepalive and keepidle are associated with
sockets, does keepalive come into play in the scenario above,
where there are no applications (e.g. telnet, ftp, etc)
running?
2. What do these keepalive packets look like on an iptrace/ipreport?
Your comments are appreciated.
---------- ---------- ---------- --------- ---------- ----------
A: I will address your questions in the order you asked them.
1) Correct. The tcp_keepintvl and tcp_keepidle parameters of the "no"
command modify the timing of TCP keep-alive packets. If there
is no TCP connection between the PC and the RISC, then keep-
alive packets are not issued by the RISC to the PC.
2) TCP keep-alive packets are defined in section 4.2.3.6 of RFC 1122
as follows:
"(The) keep-alive mechanism...confirm(s) that an idle
connection is still active...(by) send(ing) a probe segment
designed to elicit a response from the peer TCP. Such a
segment generally contains SEG.SEQ = SND.NXT-1 and may or may
not contain one garbage octet of data. Note that on a quiet
connection SND.NXT = RCV.NXT, so that this SEG.SEQ will be
outside the window. Therefore, the probe causes the receiver
to return an acknowledgment segment, confirming that the
connection is still live. If the peer has dropped the
connection due to a network partition or a crash, it will
respond with a RST instead of an acknowledgment segment."
In other words, in an iptrace/ipreport, the keep-alive packet
will contain no data, and it will have the same sequence number
as the previous packet on a given TCP connection.
I have appended below a portion of an iptrace I ran which shows
several TCP keep-alive packets. On host "festere", I set the
tcp_keepidle to 120 half-seconds (60 seconds). I then telneted
from host "u2e" to "festere". Because reading an iptrace is
tedious, allow me to describe the trace as though it were a
dialog. (You may want to print out this item so that you can
compare the dialog to the trace more easily.) I added packet
numbers to the trace for easier reference.
(We join our heros soon after a user logged into festere from u2e.)
u2e: "What is the time, please?" (Request packets not shown.)
festere: "Today is Thursday, 24 February 1994, and the time is
16:07:41 CST." (packet 1)
u2e: "Thank you. My sequence number is th_seq=ef97926a. I expect
your next message to have sequence number th_ack=1737cc59."
(packet 2)
(One minute later, at 16:08:41...)
festere: "Hello u2e. My sequence number is th_seq=1737cc58. Are you
still there?" (packet 3)
u2e: "Yes, I am still here (th_seq=ef97926a). Didn't you already send
me a message with sequence number 1737cc58?" (packet 4)
(One minute later, at 16:09:41...)
festere: "Hi u2e. Just wanted to check up on you (th_seq=1737cc58).
Please let me know you are OK (th_ack=ef97926a)." (packet 5)
u2e: "I'm not going anywhere (th_seq=ef97926a) until you send me the
right sequence number (th_ack=1737cc59)." (packet 6)
(One minute later, at 16:10:41...)
festere: "Yo, u2e (th_seq=1737cc58)¢ Wassup (th_ack=ef97926a)?"
(packet 7)
u2e: "Hey mon (th_seq=ef97926a), I'm like the bunny. I keep going,
and going...unlike you (th_ack=1737cc59)." (packet 8)
(Ad infinitum...)
Note that if u2e never responded (for example, if I had shut down
the Ethernet interface), festere would stop transmitting packets
after 8 tries. Each try would be separated by the value of the
tcp_keepintvl parameter.
*** PACKET 1 ***
=====( packet transmitted on interface en0 )=====Thu Feb 24 16:07:41 1994
ETHERNET packet: . 02:60:8c:2f:31:4e -> 02:60:8c:2e:bb:11 . type 800 (IP)
IP header breakdown:
< SRC = 9.3.6.35 > (festere.austin.ibm.com)
< DST = 9.3.6.32 > (u2e.austin.ibm.com)
ip_v=4, ip_hl=20, ip_tos=0, ip_len=72, ip_id=42177, ip_off=0
ip_ttl=60, ip_sum=bba6, ip_p = 6 (TCP)
TCP header breakdown: