[ Bottom of Page | Previous Page | Next Page | Contents | Index | Library Home |
Legal |
Search ]
System Management Guide: Communications and Networks
TCP/IP Problem Determination
This section contains information about diagnosing
common problems in a Transmission Control Protocol/Internet Protocol (TCP/IP)
network environment.
The netstat command is a good
tool for determining which area of the network has a problem. Once you have
isolated the problem to an area, you can use more sophisticated tools to proceed.
For example, you might use the netstat -i and netstat -v to determine if you have a problem with a particular hardware
interface, and then run diagnostics to further isolate the problem. Or, if
the netstat -s command shows that there are protocol
errors, you could then use the trpt or iptrace commands.
The topics discussed in this section are:
Communication Problems
If you cannot communicate with a host on your network:
- Try to contact the host, using the ping command.
Run the ping command on the local host to verify that
the local interface to the network is up and running.
- Try to resolve the name of the host, using the host
command. If the name does not resolve, you have a name resolution problem.
See "Name Resolution Problems" for more information.
If the name resolves and you are trying to contact
a host on another network, you may have a routing problem. See "Routing Problems" for more information.
- If your network is a token-ring network, check to see if the target host
is on another ring. If so, the allcast field is probably set incorrectly.
Use the Web-based System Manager, wsm, or the System Management
Interface Tool (SMIT) fast path smit chinet to access
the Network Interfaces menu. Then, set the Confine Broadcast to Local Ring
field to no in the token-ring dialog.
- If there are a large number of Address Resolution Protocol (ARP) packets
on your network, verify that your subnet mask is set correctly. This condition
is known as a broadcast storm and can affect your system performance.
Name Resolution Problems
Resolver routines on hosts running TCP/IP attempt to
resolve names, using the following sources in the order listed:
- DOMAIN name server (named)
- Network Information Service (NIS)
- Local /etc/hosts file
When NIS+ is installed, lookup preferences are set using the irs.conf file. For more information, see AIX 5L Version 5.2 Network Information Services (NIS and NIS+) Guide.
Client Host
If you cannot get a host name resolved, and you are
using flat name resolution (using the /etc/hosts file),
verify that the host name and correct Internet Protocol (IP) address information
is in the /etc/hosts file.
If you cannot get a host name resolved, and you are
using a name server:
- Verify that you have a resolv.conf file specifying
the domain name and Internet address of a name server.
- Verify that the local name server is up by issuing the ping command with the IP address of the name server (found in the local resolv.conf file).
- If the local name server is up, verify that the named daemon on your local name server is active by issuing the lssrc -s named command on the name server.
- If you are running the syslogd, check for logged
messages. The output for these messages is defined in the /etc/syslog.conf file.
If these steps do not identify the problem, check the
name server host.
Name Server Host
If you cannot get a host name resolved:
- Verify that the named daemon is active by issuing
the following command:
lssrc -s named
- Verify that the address of the target host exists and is correct in the
name server database. Send a SIGINT signal to the named daemon to dump the database and cache to the file /var/tmp/named_dump.db. Verify that the address you are
trying to resolve is there and is correct.
Add or correct
name-to-address resolution information in the named
hosts data file for the master name server of the domain. Then issue the following SRC command to reread the data files:
refresh -s named
- Verify that the name resolution requests are being processed. To do this,
enter the named daemon from the command line and specify
a debugging level. Valid debug levels are 1 through 9. The higher the level,
the more information the debug mechanism logs.
startsrc -s named -a "-d DebugLevel"
- Check for configuration problems in the named data
files. For more information, see "Configuring Name
Servers", the "DOMAIN Data File Format," "DOMAIN Reverse Data File Format," "DOMAIN Cache File Format," and the "DOMAIN Local
Data File Format" in the AIX 5L Version 5.2 Files Reference.
Note: A common error is the incorrect use of the . (period) and the @
(at sign) in the DOMAIN data files.
If external users cannot reach your domains:
- Make sure that all your non-master name servers (slave, hint) have equal
time-to-live (TTL) information in the DOMAIN data files.
If external resolvers query your servers constantly:
- Make sure your servers are distributing DOMAIN data files with reasonable
TTL values. If the TTL is zero or another small value, the data you transfer
times out very quickly. Set the minimum value in your start of authority (SOA)
records to a week or more to solve this problem.
Routing Problems
If you cannot reach a destination host, consider the
following situations:
- If you receive a Network Unreachable error
message, make sure that a route to the gateway host has been defined and is
correct. Check this by using the netstat -r command
to list kernel routing tables.
- If you receive a No route to host error message,
verify that the local network interface is up by issuing the ifconfig interface_name command. The output indicates whether or not
the interface is up. Use the ping command to try and
reach another host on your network.
- If you receive a Connection timed out error
message:
- Verify that the local gateway is up using the ping
command with the name or Internet address of the gateway.
- Make sure that a route to the gateway host has been defined and is correct.
Check this by using the netstat -r command to list kernel
routing tables.
- Make sure the host you want to communicate with has a routing table entry
back to your machine.
- If you are using static routing, make sure that a route to the target
host and gateway host has been defined. Check this by using the netstat -r command to list kernel routing tables.
Note: Make sure the host you want to communicate with has
a routing table entry to your machine.
- If you are using dynamic routing, verify that the gateway is listed and
correct in the kernel routing tables by issuing the netstat
-r command.
- If the gateway host is using the Routing Information
Protocol (RIP) with the routed daemon, make sure
that a static route to the target host is set up in the /etc/gateways file.
Note:
You need to do this only if the routing daemon cannot identify the route to
a distant host through queries to other gateways.
- If the gateway host is using the RIP with the gated daemon, make sure that a static route to the target
host is set up in the gated.conf file.
- If you are using dynamic routing with the routed
daemon:
- If you are using dynamic routing with the gated
daemon:
- Verify that the /etc/gated.conf file is configured
correctly and that you are running the correct protocols.
- Make sure the gateway on the source network is using the same protocol
as the gateway on the destination network.
- Make sure that the machine with which you are trying to communicate has
a route back to your host machine.
- Verify that the gateway names in the gated.conf
file correspond to the gateway names listed in the /etc/networks file.
- If you are using the RIP or HELLO protocols, and routes to the destination cannot be identified through
routing queries, check the gated.conf file to verify
that a route to the target host is defined. Set static routes under the following
conditions:
- The destination host is not running the same protocol as the source host
so cannot exchange routing information.
- The host must be reached by a distant gateway (a gateway that is on a
different autonomous system than the source host). The RIP can be used only among hosts on the same autonomous system.
Other Possibilities
If all else fails, you might want to turn on tracing
for your routing daemon (either routed or gated). Use the SRC traceson command from the command
line, or send a signal to the daemon to specify different levels of tracing.
See the gated daemon or the routed daemon for specifics on sending signals to these daemons.
Problems with SRC Support
- If changes to the /etc/inetd.conf file do not take
effect:
Update the inetd daemon by
issuing the refresh -s inetd command or the kill -1 InetdPID command.
- If the startsrc -s [subsystem
name] returns the following error message:
0513-00 The System Resource Controller is not active.
The System Resource Controller subsystem has not been activated. Issue
the srcmstr & command to start SRC, then reissue
the startsrc command.
You might
also want to try starting the daemon from the command line without SRC support.
- If the refresh -s [subsystem
name] or lssrc -ls [subsystem
name] returns the following error message:
[subsystem name] does not support this option.
The subsystem does not support the SRC option issued. Check the subsystem
documentation to verify options the subsystem supports.
- If the following message is displayed:
SRC was not found, continuing without SRC support.
A daemon was invoked directly from the command line instead of using
the startsrc command. This is not a problem. However,
SRC commands, such as stopsrc and refresh, will not manipulate a subsystem that is invoked directly.
If the inetd daemon is up and
running correctly and the appropriate service seems to be correct but you
still cannot connect, try running the inetd daemon processes
through a debugger.
- Stop the inetd daemon temporarily:
stopsrc -s inetd
The stopsrc command stops subsystems like the inetd daemon.
- Edit the syslog.conf file to add a debugging line
at the bottom. For example:
vi /etc/syslog.conf
- Add the line "*.debug /tmp/myfile" at the
bottom of the file and exit.
- The file that you specify must exist (/tmp/myfile in this example). You can use the touch command to make your file exists.
- Refresh the file:
- Start the inetd daemon backup with debugging enabled:
startsrc -s inetd -a "-d"
The -d flag enables debugging.
- Try to make a connection to log errors in the /tmp/myfile debugging file. For example:
tn bastet
Trying...
connected to bastet
login:>
Connection closed
- See if anything shows up as a problem in the debugging file. For example:
tail -f /tmp/myfile
telnet or rlogin Problems
The following explanations can be useful in solving
problems with the telnet or rlogin command.
Screen Distortion
If you are having trouble with screen distortion in
full-screen applications:
- Check the TERM environment variable by issuing one
of the following commands:
env
echo $TERM
- Verify that the TERM variable is set to a value
that matches the type of terminal display you are using.
telnet Debugging
telnet subcommands that can help
in debugging problems include:
display |
Displays set and toggle values. |
toggle |
Toggles the display of all network data in hex. |
toggle options |
Toggles the display of internal telnet process
options. |
telnetd Daemon Debugging
If the inetd daemon could execute
the telnet service but you still cannot connect using
the telnet command, there may be something wrong with
the telnet interface.
- Verify that telnet is using the correct terminal
type.
- Check the $TERM variable on your machine:
echo $TERM
- Log in to the machine to which you are trying to attach and check the $TERM variable:
echo $TERM
- Use the telnet interface's debugging capabilities
by entering the telnet command without flags.
telnet
tn>
- Enter open host
where host is the name of the machine.
- Enter Ctrl-T to get to the tn%gt; prompt.
- At the tn> prompt, enter debug for debugging mode.
- Try to connect to another machine using the telnet
interface:
telnet bastet
Trying...
Connected to bastet
Escape character is '^T'.
Watch the display as the various commands
scroll up the screen. For example:
SENT do ECHO
SENT do SUPPRESS GO AHEAD
SENT will TERMINAL TYPE (reply)
SENT do SUPPORT SAK
SENT will SUPPORT SAK (reply)
RCVD do TERMINAL TYPE (don't reply)
RCVD will ECHO (don't reply)
RCVD will SUPPRESS GO AHEAD (don't reply)
RCVD wont SUPPORT SAK (reply)
SENT dont SUPPORT SAK (reply)
RCVD do SUPPORT SAK (don't reply)
SENT suboption TELOPT_NAWS Width 80, Height 25
RCVD suboption TELOPT_TTYPE SEND
RCVD suboption TELOPT_TTYPE aixterm
...
- Check /etc/termcap or /usr/lib/terminfo for the aixterm definition. For example:
ls -a /usr/lib/terminfo
- If the aixterm definition is missing, add
it by building the ibm.ti file. For example:
tic ibm.ti
The tic command is a terminal information compiler.
Programs Using Extended Curses
Problems with function and arrow keys can arise when
using the rlogin and telnet commands
with programs using extended curses. Function and arrow keys generate escape
sequences, which are split if too little time is allotted for the entire key
sequence. Curses waits a specific amount of time to decide whether an Esc
indicates the escape key only or the start of a multibyte escape sequence
generated by other keys, such as cursor keys, the action key, and function
keys.
If no data, or data that is not valid, follows the
Esc in the allotted amount of time, curses decides that the Esc is the escape
key, and the key sequence is split. The delay resulting from the rlogin or telnet command is network dependent.
Sometimes arrow and function keys work and sometimes they do not, depending
on the speed of the network to which you are connecting. Setting the ESCDELAY environment variable to a large value (1000 to 1500) effectively
solves this problem.
Configuration Problems
Network interfaces are automatically configured during
the first system startup after the adapter card is installed. However, you
still need to set some initial values for TCP/IP including the host name,
the Internet address, and the subnet mask. To do this, you can use the Web-based System Manager, wsm, or you can use the SMIT interface in the following
ways:
- Use the smit mktcpip fast path to set the initial
values for the host name, the Internet address, and the subnet mask.
- Use the smit mktcpip fast path to specify a name
server to provide name resolution service. (Note that smit
mktcpip configures one network interface only.)
- Use the smit chinet fast path to set other network
attributes.
You may also want to set up any static routes the host
needs for sending transmitting information, such as a route to the local gateway.
Use the Web-based System Manager, wsm, or the SMIT fast path, smit mkroute, to set these up permanently in the configuration
database.
If you are having other problems with your configuration,
see the "Configuring a TCP/IP Network Checklist"
for more information.
Common Problems with Network Interfaces
Network interfaces are configured automatically during
the first system startup after the adapter card is installed. However, there
are certain values that must be set in order for TCP/IP to start. These include
the host name and Internet address and can be set using the Web-based System Manager, wsm, or the SMIT fast path, smit mktcpip.
If you choose the SMIT method, use the smit mktcpip
fast path to set these values permanently in the configuration database. Use
the smit chinet and smit hostname
fast paths to change them in a running system. The smit mktcpip fast path minimally configures TCP/IP. To add adapters, use the Further
Configuration menu, which can be reached with the smit tcpip fast path.
If you have already checked these to verify accuracy and you are still
having trouble sending and receiving information, check the following:
- Verify that your network adapter has a network interface by executing
the netstat -i command. The output should list an interface,
such as tr0, in the Name
column. If it does not, create a network interface through Web-based System Manager
or by entering the SMIT fast path smit mkinet.
- Verify that IP address for the interface is correct by executing the netstat -i command. The output should list the IP address
in the Network column. If it is incorrect, set the IP address through Web-based System Manager
or by entering the SMIT fast path smit chinet.
- Use the arp command to make sure you have the complete
IP address for the target machine. For example:
arp -a
The arp
command looks for the physical adapter address. This command might show an
incomplete address. For example:
? (192.100.61.210) at (incomplete)
This could be due to an unplugged machine, a stray address with
no machine at that particular address, or a hardware problem (such as a machine
that connects and receives packets but is not able to send packets back).
- Look for errors on the adapter card. For example:
netstat -v
The netstat -v command shows
statistics for the Ethernet, Token Ring, X.25, and 802.3 adapter device drivers.
The command also shows network and error logging data for all device drivers
active on an interface including: No Mbufs Errors, No Mbuf Extension Errors, and Packets Transmitted and Adapter Errors Detected.
- Check the error log by running the errpt command
to ensure that there are no adapter problems.
- Verify that the adapter card is good by running diagnostics. Use the Web-based System Manager
Devices application, the smit diag fast path, or the diag command.
If these steps do not identify the problem, see "Problems with a SLIP Network Interface", "Problems with an Ethernet Network Interface", or "Problems with a Token-Ring Network Interface".
Problems with a SLIP Network Interface
In general, the most effective method for debugging
problems with a Serial Line Interface Protocol (SLIP) interface is to retrace
your configuration, verifying each step. However, you can also:
If the modem is not functioning correctly:
- Make sure that the modem was installed correctly. See the modem installation
manual.
- Verify that any flow control the modem does is turned off.
If the tty is not functioning correctly, verify that
the tty baud rate and modem characteristics are set correctly in the configuration
database by entering the smit tty fast path.
Problems with an Ethernet Network Interface
If the network interface has been initialized, the
addresses correctly specified, and you have verified that the adapter card
is good:
- Verify that you are using a T-connector plugged directly into the inboard/outboard
transceiver.
- Make sure you are using an Ethernet cable. (Ethernet cable is 50 OHM.)
- Make sure you are using Ethernet terminators. (Ethernet terminators are
50 OHM.)
- Ethernet adapters can be used with either the transceiver that is on the
card or with an external transceiver. There is a jumper on the adapter to
specify which you are using. Verify that your jumper is set correctly (see
your adapter manual for instructions).
- Verify that you are using the correct Ethernet connector type (thin is
BNC; thick is DIX). If you change this connector type, use the Web-based System Manager, wsm, or the SMIT fast path, smit chgenet, to set the Apply Change to Database Only field. (Check the field in Web-based System Manager
or set to yes in SMIT.) Restart the machine to apply
the configuration change. (See "Configuring and
Managing Adapters".)
Problems with a Token-Ring Network Interface
If you cannot communicate with some of the machines
on your network although the network interface has been initialized, the addresses
correctly specified, and you have verified that the adapter card is good:
- Check to see if the hosts with whom you cannot communicate are on a different
ring. If they are, use the Web-based System Manager, wsm, or the
SMIT fast path smit chinet to check the Confine BROADCAST
to Local Token-Ring field. Do not check the field
in Web-based System Manager or set to no in SMIT.
- Check to see whether the token-ring adapter is configured to run at the
correct ring speed. If it is configured incorrectly, use the Web-based System Manager Network
application or SMIT to change the adapter ring speed attribute (see "Configuring a High-Performance Token-Ring Adapter"). When TCP/IP is restarted,
the token-ring adapter has the same ring speed as the rest of the network.
Problems with a Token-Ring/Ethernet Bridge
If you cannot communicate between a token-ring and
an Ethernet network, using a bridge, and you have verified that the bridge
is functioning correctly, the Ethernet adapter might be dropping packets.
A machine drops packets if the incoming packet (including headers) is greater
than the network adapter maximum transmission unit (MTU) value. For instance,
a 1500-byte packet sent by a token-ring adapter over the bridge collects an
8-byte logical link control (LLC) header, making the total packet size 1508.
If the receiving Ethernet adapter MTU is set to 1500, the packet is dropped.
Check the MTU values of both network adapters. To allow
for the eight-byte LLL header, the token-ring adapter attaches to outgoing
packets, set the MTU value for the token-ring adapter at least eight bytes
lower than the MTU value for the Ethernet adapter. For example, set the MTU
for a token-ring adapter to 1492 to communicate with an Ethernet adapter with
an MTU of 1500.
Problems with a Token-Ring/Token-Ring Bridge
When operating through a bridge, change the default
value of 1500 for the maximum transmission unit (MTU) to a value that is eight
less than the maximum information field (maximum I-frame) advertised by the
bridge in the routing control field.
To find the routing control field value, use the iptrace daemon to look at
incoming packets. Bits 1, 2, and 3 of Byte 1 are the Largest Frame Bits, which
specify the maximum information field that can be transmitted between two
communicating stations on a specific route. See the following for the format
of the routing control field:
Figure 26. Routing Control Field. This illustration shows byte
0 and byte 1 of a routing control field. The eight bits of byte one are B,
B, B, B, L, L, L, L. The eight bits of byte 1 are D, F, F, F, r, r, r, r.
Values for the Largest Frame Bits are as follows:
000 |
Specifies a maximum of 516 bytes in the information field. |
001 |
Specifies a maximum of 1500 bytes in the information field. |
010 |
Specifies a maximum of 2052 bytes in the information field. |
011 |
Specifies a maximum of 4472 bytes in the information field. |
100 |
Specifies a maximum of 8144 bytes in the information field. |
101 |
Reserved. |
110 |
Reserved. |
111 |
Used in all-routes broadcast frames. |
For example, if the maximum I-frame value is 2052 in
the routing control field, the MTU size should be set to 2044. This is for
token-ring network interfaces only.
Note: When using iptrace,
the output file must not be on a Network File System
(NFS).
Problems with Packet Delivery
Communicating with a Remote Host
If you cannot communicate with a remote host, try the
following:
- Run the ping command on the local
host to verify that the local interface to the network is up and running.
- Use the ping command for hosts and gateways that are progressively more hops
from the local host to determine the point at which communication fails.
If you are having trouble with packet loss or are experiencing
delays in packet delivery, try the following:
- Use the trpt command to trace packets at the socket level.
- Use the iptrace command to trace all protocol layers.
If you cannot communicate between a token-ring and
an Ethernet network using a bridge, and you have verified that the bridge
is good:
- Check the MTU values of both adapters. The MTU values
must be compatible to allow communication. A machine drops packets if the
incoming packet (including headers) is greater than the adapter's MTU values.
For instance, a 1500-byte packet sent over the bridge collects an 8-byte LLC
header, making the total packet size 1508. If the receiving machine MTU is
set to 1500, a packet of 1508 bytes is dropped.
snmpd Response to Queries
If snmpd is not responding to
queries and there are no log messages received, the packet might be to large
for the kernel User Datagram Protocol (UDP) packet handler. If this is the
case, increase the kernel variables, udp_sendspace and udp_recvspace by issuing the following commands:
no -o udp_sendspace=64000
no -o udp_recvspace=64000
The maximum size for a UPD packet is 64K. If your query
is larger than 64K, it will be rejected. Split the packet into smaller packets
to avoid this problem.
Problems with Dynamic Host Configuration Protocol (DHCP)
If you cannot get an IP address or other configuration
parameters:
- Check to see that you have specified an interface
to be configured. This can be done through the Web-based System Manager Network
application, by editing the /etc/dhcpcd.ini file, or
by using the SMIT fast path smit dhcp.
- Check to see that there is a server on the local
network or a relay agent configured to get your requests off the local network.
- Check to see that the dhcpcd
program is running. If it is not, use the startsrc -s dhcpcd command.
[ Top of Page | Previous Page | Next Page | Contents | Index | Library Home |
Legal |
Search ]