12/16/94 Tips for Network Communication Failure (Cannot "ping") SPECIAL NOTICES Information in this document is correct to the best of our knowledge at the time of this writing. Please send feedback by fax to "AIXServ Information" at (512) 823-4009. Please use this information with care. IBM will not be responsible for damages of any kind resulting from its use. The use of this information is the sole responsibility of the customer and depends on the customer's ability to eval- uate and integrate this information into the customer's operational environment. ABOUT THIS DOCUMENT This document contains problem determination tips for situ- ations in which you cannot "ping", "telnet", "rlogin", or use other forms of network communication. This document | applies to AIX 3.2 and 4.1. OVERVIEW Communication on a network (using "ping", "telnet", "rlogin", etc.) requires the configuration of hardware and software to work in concert. If you have either a hardware failure or a software communication problem, you may see anything from a slowdown in communication to 100% packet loss (no communication). The "ping" command is often used to check network configura- tion. "ping" is a two part application, one part on the sending machine and the other on the receiving machine. When you send a ping to another host, the software dis- patches packets. The packets activate a process at the receiving machine that responds by sending the packets back. If Machine A sends the packet, but the packet never reaches Machine B, you will see 100% packet loss from Machine A. If Machine B receives the packet, sends it back by a different route, and the packet gets lost, you will also see 100% packet loss on Machine A. STRATEGY FOR PROBLEM DETERMINATION The following strategy for problem determination is divided as follows: Checking the Hardware Checking the Environment Checking the Configuration Tips for Network Communication Failure (Cannot "ping") 1 12/16/94 Checking the Hardware 1. Ensure that all plugs are secured and screwed down on the adapters. 2. View the status of existing adapters and interfaces (the adapter is the physical hardware; the interface is the software that enables communication on that hardware): Execute the following to check the adapters and inter- faces: lsdev -C | pg The following adapters may be listed: ent# Standard Ethernet Adapter or High-Performance Ethernet Adapter tok# Token-Ring High-Performance Adapter Verify that the adapter you are using is "Available". The term "Available" indicates that the RISC System/6000 recognized that this adapter was ready for use. If the adapter is "Defined", then you need to verify that your hardware is installed correctly. The term "Defined" indicates that the RISC System/6000 at one time knew it had available hardware in that slot but currently cannot identify that it has the hardware. The following interfaces may be listed: en# Standard Ethernet Network Interface et# IEEE 802.3 Ethernet Network Interface tr# Token Ring Network Interface Verify that the interface you are using is "Available". If it is listed as "Defined", then you do not have your interface configured. The Standard Ethernet Adapter and the High-Performance Adapter can utilize either the en# or et# interface. (These designate which protocols are available on the Ethernet style adapters.) 3. Check the error report by executing the following: errpt -a | pg Look at the Date/Time line. The error log is in LIFO order (last in, first out) so the last error logged will be the first one displayed. If the date is not today's date, then you may not have a hardware error. If it is the current date, check the ERROR LABEL field for errors such as: Ethernet Token Ring -------- ---------- ENT_ERR2 TOK_ESERR ENT_ERR4 TOK_RCVRY_ENTER ENT_ERR6 TOK_RCVRY_EXIT Tips for Network Communication Failure (Cannot "ping") 2 12/16/94 The above errors will generally mention that the error is hardware related. Re-verify that all plugs are secured and screwed down on the adapters. You may want to re-seat the adapters in their slots (proceed with caution) and then ping again and see if any more errors are reported. If errors continue, you should seek assistance from hardware support (1-800-IBM-SERV). Checking the Environment 1. Execute the following to check the network statistics: netstat -m | For AIX 3.2, the output will look similar to: 285 mbufs in use: 17 mbufs allocated to data 8 mbufs allocated to packet headers 94 mbufs allocated to socket structures 126 mbufs allocated to protocol control blocks 22 mbufs allocated to routing table entries 16 mbufs allocated to socket names and addresses 2 mbufs allocated to interface addresses 300 mbufs allocated to 116/314 mapped pages in use 1327 Kbytes allocated to network (40% in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines If the last three lines have something other than "0" then your system may be exhibiting an "mbufs full" problem. Refer to IBM AIX Version 3.2/6000 Performance | Monitoring and Tuning Guide (SC23-2365). | For AIX 4.1, the first part of the output will look | similar to: | 26 mbufs in use: | 16 mbuf cluster pages in use | 70 Kbytes allocated to mbufs | 0 requests for mbufs denied | 0 calls to protocol drain routines | Kernel malloc statistics: | . . . | If the "requests for mbufs denied" line has something | other than "0", your system may be exhibiting an "mbufs | full" problem. Refer to the IBM AIX Performance Moni- | toring and Tuning Guide (SC23-2365-03). 2. Determine which machine is having the communication failure: From Machine A, ping Machine B. On Machine B, execute the following: Tips for Network Communication Failure (Cannot "ping") 3 12/16/94 arp -a The output will look similar to: ---------------------------------------------------------------------- ausvm3.austin.ibm.com (129.35.26.21) at 10:0:5a:ac:22:71 [token ring] rt=a40:22a1:c211:bb11:d3a0 cia.austin.ibm.com (129.35.22.192) at 10:0:5a:a8:e1:9d [token ring] risc.austin.ibm.com (129.35.28.168) at 10:0:5a:9:2c:b1 [token ring] rt=830: 22 a1:c211:2270 ausname1.austin.ibm.com (129.35.17.2) at 10:0:5a:a8:2b:92 [token ring] rt=a40: 22a1:c211:bb11:cff0 ---------------------------------------------------------------------- Check the listing for Machine A's hostname and IP address. If Machine A is NOT in the list, then packets never get from Machine A to Machine B. Either Machine A is the problem or something between Machine A and Machine B is the problem. If Machine B DOES have Machine A in the list, then either Machine B is the problem, or the return path to Machine A is a problem. Go back to the beginning of this fax and begin to work through the steps with Machine B. 3. If NIS is running, it may interfere with pinging by hostname. You may want to disable this option until ping and telnet are working to simplify problem determi- nation. Then, once you can ping, enable NIS and see if you have ping problems. If you do, your NIS configura- tion needs to be reviewed for correctness. To disable NIS, start smit with "smit communications" and choose the following: NFS Network Information Service (NIS) Start / Stop Configured NIS Daemons Then choose the appropriate stop items from those dis- played: Stop the Server Daemon, ypserv Stop the Client Daemon, ypbind Stop the yppasswdd Daemon Stop the ypupdated Daemon 4. Verify that your netmask is correct. (A full discussion of a netmask is outside the scope of this document.) You can access Machines with If your If your IP addresses listed below without Address is Netmask is additional routing information -------------------------------------------------------------------- 110.120.130.140 255.255.255.0 110.120.130.* 110.120.130.140 255.255.0.0 110.120.*.* 110.120.130.140 255.0.0.0 110.*.*.* Tips for Network Communication Failure (Cannot "ping") 4 12/16/94 Checking the Configuration 1. To verify that the hostname is still the correct hostname for this machine, execute the following: hostname The string returned should be the hostname of this machine. If the name returned was not what was expected, run "smit tcpip" and choose the following to set the hostname. Further Configuration Hostname 2. Verify that the IP address is what is expected by exe- cuting the following: host your_hostname The output should be similar to : zcomm1.austin.ibm.com is 129.35.31.99 If the output is not what was expected, you need to cor- rectly configure the IP address for this adapter or check the name resolution (see steps below). 3. Check to see if you are running Domain Name Service (DNS): If /etc/resolv.conf exists, then you are using DNS. Disable DNS by renaming this file to some other filename: mv /etc/resolv.conf /etc/resolve.conf.hold If you can now ping, then something is wrong with DNS configuration. (In the /etc/hosts file, you may have to add the IP address and host of the machine you are trying to ping.) 4. Examine the /etc/hosts file. Verify that your hostname is in the file only once and that there is no corruption in the file. If your hostname belongs to two IP addresses, then the first hostname it finds in the file will be the IP address that is used. Also, check for a duplicate IP address. 5. Execute the following to ensure that software is loaded correctly: lppchk -v This will execute for a while and then come to a prompt. If any error messages are displayed, it indicates a pos- sible install or update problem; correct the error and then try pinging. Tips for Network Communication Failure (Cannot "ping") 5 12/16/94 6. Ping by hostname, then by IP address. Both should respond in the same manner. If they don't, check the /etc/hosts file again for duplicates. 7. Ping other machines, routers, etc. If only one machine is failing on the ping, your machine could have one of the following: a gateway problem a route problem 8. The following steps illustrate the procedure you will need to use to verify the adapter configuration: netstat -i The above command should produce output similar to the following: ---------------------------------------------------------------------- Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Col l lo0 1536 149827 0 149827 0 0 lo0 1536 127 localhost.xxxxx 149827 0 149827 0 0 tr0 1492 5603085 48642 89675 0 0 tr0 1492 129.35.16 xxxxxx.xxxxxx.x 5603085 48642 89675 0 0 ---------------------------------------------------------------------- Some fields and values you may see in the above output are: tr0 Represents token ring interface en0 Represents standard Ethernet interface et0 Represents IEEE 802.3 Ethernet interface lo0 Represents the loopback mechanism Ierrs/Oerrs Shows errors for incoming and outgoing packets If you see only lo0, or if there is an "*" next to your interface, you need to configure the interface again. Oerrs are bad and may point to a hardware error. Ierrs generally indicate that your interface is receiving packets for which it does not recognize the format and is discarding them. If you have checked everything and the ping is still not working and you are running Ethernet, try reversing pro- tocols (en0 to et0 and vice versa). As a final try, you can remove the interface and adapter | and try starting again. You can do this from the | command line: ifconfig detach rmdev -d -l rmdev -d -l Tips for Network Communication Failure (Cannot "ping") 6 12/16/94 Then you will need to reconfigure the adapters and | interfaces. You can do that in any of these ways: | o Reboot, or | o In smit, choose: | Devices | Configure Devices Added After IPL | or | o From the command line, execute: | cfgmgr | The above procedures will configure interfaces in a | defined state and adapters in an available state. Use | normal procedures to customize the configuration for | your system. If You Need Further Assistance If you followed all of the above steps and the system still cannot ping, you may want to pursue further problem determi- nation assistance from one of the following: o local branch office o your point of sale o 1-800-CALL-AIX (to register for fee-based services) All of the above avenues for assistance may be billable. Tips for Network Communication Failure (Cannot "ping") 7 12/16/94 READER'S COMMENTS Please fax this form to (512) 823-4009, attention "AIXServ Informa- tion". You may also e-mail comments to: elizabet@austin.ibm.com. These comments should include the same customer information requested below. Use this form to tell us what you think about this document. If you have found errors in it, or if you want to express your opinion about it (such as organization, subject matter, appearance) or make sug- gestions for improvement, this is the form to use. If you need technical assistance, contact your local branch office, point of sale, or 1-800-CALL-AIX (for information about support offer- ings). These services may be billable. Faxes on a variety of sub- jects may be ordered free of charge from 1-800-IBM-4FAX. Outside the U.S. call 415-855-4329 using a fax machine phone. When you send comments to IBM, you grant IBM a nonexclusive right to use or distribute your comments in any way it believes appropriate without incurring any obligation to you. NOTE: If you have a problem report or item number, supplying that number may help us determine why a procedure did or did not work in your specific situation. Problem Report or Item #: Branch Office or Customer #: Be sure to print your name and fax number below if you would like a reply: ______________________________________________________________________ END OF DOCUMENT (cannot.ping.tcp, 4FAX# 1301) Tips for Network Communication Failure (Cannot "ping") 8