PSF/AIX: Cause of 0421-049 & what is the timer period?
ITEM: RTA000150611
Q:
RESPOND ELECTRONICALLY.
Route = .
Abstract: Cause of 0421-049 & what is the timer period?
Using PSF/AIX 2.1 to print to Ethernet attached IP/4000s. We have seen
some 0421-049 messages about lost communication with the printer & then
apparently PSF terminates. My question is what causes the 0421-049 &
how long does PSF retry/wait until it determines communication is lost?
I have not seen any timer value externalized in PSF which relates
to this. It causes lots of problems for PSF to terminate in this
way & we need to try to figure out why it is happening and how it can
be avoided.
A:
In looking through RETAIN and the AIX Support Center records, some
other customers are having similar problems. One possible cause
would be that the device intervention timer is set too low, and that
operators are not able to respond at that intervention within the
specified time. However, in your customer's case, I reviewed all the
timer settings in PSF/AIX, and the device intervention timer for all
PSF queues (lp00-lp70) is set to 9999 or never time out; so that timer
is not the problem in your case.
Your customer is somewhat backlevel on their PSF/AIX PTFs (last one
applied is from 10/97), so I'd recommend installing the latest PSF PTFs
first.There have been five additional PTFs since then fixing a number of
APARs. I'd also verify that this customer has on APAR IX69926 since
they're running 4-way F50s. I think I took care of that while I was
there, but it doesn't hurt to verify, especially since I think that
the system had to be rebuilt later. (AIX command to check for APAR is
instfix -ik IX69926.)
If you continue to experience this problem, I'd suggest that you open
a PMR and work with Boulder in the event that they know of recommended
AIX fixes or if they need to run any specific diagnostic traces. I
know that there's now an AIX fix that alleviates problems that used to
occur with qdaemon if someone updated /etc/qconfig while printing was
active; it certainly wouldn't hurt to put that one on either. I don't
know its number, but that's where the AIX Support Center and Level 2
could help.
Thanks for using ViewBlue.
Q:
Thanks for that information & we'll check on the PTFs. On my original
question though I would still like to know what the timer/interval is,
apparently in the bowels of PSF/AIX, for when PSF decides it has lost
communication with the printer & then apparently terminates. I want to
see if this in any way maps to some of the performance measurements we
have taken with IOSTAT & VMSTAT. Is 0421-049 an ambiguous catch all
message or does it give us some direction to look at?
A:
Well, as I mentioned above, according to the items I can find, the
most common cause for that message seems to be the Device Intervention
timer popping. However, since your customer has it set to 9999 (never
time out), that shouldn't be the case there.
Let me review all the different external timers for PSF/AIX here, and
I'll mention what I know about the settings at your customer. I did
not have a chance to check with the change team for other possible
factors (before leaving on vacation), so I'll assign this to my backup
to follow up on non-externalized timers next week. Bottom line is that
I don't think it's one of the externalized timers causing you problems,
and that's why I recommend opening the PMR so L2 can dig inside. The
only one I can (very¢) remotely imagine might be the TCP KEEPALIVE
parameters, but they only come into play if PSF has not received any
TCP response from the printer in the time allotted (currently set to
four minutes at your customer, I believe). I would think that even in
your customer's heavily loaded environment, particularly if EasySpooler
requires the ACK to be set to 1, that there's no way that four minutes
would elapse without some sort of TCP response. I could be wrong....
BASIC PSF FOR AIX TIMERS:
=========================
CONNECTION TIMEOUT (Device Options): This setting allows you to specify
the number of seconds PSF for AIX initially waits for the printer to
become available. After waiting the specified amount of time, PSF for
AIX will no longer attempt to connect to the printer. You would
modify this value if you're sharing the printer with another PSF.
Valid values are 0-9999, with a default of 30 seconds. 0 means that
PSF will continue to attempt to connect to the printer and never
timeout. (Customer's are set to the default of 30 seconds.)
NUMBER OF SECONDS BEFORE NPRO (Processing Options): This is used
for continuous forms printers only. It lets you specify the number of
seconds that a continuous-forms printer should wait for another print
job before stacking the final pages of the previous print job. This
delay is called a non-process runout (NPRO). Valid values are 0-9999.
0 means than an NPRO will not be performed (no blank pages will be
stacked. A value of 9999 means that an NPRO is performed if no new
print jobs come in after 9999 seconds. (Your customer has varying
values by printer, some 30, some 60.)
DEVICE INTERVENTION TIMER (Tuning Options): This allows you to specify
how many seconds PSF for AIX will wait before it treats an intervention
required condition (such as a paper jam) as a permanent printer error
and marks the print queue down. Valid values are 1-9999. 9999 is the
default and means that PSF for AIX will never time out. If a queue is
marked down by this timer, you must manually restart the queue.
(Your customer has this set to 9999, never time out.)
JOB INTERVAL SHUTDOWN TIMER (Tuning Options): This lets you specify
how many seconds PSF for AIX waits after completing the jobs in its
queue before relinquishing control of the attached printer. Valid
values are 1-9999. The default is 9999, which means that PSF for AIX
will never relinquish control of the printer. You should set a value
lower than 9999 when sharing a printer with another instance of PSF.
For continuous forms printers, you should specify a larger value for
Job Interval Shutdown than for NPRO.
(Your customer has this set to 9999.)
TIMERS FOR TCP/IP-ATTACHED IPDS PRINTERS:
=========================================
TCP KEEPALIVE: PSF for AIX uses the TCP protocol and relies on TCP to
detect when a connection with an i-data 7913 or TCP/IP-attached printer
is no longer usable. PSF for AIX directs TCP to poll its connection
partner periodically when no other data is exchanged between PSF for
AIX and its connection partner. These periodic polls, called KEEPALIVE
transmissions, enable TCP to discover when a connection is no longer
usable even if the connection partner is abruptly powered off or is no
longer accessible through the network. If PSF for AIX does not receive
a response from the printer during one of these periodic polls, PSF
will mark the queue for that printer down. Once the error condition
that caused the loss of connection is corrected, the print operator
must bring the queue back up.
Although PSF for AIX directs TCP to send KEEPALIVE transmissions, the
frequency of these transmissions is controlled by system-wide TCP/IP
configuration parameters. On AIX, the default frequency is after
roughly two hours of inactivity. However, AIX allows the frequency
of KEEPALIVE transmissions to be adjusted. The frequency applies to all
TCP applications that direct TCP to send KEEPALIVE transmissions.
Information on how to set these parameters and recommendations for
their values may be found in "PSF for AIX Print Administration" in the
chapter on "Installing a TCP/IP-Attached IPDS Printer" (S544-3817-03,
page 107) or in "IBM InfoPrint Manager for AIX: IBM InfoPrint Control
Diagnostics Guide" in the chapter on "Diagnostic Tools for Problem
Solving" under "Configuring the TCP KEEPALIVE Frequencies for TCP/IP-
attached Printers" (G544-5472-00, Oct 1997, page 75).
(While I was at the customer, I configured these on macserv1 to the
recommended values in the book: tcp_keepidle=480 and tcp_keepintvl=80,
which are in half-second units, thus 4 minutes and 40 seconds,
respectively. Original defaults are 14400 (two hours) and 150 (75
seconds) respectively.)
The following don't apply for your customer, but since I was writing
up timers anyway...
PSF DIRECT HOST RECEIVER TIMERS (smit psfdirect):
=================================================
INACTIVITY LIMIT: Allows you to specify the number of seconds the
PSF Direct host receiver waits for data from the host PSF after the
communications session has been established. When this limit is
exceeded, the PSF Direct receiver ends the communications session
with the host PSF. You would use this value to share printers
between PSF Direct host receivers and other PSF for AIX processes.
Valid values are 1-9999. 9999 is the default and means that the
PSF Direct host receiver will wait indefinitely. If you do not plan
to share the printer used by this host receiver, you don't need to
change the default Inactivity Limit. If you plan to share the printer,
you should set this value to other than 9999.
PRINTER BUSY LIMIT: Allows you to specify the number of seconds PSF
Direct will try to connect to a printer. A printer is "busy" if jobs
are being printed from another PSF for AIX process or by another PSF
Direct receiver. PSF Direct keeps trying to connect to the printer
for the specified number of seconds. If this limit is exceeded, PSF
Direct notifies the host and will no longer attempt to connect to the
device. You use this limit to manage printer sharing between PSF
Direct receivers and other PSF for AIX processes. If you plan to share
the printer used by this PSF Direct host receiver, you may want to
increase the Printer Busy Limit. Valid values are 1-9999; the default
is 120 seconds. A value of 9999 means that PSF Direct will continue
to try to connect to the printer indefinitely.
A:
I did some additional checking with my change team contact, who
determined which module issues that message, and the owner of that
module. According to them, message 0421-049 is issued by module
ain3dtcp, which is the secondary process that manages communications
with TCP/IP-attached IPDS printers. This message would be issued in the
event of either an unrecoverable communications error with the printer
(could be a problem with the printer or in the network between the
RS/6000 and the printer) or an IPDS protocol error. Or, as mentioned
previously, another cause of this error message can be an IPDS NACK
issued when the intervention-required timer expires.
To get any additional information would require a trace to determine
values of variables that are set when the failure occurs. Boulder has
recommended that the next step would be to open a PMR with Level 2; you
should be sure and remind L2 that the customer also uses the EasyPSF
product so they can determine the best approach for diagnosis.
I'm sorry for the delay in getting the information to you. I hope
this helps.
S e a r c h - k e y w o r d s:
psf/6000 psf/aix psf aix timer disconnect interval timeout 0421-049
intervention keepalive direct host receiver connection NACK ain3dtcp
WWQA: ITEM: RTA000150611 ITEM: RTA000150611
Dated: 06/1998 Category: XPSF6000
This HTML file was generated 99/06/24~12:43:38
Comments or suggestions?
Contact us