PSF/AIX: Cause of 0421-049 & what is the timer period?

ITEM: RTA000150611



Q:                                                                              
RESPOND ELECTRONICALLY.                                                         
Route = .                                                                       
Abstract: Cause of 0421-049 & what is the timer period?                         
                                                                                
Using PSF/AIX 2.1 to print to Ethernet attached IP/4000s.  We have seen         
some 0421-049 messages about lost communication with the printer & then         
apparently PSF terminates.  My question is what causes the 0421-049 &           
how long does PSF retry/wait until it determines communication is lost?         
I have not seen any timer value externalized in PSF which relates               
to this.  It causes lots of problems for PSF to terminate in this               
way & we need to try to figure out why it is happening and how it can           
be avoided.                                                                     
                                                                                
A:                                                                             
In looking through RETAIN and the AIX Support Center records, some              
other customers are having similar problems.  One possible cause                
would be that the device intervention timer is set too low, and that            
operators are not able to respond at that intervention within the               
specified time.  However, in your customer's case, I reviewed all the           
timer settings in PSF/AIX, and the device intervention timer for all            
PSF queues (lp00-lp70) is set to 9999 or never time out; so that timer          
is not the problem in your case.                                                
                                                                                
Your customer is somewhat backlevel on their PSF/AIX PTFs (last one             
applied is from 10/97), so I'd recommend installing the latest PSF PTFs         
first.There have been five additional PTFs since then fixing a number of        
APARs.  I'd also verify that this customer has on APAR IX69926 since            
they're running 4-way F50s.  I think I took care of that while I was            
there, but it doesn't hurt to verify, especially since I think that            
the system had to be rebuilt later.  (AIX command to check for APAR is          
instfix -ik IX69926.)                                                           
                                                                                
If you continue to experience this problem, I'd suggest that you open           
a PMR and work with Boulder in the event that they know of recommended          
AIX fixes or if they need to run any specific diagnostic traces.  I             
know that there's now an AIX fix that alleviates problems that used to          
occur with qdaemon if someone updated /etc/qconfig while printing was           
active; it certainly wouldn't hurt to put that one on either.  I don't          
know its number, but that's where the AIX Support Center and Level 2            
could help.                                                                     
                                                                                
Thanks for using ViewBlue.                                                      
                                                                                
Q:                                                                             
Thanks for that information & we'll check on the PTFs.  On my original          
question though I would still like to know what the timer/interval is,          
apparently in the bowels of PSF/AIX, for when PSF decides it has lost           
communication with the printer & then apparently terminates.  I want to         
see if this in any way maps to some of the performance measurements we          
have taken with IOSTAT & VMSTAT.  Is 0421-049 an ambiguous catch all            
message or does it give us some direction to look at?                           
                                                                                
A:                                                                              
Well, as I mentioned above, according to the items I can find, the              
most common cause for that message seems to be the Device Intervention          
timer popping. However, since your customer has it set to 9999 (never           
time out), that shouldn't be the case there.                                    
                                                                                
Let me review all the different external timers for PSF/AIX here, and          
I'll mention what I know about the settings at your customer.  I did            
not have a chance to check with the change team for other possible              
factors (before leaving on vacation), so I'll assign this to my backup          
to follow up on non-externalized timers next week.  Bottom line is that         
I don't think it's one of the externalized timers causing you problems,         
and that's why I recommend opening the PMR so L2 can dig inside.  The           
only one I can (very¢) remotely imagine might be the TCP KEEPALIVE              
parameters, but they only come into play if PSF has not received any            
TCP response from the printer in the time allotted (currently set to            
four minutes at your customer, I believe).  I would think that even in          
your customer's heavily loaded environment, particularly if EasySpooler         
requires the ACK to be set to 1, that there's no way that four minutes          
would elapse without some sort of TCP response.  I could be wrong....           
                                                                                
                                                                               
BASIC PSF FOR AIX TIMERS:                                                       
=========================                                                       
                                                                                
CONNECTION TIMEOUT (Device Options):  This setting allows you to specify        
the number of seconds PSF for AIX initially waits for the printer to            
become available.  After waiting the specified amount of time, PSF for          
AIX will no longer attempt to connect to the printer.  You would                
modify this value if you're sharing the printer with another PSF.               
Valid values are 0-9999, with a default of 30 seconds.  0 means that            
PSF will continue to attempt to connect to the printer and never                
timeout.  (Customer's are set to the default of 30 seconds.)                    
                                                                                
NUMBER OF SECONDS BEFORE NPRO (Processing Options):  This is used               
for continuous forms printers only.  It lets you specify the number of          
seconds that a continuous-forms printer should wait for another print          
job before stacking the final pages of the previous print job.  This            
delay is called a non-process runout (NPRO). Valid values are 0-9999.           
0 means than an NPRO will not be performed (no blank pages will be              
stacked.  A value of 9999 means that an NPRO is performed if no new             
print jobs come in after 9999 seconds. (Your customer has varying               
values by printer, some 30, some 60.)                                           
                                                                                
DEVICE INTERVENTION TIMER (Tuning Options):  This allows you to specify         
how many seconds PSF for AIX will wait before it treats an intervention         
required condition (such as a paper jam) as a permanent printer error           
and marks the print queue down.  Valid values are 1-9999.  9999 is the          
default and means that PSF for AIX will never time out.  If a queue is          
marked down by this timer, you must manually restart the queue.                 
(Your customer has this set to 9999, never time out.)                           
                                                                               
JOB INTERVAL SHUTDOWN TIMER (Tuning Options):  This lets you specify            
how many seconds PSF for AIX waits after completing the jobs in its             
queue before relinquishing control of the attached printer.  Valid              
values are 1-9999.  The default is 9999, which means that PSF for AIX           
will never relinquish control of the printer.  You should set a value           
lower than 9999 when sharing a printer with another instance of PSF.            
For continuous forms printers, you should specify a larger value for            
Job Interval Shutdown than for NPRO.                                            
(Your customer has this set to 9999.)                                           
                                                                                
                                                                                
TIMERS FOR TCP/IP-ATTACHED IPDS PRINTERS:                                       
=========================================                                       
                                                                                
TCP KEEPALIVE:  PSF for AIX uses the TCP protocol and relies on TCP to         
detect when a connection with an i-data 7913 or TCP/IP-attached printer         
is no longer usable.  PSF for AIX directs TCP to poll its connection            
partner periodically when no other data is exchanged between PSF for            
AIX and its connection partner.  These periodic polls, called KEEPALIVE         
transmissions, enable TCP to discover when a connection is no longer            
usable even if the connection partner is abruptly powered off or is no          
longer accessible through the network.  If PSF for AIX does not receive         
a response from the printer during one of these periodic polls, PSF             
will mark the queue for that printer down.  Once the error condition            
that caused the loss of connection is corrected, the print operator             
must bring the queue back up.                                                   
                                                                                
Although PSF for AIX directs TCP to send KEEPALIVE transmissions, the           
frequency of these transmissions is controlled by system-wide TCP/IP            
configuration parameters.  On AIX, the default frequency is after              
roughly two hours of inactivity.  However, AIX allows the frequency             
of KEEPALIVE transmissions to be adjusted.  The frequency applies to all        
TCP applications that direct TCP to send KEEPALIVE transmissions.               
Information on how to set these parameters and recommendations for              
their values may be found in "PSF for AIX Print Administration" in the          
chapter on "Installing a TCP/IP-Attached IPDS Printer" (S544-3817-03,           
page 107) or in "IBM InfoPrint Manager for AIX: IBM InfoPrint Control           
Diagnostics Guide" in the chapter on "Diagnostic Tools for Problem              
Solving" under "Configuring the TCP KEEPALIVE Frequencies for TCP/IP-           
attached Printers" (G544-5472-00, Oct 1997, page 75).                           
(While I was at the customer, I configured these on macserv1 to the             
recommended values in the book: tcp_keepidle=480 and tcp_keepintvl=80,          
which are in half-second units, thus 4 minutes and 40 seconds,                  
respectively.  Original defaults are 14400 (two hours) and 150 (75              
seconds) respectively.)                                                        
                                                                                
The following don't apply for your customer, but since I was writing            
up timers anyway...                                                             
                                                                                
PSF DIRECT HOST RECEIVER TIMERS (smit psfdirect):                               
=================================================                               
                                                                                
INACTIVITY LIMIT:  Allows you to specify the number of seconds the              
PSF Direct host receiver waits for data from the host PSF after the             
communications session has been established.  When this limit is                
exceeded, the PSF Direct receiver ends the communications session               
with the host PSF.  You would use this value to share printers                  
between PSF Direct host receivers and other PSF for AIX processes.              
Valid values are 1-9999.  9999 is the default and means that the                
PSF Direct host receiver will wait indefinitely.  If you do not plan           
to share the printer used by this host receiver, you don't need to              
change the default Inactivity Limit.  If you plan to share the printer,         
you should set this value to other than 9999.                                   
                                                                                
PRINTER BUSY LIMIT:  Allows you to specify the number of seconds PSF            
Direct will try to connect to a printer.  A printer is "busy" if jobs           
are being printed from another PSF for AIX process or by another PSF            
Direct receiver.  PSF Direct keeps trying to connect to the printer             
for the specified number of seconds.  If this limit is exceeded, PSF            
Direct notifies the host and will no longer attempt to connect to the           
device.  You use this limit to manage printer sharing between PSF               
Direct receivers and other PSF for AIX processes.  If you plan to share         
the printer used by this PSF Direct host receiver, you may want to              
increase the Printer Busy Limit.  Valid values are 1-9999; the default          
is 120 seconds.  A value of 9999 means that PSF Direct will continue           
to try to connect to the printer indefinitely.                                  
                                                                                
A:                                                                              
I did some additional checking with my change team contact, who                 
determined which module issues that message, and the owner of that              
module. According to them, message 0421-049 is issued by module                 
ain3dtcp, which is the secondary process that manages communications            
with TCP/IP-attached IPDS printers.  This message would be issued in the        
event of either an unrecoverable communications error with the printer          
(could be a problem with the printer or in the network between the              
RS/6000 and the printer) or an IPDS protocol error.  Or, as mentioned           
previously, another cause of this error message can be an IPDS NACK             
issued when the intervention-required timer expires.                            
                                                                                
To get any additional information would require a trace to determine           
values of variables that are set when the failure occurs.  Boulder has          
recommended that the next step would be to open a PMR with Level 2; you         
should be sure and remind L2 that the customer also uses the EasyPSF            
product so they can determine the best approach for diagnosis.                  
                                                                                
I'm sorry for the delay in getting the information to you.  I hope              
this helps.                                                                     
                                                                                
S e a r c h - k e y w o r d s:                                                  
psf/6000 psf/aix psf aix timer disconnect interval timeout 0421-049             
intervention keepalive direct host receiver connection NACK ain3dtcp            
                                                                                
                                                                                
                                                                                
                                                                               


WWQA: ITEM: RTA000150611 ITEM: RTA000150611
Dated: 06/1998 Category: XPSF6000
This HTML file was generated 99/06/24~12:43:38
Comments or suggestions? Contact us