Article 125326 of comp.os.vms: Path: nntpd.lkg.dec.com!crl.dec.com!crl.dec.com!caen!zip.eecs.umich.edu!panix!news.mathworks.com!news.kei.com!simtel!news.sprintlink.net!howland.reston.ans.net!vixen.cso.uiuc.edu!uchinews!ncar!newshost.lanl.gov!news.ttu.edu!news From: csjwp@msmail.ttuhsc.edu (Joe Pizzi) Newsgroups: vmsnet.alpha,comp.os.vms Subject: Re: Process hangs in LEF on Alpha Date: Wed, 19 Jul 1995 18:42:56 GMT Organization: Texas Tech University Health Sciences Center Lines: 144 Message-ID: <3ujju7$6gl@hydra.acs.ttu.edu> References: <3u3rkd$ij5@fnnews.fnal.gov> <3u67n5$ngf@nntpd.lkg.dec.com> NNTP-Posting-Host: cs010.lubb.ttuhsc.edu X-Newsreader: Forte Free Agent 1.0.82 Xref: nntpd.lkg.dec.com vmsnet.alpha:3041 comp.os.vms:125326 hoffman@xdelta.enet.dec.com (Stephen Hoffman) wrote: >In article <3u3rkd$ij5@fnnews.fnal.gov>, mako@fnalv1.fnal.gov (Makoto Shimojima, Univ of Tsukuba/CDF) writes: >:We have been experiencing a strange problem with a VMScluster consisting >:of three AlphaServer 2100 4/200s (all running OVMS V6.1-1H2 on EISA RAID >:disks). Two nodes run four CPUs each and the other one two CPUs, all with >:lots of memory to waste (640MBx2 + 128MB). The CPU utilisation is typically >:10 x 100%. >: >:A process (interactive or batch) suddenly hangs with no apparent reason, >:or so I am told. Issuing Control-T works but control-Y does not interrupt >:the process. The process is in LEF; when someone attempts to STOP it, it >:changes to RWAST and stays there for hours if not longer. >: >:So far I was notified four times in the last two or three weeks. We have >:had no problems before that --- we have them running since Jan/95, btw. >:The local DEC field engineers or TSC (in Japan) cannot give me anything >:helpful even after I crashed the node. I may be off the mark here, but this sounds an awful like the problem addressed by ECO AXPSYS06_061. I missed the start of the thread, but read on... ECO NUMBER: AXPSYS06_061 ----------- PRODUCT: OPENVMS AXP OPERATING SYSTEM 6.1 -------- UPDATED PRODUCT: OPENVMS AXP OPERATING SYSTEM 6.1 ---------------- APPRX BLCK SIZE: 810 ---------------- COVER LETTER 1 KIT NAME: AXPSYS06_061 2 KIT DESCRIPTION: 2.1 Version(s) of OpenVMS to which this kit may be applied: OpenVMS AXP V6.1, V6.1-1H1, V6.1-1H2 2.2 Kits superseded by this kit: AXPSYS04_061 2.3 Files patched or replaced: o [SYS$LDR]PROCESS_MANAGEMENT.EXE (new image) o [SYSLIB]SYS$SSISHR.EXE (new image) o [SYSLIB]SYS$PUBLIC_VECTORS.EXE (new image) 3 PROBLEMS ADDRESSED IN AXPSYS06_061 KIT o The PROCESS_MANAGEMENT.EXE image included in remedial kit AXPSYS04_061 did not fix the problems listed. 4 PROBLEMS ADDRESSED IN AXPSYS04_061 KIT FOR OPENVMS ALPHA V6.1 o A process is hung in LEF state, with an outstanding I/O operation and a kernel mode AST queued to the process. Kernel mode ASTs are disabled. 5 PROBLEMS ADDRESSED IN AXPSYS02_061 KIT FOR OPENVMS ALPHA V6.1 o Applications that call the wait form of system services cause a hang. The cause is that the service has finished asynchronously but the application is still waiting in LEF. ¨ -- COVER LETTER -- Page 2 31 January 1995 The problem has been found to be generally applicable to all system services that have a wait form counterpart. 6 PROBLEMS ADDRESSED IN AXPSYS01_061 KIT FOR OPENVMS ALPHA V6.1 o On systems running OpenVMS AXP 6.1, when a process with a non-zero CPU limit SPAWNs one or more subprocesses, each SPAWN halves the CPU limit (as expected), but on returning from the SPAWN the unused CPU time isn't credited back to the parent process. After some number of successful SPAWNs the process will receive a %SYSTEM-F-EXCPUTIME error and will not allow the user to continue. The user will have to log out and log back in to start another process. 7 INSTALLATION INSTRUCTIONS: Install this kit with the VMSINSTAL utility by logging into the SYSTEM account, and typing the following at the DCL prompt: @SYS$UPDATE:VMSINSTAL AXPSYS06_061 [location of the saveset] The saveset location may be a tape drive, or a disk directory that contains the kit saveset. System should be rebooted after successful installation of the kit. If you have other nodes in your VMScluster, they should also be rebooted in order to make use of the new image(s). Copyright Digital Equipment Corporation 1995. All Rights reserved. This software is proprietary to and embodies the confidential technology of Digital Equipment Corporation. Possession, use, or copying of this software and media is authorized only pursuant to a valid written license from Digital or an authorized sublicensor. This ECO has not been through an exhaustive field test process. Due to the experimental stage of this ECO/workaround, Digital makes no representations regarding its use or performance. The customer shall have the sole responsibility for adequate protection and back-up data used in conjunction with this ECO/workaround. Joe Pizzi --Just because you're paranoid doesn't mean they're *not* out to get you.