From: HENRY::IN%"GKN%sdsc-sds.arpa%sri-kl.ARPA%relay.cs.net@rca.com" 15-MAR-1987 15:27 To: STEINBERGER@sri-kl.arpa Subj: Re: Processes in a RWAST state From: Richard Steinberger Subject: Processes in a RWAST state Date: Sat 14 Mar 87 10:33:01-PST ... Is rebooting the only option to getting rid of a process in the RWAST state? If so, why does DEC allow VMS to create processes that it can't delete? Sigh. This is probably one of the things that causes the most frustration to people who do not have extensive backgrounds into VMS internals. To answer your question, yes, rebooting is sometimes the only way to get rid of a process in RWAST state. [allow me to apologize in advance for the length of this one...] Now that that's over with... a little more information on RWAST, and resource waits (what the RW stands for) in general. Resource wait is an involuntary wait state that VMS will place a process in while waiting for a resource to become available. This resource can usually be identified by the rest of the name (RWxxx) displayed by SHOW SYSTEM. Resource waits are a subset of the scheduler state SCH$C_MWAIT, which is shorthand for miscellaneous resource/MUTEX wait. A MUTEX is a mutual-exclusion semaphore, which is something VMS uses internally to protect various data structures, such as the I/O database, logical name tables, and a few other odds and ends which really aren't important in this discussion. You can tell the difference between a MUTEX wait and a resource wait by examining the event flag wait mask (PCB$L_EFWM in the PCB, JPI$_EFWM) if the process' scheduler state is SCH$C_MWAIT. If PCB$L_EFWM is negative, then it is the system-space address of the MUTEX which the process is attempting to gain access to. If it isn't, it's a small positive integer which is an indication of resource wait state the process is in. These are the known resource waits (as of VMS V4.x) and a little bit about what they mean: 1 RWAST Waiting for an AST (see below). 2 RWMBX Mailbox full. A process attempted to write more data into a mailbox than the buffer quota for that mailbox allows. 3 RWNPG Waiting for non-paged pool. 4 RWPFF Page file full. The page file to which this process is assigned is full. 5 RWPAG Waiting for paged pool. 6 RWBRK $BRKTHRU wait. 7 RWIMG Image activator interlock. 8 RWQUO A pooled quota has been exceeded. Use SDA to figure out which one. 9 RWLCK Lock ID database is full? (can anybody fill me in?). 10 RWSWP Swap file full. 11 RWMPE Waiting for the modified page write to empty the modified page list. 12 RWMPB Waiting for the modified page writer (to do something, but I'm not sure what. Can anybody fill me in?). 13 RWSCS Waiting for a systems communications services (cluster) event. 14 RWCLU Waiting for a cluster transition. RWAST is sort of the catch-all resource wait state. Routines inside VMS will place a process into RWAST hoping that the next kernel mode AST queued to the process will have called SCH$RAVAIL to report resource avaialability for the resource in question. By far the most popular reason to get placed in RWAST is a lack of a non-pooled quota, typically BYTLM, BIOLM, DIOLM or ASTLM (this is not an exhaustive list, but you get the idea). The BYTLM quota as it comes "out of the box" from DEC (4096 bytes) in the default account is so pitifully small that you can't even use DECnet effectively. For normal users I grant a BYTLM quota of 24000 bytes. You can tell if this is happening by using SDA on the running system to examine the PCB (the SDA SHOW PROCESS/INDEX=nn/PCB command) for the process that's stuck in RWAST. If some of the non-pooled quotas have zeroes for the "count" portion (really, the amount remaining) then there's your problem. Another popular reason for processes getting stuck in RWAST is a hangup in last-channel deassign. You can tell if this is happening if you look at the process in question with SDA and see EXE$DASSGN+6D or so floating near the top of that process' stack (use the SDA SHOW STACK/INDEX=nnn command). In this case R6 points to a data structure called a CCB (channel control block) which will generally have outstanding I/O that VMS is waiting to have complete before it completely deassigns the I/O channel. If a process is stuck for this reason you can sometimes unstick it by jostling whatever I/O device is involved (you can figure out which by deciphering the CCB, or using the SDA SHOW PROCESS/CHANNELS/INDEX=nnn command). This is not an exhaustive list of why processes get placed in RWAST, it's just two cases I see a fair amount. There are probably dozens of other scenarios. Why can't you kill a process in RWAST, you ask? Well, VMS put the process in that state to wait for a given resource. To delete that process before the resource comes available could leave the system in an inconsistent state, or cause system data structures to be corrupted. The actual mechanics behind it are such that the special kernel mode AST to delete the process remains queued until some process in VMS calls SCH$RAVAIL with the appropriate arguments to knock the process out of resource wait, and which time the process will be deleted. It is possible to break a process out of a quota related resource wait by writing a little program to go patch the PCB and quota in question and call SCH$RAVAIL, but it's definitely not for the novice (and probably there are real reasons why this shouldn't be done, either, but I've successfully done it). Could someone from DEC (or anyplace else, for that matter) please comment on the interpretations of RWxxx above if I'm completely off base on some of them? This information could really be helpful if more people had it. gkn -------------------------------------- Arpa: GKN@SDSC.ARPA Bitnet: GKN@SDSC Span: SDSC::GKN (5.600) USPS: Gerard K. Newman San Diego Supercomputer Center P.O. Box 85608 San Diego, CA 92138 AT&T: 619.534.5076 -------