MWAIT.MAR Author : Nick de Smith NICK@NCDLAB.ULCC.AC.UK Creation date : 09-Dec-91 Edit Edit date By Why 06 07-Feb-92 NMdS More cosmetic changes. Modify PC resolution for RWAST processes to include all loaded device drivers. 05 14-Jan-92 NMdS Add in lots of detail on process and AST status, especially for JIB related waits. Some minor corrections and additions to the text. Make code work for all versions of VMS V5.x. 04 09-Jan-92 NMdS Add in finer detail for RWAST messages to save time with SDA. 03 08-Jan-92 NMdS Add in RWMBX example, messages and documentation. Add in more detail on general resource waits. 02 23-Dec-91 NMdS Add in extra detail to reflect changes in main code. Enhance messaging considerably. Update examples. 01 09-Dec-91 NMdS New. Generated from the VMS listings CDs and *much* other research. Also input from the Internals and Data Structures (V5.2) manual and from various DEC CSC/DECtel articles. This version of MWAIT (1.06) uses no DEC copyright code. MWAIT is a utility that attempts to determine the reason for a process going into a wait state. Detail is given about LEF/CEF wait state, including the names of common event blocks being used. The length of time that the process has been in a wait state is also displayed, as is a detailed breakdown of MWAIT reasons, in particular RWAST, RWMBX and job quota waits. You will need VMS V5.x and CMEXEC to run MWAIT. MWAIT is written in MACRO32 and requires no layered products. MWAIT is an organic piece of code, and is growing as new methods of examining this problem are found. Any comments regarding the usefulness or otherwise of MWAIT would be appreciated. Comments regarding accuracy of reporting and suggestions for enhancements are especially welcome. MWAIT's documentation details all the known VMS wait states and in many cases suggests methods to correct the problem (if, indeed, there is a problem). If a process is in MWAIT, MWAIT checks the reason in PCB$L_EFWM. If this is standard (non-mutex) resource wait state, the reason is given. If its a mutex wait, the name of the mutex is determined and displayed. If the wait is job quota related, the quotas are checked for exhaustion. If the detail of the MWAIT state cannot be determined, "MWAIT?" is displayed and the wait mask will contain the address of the unknown mutex. If the state is RWAST, then MWAIT attempts to determine the system routine that caused the wait, and displays all the associated causes. If the process header of the target process is resident, the registers are displayed and MWAIT will attempt to resolve the PC as a system routine and will display a message to enable you to precisely identify the cause of the problem. You may need to use SDA to get extra fine detail, but MWAIT does all the donkey work. MWAIT's PC resolution attempts to locate executive code threads from system service and base image offsets by using the change mode vectors. This method may be (is) dependent on VMS versions as the two offsets used may change. As a result, you should check with each release of VMS that MWAIT is giving sensible results. If this is a problem for you, contact me and ask. Note that MWAIT checks that the data it is using 'looks like' the correct vectors, and will report if there is a mis-match. If the state is RWMBX, then MWAIT will attempt to determine the device causing the problem, together with the size of the write request that is blocking the process and the device's buffer quotas and channel. If the process is in a job pooled quota related wait, MWAIT will determine the quota that is causing the problem. MWAIT does not use $GETJPI etc. to return information on processes - it goes straight to the executive data structures without using any locking. This method ensures that information can be returned whatever the state of the target process, that it is 99.9% accurate, and gathered in an non-intrusive manner. This also means that there is a finite (albeit very small) possibility of MWAIT getting an ACCVIO due to a data structure or pointer changing while being inspected. This is *not* a problem - it will not harm your system so just re-run MWAIT. Usage ----- $ @BUILD ! Build MWAIT from sources $ MWAIT == "$dev:[dir]MWAIT" $ MWAIT pid or $ MWAIT _Target PID: pid ... $ If the "pid" is specified as 0 or omitted, all processes are displayed. Typical MWAIT output from a normal process ------------------------------------------ $ MWAIT 20A0008E Pid User name Process name Prior State Port 20A0008E SYSTEM IPCACP 10/8 Hib -Det- Wait mask: 7FFFFFFF, Cluster: 0, Time: 0 00:00:04.00 0: E0000001, 1: 00000000 2: 00000000 ([SYSTEM]IPCACP_FLAGS) Mode: Current: KERNEL, Previous: USER AST status: Active: (none), Enabled: KESU $! Field Use ------------- --------------- Pid Extended Process ID of the process being inspected User name VMS username of process Process name VMS process name of process Prior Current/Base priority of process State Process state - see below Port Control device for process (LTAn:/Det/Sub/Net/Bat etc) Wait mask If the process state is LEF,LEFO or CEF, the wait mask has one bit SET for each event flag being waited for. If the text "(all EFNs)" appears, then ALL the EFNs in the mask must be set for the wait condition to be satisfied, else the setting of ANY of the EFNs will satisfy the condition. If the process is in an miscellaneous or JIB related wait state, then the wait mask contains detail of the state. The only case (excluding a LEF or CEF state) where the wait mask is useful is when the text "MWAIT?" is given for the state. In this case the wait mask contains the address of an unrecognised system mutex. Cluster The event flag cluster number that the "wait mask" refers to. A value of "0" refers to event flags in the range 0..31, "1" refers to 32..63, "2" to 64..95 and "3" to 96..127. EFNs in the range 64..127 are found in "Common event blocks" - see below. Time VMS delta time since the process last went into a wait state. This time is reset after every process state transition. 0: xxxxxxxx Contents of event flag cluster 0 (EFNs 0..31) 1: xxxxxxxx '' '' '' '' '' 1 (EFNs 32..63) 2: xxxxxxxx ([uic]name) Optional, only displayed when valid '' '' '' '' '' 2 (EFNs 64..95) 3: xxxxxxxx ([uic]name) Optional, only displayed when valid '' '' '' '' '' 3 (EFNs 96..127) The "([uic]name) field is only displayed for EFN clusters 2 and 3, and it reflectes the name of the associated common event flag block, together with its group qualification. Mode The current and previous processor modes of the process. AST status A list of those AST levels that are enabled, and which are currently active. If at least one AST is waiting for delivery, the text "AST(s) waiting" is displayed. eg. from this you could determine if a process was locked in LEF at AST level, blocking other activity. Typical output from a process in RWAST -------------------------------------- $ MWAIT 20A00452 Pid User name Process name Prior State Port 20A00452 NICK NICK_1 6/4 RWAST -Sub- Wait mask: 00000001, Cluster: 0, Time: 0 00:00:00.08 0: E0000001, 1: 00000000 Mode: Current: KERNEL, Previous: USER AST status: Active: (none), Enabled: KESU Process resource wait is ENABLED. Process has no BIO left (BIOCNT zero). Process has 100. outstanding BIO operations (BIOCNT: 0., BIOLM: 100.) Process has 2. outstanding ASTS active (ASTCNT: 98. ASTLM: 100.) Process registers: R0: 00000001 R1: 00000001 R2: 80408F5A R3: 00000010 R4: 80408F20 R5: 8037C470 R6: 7FFC9F90 R7: 00000030 R8: 80138148 R9: FFA00001 R10: 00000030 R11: 00000003 AP: 7FF15928 FP: 7FFE77E4 PC: 801DE68E PSL: 00C00001 Quota: R1 = quantity, R2 = address, R3 = width(bits). (Requested: 1. unit, Available: 0. units) Quota wait on: Buffered I/Os remaining count (BIOCNT: 0.) $ This example shows that the process is in RWAST because it has run out of quota for concurrent buffered I/O operations. The PC (801DE68E) has been identified by MWAIT as lying in a VMS executive thread that checks for quota availability. The text says that R1 (1 unit) is the quantity required, R2 points to the location where the available quota is stored, and R3 is the width in bits of the quota field (16 bits in this example). MWAIT has identified the address in R2 as being the PCB$W_BIOCNT field for the process under inspection confirming that exhausted BIOCNT was the cause of the RWAST. If no "Quota wait on..." text is displayed, we would have to use SDA to determine the name of the quota. This example was generated by TEST0.C. Typical output from a process in RWMBX -------------------------------------- $ MWAIT 20200534 Pid User name Process name Prior State Port 20200534 NICK NICK_1 6/4 RWMBX -Sub- Wait mask: 00000002, Cluster: 0, Time: 0 00:00:00.43 0: E0000000, 1: 00000000 Mode: Current: USER, Previous: USER AST status: Active: U, Enabled: KESU, AST(s) waiting Process resource wait is ENABLED. Attempting to write 8. bytes to MBA2454: on channel 0060 (hex) Messages: 6., Quota (bytes): Initial = 50., Remaining = 2. This process is in RSN$_MAILBOX (value "2", also in the "wait mask" field). The process was placed into RWMBX because it tried to write 8. bytes to mailbox MBA2454: which has 2. bytes of buffer space left. The mailbox's initial buffer quota was 50. bytes. There are currently 6 messages already in the mailbox waiting to be read (excluding the one this process is trying to write). The write is being carried out on channel 60 (hex) in a user mode AST routine (there is a user mode AST active) and at least one AST is waiting for delivery (and is currently blocked). This example was generated by TEST4.C. Typical output from processes in a Job Quota wait ------------------------------------------------- $ MWAIT 202000CD Pid User name Process name Prior State Port 202000CD NICK NICK_1 6/4 JITQE -Sub- Wait mask: 8097EB10, Cluster: 0, Time: 0 00:00:00.35 0: E0000000, 1: 00000000 Mode: Current: USER, Previous: USER AST status: Active: (none), Enabled: KESU Job timer count exhausted (TQCNT: 0., TQLM: 200.) $ This process is in a job pooled quota wait for timer queue entries. Increase the UAF TQELM parameter or modify the code to prevent this happening. Note that running out of a job pooled quota can severely effect the performance of all other jobs in the job tree. The Wait mask contains the address of the JIB for the process under inspection. This example was generated by TEST1.C. $ MWAIT 202000E2 Pid User name Process name Prior State Port 202000E2 NICK NICK_1 4/4 JIBYT -Sub- Wait mask: 80D78190, Cluster: 0, Time: 0 00:01:49.44 0: E0000000, 1: 00000000 Mode: Current: KERNEL, Previous: KERNEL AST status: Active: (none), Enabled: KESU Job buffered I/O count exhausted (Requested: 2080. bytes, Available: 1632. bytes, Quota: 39424.) The process is waiting for buffered I/O byte count quota. The wait mask contains the address of the JIB for the process. The process has requested 2080. bytes of quota when there are only 1632. bytes left. The initial quota was 39424. bytes. The process will remain in this state until enough quota is available, or the process is deleted. This example was generated by TEST5.C. Nick de Smith NICK@NCDLAB.ULCC.AC.UK 1. Table of standard wait states -------------------------------- Text Description ------- ----------- ????? Unrecognised process state. Should never happen. Colpg Collided page wait. More than 1 process has referenced the same page not in physical memory. The first such process goes into PFW (see below) and subsequent processes go into COLPG. Mwait Mutual exclusion wait. The process is waiting for a depleted resource, a job quota, or a locked mutex. The program decodes (to the best of its ability) the reason for the MWAIT. See elsewhere in this document. Cef Common event flag wait. The process is waiting on for event flags in clusters 2 (64-95) or 3 (96-127). The program displays the name of any associated common event flag clusters, and their values. See LEF below. Pfw Page fault wait. Process has referenced a page not in physical memory, and must wait until the page has been read in. Process deletion, AST delivery and a successful read of the page will place the process into COM or COMO. Lef Local event flag wait. The process is waiting for one or more local event flags to be set. The program displays the event flag cluster number - 0 (0-31) or 1 (32-63). If more than one EFN is being waited for, the process is waiting for any of the EFNs to be set, unless the "All EFNs" text appears, in which case all the specified EFNs must be set for the wait to complete. Lefo Local evet flag wait (outswapped). Hib Hibernating. The process has used $HIBER. Use of $WAKE or $SCHDWK will restart the process. Hibo Hibernating (outswapped). Susp Suspended. The process has used, or been subject to, $SUSPND. Use of $RESUME will restart the process. Suspo Suspended (outswapped). Fpg Free page wait. Process has requested a physical page to be added to its working set, and no pages are on the free page list. When a page is available, the process becomes COM or COMO. Com Compute wait. The process wants to use a CPU, but none is currently available. Como Compute wait (outswapped). Cur The current process. 2. Table of system resource wait states --------------------------------------- Text Description ------- ----------- RW??? Unknown resource wait state. Should never occur. RWAST Any number of things! See below. Note that some RWAST states, particularly those related to devices such as TK50s, can be caused by hardware/microcode faults on the drive or controller, and may only be cleared by a system re-boot/power off or by calling an engineer. There is NO magic bullet for all RWAST situations. RWMBX Mailbox full. The process has tried to write to a mailbox that is full or has insufficient buffer space. See below for messages that detail the cause of this wait state. RWNPG Non-paged dynamic memory. Process was unsuccessful in allocating some NPP space. Should be a very rare wait. RWPAG Paged dynamic memory. Process was unsuccessful in allocating some paged pool. This can be generated by XQP. Use the SDA command "EXAMINE @AP+4" to return the number of bytes being requested. DEC recommend that you keep 40% of paged dynamic memory free. The SYSGEN parameter PAGEDYN (amongst others) controls this pool. RWMPE Modified page list empty. Process is waiting for the modified page writer to signal that it has flushed the modified page list. OPCRASH does this to wait prior to stopping the system. RWMPB Modified page writer busy. The process has faulted a modified page out of its working set, and either a) The modified page list already contains more than MPW_WAITLIMIT, or b) The modified page list contains more that MPW_LOWLIMIT pages and the modified page writer is busy. Processes should not remain in RWMPB very long. If they do, it may be that a page file has become full, or that the paging disk is extremely busy or has gone into mount verification. RWSCS Distributed lock manager wait. The process is waiting for a response from a remote cluster node that has information about a particular lock. RWCLU Cluster transition. The process has requested a lock on a node that is in transition (being added or removed from the cluster). The process will remain in this state until the cluster stabilises. RWCAP CPU capability wait. The process in computable, and has requested specific CPU capabilities that its current CPU cannot offer. The process is rescheduled to run on a CPU that has the right SMP characteristics. This is used when cluster quorum has been lost. RWCSV Cluster server process. The limit of outstanding requests from one cluster member to another's server process has been reached, and this process has also requested a service of that node. It may also be that the CLUSTER_SERVER process has problems. This process should be in a HIB state - if it isn't then that may be the cause of the problem. If the CLUSTER_SERVER process is not running, use the commands: "@SYS$SYSTEM:STARTUP CSP" to (re)start the server. SNPFU Snapshot rollout/rollin. Should never happen on a normal VMS system. RWPFF Page file full (*). RWBRK Breakthrough (*). RWIMG Image activation lock (*). RWQUO Job Pooled quota (*). RWLCK Lock identifier (*). RWSWP Swap file space (*). (*) Means "Not currently used". 2.1 RWAST causes ---------------- When an RWAST condition is detected, MWAIT attempts to determine the cause. There are only a few threads in VMS that can cause an RWAST. One or more of the following messages may be displayed. If the process header is resident, then a register dump is given. Subsequent messages may refer to these register values. Note: When a process is in RWAST it can be very unwise to use SDA to try to examine memory in that process's address space as the process running SDA may also hang. For this reason, MWAIT does not read any data from the target process's P0 or P1 space - only executive data is used. Be particularly careful where MWAIT gives messages relating to channels - try not to use SDA SHOW PROCESS/CHANNELS or SDA FORMAT CCB etc on a process in RWAST. a. "Process resource wait is DISABLED" "Process resource wait is ENABLED" "Process is marked for deletion" These are information only messages about the state of the process. The setting of the process's resource wait state does not normally affect an RWAST condition, except in a very few cases (see 'j' and 'k'). b. "Process has no DIO left (DIOCNT zero)" The process needs to perform a direct I/O operation, and no quota is left for it. The process may remain in RWAST until one of the outstanding direct I/O operations completes. Increase the UAF parameter DIOLM and/or the SYSGEN parameter PQL_DDIOLM. c. "Process has no BIO left (BIOCNT zero)" The process needs to perform a buffered I/O operation, and no quota is left for it. The process may remain in RWAST until one of the outstanding buffered I/O operations completes. Increase the UAF parameter BIOLM and/or the SYSGEN parameter PQL_DBIOLM. d. "Process has 'a'. outstanding DIO operations (DIOCNT: 'c'., DIOLM: 'l'.) "Process has 'a'. outstanding BIO operations (BIOCNT: 'c'., BIOLM: 'l'.) An operation has been requested that requires that all outstanding I/O is completed first. The process may remain in RWAST untill all I/O has completed. There are 'a' operations of the specified type still outstanding. A maximim of 'c' others could be made active at the same time up to a total of 'l'. e. "Process has 'l'. live sub-processes" Process deletion cannot be completed until all sub-processes owned by this process have successfully terminated. f. "Process has 'a'. outstanding XQP events" Process deletion and suspension require that all outstanding XQP activity is completed first. PCB$B_DPC contains a count of events that can stop a delete or suspension. Currently its only used by XQP and is incremented at the start and decremented at the end of an operation. g. "Process has no AST entries left (ASTCNT zero)" The process needs an AST, and no quota is available. Increase the UAF parameter ASTLM and/or the SYSGEN parameter PQL_DASTLM. h. "Process has 'n'. outstanding ASTS active (ASTCNT: 'c'. ASTLM: 'l'.) The process has 'n' outstanding ASTs waiting for delivery. A total of 'c' ASTs can still be allocated and the maximum allowed active at any one time is 'l'. i. "$DASSGN while channel busy: R5 = channel, R6 = CCB" (Channel: 'c') A channel deassign cannot complete until operations on that channel have completed. If the device is network related, ie. is an RTAn: or NETu: etc. device, you can often use NCP SHOW KNOW LINKS to identify the DECnet link associated with the device, and then use NCP DISCONNECT LINK 'n' to delete the link and thus the errant device. This often also clears the RWAST condition. Activity on a channel is indicated by CCB$W_IOC being non-zero. You can use SDA on the process to identify the detail on the CCB pointed to by R6 (or use the channel number in R5 or from "c") using the command "SDA> SHOW PROCESS/IDENT=pid/CHANNEL". The number of outstanding operations is given by the SDA commands: SDA> SET PROCESS /IDENT=pid SDA> READ SYS$SYSTEM:SYSDEF SDA> EXAMINE @R6+CCB$W_IOC The count is then the low 16 bits of the displayed value. Note the caveat above about using SDA on processes in RWAST. The PC is in EXE$DASSGN. j. "Quota: R1 = quantity, R2 = address, R3 = width(bits) (Requested: 'r'. units, Available: 'c'. units)" [other text] The process has run out of a quota. R1 contains the number of units of quota that are required, R2 points to the quota field in system memory, and R3 contains the field width in bits. The process will remain in RWAST until the quota request is satisfied. A previous message will detail which quota is exhausted, and you can use SDA to confirm the cause (see the example above). If previous messages imply that more than one quota has been exhausted and no "[other text]" message was displayed identifying the major culprit, use the SDA commands: SDA> SET PROCESS/IDENT=pid SDA> READ SYS$SYSTEM:SYSDEF SDA> SHOW PROCESS ! Note the PCB, PHD and JIB addresses SDA> FORMAT pcb_address ! ...and jib_address and phd_address Note which of the format commands displays an address that matches the contents of R2 (from the registers display), and that will be the exhausted quota that is causing the wait. The RWAST has been caused by a request for 'r' units of quota when the process only has 'c' units left. Using "SET PROCESS/NORESOURCE_WAIT" will prevent this RWAST state, and cause an error to be returned to the process instead. The PC is in EXE$SNGLEQUOTA or EXE$MULTIQUOTA. k. "$CANCEL needs BIOCNT: R5 = UCB, R6 = CCB, R7 = channel (Device: devu:, Channel: 'c')" A cancel operation on a channel needs a buffered I/O operation, and that quota is exhausted. The channel is given by "c" and the device name by "devu:". Using "SET PROCESS/NORESOURCE_WAIT" will prevent this RWAST state, and cause an error to be returned to the process instead. The PC is in EXE$CANCEL. l. "$DELPRC subprocesses active: R4 = PCB" To delete a process, all its subprocesses must also be deleted. The process will remain in RWAST until all its subprocesses have terminated. The count of subprocesses is in the field PCB$W_PRCCNT. You can use SDA to inspect this field using the PCB address from R4, however this value is also displayed by the message in 'e' above. The PC is in EXE$DELPRC. Use SDA or a DCL procedure to locate the processes owned by the hung process (the PCB$L_OWNER field should contain the EPID of the hung process). m. "$DELPRC XQP still active: R4 = PCB" To delete a process, all XQP activity must have ceased. The process will remain in RWAST until PCB$B_DPC becomes zero. See 'f' above. The PC is in EXE$DELPRC. n. "$SUSPND XQP still active: R4 = PCB" A process cannot be suspended until all XQP activity has completed. See "f" above. The PC is in EXE$SUSPND. o. "EXE$QIO access/deaccess pending: R4 = PCB, R5 = UCB, R7 = CCB (Device: devu:)" A $QIO request has been made to a channel that has a pending access or deaccess request to the device called "devu:". This must be allowed to complete before retrying the I/O operation. The PC is in EXE$QIO. p. "$PROCESS_SCAN context busy" The context block required by EXE$PSCAN_LOCKCTX is busy. The process will wait until its free. q. "DIO active when PFN mapped page deleted: R2 = VA, R3 = SAVPTE, R4 = PCB" Some DIO was outstanding when a deleting a page mapped by PFN. The process will wait until all direct I/O has completed. The PC is in MMG$DELPAG. r. "Process header is not resident" As the process header is not resident in physical memory, the process's registers and AST quota are not available. Therefore no analysis of the executive thread causing the RWAST is possible. s. "PC is at nnDRIVER+xxxx" The PC does not lie in a VMS loadable executive image, but in the named driver at the specified offset. Certain drivers, such as LTDRIVER, use the ASTWAIT state to stall processes while events complete. It is beyond the scope of this program to document every ASTWAIT occurrance in every driver as some drivers on your system may not be supplied by DEC. You must use your judgement in this case. t. "Can't calculate reason from PC" This message is displayed when either the change mode vectors could not be located (see below) and the PC was in one of the recognised threads, or when the PC is in a totally unrecognised location. I suggest you use SDA to locate the PC in these cases, or contact DEC if MWAIT has not already provided enough information. u. "%MWAIT-W, CHMK vector not found" The change mode to kernel vector was not found, and therefore no PC resolution on executive threads that lie in system services that operate in kernel mode can be carried out. The message implies that you are using incompatable versions of VMS and MWAIT. Contact the author. v. "%MWAIT-W, CHME vector not found" The change mode to executive vector was not found, and therefore no PC resolution on executive threads that lie in system services that operate in executive mode can be carried out. The message implies that you are using incompatable versions of VMS and MWAIT. Contact the author. w. "%MWAIT-W, Failed to locate image EXCEPTION.EXE" This should never happen. It means that the executive loadable image "EXCEPTION.EXE" could not be located in the list of loaded images. If this happens, there is something really strange with your system. x. "%MWAIT-W, Illegal vector (not known format) xxxxxxxx: 'text'" A vector was not in one of the known formats, and therefore its real PC (in the executive) could not be resolved. The 'text' string will identify the vector entry that was invalid, and the hexadecimal number 'xxxxxxxx' is the contents of the unrecognised entry. Contact the author. 2.2 RWMBX causes ---------------- RWMBX is generally effected by the setting of the process resource wait mode. Disabling resource wait mode will cause $QIOs to return an error message rather than going into RWMBX. a. "Process resource wait is DISABLED" "Process resource wait is ENABLED" "Process is marked for deletion" These are information only messages about the state of the process. b. "Attempt to write w. bytes to devu: on channel c Messages: m., Quota (bytes): Initial: q., Remaining: r." The process has gone into RWMBX due to an attempt to write "w" bytes to the device "devu:" on channel number "c". The device already has "m" messages in it waiting to be read (excluding this write attempt). The device was created with an initial buffer quota of "q" bytes, of which "r" bytes remain. "r" should be less than "w" for this wait state to occur. The process will remain in this state until sufficient space is made in the device (at least "w" bytes). Note that the device need not be a mailbox - it could be any device that "looks like" a mailbox to VMS. 3. Table of JIB wait states --------------------------- A JIB wait state is represented by the process state being MWAIT, and the EFN wait mask containing the address ofthe process's JIB. The location JIB$B_FLAGS then contains a number of bits representing different reasons for the wait. JIB waits are generally effected by the setting of the process resource wait mode. Disabling resource wait mode will cause $QIOs to return an error message rather than going into an MWAIT state. Text Description ------- ----------- J???? Unknown JIB wait state JIBYT At least one process in the job tree is waiting for BYTCNT quota JITQE '' '' '' '' '' '' '' '' '' '' '' TQELM '' JBYTQ Both BYTCNT and TQELM quotas are both exhausted. At least one process in the job tree is waiting for each quota. A number of messages may be displayed if the process is in a JIB resource related wait in an attempt to further isolate the problem. JIB waits are for resources that are shared by ALL processes in a job tree. a. "Process resource wait is DISABLED" "Process resource wait is ENABLED" "Process is marked for deletion" These are information only messages about the state of the process. b. "Job buffered I/O count exhausted (Requested: 'r'. bytes, Available: 'c'. bytes, Quota: 'q'.)" A request has been made for 'r' bytes of buffered byte count quota. There is a byte count quota of 'c' bytes remaining (BYTCNT) from an initial quota value of 'q' (BYTLM). The value 'c' is not enough to satisfy the request being made, and the process has been placed into a JIB wait until more quota is available. In some cases the amount of quota requested, 'r', may not be determinable and thus will not be displayed. The UAF parameter BYTLM and/or the SYSGEN parameters PQL_DBYTLM/PQL_MBYTLM need to be increased. c. "Job open file count exhausted (FILCNT zero)" The UAF parameter FILLM and/or the SYSGEN parameters PQL_DFILLM/PQL_MFILLM need to be increased. d. "Job timer count exhausted (TQCNT: 'c'., TQLM: 'q'.)" There are 'c' timers left for this job (should be 0 here) from an initial quota of 'q'. The UAF parameter TQELM and/or the SYSGEN parameters PQL_DTQELM/PQL_MTQELM need to be increased. e. "Job page file space exhausted (PGFLCNT zero)" The UAF parameter PGFLQUOTA and/or the SYSGEN parameters PQL_DPGFLQUOTA/PQL_MPGFLQUOTA need to be increased. 4. Table of known system MUTEXs. ------------------------------- A system MUTEX is represented by the process state being RSN$_MWAIT, and the EFN wait mask containing the address of the mutex. MWAIT will identify the the mutex from the following table which is taken from table 8.3 in the V5.2 Internals an Data Structures Manual (IDSM) and other sources. You will need to use the IDSM to determine the precise cause of any of these wait states. Text Mutex address Description ------- ------------- ----------- MLogNm LNM$AL_MUTEX Shared logical name structures. MIOdb IOC$GL_MUTEX I/O database access. MCEB EXE$GL_CEBMTX Common event block list. Used by the CEB handling code to synchronise access to the CEB listhead, SCH$GQ_CEBHD and SCH$GW_CEBCNT. MSMP SMP$GL_CPU_MUTEX SMP access. MPgDyn EXE$GL_PGDYNMTX Paged dynamic memory list. MGSD EXE$GL_GSDMTX Global section descriptor list. MACL EXE$GL_ACLMTX (*) MEnq EXE$GL_ENQMTX (*) MShMGS EXE$GL_SHMGSMTX Shared memory global section descriptor table. MShMMb EXE$GL_SHMMBMTX Shared memory mailbox descriptor table. MCIA CIA$GL_MUTEX System intruder lists. MBsVMS EXE$GL_BASIMGMTX Loadable executive image data structures. Used by LDR$LOAD_IMAGE to synchronise access to the the executive loadable image lists for loading, unloading and initialisation. MQMan QMAN$GL_MUTEX Queue manager. I can't find anywhere that uses this mutex. MWAIT? Unknown MWAIT state. The address of the MUTEX is given in the "Wait mask" field. (*) Means "Not currently used". [end]