Ask the Wizard Questions how NOT to propagate a lock to another node? The Question is: Sir / Madam 10 July 1996 This is the SECOND time I have submitted this question, the first was about four weeks ago. If you replied, I never received your e-mail How can you persuade the lock manager NOT to propigate a lock to another node in the cluster? Why, in theory, I would expect to see a performance increase, maybe tiny, but nevertheless, an increase, and with many of our processes, doing tens of thousands lock conversions, on system that is already slow at times, any increase would we welcome. I know that some VMS processes take out a sublock of the SYS$SYS_IDnnnn resource, where nnnn is the SCSSYSTEMID for the node, however, the parent lock is at EXECUTIVE mode. Would a duplicate lock at user mode produce the same result ? If not, what is it that the lock manager recognises/realises, when it creates this lock ? Any other way to tell the lock manager NOT to propigate my locks ? thanks P.S. Any unsupported way would do as well The Answer is: Are you talking about a specific lock, or all locks? Lock migration is controlled by the LOCKDIRWT and PE1 SYSGEN parameters. The intention is to ensure that a lock will migrate to cluster nodes which are using the lock the most. In general this should improve application performance. You can prevent migration of all locks, prevent them from migrating to specific nodes or prevent the migration of lock trees over a specific size. Normally the only reason you would want to supress remastering is to prevent large lock trees from "thrashing" from node to node in a cluster where the workload moves from node to node. I'm not aware of a method, supported or not, which will mark a specific resource as "do not migrate" (apart from the obvious hack of setting PE1 to a reasonably high value and deliberately loading a lock tree with sufficient dummy locks to exceed that size). Moving all processing of a particular resource to a single node will generally ensure that the locks stay put. If you suspect your application code is I suggest you examine your locks using SDA to determine where they are mastered. Also use MONITOR DLOCK to determine if there really is a problem with lock traffic. Appended are DSNlink articles which describe lock remastering and tracing locks. [OpenVMS] Explanation Of Dynamic Lock Re-Mastering And PE1 Parameter COPYRIGHT (c) 1988, 1993 by Digital Equipment Corporation. ALL RIGHTS RESERVED. No distribution except as provided under contract. Copyright (c) Digital Equipment Corporation 1993. All rights reserved. OP/SYS: OpenVMS VAX, Versions 5.5 through 6.1 OpenVMS Alpha, Version 1.5, 6.1 COMPONENT: Lock Manager SOURCE: Digital Equipment Corporation OVERVIEW: This article provides a basic explanation of the dynamic re-mastering of locks that occurs within clusters running OpenVMS VAX V5.5 or greater. There is also a discussion of the PE1 system parameter. This parameter was activated with OpenVMS VAX V5.5-2, however, installation of CSCPAT_1011 will also activate the PE1 parameter. DYNAMIC RE-MASTERING: Dynamic re-mastering allows lock trees to be mastered by the node which is doing the most lock activity. This provides that node with the quickest response time. The rules for dynamic re-mastering are as follows: 1. A lock tree will never move to a node with a smaller LOCKDIRWT than the current master. 2. If a lock is mastered on a node and other nodes which have locks on the tree have a greater or equal LOCKDIRWT, then the following algorithm is used to determine if the tree should be moved: Each local nodes records its lock activity. The total activity over the last second is accumulated every second. The accumulation formula is: act = (8 * act) - 8 (( act - new) / 8 ) act = previously computed activity new = new activity over the last second The above formula while confusing, basically takes the previous activity into account. When a distributed lock request is made to a lock master node, the accumulated activity of the local node is sent to the master. The master compares this value with it's local activity. If the node sending the lock request has an activity counter which is 10 or more greater, then the tree is flagged to be moved. The re-mastering should start within the next second, unless numerous re-mastering operations are already occurring. During normal operations, there is a limit of 5 concurrent re-mastering operations. For V6.0 of OpenVMS VAX and V1.5 of OpenVMS Alpha, the above algorithm is changing to make it more conservative. The changes are as follows: o Activity accumulation is every 8 seconds. o The local node needs to have done 80 or more operations These changes means that for a lock tree to be moved, there needs to be a longer sustained activity rate on another node and it takes longer to decided if a tree should be moved. NOTE: For nodes to use the 8 second scanning algorithm, ALL NODES in the cluster must be able to do the 8 second scan! For VAXclusters, this means that all nodes must be at V6.0 for the 8 second scan to be used. For Alpha only clusters, the 8 second scan will be used, for Mixed-Architecture clusters with OpenVMS Alpha V1.5 and OpenVMS VAX V5.5-2, the old 1 second scan rate will be used. PE1 PARAMETER: Since the moving of large resource trees can take some time and all lock activity on the tree is stalled until the move completes, the SYSGEN parameter PE1 was made into a tree movement throttle. Any tree with more locks than PE1 will not be moved. By setting PE1 to 500, trees with fewer than 500 locks will continue to migrate as normal. Large trees with greater than 500 locks will never move. A value of zero in PE1 means that any size tree can move. A negative value disables re-mastering. The PE1 parameter usage was added as of OpenVMS VAX V5.5-2 and is also in OpenVMS Alpha. Installation of CSCPAT_1011 also enables the PE1 parameter and establishes the 8 second frequency for activity calculations. Use the following SYSGEN command to set PE1: SYSGEN> SET PE1 %XFFFFFFFF NOTE: Using -1 fails with an error message being returned. --------------------------------------------------- [OpenVMS] How To Trace A Hung Lock Request On A Clustered System COPYRIGHT (c) 1988, 1993 by Digital Equipment Corporation. ALL RIGHTS RESERVED. No distribution except as provided under contract. Copyright (c) Digital Equipment Corporation 1988, 1994. All rights reserved PRODUCT: VMScluster Software for OpenVMS AXP VAXcluster Software for OpenVMS VAX COMPONENT: Lock Manager SOURCE: Digital Customer Support Center OVERVIEW: This article explains how to use the SDA Utility to trace hung lock requests on systems in a VAXcluster environment. If you have a single node environment, you may want to reference another article in the database on how to trace a hung lock request on a non-clustered system, as the procedure is simplified on a single node. BACKGROUND: A process may hang in LEF or RWAST waiting to complete a lock request (ENQ / ENQW). That lock request may be blocked by another process currently holding the lock with an incompatible mode. The following procedure illustrates how to locate the process holding the lock with an incompatible mode. When the process is found you can either delete the competing process, or work on isolating the coordination problem and rewriting the programs to utilize other synchronization techniques, such as Blocking ASTs. More information on locks can be found in the "VAX/VMS System Services Reference Manual" and the "VAX/VMS Internals And Data Structures" Manual. PROCEDURE SUMMARY: The following steps and example should guide you through the isolation of the process that is blocking a lock request. A summary of the steps appears first: o Identify the hung process from SDA [Steps 1,2,3] o Locate the hung process's Lock Block [4] o From the Lock Block locate the Resource Block; [5] to do this check to see if the lock is a: 'Process copy', go to Step [6] or a 'Local copy', go to Step [7] o 'Process copy' locks mean the Master Resource [6] Block is located on another node and must be viewed from that node. a. Record the 'remote' Lock ID b. Identify and log into the node in question c. Display the Lock Block on this node and verify it is the one in question. o Use the Lock ID to show the Master Resource Block. [7] o From the Resource Block locate the Blocking Lock. [8] If the Blocking Lock is a: 'Master copy of lock', goto Step [9] or a 'Local copy', goto Step [10] o 'Master copy of lock' indicates this lock is [9] actually on another node. Identify the remote Lock ID, the node. Log into that node. Display and verify the Lock information on that node. o Display the Lock ID on the node the Blocking Lock [10] is on and verify the lock Resource Name and Mode. From the Blocking Lock locate the blocking process. DETAILED PROCEDURE: 1) Identify which process is hung waiting for a lock request. In this example, assume that process M_MORREN, in system SALES, is hung waiting for a lock request. 2) Invoke SDA on the node with the hung process and locate that process's INDEX number: Sales$ SET PROCESS/PRIVILEGE=CMKRNL ! needed privilege to run SDA Sales$ ANALYZE/SYSTEM ! invoke SDA to look around VAX/VMS System analyzer SDA> SHOW SUMMARY ! print summary output Current process summary ----------------------- Extended Indx Process name Username State Pri PCB PHD -- PID -- ---- --------------- ----------- ------- --- -------- ------- 20400101 0001 SWAPPER HIB 16 80197F98 80197E0 20400106 0006 ERRFMT SYSTEM HIB 7 803F6440 80BE4E0 20400107 0007 CACHE_SERVER SYSTEM HIB 16 803FCF50 80D2E80 . . . 20400666 0066 BRUCE_1 B_ARMER HIB 4 80467150 81631E0 +--->20400767 0067 M_MORREN M_MORREN LEF 7 80479710 81B5860 | 20400768 0068 SMITH SMITH LEF 4 80479EA0 81C3420 | +--------This is the hung process in this example, its process index is 0067, from the second column. If you use the command SHOW SUMMARY/IMAGE, you also see the name of the image the process is executing. 3) Set your process index to the blocked process in SDA and VERIFY you have the correct process - either by process name or by what image the process is running (as shown in SHOW PROC/CHANNEL). SDA> SHOW PROCESS/CHANNEL/IND=67 ! set up process index and view image Process index: 0067 Name: M_MORREN Extended PID: 20400767 ------------------------------------------------------------- Process active channels ----------------------- Channel Window Status Device/file accessed ------- ------ ------ -------------------- 0010 00000000 GREAT$DUA10: +--->0020 8080E920 GREAT$DUA10:[M_MORREN]LOCK_C.EXE | 0030 808C7420 GREAT$DUS100:[SYSE.SYSCOMMON.SYSL | 0040 00000000 RTA2: | 0050 00000000 RTA2: | 0060 808C7AE0 GREAT$DUS100:[SYSE.SYSCOMMON.SYSL | +--------- The image the process is executing is usually one of the first few channels. In this example, it is running the image LOCK_C.EXE. 4) Look at what locks that process currently has outstanding. This is done with the "SHOW PROC/LOCK" command. The locks a process has are ordered such that the locks the process is waiting for are near the end of the lock list - so you may have to go through many locks to get to the locks that are blocked by another process. Any lock that says 'Granted at' you can ignore as that lock request has already been completed. If no locks are waiting then the process may not be waiting on a lock or the lock it was waiting on has since been granted. A blocked lock will say either 'Waiting for' or 'Converting to'. In this example, the lock is 'Waiting for' a new ENQW request. ----------------------------------------------------------+ | NOTE: | This example is from a Pre OpenVMS VAX V5.4 system. | The lock display on Post OpenVMS VAX V5.4 systems has | been modified and the status text, e.g.; 'Waiting for' | or 'Converting to', is located on its own individual | line in the middle of the display. | | SDA> SHOW PROCESS/LOCK | | Process index: 0067 Name: M_MORREN Extended PID: 20400767 | ------------------------------------------------------------- | Lock data: | | Lock id: 005107E3 PID: 00070067 Flags: | Par. id: 00000000 Waiting for EX <-----------------+ Sublocks: 0 LKB: 80867F00 Resource: 414A5F45 4C505041 APPLE_JA Status: ASYNC Length 30 20202020 20204B43 CK User mode 20202020 20202020 System 00002020 20202020 .. Process copy of lock 01CD0145 on system 00010003 <--------------+ | 5) You have now identified the Blocked Lock. The next step is to | identify the Resource Block for that lock. The resource block | could exist on this node or another node in the cluster. To tell | where the Master Resource Block is located look at the text at the | end of the lock displayed in Step 4. ------------------------------+ If the text says 'Process copy of lock xxx on system yyy' then the Master Resource Block is located on another system and you must go to that system to get more information. If this is the case, go to Step 6. If the text says 'Local copy', then this is the system with the lock and you can use this Lock Id on this node for Step 7. 6) 'Process copy of lock xxx on system yyy' indicates the lock exists on another node in the cluster. Take note of the following fields: a. Lock Id on the remote node - in this example it is 01CD0145 b. System Id - in this example is is 00010003 c. Resource Name - in this example it is APPLE_JACK To identify the node the lock is on with the System Id you can enter "SHOW CLUSTER" and examine both the Node name and the CSID number: SDA> SHOW CLUSTER VAXcluster data structures -------------------------- --- VAXcluster Summary --- Quorum Votes Quorum Disk Votes Status Summary ------ ----- ----------------- -------------- 2 3 6553 quorum --- CSB list --- Address Node CSID Votes State Status ------- ---- ---- ----- ----- ------ 807DFF40 FRANK 00010007 0 open member,qf_noaccess 8071E640 SAM 00010001 1 open member,qf_noaccess 8071D9F0 HAL <-+-> 00010003 1 local member,qf_same 8071E700 SALES | 00010002 1 open member,qf_noaccess | +-----------------+ | +-In this example HAL is node 00010003 and is thus the node with the Master Resource Block. NOTE: To go from CSID to cluster member display in SDA, do a $ SHOW CLUSTER/CSID= This saves time looking for the correct node on a large cluster. You can also abbreviate the CSID to just the low order word index (similar to how SET and SHOW PROCESS use the index of the PID). You must now log onto that node and enter SDA to examine the Lock on the node with Master Resource Block. Hal$ SET PROCESS/PRIVILEGE=CMKRNL Hal$ ANALYZE/SYSTEM VAX/VMS System analyzer SDA> SHOW LOCK 1CD0145 ! check the lock out Lock database ------------- Lock id: 01CD0145 PID: 00000000 Flags: Par. id: 00000000 Waiting for EX Sublocks: 0 LKB: 80F34200 Resource: 414A5F45 4C505041 APPLE_JA Status: ASYNC MSTCPY Length 30 20202020 20204B43 CK User mode 20202020 20202020 Group 022 00002020 20202020 .. Master copy of lock 005107E3 on system 00010002 You should now verify that you have the correct lock: a. Lock Id is 01CD0145, which matches b. Resource Name is APPLE_JACK, which matches c. Lock Mode is still 'Waiting for EX', which matches d. The last line has 'Master copy of lock' shows the correct Lock Id and System Id from the node you just came from. Now that you know you have the correct lock you can continue with Step 7. 7) The next step is to print out the Master Resource Block and identify the Blocking Lock. To do this enter "SHOW RESOURCE/LOCK=" from SDA, taking the from the 'Lock id:' field in the lock display. Once the resource block is displayed, the blocking lock will be found in the 'Granted Queue' and will have a lock mode incompatible with the lock mode we are requesting. In the following example the is taken from the display in Step 6, 01CD0145. SDA> SHOW RESOURCE/LOCK=01CD0145 Resource database ----------------- Address of RSB: 80BF2810 Group grant mode: EX Parent RSB: 00000000 Conversion grant mode: EX Sub-RSB count: 0 BLKAST count: 0 Value block: 00000000 00000000 00000000 00000000 Seq. #: 00000000 Resource: 414A5F45 4C505041 APPLE_JA Length 30 20202020 20204B43 CK CSID: 00000000 User mode 20202020 20202020 Directory entry Group 022 00002020 20202020 .. Granted queue (Lock ID / Gr mode): 1---> 05210A82 EX <-------This is the blocking lock request and Conversion queue (Lock ID / Gr/Rq mode): its 'Lock Id' is 2---> 032500B4 NL/EX 05210A82. Waiting queue (Lock ID / Rq mode): 3---> 01CD0145 EX NOTE 1: This is the lock that is blocking our request. This lock is granted at EXclusive access and there is another lock also waiting to get access to the resource (Note 2). NOTE 2: Another process doing a conversion request from NL (null) to EX (exclusive) is also blocked. NOTE 3: This is our lock request from node SALES in the 'Waiting Queue'. 8) The Blocking Lock has been identified, now you must see if the process owning that lock is on this node or another node in the cluster. Display the lock information using the SDA SHOW LOCK command: SDA> SHOW LOCK 5210A82 Lock database ------------- Lock id: 05210A82 PID: 00000000 Flags: Par. id: 00000000 Granted at EX Sublocks: 0 LKB: 80FC7CE0 Resource: 414A5F45 4C505041 APPLE_JA Status: MSTCPY Length 30 20202020 20204B43 CK User mode 20202020 20202020 Group 022 00002020 20202020 .. Master copy of lock 028605D7 on system 00010001 If the last line of text says it is a 'Local Copy' then you are on the correct node to get the process owning the lock and can go directly to Step 10. If the last line of text says it is a 'Master copy of lock' then the lock exists on another node and you must get onto that node to locate the blocking process. If this is the case, go to Step 9. 9) 'Master copy of lock xxx on system yyy' indicates that the blocking lock exists on another node in the cluster. Take note of the following fields: a. Lock Id on the remote node, in this example 28605D7 b. System Id of the remote node, in this case 00010001 c. Resource Name, in this case APPLE_JACK. To identify the node the lock is on with the System Id you can enter "SHOW CLUSTER" and examine both the Node name and the CSID number. SDA> SHOW CLUSTER VAXcluster data structures -------------------------- --- VAXcluster Summary --- Quorum Votes Quorum Disk Votes Status Summary ------ ----- ----------------- -------------- 2 3 6553 quorum --- CSB list --- Address Node CSID Votes State Status ------- ---- ---- ----- ----- ------ 807DFF40 FRANK 00010007 0 open member,qf_noaccess 8071E640 SAM <-+-> 00010001 1 open member,qf_noaccess 8071D9F0 HAL | 00010003 1 local member,qf_same 8071E700 SALES | 00010002 1 open member,qf_noaccess | +---------------+ | +---The node holding the lock in this case is SAM as its CSID (System Id) matches that of the lock. You must now log onto that node and enter SDA to examine the Lock on the node with Master Resource Block. Sam$ SET PROCESS/PRIVILEGE=CMKRNL Sam$ ANALYZE/SYSTEM VAX/VMS System analyzer SDA> SHOW LOCK 28605D7 ! check the lock out Process index: 005C Name: M_MORREN Extended PID: 202002DC ------------------------------------------------------------- Lock data: Lock id: 028605D7 PID: 0005005C Flags: Par. id: 00000000 Granted at EX Sublocks: 0 LKB: 808748C0 Resource: 414A5F45 4C505041 APPLE_JA Status: Length 30 20202020 20204B43 CK User mode 20202020 20202020 System 00002020 20202020 .. Process copy of lock 05210A82 on system 00010003 You should now verify that you have the correct lock: a. Lock Id is 028605d7, which matches b. Resource Name is APPLE_JACK, which matches c. Lock Mode is 'Granted at EX', this is the blocking lock d. The last line has 'Process copy of lock' and shows the Lock Id and System Id from the 'Mastering' node. Now that you have the correct lock, you can continue to Step 10. 10) To see the process holding the incompatible lock, you use the PID field as an index number on the system which the process exists on. In this example, we will use the Lock Block from Step 9, with the Lock Id of 028605d7 and PID of 005c (we only use the lower four hex digits). Hopefully either the process name holding the lock, or the image it is executing, will give some clue as to why the process has kept the lock. In this case LOCK_A.EXE has taken out an EXclusive lock on the resource APPLE_JACK and is now waiting for terminal input. SDA> SHOW PROCESS/CHANNEL/INDEX=5C Process index: 005C Name: M_MORREN Extended PID: 202002DC ------------------------------------------------------------- Process active channels ----------------------- Channel Window Status Device/file accessed ------- ------ ------ -------------------- 0010 00000000 GREAT$DUA10: 0020 80879000 GREAT$DUA10:[M_MORREN]LOCK_A.EX 0030 8081B5E0 GREAT$DUS100:[SYSE.SYSCOMMON.SY 0040 00000000 Busy RTA1: 0050 00000000 RTA1: 0060 8081BFA0 GREAT$DUS100:[SYSE.SYSCOMMON.SY