From: SMTP%"RELAY-INFO-VAX@CRVAX.SRI.COM" 12-JUN-1993 11:53:44.34 To: EVERHART CC: Subj: Re: Difference between VMSclusters and NFS X-Newsgroups: comp.os.vms From: zrepachol@cc.curtin.edu.au (Paul Repacholi) Subject: Re: Difference between VMSclusters and NFS Message-Id: <1993Jun8.055749.1@cc.curtin.edu.au> Lines: 59 Sender: news@cujo.curtin.edu.au (News Manager) Organization: Curtin University of Technology Date: Mon, 7 Jun 1993 20:57:49 GMT To: Info-VAX@kl.sri.com X-Gateway-Source-Info: USENET In article <1671@se.alcbel.be>, mvbr@se.alcbel.be (Marc Verbruggen) writes: > Now, my question : In what sense do these 3 technologies differ ? Is the lock > manager concept in clusters crucial ? Is there no such thing within Unix or the > Distributed Services ? The distributed lock manager is one of the key bits of a cluster. The REAL workers though are the MSCP, SCS and the conection manager. A cluster is very 'real' in that its members co-operate to keep it intact over time. Hands up all those who have forgoten about a diskless node in some forgoten corner, and had to do a cluster re-boot again ;-( SCS and the conection manager provide a set of comunication services that MSCP and the DLM can rely on. With SCS and MSCP you can do nearly anything NFS can do, including screw up mightly! The problem is, of course, co- ordination of the use of resorces. So the conection manager maintains the integrity of the conections, and identifies alternate paths to each devise. This is essential, even if you don't want the redundancy, as you must know if your ulimate target is the same devise, or a different one with a simular name. This is why the cluster manuals rave on about allocation classes and the like. When you have each devise identified, you can then control it use. Note that clusters use 'discressionary locking', not enforced locking. You CAN step around the lock manager and access something without any syncronization to the rest of the cluster, or issue a lock request for a key resource name and hang the cluster as everone else waits for you to free it. ( This is the common cause of cluster hangs. Something in one machine gets the lock on SYSUAF or the system volume and gets stuck. Then all the other machines hang, waiting for the lock to be freed. ) Note, it is not EASY to bypass locking, just possible. For instance, RMS does all the record and index locking in files. By not useing RMS, and going straight to the XQP with QIOs you can open the file and access the data. You just have to do ALL of the data access yourself. The lock manager can also be used to comunicate across the cluster by using the 'lock value block' in the ENQ/DEQ calls. This is the key ( along with the SCS services ) to things like OPCOM, the que manaager, license managment, etc. It is also VERY usefull for running a critical aplication over a cluster. One copy can create a resource name and put an exclusive lock on it. All other copies ( on the other nodes ) hang waiting for the lock. If the active node fails, *ONE* other node will get the lock and can proceed. Infact the DLM had an unexpected plus in non-clustered systems. Pre VMS v4, all volume managment was done by one process running F11BACP. This process was responsable for all file level lockinf and syncronisation. One effect of this was that it was nearly single threaded for all the system. For v4, it HAD to use the DLM to sync with other processes in other cluster members, so it could also use the DLM for local sync. Now, as much of the global state is in the lock manager, there is no reason not to let EACH PROCESS independantly manager the file structure, and use the DLM to sync against ALL other processes in the cluster. Hence file processing is now multi-threaded! and only has to pause to snyc if there is an access conflict. It also enables each node to pre- allocate thing and cache them so as to speed things up. If the node goes down, the inforation in the LVBs can be used to recover and clean up. ~Paul