From: MERC::"uunet!tron!clevax.dnet!jon" 24-AUG-1992 23:18:12.03 To: "everhart@raxco.com" CC: JON Subj: More CDDRIVER ramblings Glenn, I've made a few more modifications to the CDDRIVER. I wanted to see what other requests are being sent from the MSCP server. I put some additional conditional code in place so you can record just read and write requests or everything. I also put in some conditionals around some event counters I put in the driver when I first started playing around with MSCP serving the device that was being cached. As far as using locks to provide cache coherency, I don't know how to do that from driver context. The XQP and RMS both have process context to play with. I suppose it may be possible to do some things with "system owned locks" and jsb directly to the exec locking routines, but I would be afraid of what would happen to IPL/spinlocks behind the scenes. I don't see a straight forward way to do it. I would think this would normally be done by an ACP or something that could use the lock manager directly. If a way to use the distributed lock manager from driver context is possible, then what would seem to make the most sense would be a distributed cache where LBN ranges would be owned by a CPU. That way hot spots that were accessed most frequently by a single CPU could be served to the cluster. The V5.5 lock manager POSIX enhancements would work well to coordinate the cache. The POSIX enhancements allow "byte range locking" where a range of bytes in a file are locked. Instead of a range of bytes, you could represent a range of LBNs that were locked. A blocking AST could notify the owning process of someone's intent to change the value of a block within the range. I am not going to worry about making the cache distributed right now. I am going to be serving everything from a single vaxstation anyway, and the performance of the system is not my main concern. Also, since I will be using it with CD-ROM disks, it safe even if I have local caches on every node, since the contents of the disk aren't going to change. Besides, V6 is supposed to have some caching built in, if I understood correctly, so I'll wait and see if it's even worth doing. As far as 2P devices, my (very limited) understanding is that the secondary path is only used when the primary fails. I don't think there is any load sharing among paths. The load balancing stuff just goes out and changes the primary path from a TQE routine of some kind. I don't know if all the nodes in the cluster will have the same idea of the primary server or not. For example if the following is the hardware setup A and D are satellite nodes, B and C are boot nodes serving disks 0 and 1. All I/O from B or C will be handled by the DSSI port driver on the same node, (i.e. it won't be served by the other boot node). However the drives can be served by either B or C to nodes A and D. What I am not sure about is whether A's primary path to disk 0 will always be the same as D's primary path. I think that it usually will be, however, the recent changes to help in load balancing may make it possible for A's primary path to drive 0 be through B's MSCP server, while D's primary path to drive 0 be through C's MSCP server. AlloClass 1 <--------------o---------------------o---------o-----------o----> | | | | +-+-+ +-+-+ +-+-+ +-+-+ | A | | B | | C | | D | +-+-+ +-+-+ +-+-+ +-+-+ | | +-[0]-[1]-+ DSSI I also don't understand how writing a port driver to use with the disk class driver would help. I thought the port driver was just a way of insulating the disk class driver from the interface hardware specifics. Perhaps you know something I don't, or you have a idea that I don't understand. As far as making a device look like ONE device to the rest of VMS, I think all you have to do is make sure that the device name, and allocation class are the same, and set the "cluster available" bit, DEV$V_CLU, in UCB$L_DEVCHAR2. If you want to have this on lots of systems that have different allocation classes, I don't know of a supported way to do it. It appears that each DDB could theoretically have a different allocation class, so maybe you could set up an allocation class for the "cluster stripset" and have the driver explicitly set itself to that allocation class. (You could use a USERx sysgen parameter for the clusterwide allocation class, or something else may be better.) Of course that doesn't solve the problem of multiple paths, but it is what makes VMS treat it as the "same" device as far as allocate, assign, etc. are concerned. Then it's up to you to make sure the the device really is the same. Also, there were some changes to the way MSCP sees DSSI units in VMS V5.4-2 (or around there) to allow VAXA::DIB0 and VAXB::DIA0 both look like $x$DIA0: to VMS. This allows crossed DSSI buses (i.e. BUS 1 on VAXA is connected to BUS 0 on VAXB, and unit 0 happens to be on that bus. VAXA boots from DIB0/R5=10000000 and VAXB boots from DIA0/R5=0.) Whether that affects none DSSI devices, I don't know, we don't have any CI equipment any more. Well, back to CDDRIVER. I was looking at the the SETMODE routine in CDDRIVER, which is called to configure the cache for the disk being cached. It appears to me that there may be some possible synchronization problems in some of the UCB fields. Paul accesses the cache setsize fields from IPL$_ASTDEL in startup, and from fork ipl from every where else (except in the unit init routine which executes at IPL$_POWER.) I haven't seen a problem, but it appears the possiblity is there if you expanded cache while you were doing a backup on that device. Why I was looking at SETMODE is because that is where I plan to plant the hook into a "CRB" that represents the single physical drive in the DRM-600. The reason I want to use a CRB is because it would simplify the handling of the timeouts that I want to implement. EXE$TIMEOUT has a CRB timeout routine for CRB's that are placed on the IOC$GL_CRBTMOUT listhead. It takes care of synchronizing with the forklock specified in the CRB$B_FLCK location, checks CRB$L_DUETIME (offset from EXE$GL_ABSTIM), and if it is due, resets the timeout to infinity (-1) and then JSBs to the routine CRB$L_TOUTROUT. The downside is that it has one second granularity, but for my application, that is fine. You mentioned DISM32. What is the latest version? I have the latest that was on a DECUS tape. It claims to be DISM32 V4.3 (what it prints in the comments at the beginning of the dissasembly.) If you have a later version, I would like to get it we you receive the 8mm tape. Well I need to find a tape to send. Talk to you later, Jon jon@clevax.wec.com