From: MERC::"uunet!tron!clevax.dnet!jon" 26-AUG-1992 03:17:27.20 To: uunet!"everhart@raxco.com" CC: uunet!JON Subj: cluster devices and MSCP serving -> complications Glenn, Sorry for the delay, I been busy with other things here. It appears that things probably aren't as simple as I implied. When we had two 11/780 and a TU78 with a dual port swith, there was a position on the switch that was A/B where it was available to both MASSBUS adapters. A quick hack to the ZDEC (zero device error count) program to make it set the dev$v_clu bit in the UCB's of the MFA0 devices on each system allowed us to use the devices as if they were off off an HSC-50. If you allocated the device from one system, it couldn't be allocated on the other system. Since tape devices weren't served there was no problem the device being there on behalf of the MSCP server and the configure process. To get VDA0 units to be the same, some way is needed to prevent configure from creating the UCB's on nodes in the same allocation class. Either that, or somehow stealing the UCB from DUDRIVER and fixing up the MSCP server stuff. One way that may be possible to do this is to make the UCB device type something that won't get served, and then fix it up in asnvd then set the device/served. It seems that all client UCBs for served disk use DUDRIVER on the client system, no matter what drive they are using locally. >Jon - >thanks for your thoughts. I hadn't thought of it, but if all I need >to do is get name and allocation class alike to get a set of >drivers accessing the same physical media to look like "one" >device, then making distributed stripesets etc. may be quite >simple. THAT would be real nice...I'll have to try it when I get >to where there's a cluster to try it on. Well, I may have spoken too soon. I just loaded and connected VDA0 units on two VAX-4300's. They have MSCP_SERVE_ALL set to 1 so they will serve everything they can to the cluster. As soon as I connect VDA0: on one system, it shows up on the other as an MSCP served device using DUDRIVER. Connecting a VDA0 unit on the second system then resulted in two $2$VDA0: units on the second system. On the first system (TORUS) things looked something like this: $2$VDA0: (TORUS) offline ! Also served to cluster Show Device VD on the second system (OMEGA) looked something like this: $2$VDA0: (TORUS) online ! this was MSCP served from TORUS $2$VDA0: (OMEGA) offline ! this was local I've rebooted since then so I can't say exactly what they looked like. The second VDA0: unit was not duplicated on TORUS. One thing I haven't tried is setting MSCP_LOAD to 1 and MSCP_SERVE_ALL to 0 so nothing is automatically served. Then I can connect two VDA0 units, and after setting them /dual port (to set the cluster available bit; it also sets the dual_path bit but I don't believe that is not used by anything but DRDRIVER) then set them /served. I'll have to wait for some batch jobs to complete before I can try this. I know this used to work on RP07's with the dual port option. We had it working on a cluster of two VAX 780's with CI adapters. This was under VMS 4.1 I believe. I can't remember what the MSCP parameters were set to, we were serving some RA81's off a UDA-50 (we didn't have an HSC-50 until we got an 8600 to replace one of the 780's) The other thing I could try is to disable autoconfig and manually put the conn vda0: in sys$manager:syconfig.com. It appears that the served vda0: units have DUDRIVER as the local driver, instead of VDDRIVER (perhaps since VDDRIVER isn't loaded when the MSCP served devices are created.) Here's another off the wall idea. Perhaps VDDRIVER could be made to work like CDDRIVER and intercept the startio routine for a DU unit. Somehow you would have to clone a DU device UCB and keep track of the file name, lbn offset etc. in the VD ucb. I'm not sure what advantages/disadvantages there would be to doing something like that, it just occurred to me that it wouldn't be too hard from something like CDDRIVER. > The DISM32 I have is V4.6 which is the latest. I'll send it >if you want (on the tape, that is...a bit large for email). I would appreciate that. I'll include a note with the tape to remind you. > As for locks, I believe those need to be manipulated directly. >The mapping between LBNs and files exists in window blocks which >are in kernel space, and it is possible to just invalidate a map >when a file gets opened for write somewhere but I believe some >process level code would be needed to get the ASTs and so on and >to handle the locks. I'm not familiar with how one manipulates >locks from kernel mode. I hadn't thought of that. For the application that CDDRIVER was written for, it probably wouldn't be optimal. If you look at the CDDRIVER documentation, you will see that Paul was not trying to replace RMS buffering. In their application, they were using qios with READLBLK and WRITELBLK so RMS buffering was not useful to them. As long as you weren't doing file sharing, it would probably be a good way to coordinate the cache. > As for the CDDRIVER problem, I suggest just taking the device >lock out. That lock can be taken from fork level or from ASTDEL >at FDT time and relinquished without losing the process context, >and it will ensure that cache size setting is done correctly. >(I say this without seeing the code; if something that needs to be >called can't work at device IPL you may have to can the idea.) Yes, I think it would be best to just everything raise to the driver fork level (the whole of CDDDRIVER runs at the target disks fork level, it is up to the target startio routine to raise to device ipl and lock) so I think it should be safe just to use the fork lock. > Unfortunately once you fork, you aren't guaranteed to be in >the process, and you need to fire a special kernel AST back >to the process to guarantee you are again (conveniently, at >ASTDEL). FDT routines can be continued from a special kernel AST, >but it is a bit tricky and involves finding some way to keep >mainline and other AST code from running meanwhile...I used >RWAST state for that plus explicitly fiddling with AST enable >flags. Where did you use this technique? I am interested in what you did. I thought only a single level of AST code would run concurrently, and that the sp kast would preempt other AST threads. I fact I thought to only way to block sp kast delivery was to never drop below ipl$astdel. Maybe I don't see the synchronization problem you are describing. > Perhaps some simpler signalling scheme controlled by >flags in critical sections protected by devicelock would work >and allow the size changes without risk to ongoing I/O. Interlocked >instructions could also be used; just sticking a bbssi loop >in the path for the lower IPL, for instance...? I haven't figured out what I will do, first I need to go through the code an see what IPL/LOCKS are being used where. I started looking at things after reading a section of Jamie Hanrahan and Lee Leahy's "VMS Advanced Device Driver Techniques" on using tokens in comments that are more expressive than the ; ;; ;;; technique. I am still trying to find a tape... I really need to buy some more. I'll probably end up copying some savesets from an earlier DECUS tape to another. 8mm has plenty of room for more than a single symposium on one tape. Jon Pinkley jon@clevax.wec.com