From:	MERC::"uunet!tron!clevax.dnet!jon" 26-AUG-1992 03:17:27.20
To:	uunet!"everhart@raxco.com"
CC:	uunet!JON
Subj:	cluster devices and MSCP serving -> complications

Glenn, 

Sorry for the delay, I been busy with other things here. 

It appears that things probably aren't as simple as I implied.  When
we had two 11/780 and a TU78 with a dual port swith, there was a
position on the switch that was A/B where it was available to both
MASSBUS adapters.  A quick hack to the ZDEC (zero device error
count) program to make it set the dev$v_clu bit in the UCB's of the
MFA0 devices on each system allowed us to use the devices as if they
were off off an HSC-50.  If you allocated the device from one
system, it couldn't be allocated on the other system.  Since tape
devices weren't served there was no problem the device being there
on behalf of the MSCP server and the configure process.  To get VDA0 
units to be the same, some way is needed to prevent configure from
creating the UCB's on nodes in the same allocation class.  Either 
that, or somehow stealing the UCB from DUDRIVER and fixing up the 
MSCP server stuff.  One way that may be possible to do this is to 
make the UCB device type something that won't get served, and then 
fix it up in asnvd then set the device/served.  It seems that all 
client UCBs for served disk use DUDRIVER on the client system, no 
matter what drive they are using locally.

>Jon -
>thanks for your thoughts. I hadn't thought of it, but if all I need
>to do is get name and allocation class alike to get a set of
>drivers accessing the same physical media to look like "one"
>device, then making distributed stripesets etc. may be quite
>simple. THAT would be real nice...I'll have to try it when I get
>to where there's a cluster to try it on.

Well, I may have spoken too soon.  I just loaded and connected VDA0
units on two VAX-4300's.  They have MSCP_SERVE_ALL set to 1 so they
will serve everything they can to the cluster.  As soon as I connect
VDA0: on one system, it shows up on the other as an MSCP served
device using DUDRIVER.  Connecting a VDA0 unit on the second system
then resulted in two $2$VDA0: units on the second system.  

On the first system (TORUS) things looked something like this:

$2$VDA0: (TORUS)  offline	! Also served to cluster 

Show Device VD on the second system (OMEGA) looked something like this:

$2$VDA0: (TORUS)  online	! this was MSCP served from TORUS
$2$VDA0: (OMEGA)  offline       ! this was local

I've rebooted since then so I can't say exactly what they looked 
like.

The second VDA0: unit was not duplicated on TORUS.  One thing I
haven't tried is setting MSCP_LOAD to 1 and MSCP_SERVE_ALL to 0 so
nothing is automatically served.  Then I can connect two VDA0 units,
and after setting them /dual port (to set the cluster available bit;
it also sets the dual_path bit but I don't believe that is not used
by anything but DRDRIVER) then set them /served.  I'll have to wait
for some batch jobs to complete before I can try this.  I know this
used to work on RP07's with the dual port option.  We had it working
on a cluster of two VAX 780's with CI adapters.  This was under VMS
4.1 I believe.  I can't remember what the MSCP parameters were set
to, we were serving some RA81's off a UDA-50 (we didn't have an
HSC-50 until we got an 8600 to replace one of the 780's) 

The other thing I could try is to disable autoconfig and manually 
put the conn vda0: in sys$manager:syconfig.com.  It appears that the 
served vda0: units have DUDRIVER as the local driver, instead of 
VDDRIVER (perhaps since VDDRIVER isn't loaded when the MSCP served 
devices are created.)

Here's another off the wall idea.  Perhaps VDDRIVER could be made to 
work like CDDRIVER and intercept the startio routine for a DU unit.  
Somehow you would have to clone a DU device UCB and keep track of 
the file name, lbn offset etc. in the VD ucb.  I'm not sure what
advantages/disadvantages there would be to doing something like 
that, it just occurred to me that it wouldn't be too hard from 
something like CDDRIVER.

>   The DISM32 I have is V4.6 which is the latest. I'll send it
>if you want (on the tape, that is...a bit large for email).

I would appreciate that.  I'll include a note with the tape to 
remind you.

>   As for locks, I believe those need to be manipulated directly.
>The mapping between LBNs and files exists in window blocks which
>are in kernel space, and it is possible to just invalidate a map
>when a file gets opened for write somewhere but I believe some
>process level code would be needed to get the ASTs and so on and
>to handle the locks. I'm not familiar with how one manipulates
>locks from kernel mode.

I hadn't thought of that.  For the application that CDDRIVER was
written for, it probably wouldn't be optimal.  If you look at the
CDDRIVER documentation, you will see that Paul was not trying to
replace RMS buffering.  In their application, they were using qios
with READLBLK and WRITELBLK so RMS buffering was not useful to them.
As long as you weren't doing file sharing, it would probably be a
good way to coordinate the cache.  

>   As for the CDDRIVER problem, I suggest just taking the device
>lock out. That lock can be taken from fork level or from ASTDEL
>at FDT time and relinquished without losing the process context, 
>and it will ensure that cache size setting is done correctly.
>(I say this without seeing the code; if something that needs to be
>called can't work at device IPL you may have to can the idea.)

Yes, I think it would be best to just everything raise to the driver
fork level (the whole of CDDDRIVER runs at the target disks fork
level, it is up to the target startio routine to raise to device ipl
and lock) so I think it should be safe just to use the fork lock.

>   Unfortunately once you fork, you aren't guaranteed to be in
>the process, and you need to fire a special kernel AST back
>to the process to guarantee you are again (conveniently, at
>ASTDEL). FDT routines can be continued from a special kernel AST,
>but it is a bit tricky and involves finding some way to keep
>mainline and other AST code from running meanwhile...I used
>RWAST state for that plus explicitly fiddling with AST enable
>flags. 

Where did you use this technique?  I am interested in what you did.

I thought only a single level of AST code would run concurrently, and 
that the sp kast would preempt other AST threads.  I fact I thought 
to only way to block sp kast delivery was to never drop below 
ipl$astdel.  Maybe I don't see the synchronization problem you are
describing.

>         Perhaps some simpler signalling scheme controlled by
>flags in critical sections protected by devicelock would work
>and allow the size changes without risk to ongoing I/O. Interlocked
>instructions could also be used; just sticking a bbssi loop
>in the path for the lower IPL, for instance...?

I haven't figured out what I will do, first I need to go through the 
code an see what IPL/LOCKS are being used where.  I started looking 
at things after reading a section of Jamie Hanrahan and Lee Leahy's 
"VMS Advanced Device Driver Techniques" on using tokens in comments 
that are more expressive than the ; ;; ;;;  technique.

I am still trying to find a tape...  I really need to buy some more.
I'll probably end up copying some savesets from an earlier DECUS 
tape to another.  8mm has plenty of room for more than a single 
symposium on one tape.

Jon Pinkley
jon@clevax.wec.com