From:	MERC::"uunet!tron!clevax.dnet!jon" 24-AUG-1992 23:18:12.03
To:	"everhart@raxco.com"
CC:	JON
Subj:	More CDDRIVER ramblings

Glenn,

I've made a few more modifications to the CDDRIVER. I wanted to see what
other requests are being sent from the MSCP server.  I put some
additional conditional code in place so you can record just read and
write requests or everything.  I also put in some conditionals around
some event counters I put in the driver when I first started playing
around with MSCP serving the device that was being cached. 

As far as using locks to provide cache coherency, I don't know how to do 
that from driver context.  The XQP and RMS both have process context to 
play with.  I suppose it may be possible to do some things with "system 
owned locks" and jsb directly to the exec locking routines, but I would 
be afraid of what would happen to IPL/spinlocks behind the scenes.  
I don't see a straight forward way to do it.  I would think this would 
normally be done by an ACP or something that could use the lock manager 
directly.

If a way to use the distributed lock manager from driver context is
possible, then what would seem to make the most sense would be a
distributed cache where LBN ranges would be owned by a CPU.  That way
hot spots that were accessed most frequently by a single CPU could be
served to the cluster.  The V5.5 lock manager POSIX enhancements would
work well to coordinate the cache.  The POSIX enhancements allow "byte
range locking" where a range of bytes in a file are locked.  Instead of
a range of bytes, you could represent a range of LBNs that were locked. 
A blocking AST could notify the owning process of someone's intent to
change the value of a block within the range. 

I am not going to worry about making the cache distributed right now.  I 
am going to be serving everything from a single vaxstation anyway, and 
the performance of the system is not my main concern.  Also, since I 
will be using it with CD-ROM disks, it safe even if I have local 
caches on every node, since the contents of the disk aren't going to 
change.  Besides, V6 is supposed to have some caching built in, if I 
understood correctly, so I'll wait and see if it's even worth doing.

As far as 2P devices, my (very limited) understanding is that the
secondary path is only used when the primary fails.  I don't think there
is any load sharing among paths.  The load balancing stuff just goes out
and changes the primary path from a TQE routine of some kind.  I don't
know if all the nodes in the cluster will have the same idea of the
primary server or not.  For example if the following is the hardware
setup A and D are satellite nodes, B and C are boot nodes serving disks
0 and 1.  All I/O from B or C will be handled by the DSSI port driver on
the same node, (i.e. it won't be served by the other boot node). However
the drives can be served by either B or C to nodes A and D. What I am
not sure about is whether A's primary path to disk 0 will always be the
same as D's primary path.  I think that it usually will be, however, the
recent changes to help in load balancing may make it possible for A's
primary path to drive 0 be through B's MSCP server, while D's primary
path to drive 0 be through C's MSCP server. 

                                         AlloClass 1
    <--------------o---------------------o---------o-----------o---->
                   |                     |         |           |
                 +-+-+                 +-+-+     +-+-+       +-+-+
                 | A |                 | B |     | C |       | D |
                 +-+-+                 +-+-+     +-+-+       +-+-+
                                         |         |
                                         +-[0]-[1]-+
                                             DSSI

I also don't understand how writing a port driver to use with the disk 
class driver would help.  I thought the port driver was just a way of 
insulating the disk class driver from the interface hardware specifics.
Perhaps you know something I don't, or you have a idea that I don't 
understand.  

As far as making a device look like ONE device to the rest of VMS, I
think all you have to do is make sure that the device name, and
allocation class are the same, and set the "cluster available" bit,
DEV$V_CLU, in UCB$L_DEVCHAR2.  If you want to have this on lots of
systems that have different allocation classes, I don't know of a
supported way to do it.  It appears that each DDB could theoretically
have a different allocation class, so maybe you could set up an
allocation class for the "cluster stripset" and have the driver
explicitly set itself to that allocation class.  (You could use a USERx
sysgen parameter for the clusterwide allocation class, or something else
may be better.) 

Of course that doesn't solve the problem of multiple paths, but it is
what makes VMS treat it as the "same" device as far as allocate, assign,
etc. are concerned.  Then it's up to you to make sure the the device
really is the same.  Also, there were some changes to the way MSCP sees
DSSI units in VMS V5.4-2 (or around there) to allow VAXA::DIB0 and
VAXB::DIA0 both look like $x$DIA0: to VMS.  This allows crossed DSSI
buses (i.e. BUS 1 on VAXA is connected to BUS 0 on VAXB, and unit 0
happens to be on that bus.  VAXA boots from DIB0/R5=10000000 and VAXB
boots from DIA0/R5=0.)  Whether that affects none DSSI devices, I don't
know, we don't have any CI equipment any more. 

Well, back to CDDRIVER.  I was looking at the the SETMODE routine in
CDDRIVER, which is called to configure the cache for the disk being
cached.  It appears to me that there may be some possible
synchronization problems in some of the UCB fields.  Paul accesses the
cache setsize fields from IPL$_ASTDEL in startup, and from fork ipl from
every where else (except in the unit init routine which executes at
IPL$_POWER.) I haven't seen a problem, but it appears the possiblity is
there if you expanded cache while you were doing a backup on that
device. 

Why I was looking at SETMODE is because that is where I plan to plant
the hook into a "CRB" that represents the single physical drive in the
DRM-600.  The reason I want to use a CRB is because it would simplify
the handling of the timeouts that I want to implement.  EXE$TIMEOUT has
a CRB timeout routine for CRB's that are placed on the IOC$GL_CRBTMOUT
listhead.  It takes care of synchronizing with the forklock specified in
the CRB$B_FLCK location, checks CRB$L_DUETIME (offset from
EXE$GL_ABSTIM), and if it is due, resets the timeout to infinity (-1)
and then JSBs to the routine CRB$L_TOUTROUT.  The downside is that it
has one second granularity, but for my application, that is fine. 

You mentioned DISM32.  What is the latest version?  I have the latest
that was on a DECUS tape.  It claims to be DISM32 V4.3 (what it prints
in the comments at the beginning of the dissasembly.)   If you have a
later version, I would like to get it we you receive the 8mm tape. 

Well I need to find a tape to send.

Talk to you later,

Jon  
jon@clevax.wec.com