From: MERC::"uunet!CRVAX.SRI.COM!RELAY-INFO-VAX" 25-AUG-1992 19:47:06.40 To: info-vax@kl.sri.com CC: Subj: Re: RE: Clustered Global Section In article <9208190446.AA26060@uu3.psi.com>, leichter@lrw.com (Jerry Leichter) writes: > I'm just about to cluster a couple of VAXs. On one VAX an application > uses global sections to communicate between processes. After > clustering we want this application to be available across the > cluster. > > Is there a problem? > > Yes, I know that you can't share memory across a cluster, but since > global sections are mapped to files (but are all sections mapped to > files?), and assuming that the files are available clusterwide then > will it work? > > Well, it'll WORK, but it won't do what you seem to think! > > I guess it would be somewhat (or a lot) slower since it'd be disk I/O > rather than memory access. > > When you share a global section among processes on a single CPU, they are > accessing the same memory locations. A change made by one process is visible > to all the other processes within a few cycles. (In theory, at the end of > the writing instruction, but in fact only after a synchronization point is > reached is this guaranteed. For a VAX, synchronization points include all > the interlocked instructions, and a couple of other things that don't matter > for non-privileged code.) > > If you have a global section open on the same file in multiple members of a > cluster, each member of the cluster is accessing local memory. If one makes > a change to its own local memory, that change will not be visible until the > processor making the change pages that page out, and the process looking for > the change pages it in. VMS will NOT force this to happen in any way that > would be useful for this kind of programming. (It's possible to force half > of it to happen yourself by doing a $UPDSEC on the writing processor, which > will write the new data to disk. There's no simple way to force the page to > be read back in from the disk. Performance would be terrible in any case.) > > For all practical purposes, global sections are limited to single nodes (with > multiple CPU's perhaps) in a cluster. > > Sorry. > -- Jerry > -- Jerry got it right. However, this being one of those rare times I can add something to what Jerry writes, I will do so: I've implemented an application which works just as you seek. There are basically two solutions, which are very similar. However, neither is for the faint of heart, and if you are not very comfortable with the following description you should probably pass on these solutions. Further, if your application is not properly designed, these solutions will offer poor performance (reducing memory-time accesses to 2*disk-time). Jerry is (of course) correct that performance might be terrible, but for a properly-designed application, this is not necessarily the case. As further proof that this solution works, I note that Rdb global buffering works in essentially the same fashion. I think that Oracle parallel servers work the same way, too, although I am not sure that they use a section file as opposed to a direct IOs from disk. The basic idea is this: Set up a lock to coordinate access to the global section between cooperating processes. When an application updates data that it wants to make available to other processes (on any node), it simply uses the $UPDSEC system service to write the appropriate addresses to backing store on disk. This synchs the disk file with memory, at the cost of one disk IO. Next, the updater takes out a lock that uses blocking ASTs to notify the other (interested) processes that an update has occured. The lock value block includes the address range of the interesting updates. The "interested" processes block the lock and therefore have their blocking ASt fired. If they don't care about the update, they simply do nothing and re-queue their "interested" lock. If they care, then one process on each node needs to take out another lock (all try, but only one succeeds) and updates memory from the section file. The cost of this is another disk IO. (There are some rather nasty synchronization issues here, and not a few race conditions, but it is definitely do-able with the lock manager and some creative instantiation of lock names and parent-child stuff.) Another solution is to dedicate only one process on each node as the one who is responsible for keeping memory synched, possibly even making that a separate process from user processes. But this is not strictly necessary. The really tricky part is that seemingly innocuous line "updates memory from the section file". There is no system service to do this, for architectural reasons dating back to VMS V1.0. (This direct from a long discussion one DECUS with Larry Kenah.) The easiest way to do it is to (gulp) post a QIO from the correct addresses in the section file directly to the correct addresses in memory. Except for a small offset at the beginning of the section file (four bytes, as I recall) the addresses are a direct map. When this QIO has completed, then memory on that node will now reflect the contents of the section file, which, as you recall, was recently updated to reflect the contents of memory on the updater node. There are some possible gotchas here with respect to privileges, depending on the nature of your section file, but nothing that can't be handled straight-forwardly. Obviously, such a method will work only when the updates can be co-ordinated between processes explicitly, not when arbitrary updates are happening randomly and without explicit "knowledge" of the updater. That is, the updater needs to have relatively few places where update is happening, and update cannot be "all the time". If your application spends all its time writing all over the section file, then this method isn't for you and you should stick to a single node solution. (As an aside, should the previous sentence describe your application, then you will likely have some difficulties moving this application to Alpha from VAX, so you may want to look at a re-design of your application anyway.) As I said, I've done exactly the above on two occasions. With good application design, you can in fact use section files and two disk IOs to keep memory on different nodes in synch. Good luck! Phil _________________________________________________________________________ Philip A. Naecker Consulting Software Engineer Internet: pan@propress.com 1010 East Union Street, Suite 101 uunet!prowest!pan Pasadena, CA 91106-1756 Voice: +1 818 577 4820 FAX: +1 818 577 0073 Also: Technology Editor, DEC Professional Magazine VAX Professional Magazine Review Board Member _________________________________________________________________________