From: Paul Sture [p_sture@elias.decus.ch]
Sent: Monday, June 02, 2003 10:37 AM
To: Info-VAX@Mvb.Saic.Com
Subject: Re: NT: son of VMS? (was Re: Portents of VMS death)

In article <ccKdnZ2gct6g-0yjXTWcoQ@metrocast.net>, "Bill Todd" <billtodd@metrocast.net> writes:
> 
> "Paul Sture" <p_sture@elias.decus.ch> wrote in message
> news:H4lj5Ghx3MEB@elias.decus.ch...
>> In article <qMOdnfwqZcQmuU2jXTWcpA@metrocast.net>, "Bill Todd"
> <billtodd@metrocast.net> writes:
>> <snip>
>>
>>
>> > NT probably does have a somewhat closer relationship to VMS than what
> I'd
>> > write for a record manager today would have to RMS,
>>
>> Out of curiosity, what features would you put into a new RMS, and is there
>> anything in particular you would drop? As this is a theoretical question,
>> I'm not especially concerned with backwards compatibilty.
> 
> Where to begin?  At least in OSs where crossing the application/system
> interface is not unreasonably expensive, a record manager lends itself
> better to a Unix-style system where each record operation is handled by the
> OS using OS buffers, so VMS would not be the ideal platform on which to
> create it. 

I _was_ thinking in the context of VMS, but all interesting stuff anyway.

> Once you have that, you have protection for the internal
> structure of the files (users can't modify it directly; copy operations are
> done by system functions, which is safer than having applications do them
> anyway); you can also perform (and protect) updates using a transaction log,
> which allows record operations to be made persistent in a single log write
> (even multiple operations can be captured in a single log write if
> synchronous operation isn't requested and the applications use Flush-like
> mechanisms to establish consistency points) and things like index updating
> can safely be batched and performed lazily in the background (so can on-line
> reorganization) - you can even use pseudo-log-structured storage to batch
> multiple bucket updates into a single disk access (which makes the presence
> of a background on-line reorg mechanism more important, and also allows
> bucket compaction - or even full-fledged compression, though that's more
> expensive - such that partially-filled buckets occupy no more space on disk
> than their data requires), though this starts tying into underlying
> file-system changes that I'm working on.  The presence of the transaction
> log also facilitates user-level transaction support, though if *long*
> transactions are to be supported things can get considerably more complex.
> 

That last bit can be a gotcha. I once recommended that a project under
development be changed to use the transaction tools that were available.
The project leader reported back to us that the programming changes were
pretty trivial of 95% of the the programs, but the remaining 5% were
absolute pigs.

> I think I'd get rid of RFA mechanisms in indexed files:  the permanent
> indirection and maintenance overheads don't seem justifiable.  

I didn't realize they were such an overhead.

> RMS-32 may
> even have implemented no-RFA single-key indexed files at one point - I know
> I discussed it with them.  

I don't recall it escaping into the wild. Back in the days when I was
actively developing applications using RMS, for the most part I used RFA
access only within the scope of a program, due to the way they could change
on a CONVERT operation. I used RFAs mainly for things like navigating
backwards through screens of data records, particularly where non-unique
duplicate keys were involved.

> As a substitute, it's possible to define a
> flexible *unordered* container structure supporting permanent RFAs that
> never need get indirected, so multi-key files are still feasible, they just
> don't have a primary index but only secondary indexes.  

Which would mean that my existing code would work with secondary keys,
but not primary ones. Not as pretty, but I could have lived with that, if the
performance gains were sufficient. It is probably worth noting here that
we used to have a rule that all primary keys were unique. We did this
for transaction and pre/post image recovery reasons, but I found it to be
a good rule of thumb (I also have memories of resolving some appalling
performance problems by changing duplicate primary keys to unique 
values, though that was on RT-11, not using RMS).

> Sequential access in
> physical order is still fast, and reorganization into a preferred physical
> order is possible as long as the RFAs are only used internally (by the
> alternate indexes) rather than by applications.  Another option is to use
> (unique, or if necessary 'uniquified') primary key values as permanent
> record identifiers (as I think NonStop SQL does) that alternate indexes can
> use for access.
> 

That fits the scenarios I m thinking of above.

> Some kind of relatively simple support for 1:many parent:child relationships
> (supporting hierarchical structures) would be nice, likely implemented by
> physical sequencing within a primary data level to provide good access
> clustering.  If you don't want the parent clustered with its children, then
> you can just implement the structure within the application using key-based
> pointers into separate data sets and suitable enclosing transactions for
> updates, though it might be nicer for the system to support that
> transparently within a single 'file' containing multiple record types in
> multiple data sets and managing the linkages among them.  Record types
> should be able to evolve on line, with newer versions able to coexist with
> the old ones (which would be updated if/when their new fields got used).
> Some of this stuff starts encroaching on database territory, but only
> selectively:  mostly, it offers a simple navigational alternative to the
> overheads and inefficiencies of a full-blown relational model upon which
> some higher-level constructs (like *simple* view mechanisms) could be
> layered.

I rather like that idea. Back in the 1980s I always wanted to develop what
I used to call a "Poor man's Datatrieve". Indeed we got part of the
way there with a data library and file library. Unlike CDD, the data items
could be defined once and only once no matter how many files used it.
Basic data validation routines, string pattern matching etc could also
be specified there. The general idea was that if you changed a file layout
you could simply recompile and go, without touching the code itself.
> 
> There's a ton of complexity in RMS aimed at '70s-era speed and space
> optimizations that just aren't important any more, so out they go.  64 KB is
> too small a limit on bucket size, and bucket overflow mechanisms should
> remove all limits on record size (large records should be
> piece-wise-accessible like files are; limiting key size to 1 KB or so
> remains reasonable and useful, though).
>
Agreed. I also picked up on a comment of Hein's a few weeks ago about the
FDL editor choosing bucket sizes which are not appropriate for today's
cluster sizes, but I'll have to check Google for details of that.
 
> That's what comes to mind off the top of my head, anyway.  I'm sure there's
> a lot more beneath the surface.  Feel free to offer suggestions or
> observations.
> 

Not bad for an "off the top of your head" :-) Thanks, you've reminded me
how much I used to enjoy doing file related stuff.

-- 
Paul Sture