The notion here is that * Managing huge farms of file structures is a pain * Finding things in same is even more of a pain * A large server is useless unless you can find things you need * Getting a few orders of magnitude faster access to same than EVERYONE else's servers (including microsoft's) The scheme outlined below is for a layer between RMS and an XQP or ACP, basically requiring no mods to either, which allows your "current directory" to mean not just somewhere in a directory tree, but to mean a way of selecting what files you're interested in looking at, based on criteria of path, date, contents, keywords, size, or most anything else you can think of. Some periodic index pass with an Altavista type indexer (make it increasingly able to find stuff inside funny or compressed formats too) could get extra data to allow retrieval with. It should be possible to construct examples easily with this where VMS access with the proper default (you have some fake "directory" names available to set up the query, e.g. setting to examine files containing strings X or Y and string Z, so clients of your server need no additional work to be able to use this) finds some bit of data 1000 times faster than a competitor, who has to search thru large numbers of files. Do NOT discuss this with anyone who might talk to Microsoft!! My understanding of the MS object file system is that it is not precisely this kind of thing; however I'd like to know for sure if you know. BTW I consider this a relatively modest amount of work provided the relational dbms can be picked up from somewhere. It need only run in user mode...no need to build one that can run in kernel. Since underlying filesystems would be untouched & valid,this scheme allows sensible backup policies too... Glenn Everhart From: NORLMN::EVERHART 10-JAN-1997 12:40:24.50 To: STAR::ZALEWSKI,STAR::MASON,STAR::SZUBOWICZ CC: EVERHART Subj: Let's do something Microsoft will find it hard to do... From: Glenn C. Everhart, PhD. Date: 10-Jan-1997 Folks: There's an idea I've had for some years (actually first wrote it down c. 1992) that might be hard for Microsoft to copy if we act on it. The idea is the following: One key bottleneck in handling information is finding things when you're looking for them. Right now, this is dealt with with ever-longer filenames to try to encode enough information in a name to be able to tell where the stuff you want is. That's a crock, and is becoming ever more obviously one. But Microsoft seems to still be in that mold...as is VMS. Now consider a system like this: When anything tries to open a file, intercept ahead of the open and gather the name, did, fid, etc. that are passed. Shoot the request off to a daemon that runs a real honest-to-God relational DBMS and let that return you the device, did, and file ID that you want to open. By resetting the user channel to the device (and the IRP also) and intercepting close to put the user channel back, you get a blindingly fast way to open a file, and have the files actually accessed thru ONE directory, but with in fact valid file structures on as many separate devices as you want...and need NOT touch the underlying file system to get it. In fact, the underlying name can even be stored shorter if you want, and extra attributes might be able to be stored in the database too...a structure needs to be around till close to reset the user channel and it can have a little more stuff in it... I've implemented this kind of redirection ... it's blindingly fast... and had it working by 1/1995 already on Vax or Alpha. (Yes, I can let you have a demo.) On create, you pass the info to your daemon which creates the file on some device and fills in the DBMS; the intercept, on return, changes the user open to open-existing-file and lets it get at the real file, so all runs clean. I have code that tries to do this sort of thing but uses a normal file system as the holder of the data btw, but it lacks flexibility. However it uses a first-fill kind of scheme (which is good on jukeboxes and the like), not a use-emptiest-disk like volume sets (which is terrible on jukeboxes). The advantages of this are that you get one directory...ONE directory... structure for zillions of volumes, yet each volume is a valid file structure and it's not too hard to add new disks to a master structure or to remove them, and disks can be backed up and otherwise treated separately. (You'd have some, perhaps process level, control to allow direct underlying access without forcing access to use and maintain the common directory.) Files in large directories get split across lots of underlying disks, and maybe lots of underlying directories, so underlying directory perf. doesn't get too bad. But the REAL win is this: We currently have SET DEFAULT that sets a default path. Add to it. Suppose you have SET DEFAULT "show me only files less than a month old", or one that says "show me files with keyword "payroll" attached" or "Show me files of size between X and Y" or ... These can look like pseudodirectories. They can even be set up as one-shot setups (ie, for the next thing that runs only), and/or include things written by the current app or not yet having all content based info filled in by default. The database can be used to hold, and efficiently store, lots more information about stuff stored on your disk than a filesystem would normally have. Good databases scale well as they grow. And this would mean that a user of OVMS would be able to access his files based on information that was far more comprehensive than other OSs allow. Run an altavista searcher and indexer periodically to get keywords out of the text maybe. Use the scheme for even files NOT on your system...and make it all look transparent. (My stuff can inswap first easily enough, from wherever.). Use this to get at old data off tape or disk archives as if dealing with the current filesystem too. As you see it should scale exceedingly well, and represents a new twist in OS technology. Get some protection for some of the notions and maybe VMS can keep other competitors from doing exactly the same thing, and MAKE IT HARD FOR MICROSOFT TO DO IT. Then when people start wanting to access info on the net, say, or on their own corporate databases, will they want to use some OS that gives them remote filenames as if local, or will they want the OS that lets them SET DEFAULT/keywordlist=(a,b,c,...)/domainlist=(...)- /datemask=(start:date1,end:date2) ... etc. so their searches find what they really want and make it local everywhere? BTW, when you stage a file locally, my scheme is to tag it so that if it's not actually used it can be cleaned up by a garbage collector after a period of time. Think AFS (Andrew File System) or DCE DFS here... I think it'd be neat to see VMS do this sort of thing, a market win, and I gather the NT interfaces aren't really clean enough to insert something like this so easily. The VMS ones are, however; this is almost (but alas not quite) quick enough for spare time... OTOH, we don't want to delay long getting something going with a scheme like this. Once someone else does it, the great advantage is much less. If the someone else is Microsoft, game may be over. At any rate it's not something you'd want to compete against... Glenn Everhart Other notes: 1. This would not be revealed generally; nobody needs to know how it is done and the longer a lead you have till someone does it the better off you are. 2. The notion of directory generalizes here. If the layer doing the dbms calls (the dbms must be ok in clusters!) "knows" some pseudodirectories are really commands to add to any query it can strip them and return a DID flagged as perhaps an intermediate one. It would return directory info possibly of the original files, use a few extra FID bits perhaps to encode that this is an intermediate dir of a level of pseudodirs. (I'd reuse some high seq # bits for this so's not to mess up RMS or anything above, or better still reuse some RVN bits since you wouldn't use volume sets with this thing...it does all they do, better, pretty much.) (A bit of support in mapvblk could span files across disks, but initially I wouldn't do that...just keep them separate...and let users spec. how much free space must be guaranteed for a file if possible on create to allow huge files to be created on pretty empty disks...do this with a pseudodir too. Doing it would not be all that hard though so long as it gets treated a lot like a window turn...send msg to acp interface, have the new piece accessed or created, pass back pointer info....) 3. The pseudo-dir scheme would allow (but not require) new queries to maybe prune irrelevant trees at each level, but need not alter what folks see till they list after the last part of a query mod, which some syntactic tag would flag. All this sugar gets stripped in the intermediate layer betw. RMS and XQP. 4. A virtual device with its own FDT processing ahead of normal stuff could be used. 5. Underlying disks can have ODS-2 but still appear to have long filenames, extra attributes, etc. 6. RDBMS to be used should be a somewhat generic interface.Supply a small cheap dbms (or maybe just dir stuff in spiralog) but let them hook in big expensive RDB if they have it. Or ingres or oracle or sybase or whatever... 7. If you do queries in an rdbms, open might stall till you're done with that. Has to happen now anyway, though, but you want to prefer an rdbms that can have many threads active... 8. Backending arbitrary other storage systems is possible too with a little more pseudo-dir syntactic sugar thrown in. 9. Obviously RMS uses the read-dir primitives it needs for Spiralog anyhow, and still needs to be able to pass longer names with more characters in them. The intercept has to generate what readdir would create. Also needs to be able to handle any ACP level requests for file attributes that cover new attributes needed. 10. The device must be everywhere on a cluster and must synch all dbms access etc., though a user mode piece might do this; recall, we're basically dealing with open/close, not so much with r/w... so delays are normal. I suggest queueing the server part with a mailbox queue or the like could be suitable...just wait & retry if it hangs, or allow error return after too many tries.