From: Bill Todd [billtodd@foo.mv.com] Sent: Sunday, March 12, 2000 1:33 PM To: Info-VAX@Mvb.Saic.Com Subject: a common, sharable file system for Compaq Prologue Back in the mid-'80s, I tried (and failed) to convince DEC that it should develop a common file (and record) system across VMS, Unix, and MS-DOS/Windows to help integrate its offerings and position VMS as the premier file/data server to the emerging commoditized world. More recently, I've suggested to the VMS community an extension of this idea that includes heterogeneous concurrent disk access as a means of broadening VMS's appeal via concurrent data-sharing with environments that support applications and popular interfaces that VMS does not, but little interest has been evident. It strikes me that I may just be talking to the wrong people: VMS is sometimes a bit insular, sufficiently different from the rest of the world that real coexistence may seem an elusive goal at best, perhaps sensitive (with some justification) to the idea that coexistence may simply be the first step toward 'migration', and probably painfully aware of the fact that it doesn't drive corporate decisions in anything like the way it did in the old days. But while these could explain a lack of enthusiasm in that quarter, they do not change the fact that heterogeneous shared data is going to happen and that Compaq has an opportunity to lead it. Why now? Because others in the industry are not waiting for standardization in this area. IBM just acquired Mercury Computer's SANergy product, a heterogeneous NT/Unix shared-disk file system that uses an NT system to coordinate the file system meta-data, and has announced that it plans to support heterogenous Netfinity/RS6000 SP clustering via the SP switch that both products use (though to make both products play together they would either have to modify SANergy to use RS/6000's 'virtual shared disk' mechanism or modify the architecture to support direct SAN access to storage devices). Over about the past three years several other SAN-style shared-disk heterogeneous file systems have sprouted up to serve the particular needs of on-line non-linear video editing markets, from MountainGate (recently acquired by Adic), Transoft Networks (just acquired by HP), EMC (may not be heterogeneous, but certainly a SAN file system), Avid (file system purchased from Polybus), plus likely others that don't come immediately to mind. And vendors such as Veritas and Sun have SAN/cluster products in the works, at least according to their road maps - though not necessarily heterogeneous ones. Why Compaq? Because Compaq, perhaps more than any other vendor, offers multiple operating systems that could all benefit from the synergy of being able to complement each other's strengths in an integrated data-sharing environment: VMS has its rock-solid performance and scalability, Tru64 and Linux their 'open' character, and NT its commodity strengths - and all potentially overlap in market segments from the low-end on up. Because Compaq, unlike IBM, Sun, and H-P, sells only little-endian systems in the low-end-to-upper-mid-range market (the big-endian Tandem offerings tend to be both higher-end and more niche-oriented, but still could be easily included if desired): while the common file system itself can (and likely should) be built to function in a mixed-endian environment, applications tend to have a much easier time cooperating in a homogeneous-endian environment. Because Compaq has unique experience with shared-disk cluster/cluster-like environments plus breadth of file system technology from which to draw proven approaches and algorithms: ODS-2/5, AdvFS, and the shelved VMS/NT combined cluster file system work from Scotland - which otherwise was unrelated to Spiralog, though that too might contribute. Because Compaq already plans to offer VMS/Tru64/Linux mixed-system Wildfire/Marvel environments - and since 64-bit Win2K is still being (or at least until very recently has been) actively developed on Alpha, there's clearly no technical reason that it could not be included as well should Compaq and Microsoft reconsider their decision to drop it: if there's justification for that kind of integration on the hardware, many of the same arguments would seem likely to apply to heterogeneous data integration as well - and such shared data-access support across the platforms could well eliminate any demand for actual heterogeneous *clustering* (a far tougher nut to crack). Because Compaq reportedly feels that *some* new file system is required for VMS in order to side-step certain 32-bit inherited scaling limitations in ODS-2: if VMS developers are going to put in the effort to build one (even if they leverage existing code from the Scotland effort) and VMS users are going to be asked to suffer the inevitable pain of migrating to new on-disk storage structures (even if the application interface is strictly upward-compatible), it would be nice if the result were a product of more significance. Because Compaq is going to need something to make its NT offerings stand out from the crowd: IBM, as mentioned above, is moving forcefully toward proprietary NT differentiators in its On-Forever, X-Architecture, Netfinity Cluster Extensions, and NT/RS6000 cluster integration hardware and software efforts, while Dell seems unlikely to relent in its pursuit of commodity NT sales. Offering features such as one-touch Internet access seems unlikely to cut the mustard in this arena, but a highly-scalable, high-performance, high-availability shared data-access environment might, especially if it included data-sharing with a major Unix and a major high-end OS (neither of which seems to be in Microsoft's plans: they apparently don't appreciate shared-disk architectures, and have never been inclined to play any better with others than suits their own objectives). Because unless Compaq sees its future more and more as simply providing services, it also needs something to kick Tru64 into the forefront of Unix contention and to jolt VMS enough to turn it around : good as it is, Alpha has never been sufficient to do this on its own. Because one only has to look at Oracle to realize the central position that is held by those who set the standards in data storage: as SANs take hold, *someone* (quite possibly Veritas) is going to occupy that position if Compaq does not, and Compaq will not be the better for it (especially if the solution offered does not include VMS and Tru64). What advantages would Compaq's common file system have over the competition? The problem with partitioned file systems (what most Unix cluster implementations offer) is that they scale neither as gracefully (physically repartitioning data to accommodate increased access intensity is a real pain) nor as flexibly (since access to fixed portions of the data is controlled by single processing nodes) as shared-disk file systems, where as many nodes as wish to can directly access - and cache - any piece of data and storage can be added as needed without hard-wiring it to new or existing nodes. The direct access to all data is particularly beneficial in workgroup-style and application-server clusters, where the user or application instance executing at a particular node need not access data though some other node that 'serves' it - but still offers application-server environments the option to have particular nodes 'specialize' in different portions of the shared data to minimize inter-cache traffic and distributed locking activity. And while the distinction between fully-shared access that includes distributed shared meta-data management (VMS and S/390 Parallel Sysplex are the only examples of this I know of) and systems such as SANergy and the non-linear video editing products I mentioned where file data is accessed directly but meta-data is managed centrally is less dramatic, centrally-managed meta-data suffers from potential server bottlenecking and from cluster-wide temporary paralyzation while meta-data server context is re-built after a server failure, whereas with distributed meta-data management no such central bottleneck exists and a failure need freeze only the data being actively managed by the failed node rather than the entire file system. Compaq thus has the opportunity to build a product that is simply better (more scalable, more available, better performing) than any existing (or to the best of my knowledge proposed) competition, and then extend it to support integrated distributed record/object mechanisms if it wants to (since the architecture makes such high-performance extensions possible). How does this fit in with other industry efforts? Industry standardization at the SAN file system level seems to be stuck at the starting gate. Standardization of SAN storage management, however, is on the move: while the once-sensible NASD (Network Attached Secure Disk) design effort at CMU has wandered into a Never-Never Land where major file system functionality will, supposedly, migrate into intelligent disks (which, unfortunately, will then have to interact with each other at a level that would not be necessary with cleaner functional layering), start-ups like DataCore are forging ahead with actual products that serve flexible and extendible logical-volume facilities across a SAN while leaving higher-level coordination to the higher levels that need to perform it anyway. Thus the host-resident SAN file system and SAN logical volume management facilities of the common file system will dovetail cleanly with the low-level extendible logical volume space facilities provided by both new entrants like DataCore and existing SAN-friendly storage products from StorageWorks, EMC, IBM, StorageTek, and other vendors, and can be developed without waiting for consensus to solidify (or not) around the arguably inferior and certainly far-farther-from-standardization NASD architecture. What are potential problems? One sentiment expressed while the combined NT/VMS file system work was being considered was that no one in their right mind would introduce a relatively flakey NT system into an otherwise-dependable VMS cluster. While in some situations this stance could probably be considered unnecessarily extreme, it is worth noting that sharing certain volumes (and the file systems on them) between systems is a considerably less intimate relationship than that of clustering: an unreliable system can at worst destroy the data on those volumes that you're sharing with it, rather than compromise your system in its entirety. For multiple reasons it seems unlikely that the common system would *replace* ODS-2/5 in the VMS environment (and indeed it would likely not provide boot services in *any* environment, at least initially), and it's also reasonable to configure multiple common file system volumes not all of which are shared with all participants. Compatibility is always an issue in cross-system products, but Network Appliance Corp. has shown that NT and Unix clients can each use their own byte-range locking idioms to share data concurrently on its servers without apparent problems (they've sought patent protection for this mechanism, but it may be unnecessarily complex anyway, which would make that a non-issue) and OSF DCE DFS Access Control Lists (ACLs) manage to function across a wide range of environments, indicating potential for successful resolution of mixed-system security issues. File names are increasingly compatible across systems, and one possible approach is simply to allow virtually anything and then advise people what characters to avoid if they want cross-system availability. But version numbers aren't as pliable, and given that a) 99+% of the world either has never heard of version numbers or considers them a real pain while b) 1-% of the world (mostly VMS users) considers them a way of life but is also familiar with environments that lack them, c) VMS itself has enough networking contact with other non-versioning systems that many/most utilities and applications make at least some effort to deal with file systems that lack them, and d) VMS will almost certainly need to retain ODS-2/5 facilities anyway it seems fairly obvious that the common file system should not support version numbers. [The VMS contingent may howl bloody murder at this, but they've lost their opportunity to impose their views on the rest of the world. During the decades of DEC's greatest success, its official motto was "Do the right thing" - but a corollary seemed to be a "Have it your way!" attitude toward customers. VMS needs to come out of its shell and re-learn these principles if it wants to survive - at least in my opinion.] I suppose I could keep polishing this proposal forever, but the above covers the high points. High-performance, high-availability, extremely scalable data storage ought to be a real product-differentiator, especially as the world turns to web services, and the ability to share it across product lines ought to be a real vendor-differentiator - nothing to sneeze at in these times of increasing single-vendor corporate preference. So tell me what's wrong with it. - bill