<<< EVMS::DOCD$:[NOTES$LIBRARY]SCSI_ARCHITECTURE.NOTE;1 >>> -< SCSI ARCHITECTURE >- ================================================================================ Note 54.0 Problem Statement candidate for SCSI "get well" 5 replies EVMS::EVERHART 44 lines 20-OCT-1995 14:04:14.85 -------------------------------------------------------------------------------- Here is an initial candidate problem statement. Please comment and make suggestions if you have any. Also remember that if you can start early on thinking about investigation report type issues, that would help with the schedule... Glenn ------------------------------ SCSI Subsystem Proactive Maintenance PROBLEM STATEMENT The OpenVMS SCSI subsystem has become difficult to maintain, understand, or extend and needs to be made simpler in all these categories. Basic flaws which now exist have the following causes: 1. No written high level design (architecture) nor component designs exist save the code sources themselves. Some oral design tradition exists, but essentially all key individuals who originated this tradition have left, so that even the oral design tradition is weak and partial. Much of the subsystem was implemented with no available overall picture of the subsystem at the top level. 2. The code base has had a long history of changes in spite of this, which have not been implemented consistently. Not surprisingly, component interfaces are not clean or consistent, and information passed by side effects of data manipulations is common. 3. The difficulty of understanding the environment of changes has made them slow to implement and fragile when implemented. This has led to schedule pressure which has at times made it necessary to perform partial, not complete, fixes for problems. As a result, the SCSI group has difficulty making fixes or enhancements to the code base. People new to the SCSI group must undergo a long process of code reading to get up to speed, which limits what resources can be applied to designs or to reviews. Customers see many desired features slowly or not at all. Most notably, support for wide SCSI, extra SCSI LUNs, added features on disks and tapes, new device types, and commodity SCSI devices is able to be added slowly at best. Also, it has been difficult to provide information needed for third parties to write new class drivers for VMS SCSI or to be as quick as we should be to handle the customer problem report backlog. The problems are thus visible both internally and to customers. ================================================================================ Note 54.1 Problem Statement candidate for SCSI "get well" 1 of 5 STAR::S_SOMMER 25 lines 23-OCT-1995 10:08:17.23 -< Some afterthoughts >- -------------------------------------------------------------------------------- I think the problem statement in .0 does a good job of capturing what we talked about on Thursday. As luck would have it :-), I've been mulling over some of my own thoughts about this even since Thursday, and I can't resist adding a couple of comments: 1. I wonder if we want to modify the description of fragility and unmain- tainabilty we have created. As it stands, we portray the SCSI system as across-the-board difficult to modify. My sense is that there are a few areas that are especially delicate (bus reset, queue manager, busy-bit issues) but that, overall, changes can be made to the system with reasonable confidence. 2. Also I realize that we've focused perhaps too much on what the developer wish list is, and maybe not enough on the customer wish list. If we ask customers what would improve SCSI, I don't think they would be primarily complaining about stability (V6.2 SCSI clusters, for example, have received no CLDs, partly due to SCSI-2's robustness, according to Tom Coughlan). Instead, I think we'd be getting answers such as: we need wide support, more SCSI cluster enabling features (target mode, low profile bus reset, device failover), tape performance features such as density support and skipfile. Our current problem statement suggests we haven't done these things so far because it is hard to add features to such a fragile system; my sense of why we haven't added these is more simply because they haven't ever made it into a project plan. ================================================================================ Note 54.2 Problem Statement candidate for SCSI "get well" 2 of 5 STAR::YURYAN 3 lines 23-OCT-1995 13:06:04.85 -< more on customer comments >- -------------------------------------------------------------------------------- To add to Sue's comments in 54.1 item #2 - see note 21.3 and .4 for customer comments and wish list... ================================================================================ Note 54.3 Problem Statement candidate for SCSI "get well" 3 of 5 EVMS::RLORD "Rick Lord" 26 lines 23-OCT-1995 15:11:24.45 -< Comments on base note and reply #1 >- -------------------------------------------------------------------------------- I don't want to jump the gun and go from problem statement to proposed solution, but I'm afraid that one point in the base note and one in the first reply suggest easier solutions to the problem than I think are realistic. Re: .0 A missing point is that not only is there no written high-level design, there is no comprehensive high level design - written or otherwise. This is an important point if the first item in .0 implies that just writing down the design as it exists today would resolve it. It wouldn't. I'm not saying that this is what Glenn meant, by the way, just that it could be interpreted that way. Re: .1 Lack of documentation does make the SCSI drivers seem fragile, but there's also some code in there that really is fragile - and knowing that taints the rest of the code. I know that even when I'm making what I suspect to be a straighforward change I tend to poke around and see what else runs the code I'm changing, how many names the data I'm changing is known by, what else references that data by any of it's names, etc. It takes longer than it should. I just don't want to underestimate the end effect of having fragile code in there. ================================================================================ Note 54.4 Problem Statement candidate for SCSI "get well" 4 of 5 EVMS::EVERHART 40 lines 23-OCT-1995 16:18:45.24 -< OK, another try; not real different from .0 but some mods. >- -------------------------------------------------------------------------------- SCSI Subsystem Proactive Maintenance PROBLEM STATEMENT The OpenVMS SCSI subsystem has become difficult to maintain, understand, or extend and needs to be made simpler in all these categories. Basic flaws which now exist have the following causes: 1. No comprehensive written high level design (architecture) nor component designs exist save the code sources themselves. Some oral design tradition exists, but essentially all key individuals who originated this tradition have left, so that even the oral design tradition is weak and partial. Much of the subsystem was implemented with no available overall picture of the subsystem at the top level. Thus no comprehensive design exists for VMS SCSI now at all. 2. The code base has had a long history of changes in spite of this, which have not been implemented consistently. Not surprisingly, component interfaces are not clean or consistent, and information passed by side effects of data manipulations is common. 3. The difficulty of understanding the environment of changes has made them slow to implement and often fragile when implemented. Some areas remain maintainable, but the difficulty of maintenance is growing, and the learning curve for the code base is steep. This has led to schedule pressure which has at times made it necessary to perform partial, not complete, fixes for problems. As a result, the SCSI group has difficulty making fixes or enhancements to the code base. People new to the SCSI group must undergo a long process of code reading to get up to speed, which limits what resources can be applied to designs or to reviews. Partly due to these problems, customers see many desired features slowly or not at all. Most notably, support for wide SCSI, extra SCSI LUNs, added features on disks and tapes, new device types, and commodity SCSI devices is able to be added only slowly. Also, it has been difficult to provide information needed for third parties to write new class drivers for VMS SCSI or to be as quick as we should be to handle the customer problem report backlog. The problems are thus visible both internally and to customers. ================================================================================ Note 54.5 Problem Statement candidate for SCSI "get well" 5 of 5 EVMS::EVERHART 40 lines 24-OCT-1995 14:41:22.20 -< Problem statement after wordsmithing. >- -------------------------------------------------------------------------------- SCSI Subsystem Proactive Maintenance PROBLEM STATEMENT The OpenVMS SCSI subsystem has become difficult to maintain, understand, and extend. These problem areas, which are visible to internal users and to customers, need to be simplified and improved. Specifically, existing SCSI subsystem flaws include the following: 1. No comprehensive written high-level OpenVMS SCSI design or indvidual component designs exist, except the code sources themselves. Although some oral design tradition is available, it is weak and partial---mostly because all the key individuals who originated the oral tradition have left. Most of the SCSI components were implemented with no overall picture of the subsystem at the top level. Therefore, no comprehensive OpenVMS SCSI design exists at all. 2. Without a project design, the SCSI code base has had a long history of changes that have not been implemented consistently. Not surprisingly, component interfaces are not clean or consistent, and information passed by side effects of data manipulations is common. 3. Understanding the complex SCSI code environment makes the process of implementing changes slow, and changes are often fragile when implemented. Some areas of code remain maintainable, but the difficulty of maintenance is growing. These change implementation and maintenance problems sometimes create schedule pressures that result in partial and incomplete fixes for important problems. As a result of these issues, the SCSI group has difficulty making fixes or enhancements to the code base. People new to the SCSI group undergo a long process of code-reading to get up to speed. This steep learning curve sharply reduces the amount of resources available to design, review, or implement code, which means that OpenVMS customers see desired SCSI enhancements slowly or not at all. Most notably, support is delayed for features such as wide SCSI, extra SCSI LUNs, enhancements on disks and tapes, new device types, and commodity SCSI devices. It is also difficult to distribute necessary information to third parties writing new OpenVMS SCSI class drivers or to provide quick responses to the customer problem report backlog. ================================================================================ Note 55.0 Extrema of plan #1: keep all the old stuff No replies EVMS::EVERHART 11 lines 24-OCT-1995 13:10:28.50 -------------------------------------------------------------------------------- This is (near as my notes allow) the first extreme position possible in "SCSI get-well" options: Keep all the old code, but document it. -pro -con -How does it address the problem stmt? -What does it involve? Replies can address these or other questions ================================================================================ Note 56.0 Extreme position #x: clean sweep 1 reply EVMS::EVERHART 7 lines 24-OCT-1995 13:12:33.05 -------------------------------------------------------------------------------- This is the notion of all new code from a completely new design, to be implemented and released all at once "someday". -pro -con -How does it address problem? -What does it involve ================================================================================ Note 56.1 Extreme position #x: clean sweep 1 of 1 EVMS::TGOODWIN 25 lines 24-OCT-1995 14:33:49.68 -< Details of the clean sweep option >- -------------------------------------------------------------------------------- This option starts with the development of a complete and mature SCSI architecture and then a complete set of design and interface documents. Once all of these documents are complete and reviewed, then a complete set of new SCSI drivers would be written. Under this option only the highest priority CLDs/QARs would be fixed in the old code while the new design and code were under development. Advantages ---------- - New drivers would be 100% compliant with the architecture and design - A complete set of SCSI documents would be created - Customer and third party developer impact would occur only once Disadvantages ------------- - No benefit to customers for a long time. No new functionality or improvements until all code is released. ( No changes in the next few releases). - No immediate benefit to maintainability for the following reasons: Old code still must be maintained for a while longer New code, when released, will require some time to shake out bugs - The impact of doing it all in a single release will necessitate a large and extended external field test - Impact to customers and third party developers could be sizeable. ================================================================================ Note 57.0 Investigative report option #3: New arch & design; Incremental code updates 3 replies STAR::TGOODWIN 4 lines 24-OCT-1995 14:39:25.57 -------------------------------------------------------------------------------- This is a place holder for option #3 until I flesh out the details. Tune in tomorrow. Same bat time. Same bat station. ================================================================================ Note 57.1 Investigative report option #3: New arch & design; Incremental code updates 1 of 3 STAR::TGOODWIN 43 lines 25-OCT-1995 09:20:21.18 -< IR Option #3: Details >- -------------------------------------------------------------------------------- This option would start with the development of a complete SCSI architecture and a high level design document. These documents would represent what we believe to be the best way to implement SCSI under OpenVMS and would not be constrained by the current implementation. Once these documents were in place, a few key areas of the current implementation would be targeted for reimplementation for each release. Areas which are high maintainence in the current implementation will be given priority along with areas which are prerequisites for others. When an area is reworked it would entail generating a detailed design from the high level design and then modifying or completely rewriting the code to match the new design. Advantages ---------- - Some areas would be reworked for each new release (post-Gryphon) - Some new functionality can be made available for each release - All work would include a top down design - Customer and third party developer impact for each release will be localized to the areas of change - Extensibility and maintainability will improve gradually Disadvantages ------------- - The code will not match the documents for the next few years and may never fully match the design - Implementation of the entire design will take longer than the clean sweep approach due to multiple integration phases - Customers and third party developers will be impacted multiple times - The complexity problem of the SCSI code base (see numbered paragraph 3 of the problem statement) will continue to exist as we make our initial changes ================================================================================ Note 57.2 Investigative report option #3: New arch & design; Incremental code updates 2 of 3 STAR::S_SOMMER 8 lines 25-OCT-1995 11:57:22.53 -< Time estimate question >- -------------------------------------------------------------------------------- Tom, I had a question about this one regarding the time frame. You mentioned that the project would start with a complete SCSI architecture and a high level design document. Did you have a ballpark estimate on how long these would take to write? -Sue ================================================================================ Note 57.3 Investigative report option #3: New arch & design; Incremental code updates 3 of 3 STAR::TGOODWIN 12 lines 25-OCT-1995 15:37:37.53 -< Time Frame and Initial Projects >- -------------------------------------------------------------------------------- I response to Sue's questions, I think that work on a complete SCSI architecture would probably run into at least January of next year. The projects for the 7.2 release would be the high level design document, interface design documents and data structure definitions needed to support the new architecture. The only coding changes for the 7.2 release would be restructuring of the data structures to conform to the design and to enforce data access rules. I feel this would also leave some SCSI developers available to implement some business critical new functionalities. ================================================================================ Note 58.0 Approach #4, base a new design on selected elements of the current design No replies EVMS::RLORD "Rick Lord" 79 lines 24-OCT-1995 15:40:50.39 -------------------------------------------------------------------------------- 24-Oct-95 Tuesday It is possible to highlight the problem areas of the current driver, to clean them up, document them nicely and come away with an improved code base. That's probably the quickest approach, but I don't think it's a good long-term solution, and it doesn't address the extensibility issue of our problem statement at all. It would also be possible to scratch the current design completely and come up with an entirely new architecture. That's a good long-term solution, but it's probably not very cost-effective, and it would take a long time to realize any benefit from it. As is pointed out in the problem statement, one of the major problems with the current code base is that there is no comprehensive, top-level design which considers all of the major blocks of functionality that go into providing SCSI access. I think that's where we have to start. However ugly and unmaintainable it might be, though, the current code base somehow seems to work pretty well for most people. I think that there are some pretty good concepts in it, well worth keeping. I'll mention (yes, again) the notion of the SCDT, STDT and SPDT hierarchy. It works. It parallels the SCSI standard nicely. It's understandable. I'll bet that as people implemented major functions - say, Buzzy with Bus Reset or Sue with Mode Sense - they probably not only learned more about that functionality than even a careful reader would get from the standard, but they identified specific shortcomings in the current design. One approach that I think we should consider is creating a new design which we know right from the start is going to encompass a lot of the current design. We'd still start with a clean sheet of paper, but we'd begin by adding to the paper those things about the current code that were basically good and correcting their shortcomings along the way. For example, the data structures mentioned above: the hierarchy makes sense, but what about it doesn't work well? Simple little things like inconsistency in naming fields has always driven me nuts - why are some things DEV and others DEVICE? It makes every field a special case. And it's not documented anywhere which bits are status bits and which are control bits. Who has read access to which fields? Write access? Are there unnecessary fields? Are there fields which are missing or in the wrong structure? Are logically-related fields grouped so they're close together when you look at them from SDA? Another one: there's nothing wrong with having a queue manager - it's just not necessary for adapters which don't support TCQ or for adapters which handle queuing themselves. So why not say we'll move the queue manager to the new design, but also provide a way for ports that don't need it to just bypass it completely. As things were moved over to the new design they'd be integrated with whatever was already there, so we'd obviously want to deal with the most important, low level things first. The design would be documented as it took shape, and when it was complete enough to address the needs of each type of adapter we could work out an implementation plan. Because it would include quite a bit of the current design, it may be possible to incrementally replace the current code with the new code. It's important to note that the new design would not be committed to salvaging everything from the current code, nor would the current code be the only source it could draw from - new ideas would be included as appropriate, and between the architecture notes file and the wish list of things that people would like implemented we've got a boatload of them. All of them could be tested against the new design ahead of time so we wouldn't have to retrofit any hacks. The criteria I'd use are: 1) Identify the issue (Data Structures, Queue Manager, Bus Reset, etc.) 2) How does it work now? (implementation details, not "OK I guess") 3) Is it implemented by the current code? 4) If so, is it worth salvaging or is it just a complete hack? 5) If it's worth salvaging, how could it be improved? 6) If it isn't implemented, what would allow a clean implementation? ================================================================================ Note 59.0 Position #(pi) re alternative approaches to dealing with SCSI get-well 1 reply EVMS::EVERHART 59 lines 24-OCT-1995 16:00:11.13 -------------------------------------------------------------------------------- Alternative #pi This plan for meeting the problem stmt is that we proceed in 2 steps. 1. Create a top level design document which describes the framework for specific SCSI subsystem mods. It should give broad rules of thumb and include at least the port-class interface and some statements about data structures as well as the more generic rules of thumb about design principles. It should not be constrained to describing the top level design of the Zeta implementation only, but should describe a design which is implementable incrementally from Zeta. 2. Create a series of modules which will replace pieces of the Zeta SCSI implementation incrementally. As part of this creation, LOP statements of need and design would be needed and it is left till those investigation reports to decide whether functions (e.g. flow control) or code modules (e.g. MKdriver) get replaced. The design documents for these modules will become later chapters in the ultimate SCSI design handbook (or whatever it gets called) Advantages: 1. A framework document exists early in the cycle. (Indeed, we can and should crib it from the architecture document & studies that are now wholly or partly done with a few additions to fill in port-class detail and maybe some more words about data structures.) 2. The principle of getting to an incremental solution is preserved. 3. The possibility of including new functions or improving lower level details is wide open subject to the one constraint that really matters. 4. Some ("obsolescent") parts of the SCSI code base may be carried pretty much "as is" indefinitely. Disadvantages: 1. Any really revolutionary mods may be excluded. 2. More changes may be included than would happen if one stuck with the Zeta code, as more ideas may be included. The ideas may be good, but the changes will be larger and deeper. How it addresses problems: This provides documentation for the SCSI subsystem, and allows for growth of the documents so that design detail from "real world" coding experience will be preserved in the documents. The design constraint provides a path forward. What does it involve: The umbrella document part can be achieved most simply if the entire group gets involved in a wordsmithing pass over the existing document and locates areas to be added. Some words exist for most of the topics in Rick's note (53.1) in there, and it can be used to ensure that rules of thumb about each of them exist. The detail documents need LOP for each, and perhaps it would be wisest to start discussing whether functions or entire components should be replaced. I have not too many prejudices except that ultimately all of a component will need to conform, not 50-60%, so that a pass over components will eventually be necessary for at least those components which are intended to be developed further. ================================================================================ Note 59.1 Position #(pi) re alternative approaches to dealing with SCSI get-well 1 of 1 EVMS::EVERHART 5 lines 26-OCT-1995 08:25:49.00 -< Doneness criterion >- -------------------------------------------------------------------------------- The top level document described in .0 would I think be considered "done" (though subject to correction & modification) once it had externally visible interfaces described. That'd mean certainly the port/class interface and at least general features of data structures and could optionally include some internal interfaces. ================================================================================ Note 60.0 #2: Initial document + incremental changes 2 replies STAR::S_SOMMER 3 lines 24-OCT-1995 17:02:26.52 -------------------------------------------------------------------------------- The next note will describe the approach involving an initial "umbrella" document, plus incremental code changes and incremental expansion of the initial document (approach #2 we outlined today). More to come... ================================================================================ Note 60.1 #2: Initial document + incremental changes 1 of 2 STAR::S_SOMMER 43 lines 24-OCT-1995 20:55:24.17 -< Details of this approach >- -------------------------------------------------------------------------------- An incremental approach, comprised of the following: 1. Produce a high-level design document which: a.) is comprehensive in its breadth rather than in its depth, b.) serves as a description of the desired future functionality of the SCSI subsystem, and c.) specifies its own doneness criteria. That is, it contains a list of design-oriented projects which, when complete, would adequately solve the original problem as stated in our Problem Statement. 2. For each major release of OpenVMS, target a reasonable number of projects from the above document. In addition to these design- related projects, it is to be expected there will be a list of projects based on new features/functionality requests. A fair mix of these two kinds of projects will need to be chosen; the latter has a more visible and direct impact in the area of customer perception; the former is more subtle and indirect, but is ultimately of equal importance. 3. As each such project is completed, it should be accompanied by a detailed design spec. For projects which grow out of the original high-level design document, this spec should be suitable for inclusion as an additional chapter to the original document. In this way, the document will expand slowly over the next several major releases of OpenVMS. For projects which grow out of requests for new functionality, their accompanying detailed spec might either be included as a design spec chapter, or else kept in some separate area, depending on the project's relevance to design issues outlined in the original high-level document. Consequently, the deliverables for V7.2 would become threefold: a high-level document, a selection of specific projects, and a corresponding selection of design/functional specs to be appended to the high-level document (or elsewhere collected). The high-level document is a prerequisite for all ensuing project work, so most of it would have to be completed several months earlier (around the February 1996 time frame, for example) than the other V7.2 deliverables. One might hope that the highest priority proposed projects could be specified early in the develop- ment of the document, so that LOP work on those specific projects could start even before the entire high-level document is complete (say, no later than January 1995 for the start of LOP work). ================================================================================ Note 60.2 #2: Initial document + incremental changes 2 of 2 STAR::S_SOMMER 29 lines 26-OCT-1995 07:12:13.74 -< Pros and cons >- -------------------------------------------------------------------------------- (Pros and cons) Pros: 1. Produces documentation for a whole system. 2. Allows for coding project deliverables in V7.2. 3. Improves maintainability and extensibility. 4. Allows for new functionality projects in addition to design rework. Cons: 1. In early stages, documents only the future, not the present. (In a real world, I don't think we can have both though.) 2. Allows the initial design to be non-detailed. (I think a full detailed design would take at least a year. Not only would that preclude any V7.2 coding deliverables, I'm not convinced how much more it would actually benefit us.) 3. The final expanded document would be uneven in its coverage, given that it would be an overview plus perhaps a dozen detailed specs. If it appears that important areas might be left undocumented, maybe this could be remedied by including one or two document-only projects per release. It would be nice to have detailed design specs for our top 4 or 5 drivers (say, DK/MK/GK/PKQ/PKZ), while declaring others end-of-life for documentation purposes; the good news in this area is that some of these specs partly or wholly exist already and just need a commitment to be updated.