<<< EVMS::DOCD$:[NOTES$LIBRARY]SCSI_ARCHITECTURE.NOTE;1 >>>
                             -< SCSI ARCHITECTURE >-
================================================================================
Note 54.0        Problem Statement candidate for SCSI "get well"       5 replies
EVMS::EVERHART                                 44 lines  20-OCT-1995 14:04:14.85
--------------------------------------------------------------------------------
Here is an initial candidate problem statement. Please comment and make
suggestions if you have any. Also remember that if you can start early on
thinking about investigation report type issues, that would help with the
schedule...

Glenn 
------------------------------
SCSI Subsystem Proactive Maintenance

PROBLEM STATEMENT

The OpenVMS SCSI subsystem has become difficult to maintain, understand,
or extend and needs to be made simpler in all these categories.

Basic flaws which now exist have the following causes:

1. No written high level design (architecture) nor component designs
exist save the code sources themselves. Some oral design tradition
exists, but essentially all key individuals who originated this tradition
have left, so that even the oral design tradition is weak and partial.
Much of the subsystem was implemented with no available overall picture
of the subsystem at the top level.

2. The code base has had a long history of changes in spite of this,
which have not been implemented consistently. Not surprisingly, component
interfaces are not clean or consistent, and information passed by side
effects of data manipulations is common.

3. The difficulty of understanding the environment of changes has made
them slow to implement and fragile when implemented. This has led to
schedule pressure which has at times made it necessary to perform
partial, not complete, fixes for problems.

As a result, the SCSI group has difficulty making fixes or enhancements
to the code base. People new to the SCSI group must undergo a long
process of code reading to get up to speed, which limits what resources
can be applied to designs or to reviews. Customers see many desired
features slowly or not at all. Most notably, support for wide SCSI, extra
SCSI LUNs, added features on disks and tapes, new device types, and
commodity SCSI devices is able to be added slowly at best. Also, it has
been difficult to provide information needed for third parties to write
new class drivers for VMS SCSI or to be as quick as we should be to
handle the customer problem report backlog. The problems are thus visible
both internally and to customers.
================================================================================
Note 54.1        Problem Statement candidate for SCSI "get well"          1 of 5
STAR::S_SOMMER                                 25 lines  23-OCT-1995 10:08:17.23
                            -< Some afterthoughts >-
--------------------------------------------------------------------------------
    I think the problem statement in .0 does a good job of capturing
    what we talked about on Thursday.  As luck would have it :-), I've been
    mulling over some of my own thoughts about this even since Thursday,
    and I can't resist adding a couple of comments:
    
    1.  I wonder if we want to modify the description of fragility and unmain-
    tainabilty we have created.  As it stands, we portray the SCSI system
    as across-the-board difficult to modify.  My sense is that there are a
    few areas that are especially delicate (bus reset, queue manager,
    busy-bit issues) but that, overall, changes can be made to the system
    with reasonable confidence.
    
    2.  Also I realize that we've focused perhaps too much on what the
    developer wish list is, and maybe not enough on the customer wish list.
    If we ask customers what would improve SCSI, I don't think they would
    be primarily complaining about stability (V6.2 SCSI clusters, for
    example, have received no CLDs, partly due to SCSI-2's robustness,
    according to Tom Coughlan).  Instead, I think we'd be getting answers
    such as:  we need wide support, more SCSI cluster enabling features
    (target mode, low profile bus reset, device failover), tape performance
    features such as density support and skipfile.  Our current problem
    statement suggests we haven't done these things so far because it
    is hard to add features to such a fragile system;  my sense of why
    we haven't added these is more simply because they haven't ever made
    it into a project plan. 
================================================================================
Note 54.2        Problem Statement candidate for SCSI "get well"          2 of 5
STAR::YURYAN                                    3 lines  23-OCT-1995 13:06:04.85
                         -< more on customer comments >-
--------------------------------------------------------------------------------
    To add to Sue's comments in 54.1 item #2 - see note 21.3 and .4 for 
    customer comments and wish list... 
    
================================================================================
Note 54.3        Problem Statement candidate for SCSI "get well"          3 of 5
EVMS::RLORD "Rick Lord"                        26 lines  23-OCT-1995 15:11:24.45
                    -< Comments on base note and reply #1 >-
--------------------------------------------------------------------------------

	I don't want to jump the gun and go from problem statement to proposed
	solution, but I'm afraid that one point in the base note and one in the
	first reply suggest easier solutions to the problem than I think are
	realistic.

	Re: .0

	A missing point is that not only is there no written high-level design,
	there is no comprehensive high level design - written or otherwise.
	This is an important point if the first item in .0 implies that just
	writing down the design as it exists today would resolve it. It 
	wouldn't. I'm not saying that this is what Glenn meant, by the way,
	just that it could be interpreted that way.

	Re: .1

	Lack of documentation does make the SCSI drivers seem fragile, but
	there's also some code in there that really is fragile - and knowing
	that taints the rest of the code. I know that even when I'm making
	what I suspect to be a straighforward change I tend to poke around and
	see what else runs the code I'm changing, how many names the data I'm
	changing is known by, what else references that data by any of it's
	names, etc. It takes longer than it should. I just don't want to
	underestimate the end effect of having fragile code in there.

================================================================================
Note 54.4        Problem Statement candidate for SCSI "get well"          4 of 5
EVMS::EVERHART                                 40 lines  23-OCT-1995 16:18:45.24
        -< OK, another try; not real different from .0 but some mods. >-
--------------------------------------------------------------------------------
SCSI Subsystem Proactive Maintenance

PROBLEM STATEMENT

The OpenVMS SCSI subsystem has become difficult to maintain, understand,
or extend and needs to be made simpler in all these categories.

Basic flaws which now exist have the following causes:

1. No comprehensive written high level design (architecture) nor component
designs exist save the code sources themselves. Some oral design tradition
exists, but essentially all key individuals who originated this tradition
have left, so that even the oral design tradition is weak and partial.  Much
of the subsystem was implemented with no available overall picture of the
subsystem at the top level. Thus no comprehensive design exists for VMS SCSI
now at all.

2. The code base has had a long history of changes in spite of this,
which have not been implemented consistently. Not surprisingly, component
interfaces are not clean or consistent, and information passed by side
effects of data manipulations is common.

3. The difficulty of understanding the environment of changes has made them
slow to implement and often fragile when implemented. Some areas remain
maintainable, but the difficulty of maintenance is growing, and the learning
curve for the code base is steep. This has led to schedule pressure which
has at times made it necessary to perform partial, not complete, fixes for
problems. 

As a result, the SCSI group has difficulty making fixes or enhancements to
the code base. People new to the SCSI group must undergo a long process of
code reading to get up to speed, which limits what resources can be applied
to designs or to reviews. Partly due to these problems, customers see many
desired features slowly or not at all. Most notably, support for wide SCSI,
extra SCSI LUNs, added features on disks and tapes, new device types, and
commodity SCSI devices is able to be added only slowly. Also, it has been
difficult to provide information needed for third parties to write new class
drivers for VMS SCSI or to be as quick as we should be to handle the
customer problem report backlog. The problems are thus visible both
internally and to customers.
================================================================================
Note 54.5        Problem Statement candidate for SCSI "get well"          5 of 5
EVMS::EVERHART                                 40 lines  24-OCT-1995 14:41:22.20
                   -< Problem statement after wordsmithing. >-
--------------------------------------------------------------------------------
SCSI Subsystem Proactive Maintenance

PROBLEM STATEMENT

The OpenVMS SCSI subsystem has become difficult to maintain, understand,
and extend. These problem areas, which are visible to internal users and
to customers, need to be simplified and improved. 

Specifically, existing SCSI subsystem flaws include the following:

1. No comprehensive written high-level OpenVMS SCSI design or indvidual 
   component designs exist, except the code sources themselves. Although some 
   oral design tradition is available, it is weak and partial---mostly 
   because all the key individuals who originated the oral tradition have left. 
   Most of the SCSI components were implemented with no overall picture
   of the subsystem at the top level. Therefore, no comprehensive 
   OpenVMS SCSI design exists at all.

2. Without a project design, the SCSI code base has had a long history 
   of changes that have not been implemented consistently. Not surprisingly, 
   component interfaces are not clean or consistent, and information passed 
   by side effects of data manipulations is common.

3. Understanding the complex SCSI code environment makes the process of
   implementing changes slow, and changes are often fragile when implemented. 
   Some areas of code remain maintainable, but the difficulty of maintenance 
   is growing. These change implementation and maintenance problems
   sometimes create schedule pressures that result in partial and incomplete 
   fixes for important problems. 

As a result of these issues, the SCSI group has difficulty making fixes or 
enhancements to the code base. People new to the SCSI group undergo a long 
process of code-reading to get up to speed. This steep learning curve sharply 
reduces the amount of resources available to design, review, or implement 
code, which means that OpenVMS customers see desired SCSI enhancements slowly
or not at all. Most notably, support is delayed for features such as wide SCSI, 
extra SCSI LUNs, enhancements on disks and tapes, new device types, and 
commodity SCSI devices. It is also difficult to distribute necessary 
information to third parties writing new OpenVMS SCSI class drivers or to 
provide quick responses to the customer problem report backlog. 

================================================================================
Note 55.0          Extrema of plan #1: keep all the old stuff         No replies
EVMS::EVERHART                                 11 lines  24-OCT-1995 13:10:28.50
--------------------------------------------------------------------------------
This is (near as my notes allow) the first extreme position possible
in "SCSI get-well" options:

Keep all the old code, but document it.
-pro
-con
-How does it address the problem stmt?
-What does it involve?

Replies can address these or other questions


================================================================================
Note 56.0               Extreme position #x: clean sweep                 1 reply
EVMS::EVERHART                                  7 lines  24-OCT-1995 13:12:33.05
--------------------------------------------------------------------------------
This is the notion of all new code from a completely new design, to be
implemented and released all at once "someday".

-pro
-con
-How does it address problem?
-What does it involve
================================================================================
Note 56.1               Extreme position #x: clean sweep                  1 of 1
EVMS::TGOODWIN                                 25 lines  24-OCT-1995 14:33:49.68
                     -< Details of the clean sweep option >-
--------------------------------------------------------------------------------
	This option starts with the development of a complete and
mature SCSI architecture and then a complete set of design and interface
documents.  Once all of these documents are complete and reviewed, then
a complete set of new SCSI drivers would be written.

	Under this option only the highest priority CLDs/QARs would be
fixed in the old code while the new design and code were under development.

Advantages
----------
- New drivers would be 100% compliant with the architecture and design
- A complete set of SCSI documents would be created
- Customer and third party developer impact would occur only once

Disadvantages
-------------
- No benefit to customers for a long time.  No new functionality or
	improvements until all code is released.  ( No changes in the
	next few releases).
- No immediate benefit to maintainability for the following reasons:
	Old code still must be maintained for a while longer
	New code, when released, will require some time to shake out bugs
- The impact of doing it all in a single release will necessitate a large
	and extended external field test
- Impact to customers and third party developers could be sizeable.

================================================================================
Note 57.0 Investigative report option #3: New arch & design; Incremental code updates 3 replies
STAR::TGOODWIN                                  4 lines  24-OCT-1995 14:39:25.57
--------------------------------------------------------------------------------
	This is a place holder for option #3 until I flesh out
	the details.

	Tune in tomorrow. Same bat time. Same bat station.
================================================================================
Note 57.1 Investigative report option #3: New arch & design; Incremental code updates 1 of 3
STAR::TGOODWIN                                 43 lines  25-OCT-1995 09:20:21.18
                           -< IR Option #3: Details >-
--------------------------------------------------------------------------------

	This option would start with the development of a
complete SCSI architecture and a high level design document.
These documents would represent what we believe to be the
best way to implement SCSI under OpenVMS and would not be
constrained by the current implementation.

	Once these documents were in place, a few key areas
of the current implementation would be targeted for
reimplementation for each release.  Areas which are high
maintainence in the current implementation will be given
priority along with areas which are prerequisites for
others.  When an area is reworked it would entail generating
a detailed design from the high level design and then 
modifying or completely rewriting the code to match the
new design.

Advantages
----------
- Some areas would be reworked for each new 
	release (post-Gryphon)
- Some new functionality can be made available for each
	release
- All work would include a top down design
- Customer and third party developer impact for each
	release will be localized to the areas of change
- Extensibility and maintainability will improve gradually


Disadvantages
-------------
- The code will not match the documents for the
	next few years and may never fully match
	the design
- Implementation of the entire design will take longer
	than the clean sweep approach due to multiple
	integration phases
- Customers and third party developers will be impacted
	multiple times
- The complexity problem of the SCSI code base (see
	numbered paragraph 3 of the problem statement)
	will continue to exist as we make our initial
	changes 
================================================================================
Note 57.2 Investigative report option #3: New arch & design; Incremental code updates 2 of 3
STAR::S_SOMMER                                  8 lines  25-OCT-1995 11:57:22.53
                          -< Time estimate question >-
--------------------------------------------------------------------------------
    Tom,
    
    I had a question about this one regarding the time frame.  You
    mentioned that the project would start with a complete SCSI
    architecture and a high level design document.  Did you have
    a ballpark estimate on how long these would take to write?
    
    -Sue
================================================================================
Note 57.3 Investigative report option #3: New arch & design; Incremental code updates 3 of 3
STAR::TGOODWIN                                 12 lines  25-OCT-1995 15:37:37.53
                      -< Time Frame and Initial Projects >-
--------------------------------------------------------------------------------

	I response to Sue's questions, I think that work on a 
	complete SCSI architecture would probably run into at least
	January of next year.  The projects for the 7.2 release
	would be the high level design document, interface design
	documents and data structure definitions needed to support
	the new architecture.  The only coding changes for the 7.2
	release would be restructuring of the data structures to
	conform to the design and to enforce data access rules.

	I feel this would also leave some SCSI developers available
	to implement some business critical new functionalities.

================================================================================
Note 58.0 Approach #4, base a new design on selected elements of the current design No replies
EVMS::RLORD "Rick Lord"                        79 lines  24-OCT-1995 15:40:50.39
--------------------------------------------------------------------------------

								24-Oct-95
								Tuesday

        It is possible to highlight the problem areas of the current driver, to
	clean them up, document them nicely and come away with an improved code
	base. That's probably the quickest approach, but I don't think it's a
	good long-term solution, and it doesn't address the extensibility
	issue of our problem statement at all.

        It would also be possible to scratch the current design completely and
	come up with an entirely new architecture. That's a good long-term 
	solution, but it's probably not very cost-effective, and it would take
	a long time to realize any benefit from it.

	As is pointed out in the problem statement, one of the major problems
	with the current code base is that there is no comprehensive, top-level
	design which considers all of the major blocks of functionality that
	go into providing SCSI access. I think that's where we have to start.

	However ugly and unmaintainable it might be, though, the current code
	base somehow seems to work pretty well for most people. I think that
	there are some pretty good concepts in it, well worth keeping. I'll
	mention (yes, again) the notion of the SCDT, STDT and SPDT hierarchy.
	It works. It parallels the SCSI standard nicely. It's understandable.

	I'll bet that as people implemented major functions - say, Buzzy with
	Bus Reset or Sue with Mode Sense - they probably not only learned more
	about that functionality than even a careful reader would get from the
	standard, but they identified specific shortcomings in the current
	design. 

	One approach that I think we should consider is creating a new design
	which we know right from the start is going to encompass a lot of the
	current design. We'd still start with a clean sheet of paper, but we'd
	begin by adding to the paper those things about the current code that
	were basically good and correcting their shortcomings along the way.

	For example, the data structures mentioned above: the hierarchy makes
	sense, but what about it doesn't work well? Simple little things like
	inconsistency in naming fields has always driven me nuts - why are some
	things DEV and others DEVICE? It makes every field a special case. And
	it's not documented anywhere which bits are status bits and which are
	control bits. Who has read access to which fields? Write access? Are
	there unnecessary fields? Are there fields which are missing or in the
	wrong structure? Are logically-related fields grouped so they're close
	together when you look at them from SDA?

	Another one: there's nothing wrong with having a queue manager - it's
	just not necessary for adapters which don't support TCQ or for adapters 
	which handle queuing themselves. So why not say we'll move the queue
	manager to the new design, but also provide a way for ports that don't
	need it to just bypass it completely.

	As things were moved over to the new design they'd be integrated with
	whatever was already there, so we'd obviously want to deal with the
	most important, low level things first. The design would be documented
	as it took shape, and when it was complete enough to address the needs
	of each type of adapter we could work out an implementation plan. 
	Because it would include quite a bit of the current design, it may be
	possible to incrementally replace the current code with the new code.

	It's important to note that the new design would not be committed to
	salvaging everything from the current code, nor would the current code
	be the only source it could draw from - new ideas would be included as
	appropriate, and between the architecture notes file and the wish list
	of things that people would like implemented we've got a boatload of
	them. All of them could be tested against the new design ahead of time
	so we wouldn't have to retrofit any hacks.

	The criteria I'd use are:

	1) Identify the issue (Data Structures, Queue Manager, Bus Reset, etc.)
	2) How does it work now? (implementation details, not "OK I guess")
	3) Is it implemented by the current code? 
	4) If so, is it worth salvaging or is it just a complete hack?
	5) If it's worth salvaging, how could it be improved?
	6) If it isn't implemented, what would allow a clean implementation?


================================================================================
Note 59.0 Position #(pi)  re alternative approaches to dealing with SCSI get-well 1 reply
EVMS::EVERHART                                 59 lines  24-OCT-1995 16:00:11.13
--------------------------------------------------------------------------------
Alternative #pi

This plan for meeting the problem stmt is that we proceed in 2 steps.
1. Create a top level design document which describes the framework
for specific SCSI subsystem mods. It should give broad rules of thumb
and include at least the port-class interface and some statements
about data structures as well as the more generic rules of thumb about
design principles. It should not be constrained to describing the top
level design of the Zeta implementation only, but should describe a
design which is implementable incrementally from Zeta.

2. Create a series of modules which will replace pieces of the Zeta SCSI
implementation incrementally. As part of this creation, LOP statements
of need and design would be needed and it is left till those investigation
reports to decide whether functions (e.g. flow control) or code modules
(e.g. MKdriver) get replaced. The design documents for these modules
will become later chapters in the ultimate SCSI design handbook (or whatever
it gets called)

Advantages:
1. A framework document exists early in the cycle. (Indeed, we can and
should crib it from the architecture document & studies that are now
wholly or partly done with a few additions to fill in port-class detail
and maybe some more words about data structures.)

2. The principle of getting to an incremental solution is preserved.

3. The possibility of including new functions or improving lower level
details is wide open subject to the one constraint that really matters.

4. Some ("obsolescent") parts of the SCSI code base may be carried pretty
much "as is" indefinitely.

Disadvantages:
1. Any really revolutionary mods may be excluded.

2. More changes may be included than would happen if one stuck with the
Zeta code, as more ideas may be included. The ideas may be good, but the
changes will be larger and deeper.

How it addresses problems: This provides documentation for the SCSI
subsystem, and allows for growth of the documents so that design detail
from "real world" coding experience will be preserved in the documents.
The design constraint provides a path forward.

What does it involve:
	The umbrella document part can be achieved most simply if the
entire group gets involved in a wordsmithing pass over the existing
document and locates areas to be added. Some words exist for most of the
topics in Rick's note (53.1) in there, and it can be used to ensure that
rules of thumb about each of them exist. 
	The detail documents need LOP for each, and perhaps it would be
wisest to start discussing whether functions or entire components should
be replaced. I have not too many prejudices except that ultimately all
of a component will need to conform, not 50-60%, so that a pass over
components will eventually be necessary for at least those components
which are intended to be developed further.


================================================================================
Note 59.1 Position #(pi)  re alternative approaches to dealing with SCSI get-well 1 of 1
EVMS::EVERHART                                  5 lines  26-OCT-1995 08:25:49.00
                            -< Doneness criterion >-
--------------------------------------------------------------------------------
The top level document described in .0 would I think be considered "done"
(though subject to correction & modification) once it had externally
visible interfaces described. That'd mean certainly the port/class interface
and at least general features of data structures and could optionally
include some internal interfaces. 

================================================================================
Note 60.0          #2: Initial document + incremental changes          2 replies
STAR::S_SOMMER                                  3 lines  24-OCT-1995 17:02:26.52
--------------------------------------------------------------------------------
    The next note will describe the approach involving an initial "umbrella"
    document, plus incremental code changes and incremental expansion of the
    initial document (approach #2 we outlined today).  More to come...
================================================================================
Note 60.1          #2: Initial document + incremental changes             1 of 2
STAR::S_SOMMER                                 43 lines  24-OCT-1995 20:55:24.17
                         -< Details of this approach >-
--------------------------------------------------------------------------------
    An incremental approach, comprised of the following:

    1.  Produce a high-level design document which:
	a.)  is comprehensive in its breadth rather than in its depth,
	b.)  serves as a description of	the desired future functionality 
	     of the SCSI subsystem, and
	c.)  specifies its own doneness criteria.  That is, it contains a list
	     of design-oriented projects which, when complete, would adequately
	     solve the original problem as stated in our Problem Statement.

    2.  For each major release of OpenVMS, target a reasonable number of
	projects from the above document.  In addition to these design-
	related projects, it is to be expected there will be a list of
	projects based on new features/functionality requests.  A fair mix
	of these two kinds of projects will need to be chosen;  the latter
	has a more visible and direct impact in the area of customer perception;
	the former is more subtle and indirect, but is ultimately of
        equal importance. 

    3.  As each such project is completed, it should be accompanied by a
        detailed design spec.  For projects which grow out of the original
	high-level design document, this spec should be suitable for
        inclusion as an additional chapter to the original document.  
	In this way, the document will expand slowly over the next several 
	major releases of OpenVMS.  For projects which grow out of requests
	for new functionality, their accompanying detailed spec might either
	be included as a design spec chapter, or else kept in some separate
	area, depending on the project's relevance to design issues 
        outlined in the original high-level document.
	

    Consequently, the deliverables for V7.2 would become threefold:
    a high-level document, a selection of specific projects, and a corresponding
    selection of design/functional specs to be appended to the high-level
    document (or elsewhere collected).  The high-level document is a 
    prerequisite for all ensuing project work, so most of it would have to be 
    completed several months earlier (around the February 1996 time frame, for 
    example) than the other V7.2 deliverables.  One might hope that the
    highest priority proposed projects could be specified early in the develop-
    ment of the document, so that LOP work on those specific projects could
    start even before the entire high-level document is complete (say, no later 
    than January 1995 for the start of LOP work).
 
================================================================================
Note 60.2          #2: Initial document + incremental changes             2 of 2
STAR::S_SOMMER                                 29 lines  26-OCT-1995 07:12:13.74
                               -< Pros and cons >-
--------------------------------------------------------------------------------
	(Pros and cons)
    
    	Pros:
    
    	1.  Produces documentation for a whole system.
    	2.  Allows for coding project deliverables in V7.2.
        3.  Improves maintainability and extensibility.
    	4.  Allows for new functionality projects in addition to design
    	    rework.
    	
        Cons:
        
    	1.  In early stages, documents only the future, not the present.
    	    (In a real world, I don't think we can have both though.)
    	2.  Allows the initial design to be non-detailed.  (I think a full
    	    detailed design would take at least a year.  Not only would
    	    that preclude any V7.2 coding deliverables, I'm not convinced
    	    how much more it would actually benefit us.)
    	3.  The final expanded document would be uneven in its coverage,
    	    given that it would be an overview plus perhaps a dozen
    	    detailed specs.  If it appears that important areas might be
    	    left undocumented, maybe this could be remedied by including
    	    one or two document-only projects per release.  It would be
    	    nice to have detailed design specs for our top 4 or 5 drivers
            (say, DK/MK/GK/PKQ/PKZ), while declaring others end-of-life
    	    for documentation purposes;  the good news in this area is that
    	    some of these specs partly or wholly exist already and just
    	    need  a commitment to be updated.