MSCP Failover Scenario Glenn C. Everhart 10-Apr-1996 (The following represents an idea about how to handle MSCP failover, partly using techniques being developed for use with HSZ failover.) Problem: When a new device is added to a shared SCSI bus, it is detectable by a system when configured in. At this time, that system makes the device available locally, and may MSCP serve it to the cluster also. The system(s) at the other end of the shared SCSI bus then may have the MSCP served path to the disk appear before they can configure a direct path, and thus wind up using the disk via MSCP serving rather than directly. At boot, configuration can check busses first, avoiding this, but when a new device is added, this can create problems. Moreover, if the SCSI connection fails, it is desiirable to be able to continue device access via the MSCP served path. Discussion: The first part of the problem would be tractable if locally the MSCP server could determine that the device was served on a shared bus to which it had access. This could mean that a port allocation class was known to be shared and some devices of that class existed, but best would be if the port ($alloclass$dkC combo, where C=port character) could be determined to be a local shared bus. If the SCSI port drivers would record the fact they responded to a target mode inquiry, this might be a way to pass the info around. A second way might be to create a lock of form alloclass$letter and pass it around the cluster, once a shared bus was detected. At each node when it found the lock it would check the alloclass and the port number (in the lock value block; in binary there's room for both up to a full longword and more) and flag the bus shared in some structure (the SPDT?) so that a new MSCP disk appearing could be tested for port number and alloclass and if the port descriptors (if any) locally matched these values, the MSCP server's image of the disk would not be enabled, but would be treated as a nondisk. Once a second node configured the new shared device, perhaps the same server that handles HSZ failover could be notified of the device name and could attach a SWDRIVER unit to switch between the direct path (the primary) and the alternate path (MSCP served). The MSCP served path would not be lost or destroyed, just made unused by setting the device a non-disk device, and renaming it locally (only!), so that the SWDRIVER switching code could still access and use it. Meanwhile, the direct path would be used. The server could delay briefly while awaiting the MSCP path if needed; the switch-over would be done when or if the active path enters mount verify and fails. At that point, the path would be switched. Flushing out I/O would be enforced by the logic that enters mount verify, which tries to idle all outstanding I/O. The HSZ failover server could host the locking logic here also, so that additional kernel logic could be avoided. Rather, the failover server could scan devices on startup also and communicate with anything else visible. All that would be needed would be some global structure and code for DUdriver to search; reserving a bit of space in the SPDT may be enough. The DUdriver init path would have to do the checking mentioned above.