From: SMTP%"dhayes@hoasys.isd1.tafensw.edu.au" 22-OCT-1994 13:02:02.51 To: EVERHART CC: Subj: Re: VAXcluster FDDI problem (LONG!) X-Newsgroups: comp.os.vms Subject: Re: VAXcluster FDDI problem (LONG!) Message-Id: <1994Oct21.152114.5842@hoasys> From: dhayes@hoasys.isd1.tafensw.edu.au Date: 21 Oct 94 15:21:14 +1000 Organization: TAFE NSW Network Lines: 381 To: Info-VAX@Mvb.Saic.Com X-Gateway-Source-Info: USENET In article <32142289@MVB.SAIC.COM>, ivax@meng.ucl.ac.uk (Mark Iline - Info-VAX account) writes: >> We have a mixed-interconnect cluster onto which we're trying to integrate >> an FDDI ring. There are 4 main CI VAXes, 2 in one computer room and >> another 2 in another computer room. The cluster star coupler resides in a >> 3rd room half way between the computer rooms. >> >> A dual FDDI ring has been installed and tested between the computer rooms. >> The FDDI fibres come into patch panels which in turn connect to Cabletron >> HUBs. The HUBs are MMAC-5 chassis with the following modules installed:- >> MMAC-5PSM power supply module >> FDMMIM-04 FDDI concentrator >> TPRMIMM-20 10BaseT adapter >> EMME-6 Ethernet module >> >> A (very) rough of the setup is as follows (NB: CI not shown). >> >> 10BaseT 10BaseT >> +-+ +-+ +-+ +-+ >> | |----| | +---+ +---+ | |-----| | Satellite >> +-+ | | | | | | | | +-+ +-+ >> Satellite | |--| H | | H |--| |------------| |Satellite >> +-+ | | +--------+ | | +-+ +-+ >> | U |---| FDDI |---| U | >> | | +--------+ | |--------------+ >> +-----+ | B | | B | +-----+ | >> |Conc.|--| | | |--|Conc.| | >> +-----+ +---+ +---+ +-----+ +++ >> | | | | | | FDDI/Ethernet >> | | | | +++ bridge >> +---+ +---+ +---+ +---+ | >> |VAX| |VAX| |VAX| |VAX| | >> | | | | | * | | | | >> +---+ +---+ +---+ +---+ | >> | | +-+ | | | >> --------+-----+-------------| |--------------+-----+-----+------- >> Ethernet +-+ Ethernet >> bridge >> >> >> The main VAXes have DEMFA FDDI cards and run with LRPSIZE=4541 as per DEC >> recommendations. The satellites connect to the 10BaseT modules and have >> LRPSIZE=default, again as per DEC recommendations. We're running VMS >> V5.5-2 (fully patched!). >> >> The main purpose of the above is to pilot FDDI (ie. show that it works and >> is stable) and use it to offload LAVc protocol from our Ethernet. Later on >> we will look to transferring DEcnet, LAT & TCP/IP from Ethernet to FDDI. >> Check what protocols are running running over the FDDI and Ethernet adapters. Although you make the comment that you are only testing FDDI for LAVC at the moment, are you sure that nothing else is using the FDDI? For example, we found (if I remember rightly) that under V5.5-1 LAT defaulted to Ethernet, but under V5.5-2 it defaulted to FDDI. With the LATmaster patch, LAT will use the DECnet ethernet address, EVEN IF DECnet is not running on that device. In such a case, you may actually find that LAT is talking via the FDDI adapter using the DECnet address AND DECnet is using the Ethernet adapter and also using the same DECnet ethernet address. This will REALLY SCREW UP your bridges forwarding tables. To check which protocols are being used: $ anal/sys SDA> SHOW LAN and to see the actual physical ethernet address being used by each protocol use: SDA> SHOW LAN/FULL There is one other important difference between ethernet and FDDI. On ethernet, LAVC protocol is the first to initialize the ethernet adapter and starts using the hardware address of 08-00-... LAVC sets a flag indicating that it permits the address to be changed, which is what happens when DECnet starts. Ethernet adapters apparently only support a single physical address, so LAVC protocol switches to using the AA-00.. address. On FDDI, it can apparently support multiple simultaneous physical addresses and even after initializing DECnet on the FDDI adapter, the LAVC protocol continues to use the hardware address of 08-00... This can be seen from the "SDA> SHOW LAN/FULL" command. Hence, to make sure that everything works in a bridged network, you must be very careful which protocol you put on which adapter. Attached to the end of this posting is a copy of a posting I did just yesterday for another person having trouble with LAT and FDDI. It contains a list of patches that we applied when we were at the BLEEADING edge of FDDI over a year ago. >> In the above configuration the FDDI/Ethernet bridge is clever enough to >> realise that it doesn't need to transfer any LAVc packets - all cluster >> nodes can effectively see each other via FDDI and the HUBs. > > Unless you've deliberately set this up like this, this isn't right. All LAN > adapters will listen to the multicasts from all other LAN adapters. This > is what gives you the multiple channels. > > -------------------------------------------- FDDI > | | | > VAX A FDDI/ethernet bridge VAX B > | | | > -------------------------------------------- Ethernet > > There is one Virtual Circuit between VAX A & VAX B - the LAN circuit, > between ports PEA0 & PEA0. However, there are 4 channels that support this > circuit. FDDI on A to FDDI on B; FDDI on A to ethernet on B; Ethernet on A > to FDDI on B; Ethernet on A to Ethernet on B. Both PEA0s will listen on all > these channels, but pick a single preferred channel to transmit on. You > would expect it to pick the FDDI to FDDI channels in both directions. > > Hence, VAXcluster (SCS) traffic should be flowing over the bridge, even if > it's only the multicasts. > Agree. LAVC send multicasts our over all virtual circuits every 3 seconds. The other cluster members take note of which circuit they received the packet on first which implies it is the fastest and use this for communicating with that node. >> >> The configuration remains stable a short while... then the satellites have >> communication problems with the CI node marked "*". The satellite lose >> CNXMAN connection with the offending VAX and re-establish them via the >> Ethernet instead (with the unfortunate result that the entire LAVc traffic >> is back on the Ethernet again and we're back to square one). Although the >> problem appears isolated to mainly one CI VAX it has been seen occasionally >> on the other CI VAXes. > > There's an inconsistency here. The only way the satellites can talk to the > ethernet adapters on VAX *, is through the FDDI/Ethernet bridge. If this > bridge is, as you say, not forwarding LAVC packets, this can't happen. > > Also, just because the satellites are talking to VAX * via its ethernet > port, it doesn't follow that they are talking to the other VAXes over the > FDDI/ethernet bridge. The selection of which channel to transmit on is made > on a per virtual circuit basis. > > In fact, looking at your diagram, if everything were functioning correctly > except the FDDI port on VAX *, you shouldn't necessarilly see a performance > problem, because the ethernet joining the large VAXes would be carrying > less traffic than the 10baseT ethernet that both satellites are connected > to. > >> >> We can use LAVC$STOP_BUS to disable LAVc on the main VAX Ethernet cards but >> the configuration isn't stable - all satellites continually get CNXMAN lost >> connection and re-established connection; the configuration only stabilises >> when the CI VAXes can talk LAVc on their Ethernets again. >> >> Can anyone help? All ideas, comments, suggestions, flaming, whatever is >> welcome! Please post either on the newsgroup of (preferably) to myself >> direct on the internet at green@grey.sps.mot.com. > > What I'd do is to determine which channels the various nodes are talking to > each other over. If it's only VAX * that is having problems with FDDI, it > may be that it has a duff adaptor/cabling. > > Here's part of a posting by Chris Lishka that may be useful. > > > There are four channels from the local node to the remote > node: > Local Adapter Remote Adapter > ------------- -------------- > (1) FDDI -> FDDI > (2) FDDI -> Ethernet > (3) Ethernet -> Ethernet > (4) Ethernet -> FDDI > > The virtual circuit from the local node to remote node > includes all four channels, and load balances between them. > One channel is marked as "preferred"; others are "alive" or > "dead". As long as one channel in the VC is alive, the VC > will work. If all four channels are dead, then the VC is > dead and the local node will lose contact with the remote > node. > > The method to verify which channels are working: > > > * Run ANALYZE/SYSTEM. This will run the System Dump > Analyzer (SDA) on the running system. > > * First issue the command "SHOW PORT". For some reason this > step is necessary. > > * Next, issue the command "SHOW PORT/VC=VC_XXXXXX" where > XXXXXX is the name of the >remote< node. The first page > of output will show characteristics and counters related > to the virtual circuit. > > * Press RETURN to see the second page, which shows a summary > of >all< channels in the virtual circuit. It also lists > the the status of each channel (preferred, alive, or > dead). The first column of this output is important, as > it lists the channel address associated with each channel > (which you will use below). > > * To see statistics on each channel, issue the command "SHOW > PORT/CHANNEL=address", where "address" is a channel > address from above. This will show characteristics of the > particular channel, including the adapter being used on > the local node and remote node. Look at the fields "Lcl > Device", "Rmt Device", and "Rmt Name" to glean this > information. By using "SHOW PORT/CHANNEL=..." on each > channel you can map out which channels are alive, dead, > and which is preferred. > > I have used this method to watch the channels in VCs between our AXPs > with FDDI+ethernet and our VAX 9000. Typically (but not always) the > FDDI->FDDI channel is preferred. > > > > One thing this does suggest to me, is that if you have multiple functional > channels, you shouldn't be seeing the CNXMAN message about losing > connection. I would expect the failover from one channel to another not to > disrupt the VC. I may be wrong, though. > > Anyway, look at your preferred channels, and see if it gives you any > pointers to what's going on. > > > Mark Iline system@meng.ucl.ac.uk > Dept Mech Eng, University College, London. UK > > Read at your own risk. > > Again, check that your bridges aren't getting confused by seeing the same ethernet address on 2 different LAN segments. Below is a copy of the posting I referred to regarding LAT & FDDI problems that I made yesterday. Hope this helps in some way. David. ----------------------------------------------------------------------------- David Hayes david.hayes@tafensw.edu.au TAFE-NSW Wrk: +61 2 950 1679 Australia Opinions expressed are my own, and not those of my employer. ---------------------------------------------------------------------- X-NEWS: hoasys comp.os.vms: 13624 Path: hoasys!dhayes From: dhayes@hoasys.isd1.tafensw.edu.au Newsgroups: comp.os.vms,comp.sys.dec Subject: Re: LAT Problem Message-ID: <1994Oct20.093219.5828@hoasys> Date: 20 Oct 94 09:32:18 +1000 References: <1994Oct14.165458.1@uncvx1.oit.unc.edu> Organization: TAFE NSW Network Lines: 116 Xref: hoasys comp.os.vms:13624 comp.sys.dec:7704 In article <1994Oct14.165458.1@uncvx1.oit.unc.edu>, murrell@uncvx1.oit.unc.edu writes: > > Problem: LAT comes up on our FDDI controller and allows periodic > connections that are usually aborted/dropped. Although > we have an Ethernet controller as well, we are unable > to delete LAT$LINK and create it again using /DEVICE=ETA0: > LATCP says "no such device available." > > Equipment: VAX 6620 running VMS 5.5-2 with controllers --- ethernet > (BNA-0) and FDDI (MFA-0) running thru two independent > Cisco routers to the campus network. > > History: Before there were problems, we were running VMS 5.5 off > ethernet only and all communication (DECNET, LAT, TCP/IP) performed > flawlessly. We added the FDDI controller to do experimenting on a local > loop as part of planning for Campus-wide fiber. TCP/IP and DECNET were > both happy but we were unable to force LAT over the FDDI. This wasn't > a real problem since all of our LAT connections could be made via our > ethernet controller anyway. A few days ago we upgraded VMS from VMS 5.5 > to VMS 5.5-2 in order to meet the minimal level to install motif for > a large application. Both Decnet and Tcp/Ip are still happy but we > now have a problem with LAT on BOTH controllers. > > What We've Tried: When we examine known circuits from ncp we see the > following: > Circuit State Loopback Adjacent > Name Routing Node > BNA-0 on 39.470 (OITKID) > MFA-0 on 39.471 > MFA-0 39.470 (OITKID) > > where 39.471 is the address of the "other" Cisco box which is set up as a > router. However, when we issue an ANAL/SYSTEM followed by SHOW LAN, we see > ONLY the FDDI device! At the same time, "Show Device E" shows me ETA0 and > "Show Device F" shows me FXA0! > We called DEC and swapped the ethernet controller without impacting the > problem. We also upgraded the console microcode to 1.06 (VMS 5.5-2 requires > a minimum of V 1.01) to no avail. > > Question: Has anyone ever seen or heard of anything like this or have any > clue as to what is happening? We installed FDDI on our cluster over 18 months ago and had numerous problems. It was not pleasant being on the bleading edge of technology, particularly since the only nodes we had on FDDI was our VAX 9000 production cluster! My memory is a bit fague but I will try my best... We received numerous patches of FYDRIVER, LAT & LAST to attempt to rectify our problems, some of these fixes were incorporated into VMS V5.5-2, and some were not. 1. When LAT is started on an ethernet controller under VMS V5.5 which does not have decnet running on it, it will continue to use the hardware address (08-00...). Under VMS V5.5-2, they changes LAT so that it will use the DECnet address (AA-00...), even if DECnet is not started. Therefore if you try to run DECnet over FDDI and LAT over ethernet your bridges/routers start to get very confused because they see the same ethernet address on two different LAN segments. While this feature was not documented in the VMS V5.5-2 release notes, it IS documented in the VMS V6.0 Release NOtes section 2.4.11.3 which states catagorically that "YOU MUST RUN LAT & DECNET OVER THE SAME CONTROLLER OF SCSSYSTEMID IS NOT 0" if the Ethernet and FDDI sections are bridged. 2. To force LAT to work over FDDI and ignore ethernet simply define LAT$DEVICE to be FXA in SYLOGICALS. We also define LAST$DEVICE the same way. $ DEFINE/SYS/EXEC LAT$DEVICE FXA $ DEFINE/SYS/EXEC LAST$DEVICE FXA (Note that we had problems with LAST (I think) if you defined it to point to FXA0 3. We received numerous patches for FXDRIVER. The version we are running today is: X-24. This was a special copy of the image as it wasn't part of any CSCPAT kit when we got it. It probably is by now but I don't know the kit number. 4. We also have CSCPAT_0296015 installed with multiple LAST fixes. Although this kit says it applies to V5.5-1, it needed to be re-applied to V5.5-2. Again, there is probably a more recent kit. (LAST had nasty habits of crashing the machine without this patch!) 5. I assume you already have CSCPAT_0511 V3.4 or later as this has PAGES of LAT fixes, and MUST be re-applied after you upgraded to V5.5-2. 6. We also upgraded the DEMFA FIRMWARE from V1.3 to V1.4, but since yours is most likely a recent purchase you probably already have V1.4 or even V2.0. 7. I notice in you notes above that you have both MFA-0 and BNA-0 circuits enabled. As mentioned in point 1 above, having 2 ethernet adapters both using the same decnet ethernet address can cause problems. Even though it may appear to be working, close examination may find that some packets are being forwarded to your ethernet segment and some to the FDDI segment as the bridge/router switches its forwarding depending upon which segment it last received a packet from that address. Only run decnet over a single device. NCP> CLEAR/PURGE CIRCUIT BNA-0 ALL NCP> CLEAR PURGE LINE BNA-0 ALL I hope this helps. Regards, (and Good Luck!) David Hayes ----------------------------------------------------------------------------- David Hayes david.hayes@tafensw.edu.au TAFE-NSW Wrk: +61 2 950 1679 Australia Opinions expressed are my own, and not those of my employer.