INFO-VAX Wed, 10 Sep 2008 Volume 2008 : Issue 497 Contents: Can't read unzipped Monitor files Re: Did Windows just cry "Uncle"? How do I diagnose a server that crashes every night? Re: How do I diagnose a server that crashes every night? Re: How do I diagnose a server that crashes every night? Re: How do I diagnose a server that crashes every night? Re: How do I diagnose a server that crashes every night? Re: How do I diagnose a server that crashes every night? Re: How do I diagnose a server that crashes every night? Re: How do I diagnose a server that crashes every night? Re: How do I diagnose a server that crashes every night? Re: How do I diagnose a server that crashes every night? Re: How do I diagnose a server that crashes every night? Re: How do I diagnose a server that crashes every night? Re: How do I diagnose a server that crashes every night? Re: How do I diagnose a server that crashes every night? Re: How do I diagnose a server that crashes every night? Re: How do I diagnose a server that crashes every night? Re: Intermittent RWSCS state Re: Loose Cannon-dian Re: Loose Cannon-dian Re: Loose Cannon-dian Re: Loose Cannon-dian Re: Loose Cannon-dian Re: Loose Cannon-dian Re: Loose Cannon-dian Re: Loose Cannon-dian Re: OT: Message to Mr VAXman Re: OT: The end of the world in roughly 3 hours Re: OT: The end of the world in roughly 3 hours Re: OT: The end of the world in roughly 3 hours Re: OT: The end of the world in roughly 3 hours Re: OT: The end of the world in roughly 3 hours Re: OT: The end of the world in roughly 3 hours Re: OT: The end of the world in roughly 3 hours Re: OT: The end of the world in roughly 3 hours Re: OT: The end of the world in roughly 3 hours Re: OT: The end of the world in roughly 3 hours Re: OT: The end of the world in roughly 3 hours Re: OT: The end of the world in roughly 3 hours Re: OT: The end of the world in roughly 3 hours Re: OT: The end of the world in roughly 3 hours Re: OT: The end of the world in roughly 3 hours Re: OT: The end of the world in roughly 3 hours Re: OT: The end of the world in roughly 3 hours Re: OT: The end of the world in roughly 3 hours Re: Pipe search of command procedure log file containing pipe search command. co Re: Pipe search of command procedure log file containing pipe search command. co Pipe search of command procedure log file containing pipe search command. Re: Security alarm msg Spinning down an old disk array Re: Spinning down an old disk array Re: Spinning down an old disk array Re: Spinning down an old disk array ---------------------------------------------------------------------- Date: Wed, 10 Sep 2008 10:04:17 -0700 (PDT) From: "James J. O'Shea" Subject: Can't read unzipped Monitor files Message-ID: <456249.34286.qm@web83906.mail.sp1.yahoo.com> I am not able to read Monitor files after unzipping them: $ MON/NODISPLAY SYSTEM/ALL/SUM=X.SUM/INPUT=MONITOR-DATA-NONPRIME-2008-08-12.DAT %MONITOR-E-CLASMISS, requested class record missing from /INPUT file I've tried changing the attributes and using FDL files but I'm not able to find the right combination. The original file has, RFM:Var, MRS:32765, LRL:32760. After zipping, then unzipping, the file has, RFM:STMLF, MRS:0, LRL:0 I'm running OpenVMS 8.3 on an ES45; Info-Zip Zip v2.3, Info-Zip Unzip v5.52. Has anyone run into this problem? Thanks, Jim O'Shea Chicago, IL ------------------------------ Date: Wed, 10 Sep 2008 07:24:42 -0700 (PDT) From: DaveG Subject: Re: Did Windows just cry "Uncle"? Message-ID: <4ec652bc-256b-4c10-bf36-989b5e374b3f@c58g2000hsc.googlegroups.com> On Sep 8, 6:56=A0pm, VAXman- @SendSpamHere.ORG wrote: > In article <0ea19636-ef44-4d9f-bc02-c10375be6...@d77g2000hsb.googlegroups= .com>, AEF writes: > > > > > > >On Sep 8, 6:13 pm, hel...@astro.multiCLOTHESvax.de (Phillip Helbig--- > >remove CLOTHES to reply) wrote: > >> In article > >> , > > >> yyyc186 writes: > >> > LONDON (Reuters) - The London Stock Exchange (LSE:LSE.L - News) > >> > suffered its worst systems failure in eight years on Monday, forcing > >> > the world's third largest share market to suspend trading for about > >> > seven hours and infuriating its users. > > >> They probably lost more revenue due to that outage than the move from > >> VMS to Windows "saved" them. =A0It's not just immediate revenue which = was > >> lost, but people remembering this when deciding to do business with th= e > >> LSE or one of their competitors who run VMS. =A0(They might not know t= hey > >> run VMS, but they will know if there were any comparable outages in th= e > >> last few years.) > > >> > Weren't there a whole bunch of adds a while back about how London wh= en > >> > with Windows and that worthless Oracle product for their new trading > >> > engine? > > >> Indeed. =A0I think it looked scary from the inside. > > >What's an "add"? Did you mean "advertisements"? That would be "ad", > >not "add". > > He used M$ spell checker! ;) > > -- > VAXman- A Bored Certified VMS Kernel Mode Hacker =A0 =A0 =A0VAXman(at)TME= SIS(dot)COM > > ... pejorative statements of opinion are entitled to constitutional prote= ction > no matter how extreme, vituperous, or vigorously expressed they may be. (= NJSC) > > Copr. 2008 Brian Schenkenberger. =A0Publication of _this_ usenet article = outside > of usenet _must_ include its contents in its entirety including this copy= right > notice, disclaimer and quotations.- Hide quoted text - > > - Show quoted text - My opinion on this - won't matter much. Windows continues to make progress and flourish, warts and all, with Linux (free is good) in close pursuit. Reminds me of a quote from P.T. Barnum: "I don't care what they say about me, as long as they spell my name right." ------------------------------ Date: Wed, 10 Sep 2008 01:52:37 -0700 (PDT) From: StraightEight Subject: How do I diagnose a server that crashes every night? Message-ID: <6df72036-9e6f-4a79-96cf-a841020f7b26@l64g2000hse.googlegroups.com> Hi, I have very little VMS experience but we have inherited a nice shiny new alphaserver 250 to support (ok its not very shiny or new!) which is located in the middle of the sea. EVERY night without fail, this server is crashing and restarting itself. I'd really like to get to the bottom of this as I am being called every morning at 3am to log in and start some services which don't seem launch at startup despite being in the startup file ("ahh it's always been that way...") Below is the FATAL BUGCHECK which I suspect is causing the machine to reboot. The process which appears to be crashing is key to this servers functionality. This is a relatively new problem as this box has run itself for the past 10 years. I have no idea where to even begin determining the source of the problem from this. Is anyone able to give me any pointers as to what I should be looking for, what information I will need, and how to make sense of it all? My VMS knowledge as I say is extremely limited, so any commands would be useful and appreciated also. Nice to see if anyone can help! Thanks str8 ******************************* ENTRY 435. ******************************* ERROR SEQUENCE 432. LOGGED ON: CPU_TYPE 00000006 DATE/TIME 10-SEP-2008 05:27:25.87 SYS_TYPE 0000000D SYSTEM UPTIME: 1 DAYS 00:55:22 SCS NODE: PIN01 OpenVMS AXP V6.2-1H3 HW_MODEL: 00000000 Hardware Model = 0. FATAL BUGCHECK AlphaStation 250 4/266 MACHINECHK, Machine check while in kernel mode PROCESS NAME BLYSEM_I1 PROCESS ID 0001001F ERROR PC FFFFFFFF 800485F8 Process Status = 20000000 00001F04, SW = 00, Previous Mode = KERNEL System State = 01, Current Mode = KERNEL VMM = 00 IPL = 31, SP Alignment = 32 STACK POINTERS KSP 00000000 7FF91EE0 ESP 00000000 7FF96000 SSP 00000000 7FF9C100 USP 00000000 7EE7D390 GENERAL REGISTERS R0 00000000 00000002 R1 00000000 0000940A R2 FFFFFFFF 80C2DB50 R3 FFFFFFFF 80C04D98 R4 00000000 00000048 R5 00000000 00001F04 R6 00000000 00000000 R7 00000000 00000001 R8 00000000 7FF9C1F8 R9 00000000 7FF9C400 R10 00000000 7FF9D228 R11 00000000 7FFBE3E0 R12 00000000 00000000 R13 FFFFFFFF 8326B910 R14 00000000 00000000 R15 00000000 7EE7D498 R16 00000000 00000215 R17 00000000 00000001 R18 00000000 00000001 R19 00000000 00000000 R20 FFFFFFFF FFFFFFF8 R21 00000000 00000017 R22 00000000 00000100 R23 FFFFFFFF 80E08368 R24 FFFFFFFF 80E08000 R25 00000000 00000003 R26 00000000 00000210 R27 FFFFFFFF 80C34D60 R28 FFFFFFFF 8003B9C4 FP 00000000 7FF91EE0 SP 00000000 7FF91EE0 PC FFFFFFFF 800485F8 PS 20000000 00001F04 SYSTEM REGISTERS PTBR 00000000 00001F19 Page Table Base Register PCBB 00000000 0414A080 Privileged Context Block Base PRBR FFFFFFFF 80E0A000 Processor Base Register VPTB 00000002 00000000 Virtual Page Table Base Register SCBB 00000000 000001A1 System Control Block Base SISR 00000000 00000000 Software Interrupt Summary Register ASN 00000000 0000003B Address Space Number ASTSR_ASTEN 00000000 0000000F AST Summary/AST Enable FEN 00000000 00000001 Floating-Point Enable IPL 00000000 0000001F Interrupt Priority Level MCES 00000000 00000008 Machine Check Error Summary ------------------------------ Date: Wed, 10 Sep 2008 09:13:42 +0000 (UTC) From: gartmann@nonsense.immunbio.mpg.de (Christoph Gartmann) Subject: Re: How do I diagnose a server that crashes every night? Message-ID: In article <6df72036-9e6f-4a79-96cf-a841020f7b26@l64g2000hse.googlegroups.com>, StraightEight writes: >I have very little VMS experience but we have inherited a nice shiny >new alphaserver 250 to support (ok its not very shiny or new!) which >is located in the middle of the sea. > >EVERY night without fail, this server is crashing and restarting >itself. I'd really like to get to the bottom of this as I am being >called every morning at 3am to log in and start some services which >don't seem launch at startup despite being in the startup file ("ahh >it's always been that way...") > >Below is the FATAL BUGCHECK which I suspect is causing the machine to >reboot. Correct, a FATAL BUGCHECK results in a crash of the system. >The process which appears to be crashing is key to this >servers functionality. This is a relatively new problem as this box >has run itself for the past 10 years. So the first question is: was anything changed on this system? >I have no idea where to even begin determining the source of the >problem from this. Is anyone able to give me any pointers as to what I >should be looking for, what information I will need, and how to make >sense of it all? Have a look in sys$common:[syserr] for files named CLUE$*.LIS. In addition see the online help for "ANALYZE/ERROR". In addition, is there anyzting in sys$manager:operator.log? Regards, Christoph Gartmann -- Max-Planck-Institut fuer Phone : +49-761-5108-464 Fax: -80464 Immunbiologie Postfach 1169 Internet: gartmann@immunbio dot mpg dot de D-79011 Freiburg, Germany http://www.immunbio.mpg.de/home/menue.html ------------------------------ Date: Wed, 10 Sep 2008 05:46:54 -0400 From: JF Mezei Subject: Re: How do I diagnose a server that crashes every night? Message-ID: <48c79805$0$1550$c3e8da3@news.astraweb.com> StraightEight wrote: > EVERY night without fail, this server is crashing and restarting > itself. I'd really like to get to the bottom of this as I am being > called every morning at 3am to log in and start some services which Does it crash at exactly the same time every night ? Or does it vary ? Any relationship with actual operations being done related to that machine ? Or does it crash when some link goes down and the code just doesn't handle this properly ? > I have no idea where to even begin determining the source of the > problem from this. It would help to provide more background on what the application is. Is it some COBOL app that just prints an accounting report, or it is some real time applictaion that controls a drilling rig ? What sort of stuff is connected to that machine using what sort of protocol ? In terms of services not starting when it boots and needing to be started manually, you would need to look at the SYSTARTUP_VMS.COM file in the SYS$MANAGER directory and take a careful look at it. The output normally just goes on the operator console, so if you are not on site, you have hard time seeing error messages. However, if you start a service by submitting a batch job, then there should be a log file that contains some information on why the service didn't start. If SYSTARTUP_VMS.COM calls a command procedure to start a service, you can add /OUTPUT=logfile.log to the command eg: @disk:[directory]myapplication_startup.com/output=sys$manager:myapplication_startup.log Then, you could consult the log file later on to find out why the application didn't start. Remember that some services take some time to become available, so on a faster machine, you might be trying to start your app become TCPIP is fully available for instance, and the app would fail. But later on when you log in to fix the problem, TCPIP would be available and the app would start properly. ------------------------------ Date: Wed, 10 Sep 2008 02:48:49 -0700 (PDT) From: StraightEight Subject: Re: How do I diagnose a server that crashes every night? Message-ID: <973f56c2-f6a1-4a09-8d79-37ae2645ad00@e53g2000hsa.googlegroups.com> Thanks for a quick reply. Here are my findings. > So the first question is: was anything changed on this system? No, as far as I am aware, this server has always been running unchanged for several years > Have a look in sys$common:[syserr] for files named CLUE$*.LIS If I look at these files there are pages of information, but I am unsure just what I need to be looking at! > In addition see the online help for "ANALYZE/ERROR". In addition, is there > anyzting in sys$manager:operator.log? This mostly just contains our Telnet requests, heres one i spotted...do you know what this means? Sometimes when we try to connect by telnet to the server we see No License for the Active Product (or something along those lines) Could it be something as simple as a licensing problem, or is this a red herring? %%%%%%%%%%% OPCOM 9-SEP-2008 04:32:26.73 %%%%%%%%%%% Message from user SYSTEM on PIN01 %LICENSE-E-TERM, C ALL-IL-1997NOV26-2136 License has terminated Thanks! ------------------------------ Date: Wed, 10 Sep 2008 06:03:55 -0400 From: JF Mezei Subject: Re: How do I diagnose a server that crashes every night? Message-ID: <48c79bd7$0$9666$c3e8da3@news.astraweb.com> StraightEight wrote: > %%%%%%%%%%% OPCOM 9-SEP-2008 04:32:26.73 %%%%%%%%%%% > Message from user SYSTEM on PIN01 > %LICENSE-E-TERM, C ALL-IL-1997NOV26-2136 License has terminated This is the C compiler licence. The command: SHOW LICENSE will give you list of active licences on that node. LICENSE LIST will give you list of registered licences. (this will exclude expired licences or licences that aren't valuid for this node). It is possible that you have 2 C licences, the "real" one and some temporary one which has expired. Not having the C compiler would not cause problems to run programs. It would only affect the invocation of the C compiler (CC command). Programs compiled with this compiler will run find without the licence. ------------------------------ Date: Wed, 10 Sep 2008 03:18:31 -0700 (PDT) From: StraightEight Subject: Re: How do I diagnose a server that crashes every night? Message-ID: <00b3b7e9-af02-40ec-b9bf-0e063ac036b5@e53g2000hsa.googlegroups.com> On 10 Sep, 10:46, JF Mezei wrote: > Does it crash at exactly the same time every night ? Or does it vary ? > Any relationship with actual operations being done related to that > machine ? Or does it crash =A0when some link goes down and the code just > doesn't handle this properly ? Doesn't really seem to be any pattern, some nights it restarts just once, some nights it can happen up to 4 times. > It would help to provide more background on what the application is. Is > it some COBOL app that just prints an accounting report, or it is some > real time applictaion that controls a drilling rig ? > What sort of stuff is connected to that machine using what sort of > protocol ? The file BLYSEM i'm sure is a software interface to a Bailey INFI900 DCS (so real time data aquisition on a rig as guessed!) for OSI PI software. The volume of data this handles has probably increased over the years...could a capacity problem knock a service over? > In terms of services not starting when it boots and needing to be > started manually, you would need to look at =A0the SYSTARTUP_VMS.COM file > in the SYS$MANAGER directory and take a careful look at it. =A0The output > normally just goes on the operator console, so if you are not on site, > you have hard time seeing error messages. > > However, if you start a service by submitting a batch job, then there > should be a log file that contains some information on why the service > didn't start. > > If SYSTARTUP_VMS.COM calls a command procedure to start a service, you > can add /OUTPUT=3Dlogfile.log to the command > > eg: > @disk:[directory]myapplication_startup.com/output=3Dsys$manager:myapplica= tion=AD_startup.log > > Then, you could consult the log file later on to find out why the > application didn't start. > > Remember that some services take some time to become available, so on a > faster machine, you might be trying to start your app become TCPIP is > fully available for instance, and the app would fail. But later on when > you log in to fix the problem, TCPIP would be available and the app > would start properly. Thanks for the tips, I think I will try the output switch and see what is logged. It's a good hunch at the end....the call to the service is at the very end of the startup file, would each line in the startup file wait until it is executed before moving to the next, or does it just fire off all the commands at once? Now I think about it, the last time we caught the error very early there was still a batch job running...maybe we should call it at the end of this batch job? Thanks for your response! ------------------------------ Date: Wed, 10 Sep 2008 11:47:21 +0100 From: "Richard Brodie" Subject: Re: How do I diagnose a server that crashes every night? Message-ID: "StraightEight" wrote in message news:6df72036-9e6f-4a79-96cf-a841020f7b26@l64g2000hse.googlegroups.com... > I have no idea where to even begin determining the source of the > problem from this. MACHINECHK, Machine check while in kernel mode suggests hardware. There may be other entries in the error log as well as the bugcheck, which may give more detail. Looking at the CLUE files in sys$errorlog, particularly the _collect.dat may help nail down common features. ------------------------------ Date: Wed, 10 Sep 2008 03:52:51 -0700 (PDT) From: Bob Gezelter Subject: Re: How do I diagnose a server that crashes every night? Message-ID: <2799177a-4711-497a-81aa-782372e23021@f63g2000hsf.googlegroups.com> On Sep 10, 6:18=A0am, StraightEight wrote: > On 10 Sep, 10:46, JF Mezei wrote: > > > Does it crash at exactly the same time every night ? Or does it vary ? > > Any relationship with actual operations being done related to that > > machine ? Or does it crash =A0when some link goes down and the code jus= t > > doesn't handle this properly ? > > Doesn't really seem to be any pattern, some nights it restarts just > once, some nights it can happen up to 4 times. > > > It would help to provide more background on what the application is. Is > > it some COBOL app that just prints an accounting report, or it is some > > real time applictaion that controls a drilling rig ? > > What sort of stuff is connected to that machine using what sort of > > protocol ? > > The file BLYSEM i'm sure is a software interface to a Bailey INFI900 > DCS (so real time data aquisition on a rig as guessed!) for OSI PI > software. The volume of data this handles has probably increased over > the years...could a capacity problem knock a service over? > > > > > In terms of services not starting when it boots and needing to be > > started manually, you would need to look at =A0the SYSTARTUP_VMS.COM fi= le > > in the SYS$MANAGER directory and take a careful look at it. =A0The outp= ut > > normally just goes on the operator console, so if you are not on site, > > you have hard time seeing error messages. > > > However, if you start a service by submitting a batch job, then there > > should be a log file that contains some information on why the service > > didn't start. > > > If SYSTARTUP_VMS.COM calls a command procedure to start a service, you > > can add /OUTPUT=3Dlogfile.log to the command > > > eg: > > @disk:[directory]myapplication_startup.com/output=3Dsys$manager:myappli= cation=AD_startup.log > > > Then, you could consult the log file later on to find out why the > > application didn't start. > > > Remember that some services take some time to become available, so on a > > faster machine, you might be trying to start your app become TCPIP is > > fully available for instance, and the app would fail. But later on when > > you log in to fix the problem, TCPIP would be available and the app > > would start properly. > > Thanks for the tips, I think I will try the output switch and see what > is logged. It's a good hunch at the end....the call to the service is > at the very end of the startup file, would each line in the startup > file wait until it is executed before moving to the next, or does it > just fire off all the commands at once? Now I think about it, the last > time we caught the error very early there was still a batch job > running...maybe we should call it at the end of this batch job? > > Thanks for your response! str8, I do not have an Alpha CPU manual handy, so I will restrict this set of comments to the other issues raised. However, one important question is whether the machine check error information is the same on every crash. Being responsible for a system with little or no documentation can be a significant challenge. I have seen this kind of situation often when getting called into a site which has been without a good system manager in a while, It is common to find things "broken", that in effect, were never working correctly. Not having failed noticeably does not mean that there was not an issue that did not rise to the severity to be noticed. While the overall STARTUP process is capable of parallel operation, each individual command file is executed sequentially (using the parallel execution features can speed restarts substantially, as I noted in my presentation "SYSMAN for Improved Restart Performance" at the Fall 1999 US DECUS symposium (slides available via http://www.rlgsc.com/decus/usf99/index.html ). Most likely, the parallel execution features were not used in this case. If processes that are supposed to start during a restart do not in fact start, and the requests to start them are in the system startup file, the most common reason for the failure is a small typographical error made when editing the startup file. If there is an error, the startup file will exit, with only a transiently visible message on the console. Typically, there are two ways to resolve this: 1) extremely close inspection of the startup file (generally SYS $MANAGER:SYSTARTUP_VMS.COM), or 2) enable logging of the startup sequence using the SYSMAN STARTUP OPTIONS/OUTPUT=3DFILE command. [the latter creates the file SYS$SPECIFIC:[SYSEXE]STARTUP.LOG]. Reviewing the log file generated often clarifies precisely what messages were scrolled rapidly off the screen. I often leave unattended systems in the FILE setting so that it is possible to resolve problems on unattended systems. One important recommendation is to make sure that there is a good backup of the system disk, and a log kept of any changes to any of the files. It goes without saying that at some point, it may be wise to retain outside experienced assistance to examine the problem [Disclosure: our firm does provide consulting services in this area]. - Bob Gezelter, http://www.rlgsc.com ------------------------------ Date: Wed, 10 Sep 2008 04:05:04 -0700 (PDT) From: Bob Gezelter Subject: Re: How do I diagnose a server that crashes every night? Message-ID: <9074c354-2557-4638-961e-7860857e3485@s50g2000hsb.googlegroups.com> str8, I should note that it is also possible that the Machine Check and the active process are, in effect, not related. The last client system that was having machine checks turned out to be caused by an erratic power supply. The power supply worked well when it was working, but it was apparently having problems. The fact that the system in question would appear to be in a somewhat industrial setting raises the question of whether there is an external power or grounding event that is the underlying cause of the Machine Check. If there is a UPS involved, there could also be a problem there. - Bob Gezelter, http://www.rlgsc.com ------------------------------ Date: 10 Sep 2008 06:49:24 -0500 From: clubley@remove_me.eisner.decus.org-Earth.UFP (Simon Clubley) Subject: Re: How do I diagnose a server that crashes every night? Message-ID: In article <9074c354-2557-4638-961e-7860857e3485@s50g2000hsb.googlegroups.com>, Bob Gezelter writes: > str8, > > I should note that it is also possible that the Machine Check and the > active process are, in effect, not related. > > The last client system that was having machine checks turned out to be > caused by an erratic power supply. The power supply worked well when > it was working, but it was apparently having problems. The fact that > the system in question would appear to be in a somewhat industrial > setting raises the question of whether there is an external power or > grounding event that is the underlying cause of the Machine Check. > > If there is a UPS involved, there could also be a problem there. > The OP should also be aware that although a machine check is usually a hardware issue, it can be caused by a faulty device driver as well. Personal experience here: I have caused VMS to issue machine checks while I have been developing VMS device drivers in the past. Simon. -- Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980's technology to a 21st century world ------------------------------ Date: Wed, 10 Sep 2008 05:34:58 -0700 (PDT) From: StraightEight Subject: Re: How do I diagnose a server that crashes every night? Message-ID: > The last client system that was having machine checks turned out to be > caused by an erratic power supply. The power supply worked well when > it was working, but it was apparently having problems. The fact that > the system in question would appear to be in a somewhat industrial > setting raises the question of whether there is an external power or > grounding event that is the underlying cause of the Machine Check. > > If there is a UPS involved, there could also be a problem there. I think you could be onto something here...as funnily enough we _had_ two VMS servers, one came back to be repaired (power supply problem!) Now mentioning UPS gets me wondering, if there is indeed a UPS (I'll need to check) I would imagine both servers would come off the same UPS...maybe machine 1 never had problems with its power supply after all! Definitely something to rule out (and perhaps in light of recent experiences something I should have considered straight away!) Many thanks. ------------------------------ Date: 10 Sep 2008 07:52:17 -0500 From: koehler@eisner.nospam.encompasserve.org (Bob Koehler) Subject: Re: How do I diagnose a server that crashes every night? Message-ID: In article <6df72036-9e6f-4a79-96cf-a841020f7b26@l64g2000hse.googlegroups.com>, StraightEight writes: > Hi, > > I have very little VMS experience but we have inherited a nice shiny > new alphaserver 250 to support (ok its not very shiny or new!) which > is located in the middle of the sea. I would think _very_ seriously about contracting a consultant who knows VMS. ------------------------------ Date: Wed, 10 Sep 2008 07:15:22 -0700 (PDT) From: DaveG Subject: Re: How do I diagnose a server that crashes every night? Message-ID: <3360773d-860d-4145-8509-21752f00e75a@m73g2000hsh.googlegroups.com> On Sep 10, 7:52=A0am, koeh...@eisner.nospam.encompasserve.org (Bob Koehler) wrote: > In article <6df72036-9e6f-4a79-96cf-a841020f7...@l64g2000hse.googlegroups= .com>, StraightEight writes: > > > Hi, > > > I have very little VMS experience but we have inherited a nice shiny > > new alphaserver 250 to support (ok its not very shiny or new!) which > > is located in the middle of the sea. > > =A0 =A0I would think _very_ seriously about contracting a consultant who > =A0 =A0knows VMS. With the OP mentioning that the system was located out to sea somewhere, I wonder what might be happening to power and/or other environmental stuff during non-daylight hours? ------------------------------ Date: Wed, 10 Sep 2008 10:59:02 -0400 From: "Richard B. Gilbert" Subject: Re: How do I diagnose a server that crashes every night? Message-ID: StraightEight wrote: > Hi, > > I have very little VMS experience but we have inherited a nice shiny > new alphaserver 250 to support (ok its not very shiny or new!) which > is located in the middle of the sea. > > EVERY night without fail, this server is crashing and restarting > itself. I'd really like to get to the bottom of this as I am being > called every morning at 3am to log in and start some services which > don't seem launch at startup despite being in the startup file ("ahh > it's always been that way...") > > Below is the FATAL BUGCHECK which I suspect is causing the machine to > reboot. The process which appears to be crashing is key to this > servers functionality. This is a relatively new problem as this box > has run itself for the past 10 years. > > I have no idea where to even begin determining the source of the > problem from this. Is anyone able to give me any pointers as to what I > should be looking for, what information I will need, and how to make > sense of it all? My VMS knowledge as I say is extremely limited, so > any commands would be useful and appreciated also. Nice to see if > anyone can help! > > Thanks > str8 > > > > ******************************* ENTRY 435. > ******************************* > ERROR SEQUENCE 432. LOGGED ON: CPU_TYPE > 00000006 > DATE/TIME 10-SEP-2008 05:27:25.87 SYS_TYPE > 0000000D > SYSTEM UPTIME: 1 DAYS 00:55:22 > SCS NODE: PIN01 OpenVMS > AXP V6.2-1H3 > > HW_MODEL: 00000000 Hardware Model = 0. > > FATAL BUGCHECK AlphaStation 250 4/266 > > MACHINECHK, Machine check while in kernel mode > > PROCESS NAME BLYSEM_I1 > PROCESS ID 0001001F > > ERROR PC FFFFFFFF 800485F8 > > Process Status = 20000000 00001F04, SW = 00, Previous Mode = > KERNEL > System State = 01, Current Mode = KERNEL > VMM = 00 IPL = 31, SP Alignment = 32 > > STACK POINTERS > > KSP 00000000 7FF91EE0 ESP 00000000 7FF96000 SSP 00000000 7FF9C100 > USP 00000000 7EE7D390 > > GENERAL REGISTERS > > R0 00000000 00000002 R1 00000000 0000940A R2 FFFFFFFF 80C2DB50 > R3 FFFFFFFF 80C04D98 R4 00000000 00000048 R5 00000000 00001F04 > R6 00000000 00000000 R7 00000000 00000001 R8 00000000 7FF9C1F8 > R9 00000000 7FF9C400 R10 00000000 7FF9D228 R11 00000000 7FFBE3E0 > R12 00000000 00000000 R13 FFFFFFFF 8326B910 R14 00000000 00000000 > R15 00000000 7EE7D498 R16 00000000 00000215 R17 00000000 00000001 > R18 00000000 00000001 R19 00000000 00000000 R20 FFFFFFFF FFFFFFF8 > R21 00000000 00000017 R22 00000000 00000100 R23 FFFFFFFF 80E08368 > R24 FFFFFFFF 80E08000 R25 00000000 00000003 R26 00000000 00000210 > R27 FFFFFFFF 80C34D60 R28 FFFFFFFF 8003B9C4 FP 00000000 7FF91EE0 > SP 00000000 7FF91EE0 PC FFFFFFFF 800485F8 PS 20000000 00001F04 > > SYSTEM REGISTERS > > PTBR 00000000 00001F19 > Page Table Base Register > PCBB 00000000 0414A080 > Privileged Context Block Base > PRBR FFFFFFFF 80E0A000 > Processor Base Register > VPTB 00000002 00000000 > Virtual Page Table Base > Register > SCBB 00000000 000001A1 > System Control Block Base > SISR 00000000 00000000 > Software Interrupt Summary > Register > ASN 00000000 0000003B > Address Space Number > ASTSR_ASTEN 00000000 0000000F > AST Summary/AST Enable > FEN 00000000 00000001 > Floating-Point Enable > IPL 00000000 0000001F > Interrupt Priority Level > MCES 00000000 00000008 > Machine Check Error Summary Well, it says "Machine Check" and that generally means a hardware problem of some sort. If you have a service contract, just pick up the phone and call for help. If not, get prior approval from whoever pays the bills and then pick up the phone and call for help! It might also help to try to find out what else happens every morning at 3:00 AM. The fact that the timing is consistent suggests that it's something happening in the environment that triggers the machine check. ------------------------------ Date: Wed, 10 Sep 2008 08:03:28 -0700 (PDT) From: Volker Halle Subject: Re: How do I diagnose a server that crashes every night? Message-ID: <4b119f0b-2778-4e0a-b9e4-014583a6cc4a@79g2000hsk.googlegroups.com> If you see MACHINECHK crashes, think of hardware problems first. There should be errlog-entries immediately preceeding the system crash. Find those and analyze them. $ ANAL/ERR/SINCE=<1-minute-before-system-crash> or look at the errors in the dump: $ ANAL/CRASH SYS$SYSTEM SDA> CLUE ERRLOG SDA> EXIT You may need to install DECevent V3.4 ( $ DIAGNOSE command ) to translate those error to meaningful text. --- Volker Halle, Invenate GmbH, OpenVMS Support An OpenVMS crashdump analysis a day makes the Windows headaches go away. ------------------------------ Date: Wed, 10 Sep 2008 08:05:51 -0700 (PDT) From: Bob Gezelter Subject: Re: How do I diagnose a server that crashes every night? Message-ID: <88aba1d6-9b21-4db0-8774-a9606c9cdc5c@m45g2000hsb.googlegroups.com> On Sep 10, 6:49 am, clubley@remove_me.eisner.decus.org-Earth.UFP (Simon Clubley) wrote: > In article <9074c354-2557-4638-961e-7860857e3...@s50g2000hsb.googlegroups.com>, Bob Gezelter writes: > > > str8, > > > I should note that it is also possible that the Machine Check and the > > active process are, in effect, not related. > > > The last client system that was having machine checks turned out to be > > caused by an erratic power supply. The power supply worked well when > > it was working, but it was apparently having problems. The fact that > > the system in question would appear to be in a somewhat industrial > > setting raises the question of whether there is an external power or > > grounding event that is the underlying cause of the Machine Check. > > > If there is a UPS involved, there could also be a problem there. > > The OP should also be aware that although a machine check is usually a > hardware issue, it can be caused by a faulty device driver as well. > > Personal experience here: I have caused VMS to issue machine checks while > I have been developing VMS device drivers in the past. > > Simon. > > -- > Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP > Microsoft: Bringing you 1980's technology to a 21st century world Simon, Indeed. When one is not careful running in Kernel mode, particularly at interrupt level, all kinds of strange results can ensue, for all kinds of reaons. My favorite was a problem on an early version of a third-party J-11- based product, there was no RESET control, it was presumed that PowerFail could do it. I pointed out that there were many situations in which PowerFail would cause a problem if there was not a valid kernel stack pointer.Oops. - Bob Gezelter, http://www.rlgsc.com ------------------------------ Date: Wed, 10 Sep 2008 11:21:59 -0400 From: "Jilly" Subject: Re: Intermittent RWSCS state Message-ID: <48c7e624$0$21316$ec3e2dad@unlimited.usenetmonster.com> You really need to look at the credit waits from the viewpoint of all the nodes. But from NODE_A POV there is an overload in talking to VAX_C. You can look at the SYSGEN parameter CLUSTER_CREDITS and set it to the max of 128 (not sure what version or platform this is available on). Additionally review the Int. Stack usage on VAX_C as all lock requests get serviced on the Int. Stack of the cpu handling the SCS interface. Review the MONITOR DLOCK output and see if VAX_C is doing more incoming lock requeust than the other nodes. For recent vintage Alpha VMS versions, you can change the interrupt cpu on a multicpu system. Review which systems 1st create the resources and review LOCKDIRWT and PE1 to see if you are limiting the movement of lock mastership. Also when rebooting systems in a cluster remember to reboot the desired lock master 1st and then the other systems after. If the lock master node has been rebooted and the other nodes have not it is likely that a number of resources will be mastered on non-optimal nodes. When booting a cluster, boot the desired lock master 1st and the other nodes after. You could also be hitting the FDDI bandwidth ceiling if there is enough SCS traffic but that is unlikely in this cluster. RWSCS means that there is a delay for this processes lock request that has to be serviced by another node. As has been said, occasional RWSCS states are normal and expected. Persistent RWSCS states point to a delay in the locking path so that includes the physical SCS medium (FDDI in your case), speed & load on the involved nodes (Int. Stack etc.) and whether lock mastership is being handled by the 'ideal' node for the resource involved.. ------------------------------ Date: 10 Sep 2008 13:08:49 GMT From: billg999@cs.uofs.edu (Bill Gunshannon) Subject: Re: Loose Cannon-dian Message-ID: <6ipv70Frs1niU2@mid.individual.net> In article , "Tom Linden" writes: > On Tue, 09 Sep 2008 09:53:33 -0700, Bill Gunshannon > wrote: > >> It's a poor workman who blames his tools. > > It is a diletante that uses inferior tools Tell that to all the people using Java. :-) bill -- Bill Gunshannon | de-moc-ra-cy (di mok' ra see) n. Three wolves billg999@cs.scranton.edu | and a sheep voting on what's for dinner. University of Scranton | Scranton, Pennsylvania | #include ------------------------------ Date: 10 Sep 2008 13:14:49 GMT From: billg999@cs.uofs.edu (Bill Gunshannon) Subject: Re: Loose Cannon-dian Message-ID: <6ipvi9Frs1niU3@mid.individual.net> In article , "Tom Linden" writes: > On Tue, 09 Sep 2008 09:54:53 -0700, Bill Gunshannon > wrote: > >> In article , >> "Tom Linden" writes: >>> On Tue, 09 Sep 2008 06:55:43 -0700, Bob Koehler >>> wrote: >>> >>>> In article , "Tom >>>> Linden" >>>> writes: >>>>> On Tue, 09 Sep 2008 05:17:28 -0700, wrote: >>>>> >>>>>> While there is much in what you say, your case is not helped by >>>>>> demonstrably dubious claims such as "There is nothing to show that >>>>>> "security" was the underlying principle in everything VMS did any >>>>>> more >>>>>> than that Unix didn't consider it at all". There's plenty of evidence >>>>>> if you look with open eyes. Native VMS code's widespread use of >>>>>> descriptors for varying-length items encourages careful programming >>>>>> and has no equivalent in Windows or any Unix I've seen (since V7, Sys >>>>>> V, and BSD4.1, I've seen a few). >>>>> >>>>> Descriptors are not part of the OS but a feature of the compilers, and >>>>> the >>>>> concept really came out of languages like PL/I and Algol, we call them >>>>> dope vectors. >>>> >>>> The use of descriptors for many of the OS APIs is part of the OS. >>>> >>> Don't wish to nitpick, but it is the selection of compilers supporting >>> such >>> constructs that is part of the OS. Languages deficient in such >>> constructs >>> were enhanced to provide that capability. OS's like Multics, Primos, >>> VOS, >>> MVS-z/os Burroughs were written in languages in which such constructs >>> are >>> an integral part of the language. >>> >> Primos? A bunch of that was written in Fortran IV. :-) > That is true, but from 18 on forward it moistly all PLP Not really. I maintained Rev 19 systems and we still had FTN and PMA. I used to have a copy but it is long gone now. I never got to work with Rev 20 so I can't say if they redid all of it by that point. But I was being very toungue-in-cheek. It was mostly PL/I code (PLP and PL/I Subset G) and a lot of fun to work with. It is as much a shame that Primos didn't really survive as it will be when VMS fades away. I expect both to experience the same fate. That is, just as Primos is still in use this long after its demise, so will VMS continue to be used long after its owners have given up the ghost. One can only hope that there will be someone to pick up the ball and run with it when that time comes as was the case with Primos. bill -- Bill Gunshannon | de-moc-ra-cy (di mok' ra see) n. Three wolves billg999@cs.scranton.edu | and a sheep voting on what's for dinner. University of Scranton | Scranton, Pennsylvania | #include ------------------------------ Date: 10 Sep 2008 13:17:50 GMT From: billg999@cs.uofs.edu (Bill Gunshannon) Subject: Re: Loose Cannon-dian Message-ID: <6ipvnuFrs1niU4@mid.individual.net> In article , Michael Kraemer writes: > Bob Koehler schrieb: >> In article , Michael Kraemer writes: >> >>>johnwallace4@yahoo.co.uk schrieb: >>> >>> >>>>If you want to compare OSes not in common use then maybe comparing an >>>>SELinux setup with a VMS setup is appropriate, but that still leaves >>>>VMS mostly ahead (others may obviously disagree). >>> >>>AFAIK: >>>Ordinary VMS has C2 security. SEVMS (sp ?) has B1. >>>Ordinary Unices have C2. Their "Trusted" variants have B1. >>> >>>So where's the difference ? The difference should be obvious. More people prefer Unix. :-) >> >> >> Where C2 and B1 don't go. >> > > That's pretty much nowhere land. > Are there widely accepted certifications beyond > orange book ? The rainbow books are being replaced by things like Common Criteria. Checkout sites like NIST and DISA for information on modern security requirements. DISA is a very good source as they even have papers and scripts to make securing systems, even Windows, very doable. bill -- Bill Gunshannon | de-moc-ra-cy (di mok' ra see) n. Three wolves billg999@cs.scranton.edu | and a sheep voting on what's for dinner. University of Scranton | Scranton, Pennsylvania | #include ------------------------------ Date: 10 Sep 2008 13:19:05 GMT From: billg999@cs.uofs.edu (Bill Gunshannon) Subject: Re: Loose Cannon-dian Message-ID: <6ipvq9Frs1niU5@mid.individual.net> In article , Michael Kraemer writes: > Tom Linden schrieb: > >> Yes, the Common Criteria E1 thru E6 > > And where on that scale is VMS ? Unless things have changed, VMS's owners have made no attempt to get rated according to Common Criteria. bill -- Bill Gunshannon | de-moc-ra-cy (di mok' ra see) n. Three wolves billg999@cs.scranton.edu | and a sheep voting on what's for dinner. University of Scranton | Scranton, Pennsylvania | #include ------------------------------ Date: 10 Sep 2008 13:23:40 GMT From: billg999@cs.uofs.edu (Bill Gunshannon) Subject: Re: Loose Cannon-dian Message-ID: <6iq02sFrs1niU6@mid.individual.net> In article , koehler@eisner.nospam.encompasserve.org (Bob Koehler) writes: > In article , Michael Kraemer writes: >> >> That's pretty much nowhere land. >> Are there widely accepted certifications beyond >> orange book ? > > Nowhere? C2, B1, ..., all were written by some folks based on thier > limited knowledge and thier specific needs. There are a lot of other > legitimate security concerns. > > For example, Windows got a C2 rating at one time, based on > limitations like no network, no floppies, ... > > So what good is a system if you can't enter or retrive data? Those ratings are for operational systems. What need is there for a network connection or floppies on a system running a power plant? One can take the system offline, connect a floppy, load and install needed upgrades and then remove the floppy, recertify and return to production as a C2 system. When one looks at things in terms of IS's instead of just a Windows box this stuff makes a lot more sense. But then, when you are so totally biased against MS, you become blind to reality. bill -- Bill Gunshannon | de-moc-ra-cy (di mok' ra see) n. Three wolves billg999@cs.scranton.edu | and a sheep voting on what's for dinner. University of Scranton | Scranton, Pennsylvania | #include ------------------------------ Date: 10 Sep 2008 13:41:01 GMT From: billg999@cs.uofs.edu (Bill Gunshannon) Subject: Re: Loose Cannon-dian Message-ID: <6iq13cFs3tqnU1@mid.individual.net> In article , "Tom Linden" writes: > On Wed, 10 Sep 2008 06:14:49 -0700, Bill Gunshannon > wrote: > >> In article , >> "Tom Linden" writes: >>> On Tue, 09 Sep 2008 09:54:53 -0700, Bill Gunshannon >>> >>> wrote: >>> >>>> In article , >>>> "Tom Linden" writes: >>>>> On Tue, 09 Sep 2008 06:55:43 -0700, Bob Koehler >>>>> wrote: >>>>> >>>>>> In article , "Tom >>>>>> Linden" >>>>>> writes: >>>>>>> On Tue, 09 Sep 2008 05:17:28 -0700, >>>>>>> wrote: >>>>>>> >>>>>>>> While there is much in what you say, your case is not helped by >>>>>>>> demonstrably dubious claims such as "There is nothing to show that >>>>>>>> "security" was the underlying principle in everything VMS did any >>>>>>>> more >>>>>>>> than that Unix didn't consider it at all". There's plenty of >>>>>>>> evidence >>>>>>>> if you look with open eyes. Native VMS code's widespread use of >>>>>>>> descriptors for varying-length items encourages careful programming >>>>>>>> and has no equivalent in Windows or any Unix I've seen (since V7, >>>>>>>> Sys >>>>>>>> V, and BSD4.1, I've seen a few). >>>>>>> >>>>>>> Descriptors are not part of the OS but a feature of the compilers, >>>>>>> and >>>>>>> the >>>>>>> concept really came out of languages like PL/I and Algol, we call >>>>>>> them >>>>>>> dope vectors. >>>>>> >>>>>> The use of descriptors for many of the OS APIs is part of the OS. >>>>>> >>>>> Don't wish to nitpick, but it is the selection of compilers supporting >>>>> such >>>>> constructs that is part of the OS. Languages deficient in such >>>>> constructs >>>>> were enhanced to provide that capability. OS's like Multics, Primos, >>>>> VOS, >>>>> MVS-z/os Burroughs were written in languages in which such constructs >>>>> are >>>>> an integral part of the language. >>>>> >>>> Primos? A bunch of that was written in Fortran IV. :-) >>> That is true, but from 18 on forward it moistly all PLP >> >> Not really. I maintained Rev 19 systems and we still had FTN and PMA. >> I used to have a copy but it is long gone now. I never got to work with >> Rev 20 so I can't say if they redid all of it by that point. >> >> But I was being very toungue-in-cheek. It was mostly PL/I code (PLP and >> PL/I Subset G) and a lot of fun to work with. It is as much a shame that >> Primos didn't really survive as it will be when VMS fades away. I expect >> both to experience the same fate. That is, just as Primos is still in >> use >> this long after its demise, so will VMS continue to be used long after >> its owners have given up the ghost. One can only hope that there will >> be someone to pick up the ball and run with it when that time comes as >> was the case with Primos. > > Does anyone maintain it? Yes. As a matter of fact, I donated my home Prime system to one of the people who is still licensed to maintain Primos. That was several years ago and he drove out here from Ohio to get it. I still keep in touch with a handful of the Prime people. It was a very nice machine although a little strange sometimes. bill -- Bill Gunshannon | de-moc-ra-cy (di mok' ra see) n. Three wolves billg999@cs.scranton.edu | and a sheep voting on what's for dinner. University of Scranton | Scranton, Pennsylvania | #include ------------------------------ Date: Wed, 10 Sep 2008 07:36:53 -0700 (PDT) From: johnwallace4@yahoo.co.uk Subject: Re: Loose Cannon-dian Message-ID: On Sep 10, 2:23 pm, billg...@cs.uofs.edu (Bill Gunshannon) wrote: > In article , > koeh...@eisner.nospam.encompasserve.org (Bob Koehler) writes: > > > In article , Michael Kraemer writes: > > >> That's pretty much nowhere land. > >> Are there widely accepted certifications beyond > >> orange book ? > > > Nowhere? C2, B1, ..., all were written by some folks based on thier > > limited knowledge and thier specific needs. There are a lot of other > > legitimate security concerns. > > > For example, Windows got a C2 rating at one time, based on > > limitations like no network, no floppies, ... > > > So what good is a system if you can't enter or retrive data? > > Those ratings are for operational systems. What need is there for a > network connection or floppies on a system running a power plant? > > One can take the system offline, connect a floppy, load and install > needed upgrades and then remove the floppy, recertify and return to > production as a C2 system. > > When one looks at things in terms of IS's instead of just a Windows > box this stuff makes a lot more sense. But then, when you are so > totally biased against MS, you become blind to reality. > > bill > > -- > Bill Gunshannon | de-moc-ra-cy (di mok' ra see) n. Three wolves > billg...@cs.scranton.edu | and a sheep voting on what's for dinner. > University of Scranton | > Scranton, Pennsylvania | #include Power plants are more networked than you seem to think, in order to (for example) automate the process of matching electricity generation against electricity demand in something approaching real time (this kind of thing used to be done by phone but the PHBs prefer things like this to be automated). And then there's also the wandering contractor with a potentially-infected laptop connected to the (maybe isolated) plant network on one side, and (maybe) via a 3G phone to the Internerd on the other side. Depending on the technologies used, this can make them more vulnerable than you seem to think, and almost certainly more vulnerable than they were prior to Windows monoculture. If the plant network is designed to be isolated when operational, it will likely still have essential Window boxes on it in places, so where will those boxes get their daily AV updates, monthly Windows updates, occasional application updates? A network connection or a removable media sneakernet, perhaps? Isolated but out of date (and requiring downtime for each update), or up to date and vulnerable. Take your pick. Perhaps you missed the GAO report in May this year which had 92 specific suggestions for IT/SCADA security improvements at the Tennessee Valley Authority (you've heard of them?) and recommendations for "best practice" elsewhere? GAO report: http://www.gao.gov/new.items/d08526.pdf Sample "IT" media coverage: http://www.theregister.co.uk/2008/05/22/electrical_grid_vulnerable/ ------------------------------ Date: 10 Sep 2008 15:47:08 GMT From: billg999@cs.uofs.edu (Bill Gunshannon) Subject: Re: Loose Cannon-dian Message-ID: <6iq8frF5ldU1@mid.individual.net> In article , johnwallace4@yahoo.co.uk writes: > On Sep 10, 2:23 pm, billg...@cs.uofs.edu (Bill Gunshannon) wrote: >> In article , >> koeh...@eisner.nospam.encompasserve.org (Bob Koehler) writes: >> >> > In article , Michael Kraemer writes: >> >> >> That's pretty much nowhere land. >> >> Are there widely accepted certifications beyond >> >> orange book ? >> >> > Nowhere? C2, B1, ..., all were written by some folks based on thier >> > limited knowledge and thier specific needs. There are a lot of other >> > legitimate security concerns. >> >> > For example, Windows got a C2 rating at one time, based on >> > limitations like no network, no floppies, ... >> >> > So what good is a system if you can't enter or retrive data? >> >> Those ratings are for operational systems. What need is there for a >> network connection or floppies on a system running a power plant? >> >> One can take the system offline, connect a floppy, load and install >> needed upgrades and then remove the floppy, recertify and return to >> production as a C2 system. >> >> When one looks at things in terms of IS's instead of just a Windows >> box this stuff makes a lot more sense. But then, when you are so >> totally biased against MS, you become blind to reality. > > Power plants are more networked than you seem to think, in order to > (for example) automate the process of matching electricity generation > against electricity demand in something approaching real time (this > kind of thing used to be done by phone but the PHBs prefer things like > this to be automated). I just used that as an example as it is one that shows up here. If, as you say, networking is required then obviously t either wouldn't be C2 or wouldn't be Windows. I was just trying to show that not having those things in production did not mean they could not be available in a C2 rated IS. > And then there's also the wandering contractor > with a potentially-infected laptop connected to the (maybe isolated) > plant network on one side, The statement was C2 + Windows = "no network" so, not a problem. Obviously, a lot more goes into maintaining C2 systems than your home PC but it is done every day. > and (maybe) via a 3G phone to the Internerd > on the other side. > > Depending on the technologies used, this can make them more vulnerable > than you seem to think, and almost certainly more vulnerable than they > were prior to Windows monoculture. If the plant network is designed to > be isolated when operational, it will likely still have essential > Window boxes on it in places, so where will those boxes get their > daily AV updates, monthly Windows updates, occasional application > updates? You missed the most important point. "No Network". Obviously, C2 rated systems do not get "daily AV updates, monthly Windows updates, occasional application updates" in the same manner as your home PC. Tell me something? Can you get to any of the PC's currently being used by the military in Iraq? Do you think they are not running Windows? Do you think they don't get kept up to date for things like AV and Windows Updates? > A network connection or a removable media sneakernet, > perhaps? Isolated but out of date (and requiring downtime for each > update), or up to date and vulnerable. Take your pick. If it is not connected to the outside world in any way and it only runs one task, vulnerable to what? You guys really need to change your mindset and accept that there are secure Windows Systems running all over the world. I know, I just had to go back to school (again) to have my skills refreshed on how this is being done. > > Perhaps you missed the GAO report in May this year which had 92 > specific suggestions for IT/SCADA security improvements at the > Tennessee Valley Authority (you've heard of them?) and recommendations > for "best practice" elsewhere? Don't know anything about TVA but I doubt C2 is one of their requirements for an IS. And that was what was being discussed. > > GAO report: http://www.gao.gov/new.items/d08526.pdf > Sample "IT" media coverage: http://www.theregister.co.uk/2008/05/22/electrical_grid_vulnerable/ bill -- Bill Gunshannon | de-moc-ra-cy (di mok' ra see) n. Three wolves billg999@cs.scranton.edu | and a sheep voting on what's for dinner. University of Scranton | Scranton, Pennsylvania | #include ------------------------------ Date: Wed, 10 Sep 2008 03:53:30 -0400 From: JF Mezei Subject: Re: OT: Message to Mr VAXman Message-ID: <48c77d70$0$1537$c3e8da3@news.astraweb.com> Another article about Scientology for Mr VAXman: (scientology using DCMA to force youtube to take down videos that are critical of their business/sect/whatever) http://arstechnica.com/news.ars/post/20080908-scientology-fights-critics-with-4000-dmca-takedown-notices.html ------------------------------ Date: Wed, 10 Sep 2008 09:10:53 +0200 From: Michael Kraemer Subject: Re: OT: The end of the world in roughly 3 hours Message-ID: JF Mezei schrieb: > ... and lets hope that > they doN't rely on Windows to run it. I wouldn't hold my breath. ------------------------------ Date: Wed, 10 Sep 2008 09:57:42 +0200 From: Joseph Huber Subject: Re: OT: The end of the world in roughly 3 hours Message-ID: Michael Kraemer wrote: > JF Mezei schrieb: > >> ... and lets hope that >> they doN't rely on Windows to run it. > > I wouldn't hold my breath. > > Well yes, they do, and we do: see my snapshot of a small part of our liquid argon calorimeter detector control system http://wwwvms.mppmu.mpg.de/~huber/atlas_lar_ready_for_beam.jpg waiting for beam ... -- Joseph Huber - http://www.huber-joseph.de ------------------------------ Date: Wed, 10 Sep 2008 10:07:42 +0200 From: Michael Kraemer Subject: Re: OT: The end of the world in roughly 3 hours Message-ID: Joseph Huber schrieb: > Michael Kraemer wrote: > >> JF Mezei schrieb: >> >>> ... and lets hope that >>> they doN't rely on Windows to run it. >> >> >> I wouldn't hold my breath. >> >> > > Well yes, they do, and we do: > see my snapshot of a small part of our liquid argon calorimeter > detector control system > http://wwwvms.mppmu.mpg.de/~huber/atlas_lar_ready_for_beam.jpg > waiting for beam ... Well, that's just a snapshot of a particular experiment, CERN is large, and not too long ago, the mantra of their IT bosses was that WindozeNT will take over everything. What are the accelerator controls running on ? ------------------------------ Date: Wed, 10 Sep 2008 01:12:54 -0700 (PDT) From: IanMiller Subject: Re: OT: The end of the world in roughly 3 hours Message-ID: <131f1dfd-249d-4783-ac9e-4a46d9c049f0@d45g2000hsc.googlegroups.com> Follow the excitement at http://www.bbc.co.uk/radio4/bigbang/ ------------------------------ Date: Wed, 10 Sep 2008 10:34:00 +0200 From: Joseph Huber Subject: Re: OT: The end of the world in roughly 3 hours Message-ID: Michael Kraemer wrote: > Well, that's just a snapshot of a particular experiment, CERN is large, > and not too long ago, the mantra of their IT bosses was that > WindozeNT will take over everything. > What are the accelerator controls running on ? > Well I'm not near enough to know. The hard core is certainly running on hard realtime systems, the controls levels above are a mixture of Windows and Linux systems. There is a common SCADA system in use both for LHC and detector control , which runs on Windows and Linux. Windows has not taken over everything, but in admin and engineering almost. -- Joseph Huber - http://www.huber-joseph.de ------------------------------ Date: Wed, 10 Sep 2008 02:09:17 -0700 (PDT) From: johnwallace4@yahoo.co.uk Subject: Re: OT: The end of the world in roughly 3 hours Message-ID: <8095b927-dfc6-466e-9601-3fccce13018e@k30g2000hse.googlegroups.com> On Sep 10, 9:34 am, Joseph Huber wrote: > Michael Kraemer wrote: > > Well, that's just a snapshot of a particular experiment, CERN is large, > > and not too long ago, the mantra of their IT bosses was that > > WindozeNT will take over everything. > > What are the accelerator controls running on ? > > Well I'm not near enough to know. The hard core is certainly running on > hard realtime systems, the controls levels above are a mixture of > Windows and Linux systems. > There is a common SCADA system in use both for LHC and detector control > , which runs on Windows and Linux. > > Windows has not taken over everything, but in admin and engineering almost. > > -- > > Joseph Huber -http://www.huber-joseph.de "Common SCADA system" = National Instruments Labview, right ? As per http://sine.ni.com/cs/app/doc/p/id/cs-10795 ? There are lots of folks who'd consider themselves in the SCADA industry who wouldn't class Labview as a SCADA package, but... ------------------------------ Date: Wed, 10 Sep 2008 05:15:03 -0400 From: JF Mezei Subject: Re: OT: The end of the world in roughly 3 hours Message-ID: <48c79086$0$12394$c3e8da3@news.astraweb.com> AN UPDATE: The universe still exists today because all they are doing is getting a few particles to move in one direction. No collisions planned in short term. They'll next try to get a few particles to go in the opposite direction. Collisions will happen much later when they get particles to flow in opposite directions at the same time. (and they have to fine tune their guidance system so that particles flowing in opposite directions will hit each other. So what they have done today is what I used to do at the montreal velodrome (before it was savagely destroyed by politicians): go round and round hopefully without a collision... BTW: $ curl -I http://www.cern.ch HTTP/1.1 302 Found Date: Wed, 10 Sep 2008 09:11:05 GMT Server: Microsoft-IIS/6.0 X-Powered-By: ASP.NET X-AspNet-Version: 1.1.4322 Location: http://public.web.cern.ch/public/ Cache-Control: private Content-Type: text/html; charset=utf-8 Content-Length: 150 And I thought CERN was populated by intelligent and educated people who would know not to use microsoft products. ------------------------------ Date: Wed, 10 Sep 2008 11:53:39 +0200 From: Michael Kraemer Subject: Re: OT: The end of the world in roughly 3 hours Message-ID: JF Mezei schrieb: > BTW: > > $ curl -I http://www.cern.ch > HTTP/1.1 302 Found > Date: Wed, 10 Sep 2008 09:11:05 GMT > Server: Microsoft-IIS/6.0 > X-Powered-By: ASP.NET > X-AspNet-Version: 1.1.4322 > Location: http://public.web.cern.ch/public/ > Cache-Control: private > Content-Type: text/html; charset=utf-8 > Content-Length: 150 > > > And I thought CERN was populated by intelligent and educated people who > would know not to use microsoft products. well, being cynical, one could ask: And these are the same people telling us there would be absolutely no problems with black holes ? :-) ------------------------------ Date: Wed, 10 Sep 2008 12:01:17 +0200 From: Joseph Huber Subject: Re: OT: The end of the world in roughly 3 hours Message-ID: johnwallace4@yahoo.co.uk wrote: > "Common SCADA system" = National Instruments Labview, right ? As per > http://sine.ni.com/cs/app/doc/p/id/cs-10795 ? > > There are lots of folks who'd consider themselves in the SCADA > industry who wouldn't class Labview as a SCADA package, but... No, definitely not Labview, it is PVSS from ETM (http://www.etm.at/). -- Joseph Huber - http://www.huber-joseph.de ------------------------------ Date: Wed, 10 Sep 2008 06:13:16 -0400 From: JF Mezei Subject: Re: OT: The end of the world in roughly 3 hours Message-ID: <48c79e2a$0$4544$c3e8da3@news.astraweb.com> Joseph Huber wrote: > No, definitely not Labview, it is PVSS from ETM (http://www.etm.at/). > This product is said to work on Windows, Linux and Solaris. I can see commonality between Linux and SOlaris, they both use X and various graphical toolkits are available on both. How do they also support Windows ? Isn't this a major headache to support both Unix and Windows for some serious application when you consider how different Windows is from the rest of the world ? ------------------------------ Date: Wed, 10 Sep 2008 13:37:44 +0200 From: Joseph Huber Subject: Re: OT: The end of the world in roughly 3 hours Message-ID: JF Mezei wrote: > Joseph Huber wrote: > >> No, definitely not Labview, it is PVSS from ETM (http://www.etm.at/). >> > > This product is said to work on Windows, Linux and Solaris. > > I can see commonality between Linux and SOlaris, they both use X and > various graphical toolkits are available on both. > > How do they also support Windows ? Isn't this a major headache to > support both Unix and Windows for some serious application when you > consider how different Windows is from the rest of the world ? The first years we used it, the graphical interface was quite different between linux and Windows. They now have an intermediate graphic layer ( Qt ?), so the graphics part now has a rather common software base on both system worlds. And for datahandling and communication, if there are well defined protocols behind (ODBC,OPC,..), then I think one doesn't have to maintain too diverse code streams. PVSS consists of a rather small set of programs (called "managers"), and the application work is plugged in as "scripts" in form of (extended-) C code, which is interpreted by the control managers. There are mainly device drivers tight to the different systems. And here is also one reasons why some subsystems are bound to Windows. The subproject I'm working e.g. has some CANbus driver and OPC interfaces not available for Linux. -- Joseph Huber - http://www.huber-joseph.de ------------------------------ Date: 10 Sep 2008 07:49:48 -0500 From: koehler@eisner.nospam.encompasserve.org (Bob Koehler) Subject: Re: OT: The end of the world in roughly 3 hours Message-ID: In article <48c74eaa$0$12359$c3e8da3@news.astraweb.com>, JF Mezei writes: > The Large Hadron Collider will first be activated this morning at 09:00 > Central European Time (GMT + 2). > > It is expected that a new universe will be created inside the LHC > (lasting an eternity for the people in it, but mere millionth of a > second for us) and it is possible that it will also create a black hole > that will suck up the earth (like the vaccuum cleaner that sucks the > pink panther and then sucks itself out of existance) They didn't even turn on the opposing beam yet. It's really hard to get collisions when all the protons are running in the same direction. Creating a new universe is an extreem and misleading description of what is expected to happen when they do get collisions. And the possibility of destroying the Earth is just fodder for headline seeking "journalists". ------------------------------ Date: Wed, 10 Sep 2008 07:03:12 -0700 (PDT) From: DaveG Subject: Re: OT: The end of the world in roughly 3 hours Message-ID: <4a6998f0-e341-4178-a7ae-eacd7d54b8f5@8g2000hse.googlegroups.com> On Sep 10, 7:49=A0am, koeh...@eisner.nospam.encompasserve.org (Bob Koehler) wrote: > In article <48c74eaa$0$12359$c3e8...@news.astraweb.com>, JF Mezei writes: > > > The Large Hadron Collider will first be activated this morning at 09:00 > > Central European Time (GMT + 2). > > > It is expected that a new universe will be created inside the LHC > > (lasting an eternity for the people in it, but mere millionth of a > > second for us) and it is possible that it will also create a black hole > > that will suck up the earth (like the vaccuum cleaner that sucks =A0the > > pink panther and then sucks itself out of existance) > > =A0 =A0They didn't even turn on the opposing beam yet. =A0It's really har= d > =A0 =A0to get collisions when all the protons are running in the same > =A0 =A0direction. > > =A0 =A0Creating a new universe is an extreem and misleading description > =A0 =A0of what is expected to happen when they do get collisions. > > =A0 =A0And the possibility of destroying the Earth is just fodder for > =A0 =A0headline seeking "journalists". Can't comment on the end of the world or Windows/Linux stuff but I will pass this on. At a recent DECUS-->Encompass-->Connect LUG (now chapter) meeting a brief discussion of the LHC took place. Guy from Fermi Lab said that the unit must be shut down for 3 months of the year during the heating season. Electrons are used to heat homes, etc and with the LHC using the Euro grid to power their new machine, there won't be enough electrons to both run the collider and keep people warm during the heating season. ------------------------------ Date: Wed, 10 Sep 2008 10:40:34 -0400 From: "Richard B. Gilbert" Subject: Re: OT: The end of the world in roughly 3 hours Message-ID: <5LmdnRAOyepDQVrVnZ2dnUVZ_h-dnZ2d@comcast.com> JF Mezei wrote: > The Large Hadron Collider will first be activated this morning at 09:00 > Central European Time (GMT + 2). > > It is expected that a new universe will be created inside the LHC > (lasting an eternity for the people in it, but mere millionth of a > second for us) and it is possible that it will also create a black hole > that will suck up the earth (like the vaccuum cleaner that sucks the > pink panther and then sucks itself out of existance) > > It is unclear what effect the LHC experiment may have on the connection > between my universe and the one where most of comp.os.vms lives in. > > BBC said that they won't have it at full power today and it will take a > year before they risk running at at full power. > > > www.cern.ch is the official website. As in any modern event, they are to > have a live webcast. It is not clear what's we'll see in it. It is not > clear if we will actually hear a "big bang". > > Good luck to all those who worked on that project, and lets hope that > they doN't rely on Windows to run it. And remember that, as in any > science fiction movie, all the lights in the world will dim when they > turn the power on to the collider :-) :-) :-) :-) Well, the world has clearly survived. Unless, of course, I'm hallucinating all this! ------------------------------ Date: Wed, 10 Sep 2008 15:52:55 +0000 (UTC) From: helbig@astro.multiCLOTHESvax.de (Phillip Helbig---remove CLOTHES to reply) Subject: Re: OT: The end of the world in roughly 3 hours Message-ID: In article <48c79086$0$12394$c3e8da3@news.astraweb.com>, JF Mezei writes: > The universe still exists today because all they are doing is getting a > few particles to move in one direction. I didn't reply at first since it was off-topic, but since so many others have.... There are a few folks who really think the world is coming to an end, since the LHC might produce black holes. It might, but they won't grow by sucking in the Earth, since they will quickly decay via Hawking radiation (the smaller the black hole, the FASTER it decays). The best demonstration to realise that there is no danger is to understand that the LHC wasn't built to produce something which has never existed (on Earth) before---although this is often claimed by the media---, but rather to study it in detail. Cosmic rays regularly reach energies well in excess of anything the LHC is capable of. Some elementary particles were first detected by studying the aftermath of the collisions of cosmic rays with the atmosphere. (Q: Why not use cosmic rays instead of artificial collisions for the current experiments? A: Because these dectectors which weigh as much as the Eiffel tower won't fit in a hot-air balloon.) ------------------------------ Date: Wed, 10 Sep 2008 15:55:59 +0000 (UTC) From: helbig@astro.multiCLOTHESvax.de (Phillip Helbig---remove CLOTHES to reply) Subject: Re: OT: The end of the world in roughly 3 hours Message-ID: In article <4a6998f0-e341-4178-a7ae-eacd7d54b8f5@8g2000hse.googlegroups.com>, DaveG writes: > At a recent DECUS-->Encompass-->Connect LUG (now chapter) meeting a > brief discussion of the LHC took place. Guy from Fermi Lab said that > the unit must be shut down for 3 months of the year during the heating > season. Electrons are used to heat homes, etc and with the LHC using > the Euro grid to power their new machine, there won't be enough > electrons to both run the collider and keep people warm during the > heating season. I don't know if this is true, but if so, the bottleneck is not electrons, but rather power. I think at one point it was debated whether CERN should have its own power plant. At DESY in Hamburg, which is a similar institution, the average power consumption amounts to 2% of the city of Hamburg (where DESY is located), which has over a million inhabitants. ------------------------------ Date: Wed, 10 Sep 2008 09:05:51 -0700 (PDT) From: DaveG Subject: Re: OT: The end of the world in roughly 3 hours Message-ID: On Sep 10, 10:55=A0am, hel...@astro.multiCLOTHESvax.de (Phillip Helbig--- remove CLOTHES to reply) wrote: > In article > <4a6998f0-e341-4178-a7ae-eacd7d54b...@8g2000hse.googlegroups.com>, DaveG > > writes: > > At a recent DECUS-->Encompass-->Connect LUG (now chapter) meeting a > > brief discussion of the LHC took place. =A0Guy from Fermi Lab said that > > the unit must be shut down for 3 months of the year during the heating > > season. =A0Electrons are used to heat homes, etc and with the LHC using > > the Euro grid to power their new machine, there won't be enough > > electrons to both run the collider and keep people warm during the > > heating season. > > I don't know if this is true, but if so, the bottleneck is not > electrons, but rather power. =A0I think at one point it was debated > whether CERN should have its own power plant. =A0At DESY in Hamburg, whic= h > is a similar institution, the average power consumption amounts to 2% of > the city of Hamburg (where DESY is located), which has over a million > inhabitants. power =3D=3D electrons in my note. Sorry for the confusion. ------------------------------ Date: Wed, 10 Sep 2008 10:11:58 -0700 (PDT) From: "winston19842005@yahoo.com" Subject: Re: OT: The end of the world in roughly 3 hours Message-ID: <95f6dfea-f2ff-47e5-9fc5-d37979d99efe@k7g2000hsd.googlegroups.com> On Sep 10, 10:40=A0am, "Richard B. Gilbert" wrote: > JF Mezei wrote: > > The Large Hadron Collider will first be activated this morning at 09:00 > > Central European Time (GMT + 2). > > > It is expected that a new universe will be created inside the LHC > > (lasting an eternity for the people in it, but mere millionth of a > > second for us) and it is possible that it will also create a black hole > > that will suck up the earth (like the vaccuum cleaner that sucks =A0the > > pink panther and then sucks itself out of existance) > > > It is unclear what effect the LHC experiment may have on the connection > > between my universe and the one where most of comp.os.vms lives in. > > > BBC said that they won't have it at full power today and it will take a > > year before they risk running at at full power. > > >www.cern.chis the official website. As in any modern event, they are to > > have a live webcast. It is not clear what's we'll see in it. It is not > > clear if we will actually hear a "big bang". > > > Good luck to all those who worked on that project, and lets hope that > > they doN't rely on Windows to run it. And remember that, as in any > > science fiction movie, all the lights in the world will dim when they > > turn the power on to the collider :-) :-) :-) :-) > > Well, the world has clearly survived. =A0Unless, of course, I'm > hallucinating all this!- Hide quoted text - > > - Show quoted text - No, actually I exist, and the rest of you are figments of my imagination. Or at least that is what my wife is telling me, as she studies philosophy. Or at least that is what my mind is telling me, that I have a wife that is studying philosophy who is telling me that I don't exist, yet I do and no one else does, they are only projections of my mind... ------------------------------ Date: Wed, 10 Sep 2008 11:44:50 -0400 From: norm.raphael@metso.com Subject: Re: Pipe search of command procedure log file containing pipe search command. co Message-ID: This is a multipart message in MIME format. --=_alternative 00567F36852574C0_= Content-Type: text/plain; charset="US-ASCII" FrankS wrote on 09/10/2008 10:25:54 AM: > On Sep 10, 9:40 am, norm.raph...@metso.com wrote: > > Is there a better way to do this? > > Why not just check the status condition after each CONVERT completes? When the IF-STATEMENT finds a dup, it takes all the collected error lines and e-mails them. Constructing something to store all that would be far more complex and/or require an output temporary file, which is what the pipe search is employed to eliminate. I don't want just to know if there were dups, but to e-mail them. The case here is/was somewhat simplified to uncomplicate it. Nothing yet explains why it works sometimes and fails other times. > > $ SET NoON > $ CVT_DUP = %X > $ DUP_OCCURS = 0 > $ > $ CONVERT > $ IF ($STATUS .EQ. CVT_DUP) THEN GOSUB RECORD_OCCURANCE > $ IF (.NOT. $STATUS) THEN GOTO KABOOM > $ > $ CONVERT > $ IF ($STATUS .EQ. CVT_DUP) THEN GOSUB RECORD_OCCURANCE > $ IF (.NOT. $STATUS) THEN GOTO KABOOM > $ > . > . Repeat however many $CONVERTs you need > . > $ > $ IF (DUP_OCCURS .NE. 0) > $ THEN > $ > $ ENDIF > $ EXIT > $ > $ KABOOM: > $ > $ EXIT > $ > $ RECORD_OCCURANCE: > $ DUP_OCCURS = DUP_OCCURS + 1 > $ RETURN > $ --=_alternative 00567F36852574C0_= Content-Type: text/html; charset="US-ASCII"



FrankS <sapienza@noesys.com> wrote on 09/10/2008 10:25:54 AM:

> On Sep 10, 9:40 am, norm.raph...@metso.com wrote:
> > Is there a better way to do this?  
>
> Why not just check the status condition after each CONVERT completes?


When the IF-STATEMENT finds a dup, it takes all the collected error  
lines and e-mails them.  Constructing something to store all that
would be far more complex and/or require an output temporary file,
which is what the pipe search is employed to eliminate.  I don't want
just to know if there were dups, but to e-mail them.
 
The case here is/was somewhat simplified to uncomplicate it.


Nothing yet explains why it works sometimes and fails other times.

>
> $ SET NoON
> $ CVT_DUP = %X<put whatever code CONVERT-I-DUP is here>
> $ DUP_OCCURS = 0
> $
> $ CONVERT <do your convert here>
> $ IF ($STATUS .EQ. CVT_DUP) THEN GOSUB RECORD_OCCURANCE
> $ IF (.NOT. $STATUS) THEN GOTO KABOOM
> $
> $ CONVERT <do your convert here>
> $ IF ($STATUS .EQ. CVT_DUP) THEN GOSUB RECORD_OCCURANCE
> $ IF (.NOT. $STATUS) THEN GOTO KABOOM
> $
> .
> .  Repeat however many $CONVERTs you need
> .
> $
> $ IF (DUP_OCCURS .NE. 0)
> $ THEN
> $      <do whatever you need in the event of duplicates here>
> $ ENDIF
> $ EXIT
> $
> $ KABOOM:
> $ <put error condition handling here>
> $ EXIT
> $
> $ RECORD_OCCURANCE:
> $ DUP_OCCURS = DUP_OCCURS + 1
> $ RETURN
> $
--=_alternative 00567F36852574C0_=-- ------------------------------ Date: Wed, 10 Sep 2008 07:25:54 -0700 (PDT) From: FrankS Subject: Re: Pipe search of command procedure log file containing pipe search command. co Message-ID: On Sep 10, 9:40=A0am, norm.raph...@metso.com wrote: > Is there a better way to do this? =A0 Why not just check the status condition after each CONVERT completes? $ SET NoON $ CVT_DUP =3D %X $ DUP_OCCURS =3D 0 $ $ CONVERT $ IF ($STATUS .EQ. CVT_DUP) THEN GOSUB RECORD_OCCURANCE $ IF (.NOT. $STATUS) THEN GOTO KABOOM $ $ CONVERT $ IF ($STATUS .EQ. CVT_DUP) THEN GOSUB RECORD_OCCURANCE $ IF (.NOT. $STATUS) THEN GOTO KABOOM $ . . Repeat however many $CONVERTs you need . $ $ IF (DUP_OCCURS .NE. 0) $ THEN $ $ ENDIF $ EXIT $ $ KABOOM: $ $ EXIT $ $ RECORD_OCCURANCE: $ DUP_OCCURS =3D DUP_OCCURS + 1 $ RETURN $ ------------------------------ Date: Wed, 10 Sep 2008 09:40:04 -0400 From: norm.raphael@metso.com Subject: Pipe search of command procedure log file containing pipe search command. Message-ID: This is a multipart message in MIME format. --=_alternative 004B124F852574C0_= Content-Type: text/plain; charset="US-ASCII" Here is a code fragment designed to search the running command procedure to see if any of the converts in the log got duplicate error messages. The second line eliminates matches of the pipe search command itself. ====== $ proc = f$environment("procedure") $ proc_name = f$parse(proc,,,"name") $! [snip] $ pipe search 'proc_name'.log; "%CONVERT-I-DUP," /mat=or | - sear sys$input "pipe search" /match=nor | - ( read sys$pipe ema ; DEFINE /JOB /nolog email_rec &ema ) $ email_rec=- f$edit(f$trnlnm("EMAIL_REC","LNM$JOB"),"compress,trim") $ sho sym email_rec $ Deassign/job email_rec $if f$locate("NO STRING",f$edit(email_rec,"UPCASE")) .eq. f$length(email_rec) ====== Here is the log file from a normal run. The symbol EMAIL_REC contains the expected result of the search when there are no duplicate error messages. ====== $ pipe search GET_ORDERS_AM.log; "%CONVERT-I-DUP," /mat=or | - sear sys$input "pipe search" /match=nor | - ( read sys$pipe ema ; DEFINE /JOB /nolog email_rec &ema ) $ email_rec=- f$edit(f$trnlnm("EMAIL_REC","LNM$JOB"),"compress,trim") $ sho sym email_rec EMAIL_REC = "%SEARCH-I-NOMATCHES, no strings matched" $ Deassign/job email_rec $if f$locate("NO STRING",f$edit(email_rec,"UPCASE")) .eq. f$length(email_rec) $! [snip] USER1 job terminated at 10-SEP-2008 07:17:21.69 Accounting information: Buffered I/O count: 8816 Peak working set size: 22240 Direct I/O count: 3107 Peak virtual size: 234528 Page faults: 11238 Mounted volumes: 0 Charged CPU time: 0 00:00:06.07 Elapsed time: 0 01:17:21.69 ====== Here is the log file from a failed run. The symbol EMAIL_REC here contains a null-string even though there are no duplicate error messages. ====== $ pipe search CONVERT_FILES.log; "%CONVERT-I-DUP," /mat=or | - sear sys$input "pipe search" /match=nor | - ( read sys$pipe ema ; DEFINE /JOB /nolog email_rec &ema ) $ email_rec=- f$edit(f$trnlnm("EMAIL_REC","LNM$JOB"),"compress,trim") $ sho sym email_rec EMAIL_REC = "" $ Deassign/job email_rec $if f$locate("NO STRING",f$edit(email_rec,"UPCASE")) .eq. f$length(email_rec) $! [snip] USER1 job terminated at 8-SEP-2008 07:18:56.44 Accounting information: Buffered I/O count: 13624 Peak working set size: 22224 Direct I/O count: 9226 Peak virtual size: 234544 Page faults: 25189 Mounted volumes: 0 Charged CPU time: 0 00:00:14.57 Elapsed time: 0 01:18:56.45 ====== Is this a race condition? Can it be fixed to provide expected results every run? Is there a better way to do this? --=_alternative 004B124F852574C0_= Content-Type: text/html; charset="US-ASCII"
Here is a code fragment designed to search the running command procedure to
see if any of the converts in the log got duplicate error messages.  The
second line eliminates matches of the pipe search command itself.
======
$ proc = f$environment("procedure")
$ proc_name = f$parse(proc,,,"name")
$! [snip]
$ pipe search 'proc_name'.log;  "%CONVERT-I-DUP," /mat=or | -
  sear sys$input "pipe search" /match=nor | -
( read sys$pipe ema ; DEFINE /JOB /nolog email_rec &ema )
$ email_rec=-
  f$edit(f$trnlnm("EMAIL_REC","LNM$JOB"),"compress,trim")
$ sho sym email_rec
$ Deassign/job email_rec
$if f$locate("NO STRING",f$edit(email_rec,"UPCASE")) eq. f$length(email_rec)
======
Here is the log file from a normal run.
The symbol EMAIL_REC contains the expected result of the search when there
are no duplicate error messages.
======
$ pipe search GET_ORDERS_AM.log;  "%CONVERT-I-DUP," /mat=or | -
  sear sys$input "pipe search" /match=nor | -
( read sys$pipe ema ; DEFINE /JOB /nolog email_rec &ema )
$ email_rec=-
  f$edit(f$trnlnm("EMAIL_REC","LNM$JOB"),"compress,trim")
$ sho sym email_rec
  EMAIL_REC = "%SEARCH-I-NOMATCHES, no strings matched"
$ Deassign/job email_rec
$if f$locate("NO STRING",f$edit(email_rec,"UPCASE")) eq. f$length(email_rec)
$! [snip]
  USER1     job terminated at 10-SEP-2008 07:17:21.69
<CR><LF>  Accounting information:
  Buffered I/O count:               8816      Peak working set size:      22240
  Direct I/O count:                 3107      Peak virtual size:         234528
  Page faults:                     11238      Mounted volumes:                0
  Charged CPU time:        0 00:00:06.07      Elapsed time:       0 01:17:21.69
======
Here is the log file from a failed run.
The symbol EMAIL_REC here contains a null-string even though there
are no duplicate error messages.
======
$ pipe search CONVERT_FILES.log;  "%CONVERT-I-DUP," /mat=or | -
  sear sys$input "pipe search" /match=nor | -
( read sys$pipe ema ; DEFINE /JOB /nolog email_rec &ema )
$ email_rec=-
  f$edit(f$trnlnm("EMAIL_REC","LNM$JOB"),"compress,trim")
$ sho sym email_rec
  EMAIL_REC = ""
$ Deassign/job email_rec
$if f$locate("NO STRING",f$edit(email_rec,"UPCASE")) eq. f$length(email_rec)
$! [snip]
  USER1     job terminated at  8-SEP-2008 07:18:56.44
<CR><LF>  Accounting information:
  Buffered I/O count:              13624      Peak working set size:      22224
  Direct I/O count:                 9226      Peak virtual size:         234544
  Page faults:                     25189      Mounted volumes:                0
  Charged CPU time:        0 00:00:14.57      Elapsed time:       0 01:18:56.45
======
Is this a race condition?  Can it be fixed to provide expected results every run?
Is there a better way to do this?  


 
--=_alternative 004B124F852574C0_=-- ------------------------------ Date: Wed, 10 Sep 2008 09:41:11 +0100 From: "Richard Brodie" Subject: Re: Security alarm msg Message-ID: "Tom Linden" wrote in message news:op.ug2boszghv4qyg@murphus.hsd1.ca.comcast.net... >I noted following on opcon. Why is the remote node id in decimal format? > Remote node id: 998090410 It's stored as an integer in the binary log and the formatter in ANALYZE/AUDIT only understands DECnet addresses. ------------------------------ Date: Wed, 10 Sep 2008 07:25:27 -0700 (PDT) From: Peter Weaver Subject: Spinning down an old disk array Message-ID: A customer is planning on doing some maintenance at their data centre. As part of the maintenance their HSZ80 and HSJ50 disk sub-systems will have their power cut off. Since most of these disks have been constantly spinning for the past 8 or 9 years the customer is concerned about the disks spinning again after power is restored. Most disks are DR-RZ1FC-VW and some are RZ29. Some people here feel that as long as the power is off for only 10 or 15 minutes that the disks should spin up again after power is restored. Some people here feel that even if the power is off for a few seconds that we risk having disks not spin again. Does anyone have any experience with turning off the power on disks that have been running for years? What percentage of disks should we expect to have fail after; - a few seconds - a few minutes - 10 minutes - 15 minutes Peter ------------------------------ Date: Wed, 10 Sep 2008 07:30:23 -0700 (PDT) From: FrankS Subject: Re: Spinning down an old disk array Message-ID: <41398b6b-9d11-4f83-adf4-cbba80725c29@e39g2000hsf.googlegroups.com> On Sep 10, 10:25=A0am, Peter Weaver wrote: > Does anyone have any experience with turning off the power on disks > that have been running for years? What percentage of disks should we > expect to have fail after; ... Yes to the first part. I wouldn't say frequently, but certainly I have turned off complete disk arrays for maintenance and then powered them right back up again without incident. Too random on the second part. In my experience: none failed. In fact, I'd say I've had better experience with the older 5400rpm drives than newer 10k or 15k drives. ------------------------------ Date: Wed, 10 Sep 2008 08:01:45 -0700 (PDT) From: Bob Gezelter Subject: Re: Spinning down an old disk array Message-ID: On Sep 10, 9:25 am, Peter Weaver wrote: > A customer is planning on doing some maintenance at their data centre. > As part of the maintenance their HSZ80 and HSJ50 disk sub-systems will > have their power cut off. Since most of these disks have been > constantly spinning for the past 8 or 9 years the customer is > concerned about the disks spinning again after power is restored. > > Most disks are DR-RZ1FC-VW and some are RZ29. > > Some people here feel that as long as the power is off for only 10 or > 15 minutes that the disks should spin up again after power is > restored. Some people here feel that even if the power is off for a > few seconds that we risk having disks not spin again. > > Does anyone have any experience with turning off the power on disks > that have been running for years? What percentage of disks should we > expect to have fail after; > - a few seconds > - a few minutes > - 10 minutes > - 15 minutes > > Peter Peter, The original post does not indicate how many of these drives are in stripes, mirrors, and other flavors of RAID. For certain, particularly because the term "maintenance" includes much real estate (including power and water), I would recommend that backups be up-to-date and off- site during the "maintenance". That said, I have not seen particularly bad experiences caused by a single power down. In my experience, most of the interesting problems come on sites where power-up/power-down is a chronic cycle, and the cumulative wear and tear does cause failures. It also has a tendency to uncover out-of-date batteries in various devices. Perhaps one of the more overlooked checklist items is making sure that systems and controllers have up to date NVRAM and other batteries. Spare batteries would not be a bad idea, as is using the opportunity to change batteries for fresh ones while the systems are powered down. - Bob Gezelter, http://www.rlgsc.com ------------------------------ Date: Wed, 10 Sep 2008 11:24:41 -0400 From: "Richard B. Gilbert" Subject: Re: Spinning down an old disk array Message-ID: Peter Weaver wrote: > A customer is planning on doing some maintenance at their data centre. > As part of the maintenance their HSZ80 and HSJ50 disk sub-systems will > have their power cut off. Since most of these disks have been > constantly spinning for the past 8 or 9 years the customer is > concerned about the disks spinning again after power is restored. > > Most disks are DR-RZ1FC-VW and some are RZ29. > > Some people here feel that as long as the power is off for only 10 or > 15 minutes that the disks should spin up again after power is > restored. Some people here feel that even if the power is off for a > few seconds that we risk having disks not spin again. > > Does anyone have any experience with turning off the power on disks > that have been running for years? What percentage of disks should we > expect to have fail after; > - a few seconds > - a few minutes > - 10 minutes > - 15 minutes > > > Peter Sooner or later EVERY disk will fail! People use BACKUP to ensure that no data is lost. Various forms of RAID are used to ensure that access to data is not lost. Ideally, you should have spares on hand for each make and model of disk drive in use. It's easy if all your disks are StorageWorks; just pop a failed drive out of its socket and plug in a new one. If anything fails at power on, I would expect it to happen within the first sixty seconds or less. ------------------------------ End of INFO-VAX 2008.497 ************************