INFO-VAX	Wed, 10 Sep 2008	Volume 2008 : Issue 497

   Contents:
Can't read unzipped Monitor files
Re: Did Windows just cry "Uncle"?
How do I diagnose a server that crashes every night?
Re: How do I diagnose a server that crashes every night?
Re: How do I diagnose a server that crashes every night?
Re: How do I diagnose a server that crashes every night?
Re: How do I diagnose a server that crashes every night?
Re: How do I diagnose a server that crashes every night?
Re: How do I diagnose a server that crashes every night?
Re: How do I diagnose a server that crashes every night?
Re: How do I diagnose a server that crashes every night?
Re: How do I diagnose a server that crashes every night?
Re: How do I diagnose a server that crashes every night?
Re: How do I diagnose a server that crashes every night?
Re: How do I diagnose a server that crashes every night?
Re: How do I diagnose a server that crashes every night?
Re: How do I diagnose a server that crashes every night?
Re: How do I diagnose a server that crashes every night?
Re: Intermittent RWSCS state
Re: Loose Cannon-dian
Re: Loose Cannon-dian
Re: Loose Cannon-dian
Re: Loose Cannon-dian
Re: Loose Cannon-dian
Re: Loose Cannon-dian
Re: Loose Cannon-dian
Re: Loose Cannon-dian
Re: OT: Message to Mr VAXman
Re: OT: The end of the world in roughly 3 hours
Re: OT: The end of the world in roughly 3 hours
Re: OT: The end of the world in roughly 3 hours
Re: OT: The end of the world in roughly 3 hours
Re: OT: The end of the world in roughly 3 hours
Re: OT: The end of the world in roughly 3 hours
Re: OT: The end of the world in roughly 3 hours
Re: OT: The end of the world in roughly 3 hours
Re: OT: The end of the world in roughly 3 hours
Re: OT: The end of the world in roughly 3 hours
Re: OT: The end of the world in roughly 3 hours
Re: OT: The end of the world in roughly 3 hours
Re: OT: The end of the world in roughly 3 hours
Re: OT: The end of the world in roughly 3 hours
Re: OT: The end of the world in roughly 3 hours
Re: OT: The end of the world in roughly 3 hours
Re: OT: The end of the world in roughly 3 hours
Re: OT: The end of the world in roughly 3 hours
Re: Pipe search of command procedure log file containing pipe search	command.	co
Re: Pipe search of command procedure log file containing pipe search	command.	co
Pipe search of command procedure log file containing pipe search command.
Re: Security alarm msg
Spinning down an old disk array
Re: Spinning down an old disk array
Re: Spinning down an old disk array
Re: Spinning down an old disk array

----------------------------------------------------------------------

Date: Wed, 10 Sep 2008 10:04:17 -0700 (PDT)
From: "James J. O'Shea" <seamas_ose@ameritech.net>
Subject: Can't read unzipped Monitor files
Message-ID: <456249.34286.qm@web83906.mail.sp1.yahoo.com>

I am not able to read Monitor files after unzipping them:

$ MON/NODISPLAY SYSTEM/ALL/SUM=X.SUM/INPUT=MONITOR-DATA-NONPRIME-2008-08-12.DAT
%MONITOR-E-CLASMISS, requested class record missing from /INPUT file

I've tried changing the attributes and using FDL files  but I'm not able to find the right combination.


The original file has, RFM:Var, MRS:32765, LRL:32760. 

After zipping, then unzipping, the file has, RFM:STMLF, MRS:0, LRL:0

I'm running OpenVMS 8.3 on an ES45;  Info-Zip Zip v2.3, Info-Zip Unzip v5.52.

Has anyone run into this problem?


Thanks,
Jim O'Shea
Chicago, IL 

------------------------------

Date: Wed, 10 Sep 2008 07:24:42 -0700 (PDT)
From: DaveG <david.gudewicz@abbott.com>
Subject: Re: Did Windows just cry "Uncle"?
Message-ID: <4ec652bc-256b-4c10-bf36-989b5e374b3f@c58g2000hsc.googlegroups.com>

On Sep 8, 6:56=A0pm, VAXman-  @SendSpamHere.ORG wrote:
> In article <0ea19636-ef44-4d9f-bc02-c10375be6...@d77g2000hsb.googlegroups=
.com>, AEF <spamsink2...@yahoo.com> writes:
>
>
>
>
>
> >On Sep 8, 6:13 pm, hel...@astro.multiCLOTHESvax.de (Phillip Helbig---
> >remove CLOTHES to reply) wrote:
> >> In article
> >> <e3f254ed-e420-4163-a8da-34a5dfa9c...@c58g2000hsc.googlegroups.com>,
>
> >> yyyc186 <yyyc...@hughes.net> writes:
> >> > LONDON (Reuters) - The London Stock Exchange (LSE:LSE.L - News)
> >> > suffered its worst systems failure in eight years on Monday, forcing
> >> > the world's third largest share market to suspend trading for about
> >> > seven hours and infuriating its users.
>
> >> They probably lost more revenue due to that outage than the move from
> >> VMS to Windows "saved" them. =A0It's not just immediate revenue which =
was
> >> lost, but people remembering this when deciding to do business with th=
e
> >> LSE or one of their competitors who run VMS. =A0(They might not know t=
hey
> >> run VMS, but they will know if there were any comparable outages in th=
e
> >> last few years.)
>
> >> > Weren't there a whole bunch of adds a while back about how London wh=
en
> >> > with Windows and that worthless Oracle product for their new trading
> >> > engine?
>
> >> Indeed. =A0I think it looked scary from the inside.
>
> >What's an "add"? Did you mean "advertisements"? That would be "ad",
> >not "add".
>
> He used M$ spell checker! ;)
>
> --
> VAXman- A Bored Certified VMS Kernel Mode Hacker =A0 =A0 =A0VAXman(at)TME=
SIS(dot)COM
>
> ... pejorative statements of opinion are entitled to constitutional prote=
ction
> no matter how extreme, vituperous, or vigorously expressed they may be. (=
NJSC)
>
> Copr. 2008 Brian Schenkenberger. =A0Publication of _this_ usenet article =
outside
> of usenet _must_ include its contents in its entirety including this copy=
right
> notice, disclaimer and quotations.- Hide quoted text -
>
> - Show quoted text -

My opinion on this - won't matter much.  Windows continues to make
progress and flourish, warts and all, with Linux (free is good) in
close pursuit.

Reminds me of a quote from P.T. Barnum: "I don't care what they say
about me, as long as they spell my name right."

------------------------------

Date: Wed, 10 Sep 2008 01:52:37 -0700 (PDT)
From: StraightEight <straighteight@gmail.com>
Subject: How do I diagnose a server that crashes every night?
Message-ID: <6df72036-9e6f-4a79-96cf-a841020f7b26@l64g2000hse.googlegroups.com>

Hi,

I have very little VMS experience but we have inherited a nice shiny
new alphaserver 250 to support (ok its not very shiny or new!) which
is located in the middle of the sea.

EVERY night without fail, this server is crashing and restarting
itself. I'd really like to get to the bottom of this as I am being
called every morning at 3am to log in and start some services which
don't seem launch at startup despite being in the startup file ("ahh
it's always been that way...")

Below is the FATAL BUGCHECK which I suspect is causing the machine to
reboot. The process which appears to be crashing is key to this
servers functionality. This is a relatively new problem as this box
has run itself for the past 10 years.

I have no idea where to even begin determining the source of the
problem from this. Is anyone able to give me any pointers as to what I
should be looking for, what information I will need, and how to make
sense of it all? My VMS knowledge as I say is extremely limited, so
any commands would be useful and appreciated also. Nice to see if
anyone can help!

Thanks
str8


******************************* ENTRY     435.
*******************************
 ERROR SEQUENCE 432.                             LOGGED ON:  CPU_TYPE
00000006
 DATE/TIME 10-SEP-2008 05:27:25.87                            SYS_TYPE
0000000D
 SYSTEM UPTIME: 1 DAYS 00:55:22
 SCS NODE: PIN01                                            OpenVMS
AXP V6.2-1H3

 HW_MODEL: 00000000 Hardware Model = 0.

 FATAL BUGCHECK AlphaStation 250 4/266

 MACHINECHK, Machine check while in kernel mode

       PROCESS NAME    BLYSEM_I1
       PROCESS ID      0001001F

       ERROR PC        FFFFFFFF 800485F8

    Process Status = 20000000 00001F04, SW = 00, Previous Mode =
KERNEL
    System State = 01, Current Mode = KERNEL
    VMM = 00 IPL = 31, SP Alignment = 32

 STACK POINTERS

 KSP 00000000 7FF91EE0  ESP 00000000 7FF96000  SSP 00000000 7FF9C100
 USP 00000000 7EE7D390

 GENERAL REGISTERS

 R0  00000000 00000002  R1  00000000 0000940A  R2  FFFFFFFF 80C2DB50
 R3  FFFFFFFF 80C04D98  R4  00000000 00000048  R5  00000000 00001F04
 R6  00000000 00000000  R7  00000000 00000001  R8  00000000 7FF9C1F8
 R9  00000000 7FF9C400  R10 00000000 7FF9D228  R11 00000000 7FFBE3E0
 R12 00000000 00000000  R13 FFFFFFFF 8326B910  R14 00000000 00000000
 R15 00000000 7EE7D498  R16 00000000 00000215  R17 00000000 00000001
 R18 00000000 00000001  R19 00000000 00000000  R20 FFFFFFFF FFFFFFF8
 R21 00000000 00000017  R22 00000000 00000100  R23 FFFFFFFF 80E08368
 R24 FFFFFFFF 80E08000  R25 00000000 00000003  R26 00000000 00000210
 R27 FFFFFFFF 80C34D60  R28 FFFFFFFF 8003B9C4  FP  00000000 7FF91EE0
 SP  00000000 7FF91EE0  PC  FFFFFFFF 800485F8  PS  20000000 00001F04

 SYSTEM REGISTERS

       PTBR            00000000 00001F19
                                       Page Table Base Register
       PCBB            00000000 0414A080
                                       Privileged Context Block Base
       PRBR            FFFFFFFF 80E0A000
                                       Processor Base Register
       VPTB            00000002 00000000
                                       Virtual Page Table Base
Register
       SCBB            00000000 000001A1
                                       System Control Block Base
       SISR            00000000 00000000
                                       Software Interrupt Summary
Register
       ASN             00000000 0000003B
                                       Address Space Number
       ASTSR_ASTEN     00000000 0000000F
                                       AST Summary/AST Enable
       FEN             00000000 00000001
                                       Floating-Point Enable
       IPL             00000000 0000001F
                                       Interrupt Priority Level
       MCES            00000000 00000008
                                       Machine Check Error Summary

------------------------------

Date: Wed, 10 Sep 2008 09:13:42 +0000 (UTC)
From: gartmann@nonsense.immunbio.mpg.de (Christoph Gartmann)
Subject: Re: How do I diagnose a server that crashes every night?
Message-ID: <ga8346$ah5$1@news.belwue.de>

In article <6df72036-9e6f-4a79-96cf-a841020f7b26@l64g2000hse.googlegroups.com>, StraightEight <straighteight@gmail.com> writes:
>I have very little VMS experience but we have inherited a nice shiny
>new alphaserver 250 to support (ok its not very shiny or new!) which
>is located in the middle of the sea.
>
>EVERY night without fail, this server is crashing and restarting
>itself. I'd really like to get to the bottom of this as I am being
>called every morning at 3am to log in and start some services which
>don't seem launch at startup despite being in the startup file ("ahh
>it's always been that way...")
>
>Below is the FATAL BUGCHECK which I suspect is causing the machine to
>reboot.

Correct, a FATAL BUGCHECK results in a crash of the system.

>The process which appears to be crashing is key to this
>servers functionality. This is a relatively new problem as this box
>has run itself for the past 10 years.

So the first question is: was anything changed on this system?

>I have no idea where to even begin determining the source of the
>problem from this. Is anyone able to give me any pointers as to what I
>should be looking for, what information I will need, and how to make
>sense of it all?

Have a look in sys$common:[syserr] for files named CLUE$*.LIS. 
In addition see the online help for "ANALYZE/ERROR". In addition, is there
anyzting in sys$manager:operator.log?

Regards,
   Christoph Gartmann

-- 
 Max-Planck-Institut fuer      Phone   : +49-761-5108-464   Fax: -80464
 Immunbiologie
 Postfach 1169                 Internet: gartmann@immunbio dot mpg dot de
 D-79011  Freiburg, Germany
               http://www.immunbio.mpg.de/home/menue.html

------------------------------

Date: Wed, 10 Sep 2008 05:46:54 -0400
From: JF Mezei <jfmezei.spamnot@vaxination.ca>
Subject: Re: How do I diagnose a server that crashes every night?
Message-ID: <48c79805$0$1550$c3e8da3@news.astraweb.com>

StraightEight wrote:

> EVERY night without fail, this server is crashing and restarting
> itself. I'd really like to get to the bottom of this as I am being
> called every morning at 3am to log in and start some services which

Does it crash at exactly the same time every night ? Or does it vary ?
Any relationship with actual operations being done related to that
machine ? Or does it crash  when some link goes down and the code just
doesn't handle this properly ?


> I have no idea where to even begin determining the source of the
> problem from this.

It would help to provide more background on what the application is. Is
it some COBOL app that just prints an accounting report, or it is some
real time applictaion that controls a drilling rig ?

What sort of stuff is connected to that machine using what sort of
protocol ?


In terms of services not starting when it boots and needing to be
started manually, you would need to look at  the SYSTARTUP_VMS.COM file
in the SYS$MANAGER directory and take a careful look at it.  The output
normally just goes on the operator console, so if you are not on site,
you have hard time seeing error messages.

However, if you start a service by submitting a batch job, then there
should be a log file that contains some information on why the service
didn't start.

If SYSTARTUP_VMS.COM calls a command procedure to start a service, you
can add /OUTPUT=logfile.log to the command

eg:
@disk:[directory]myapplication_startup.com/output=sys$manager:myapplication_startup.log

Then, you could consult the log file later on to find out why the
application didn't start.

Remember that some services take some time to become available, so on a
faster machine, you might be trying to start your app become TCPIP is
fully available for instance, and the app would fail. But later on when
you log in to fix the problem, TCPIP would be available and the app
would start properly.

------------------------------

Date: Wed, 10 Sep 2008 02:48:49 -0700 (PDT)
From: StraightEight <straighteight@gmail.com>
Subject: Re: How do I diagnose a server that crashes every night?
Message-ID: <973f56c2-f6a1-4a09-8d79-37ae2645ad00@e53g2000hsa.googlegroups.com>

Thanks for a quick reply. Here are my findings.

> So the first question is: was anything changed on this system?
No, as far as I am aware, this server has always been running
unchanged for several years

> Have a look in sys$common:[syserr] for files named CLUE$*.LIS

If I look at these files there are pages of information, but I am
unsure just what I need to be looking at!

> In addition see the online help for "ANALYZE/ERROR". In addition, is there
> anyzting in sys$manager:operator.log?

This mostly just contains our Telnet requests, heres one i
spotted...do you know what this means? Sometimes when we try to
connect by telnet to the server we see No License for the Active
Product (or something along those lines) Could it be something as
simple as a licensing problem, or is this a red herring?

%%%%%%%%%%%  OPCOM   9-SEP-2008 04:32:26.73  %%%%%%%%%%%
Message from user SYSTEM on PIN01
%LICENSE-E-TERM, C ALL-IL-1997NOV26-2136 License has terminated

Thanks!

------------------------------

Date: Wed, 10 Sep 2008 06:03:55 -0400
From: JF Mezei <jfmezei.spamnot@vaxination.ca>
Subject: Re: How do I diagnose a server that crashes every night?
Message-ID: <48c79bd7$0$9666$c3e8da3@news.astraweb.com>

StraightEight wrote:

> %%%%%%%%%%%  OPCOM   9-SEP-2008 04:32:26.73  %%%%%%%%%%%
> Message from user SYSTEM on PIN01
> %LICENSE-E-TERM, C ALL-IL-1997NOV26-2136 License has terminated

This is the C compiler licence.

The command:

SHOW LICENSE will give you list of active licences on that node.

LICENSE LIST will give you list of registered licences. (this will
exclude expired licences or licences that aren't valuid for this node).

It is possible that you have 2 C licences, the "real" one and some
temporary one which has expired.


Not having the C compiler would not cause problems to run programs. It
would only affect the invocation of the C compiler (CC command).
Programs compiled with this compiler will run find without the licence.

------------------------------

Date: Wed, 10 Sep 2008 03:18:31 -0700 (PDT)
From: StraightEight <straighteight@gmail.com>
Subject: Re: How do I diagnose a server that crashes every night?
Message-ID: <00b3b7e9-af02-40ec-b9bf-0e063ac036b5@e53g2000hsa.googlegroups.com>

On 10 Sep, 10:46, JF Mezei <jfmezei.spam...@vaxination.ca> wrote:

> Does it crash at exactly the same time every night ? Or does it vary ?
> Any relationship with actual operations being done related to that
> machine ? Or does it crash =A0when some link goes down and the code just
> doesn't handle this properly ?

Doesn't really seem to be any pattern, some nights it restarts just
once, some nights it can happen up to 4 times.

> It would help to provide more background on what the application is. Is
> it some COBOL app that just prints an accounting report, or it is some
> real time applictaion that controls a drilling rig ?
> What sort of stuff is connected to that machine using what sort of
> protocol ?

The file BLYSEM i'm sure is a software interface to a Bailey INFI900
DCS (so real time data aquisition on a rig as guessed!) for OSI PI
software. The volume of data this handles has probably increased over
the years...could a capacity problem knock a service over?

> In terms of services not starting when it boots and needing to be
> started manually, you would need to look at =A0the SYSTARTUP_VMS.COM file
> in the SYS$MANAGER directory and take a careful look at it. =A0The output
> normally just goes on the operator console, so if you are not on site,
> you have hard time seeing error messages.
>
> However, if you start a service by submitting a batch job, then there
> should be a log file that contains some information on why the service
> didn't start.
>
> If SYSTARTUP_VMS.COM calls a command procedure to start a service, you
> can add /OUTPUT=3Dlogfile.log to the command
>
> eg:
> @disk:[directory]myapplication_startup.com/output=3Dsys$manager:myapplica=
tion=AD_startup.log
>
> Then, you could consult the log file later on to find out why the
> application didn't start.
>
> Remember that some services take some time to become available, so on a
> faster machine, you might be trying to start your app become TCPIP is
> fully available for instance, and the app would fail. But later on when
> you log in to fix the problem, TCPIP would be available and the app
> would start properly.

Thanks for the tips, I think I will try the output switch and see what
is logged. It's a good hunch at the end....the call to the service is
at the very end of the startup file, would each line in the startup
file wait until it is executed before moving to the next, or does it
just fire off all the commands at once? Now I think about it, the last
time we caught the error very early there was still a batch job
running...maybe we should call it at the end of this batch job?

Thanks for your response!

------------------------------

Date: Wed, 10 Sep 2008 11:47:21 +0100
From: "Richard Brodie" <R.Brodie@rl.ac.uk>
Subject: Re: How do I diagnose a server that crashes every night?
Message-ID: <ga88jp$aek$1@south.jnrs.ja.net>

"StraightEight" <straighteight@gmail.com> wrote in message 
news:6df72036-9e6f-4a79-96cf-a841020f7b26@l64g2000hse.googlegroups.com...

> I have no idea where to even begin determining the source of the
> problem from this.

MACHINECHK, Machine check while in kernel mode suggests
hardware. There may be other entries in the error log as well as
the bugcheck, which may give more detail.

Looking at the CLUE files in sys$errorlog, particularly the
_collect.dat may help nail down common features. 

------------------------------

Date: Wed, 10 Sep 2008 03:52:51 -0700 (PDT)
From: Bob Gezelter <gezelter@rlgsc.com>
Subject: Re: How do I diagnose a server that crashes every night?
Message-ID: <2799177a-4711-497a-81aa-782372e23021@f63g2000hsf.googlegroups.com>

On Sep 10, 6:18=A0am, StraightEight <straightei...@gmail.com> wrote:
> On 10 Sep, 10:46, JF Mezei <jfmezei.spam...@vaxination.ca> wrote:
>
> > Does it crash at exactly the same time every night ? Or does it vary ?
> > Any relationship with actual operations being done related to that
> > machine ? Or does it crash =A0when some link goes down and the code jus=
t
> > doesn't handle this properly ?
>
> Doesn't really seem to be any pattern, some nights it restarts just
> once, some nights it can happen up to 4 times.
>
> > It would help to provide more background on what the application is. Is
> > it some COBOL app that just prints an accounting report, or it is some
> > real time applictaion that controls a drilling rig ?
> > What sort of stuff is connected to that machine using what sort of
> > protocol ?
>
> The file BLYSEM i'm sure is a software interface to a Bailey INFI900
> DCS (so real time data aquisition on a rig as guessed!) for OSI PI
> software. The volume of data this handles has probably increased over
> the years...could a capacity problem knock a service over?
>
>
>
> > In terms of services not starting when it boots and needing to be
> > started manually, you would need to look at =A0the SYSTARTUP_VMS.COM fi=
le
> > in the SYS$MANAGER directory and take a careful look at it. =A0The outp=
ut
> > normally just goes on the operator console, so if you are not on site,
> > you have hard time seeing error messages.
>
> > However, if you start a service by submitting a batch job, then there
> > should be a log file that contains some information on why the service
> > didn't start.
>
> > If SYSTARTUP_VMS.COM calls a command procedure to start a service, you
> > can add /OUTPUT=3Dlogfile.log to the command
>
> > eg:
> > @disk:[directory]myapplication_startup.com/output=3Dsys$manager:myappli=
cation=AD_startup.log
>
> > Then, you could consult the log file later on to find out why the
> > application didn't start.
>
> > Remember that some services take some time to become available, so on a
> > faster machine, you might be trying to start your app become TCPIP is
> > fully available for instance, and the app would fail. But later on when
> > you log in to fix the problem, TCPIP would be available and the app
> > would start properly.
>
> Thanks for the tips, I think I will try the output switch and see what
> is logged. It's a good hunch at the end....the call to the service is
> at the very end of the startup file, would each line in the startup
> file wait until it is executed before moving to the next, or does it
> just fire off all the commands at once? Now I think about it, the last
> time we caught the error very early there was still a batch job
> running...maybe we should call it at the end of this batch job?
>
> Thanks for your response!

str8,

I do not have an Alpha CPU manual handy, so I will restrict this set
of comments to the other issues raised. However, one important
question is whether the machine check error information is the same on
every crash.

Being responsible for a system with little or no documentation can be
a significant challenge. I have seen this kind of situation often when
getting called into a site which has been without a good system
manager in a while, It is common to find things "broken", that in
effect, were never working correctly. Not having failed noticeably
does not mean that there was not an issue that did not rise to the
severity to be noticed.

While the overall STARTUP process is capable of parallel operation,
each individual command file is executed sequentially (using the
parallel execution features can speed restarts substantially, as I
noted in my presentation "SYSMAN for Improved Restart Performance" at
the Fall 1999 US DECUS symposium (slides available via
http://www.rlgsc.com/decus/usf99/index.html ).

Most likely, the parallel execution features were not used in this
case. If processes that are supposed to start during a restart do not
in fact start, and the requests to start them are in the system
startup file, the most common reason for the failure is a small
typographical error made when editing the startup file. If there is an
error, the startup file will exit, with only a transiently visible
message on the console.

Typically, there are two ways to resolve this: 1) extremely close
inspection of the startup file (generally SYS
$MANAGER:SYSTARTUP_VMS.COM), or 2) enable logging of the startup
sequence using the SYSMAN STARTUP OPTIONS/OUTPUT=3DFILE command. [the
latter creates the file SYS$SPECIFIC:[SYSEXE]STARTUP.LOG]. Reviewing
the log file generated often clarifies precisely what messages were
scrolled rapidly off the screen. I often leave unattended systems in
the FILE setting so that it is possible to resolve problems on
unattended systems.

One important recommendation is to make sure that there is a good
backup of the system disk, and a log kept of any changes to any of the
files.

It goes without saying that at some point, it may be wise to retain
outside experienced assistance to examine the problem [Disclosure: our
firm does provide consulting services in this area].

- Bob Gezelter, http://www.rlgsc.com

------------------------------

Date: Wed, 10 Sep 2008 04:05:04 -0700 (PDT)
From: Bob Gezelter <gezelter@rlgsc.com>
Subject: Re: How do I diagnose a server that crashes every night?
Message-ID: <9074c354-2557-4638-961e-7860857e3485@s50g2000hsb.googlegroups.com>

str8,

I should note that it is also possible that the Machine Check and the
active process are, in effect, not related.

The last client system that was having machine checks turned out to be
caused by an erratic power supply. The power supply worked well when
it was working, but it was apparently having problems. The fact that
the system in question would appear to be in a somewhat industrial
setting raises the question of whether there is an external power or
grounding event that is the underlying cause of the Machine Check.

If there is a UPS involved, there could also be a problem there.

- Bob Gezelter, http://www.rlgsc.com

------------------------------

Date: 10 Sep 2008 06:49:24 -0500
From: clubley@remove_me.eisner.decus.org-Earth.UFP (Simon Clubley)
Subject: Re: How do I diagnose a server that crashes every night?
Message-ID: <hL7OrHcwWQPI@eisner.encompasserve.org>

In article <9074c354-2557-4638-961e-7860857e3485@s50g2000hsb.googlegroups.com>, Bob Gezelter <gezelter@rlgsc.com> writes:
> str8,
> 
> I should note that it is also possible that the Machine Check and the
> active process are, in effect, not related.
> 
> The last client system that was having machine checks turned out to be
> caused by an erratic power supply. The power supply worked well when
> it was working, but it was apparently having problems. The fact that
> the system in question would appear to be in a somewhat industrial
> setting raises the question of whether there is an external power or
> grounding event that is the underlying cause of the Machine Check.
> 
> If there is a UPS involved, there could also be a problem there.
> 

The OP should also be aware that although a machine check is usually a
hardware issue, it can be caused by a faulty device driver as well.

Personal experience here: I have caused VMS to issue machine checks while
I have been developing VMS device drivers in the past.

Simon.

-- 
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980's technology to a 21st century world

------------------------------

Date: Wed, 10 Sep 2008 05:34:58 -0700 (PDT)
From: StraightEight <straighteight@gmail.com>
Subject: Re: How do I diagnose a server that crashes every night?
Message-ID: <d24b75b5-10ff-4b82-9f9b-3019b890804a@p25g2000hsf.googlegroups.com>

> The last client system that was having machine checks turned out to be
> caused by an erratic power supply. The power supply worked well when
> it was working, but it was apparently having problems. The fact that
> the system in question would appear to be in a somewhat industrial
> setting raises the question of whether there is an external power or
> grounding event that is the underlying cause of the Machine Check.
>
> If there is a UPS involved, there could also be a problem there.

I think you could be onto something here...as funnily enough we _had_
two VMS servers, one came back to be repaired (power supply problem!)
Now mentioning UPS gets me wondering, if there is indeed a UPS (I'll
need to check) I would imagine both servers would come off the same
UPS...maybe machine 1 never had problems with its power supply after
all! Definitely something to rule out (and perhaps in light of recent
experiences something I should have considered straight away!) Many
thanks.

------------------------------

Date: 10 Sep 2008 07:52:17 -0500
From: koehler@eisner.nospam.encompasserve.org (Bob Koehler)
Subject: Re: How do I diagnose a server that crashes every night?
Message-ID: <ALUpm+AEOV6g@eisner.encompasserve.org>

In article <6df72036-9e6f-4a79-96cf-a841020f7b26@l64g2000hse.googlegroups.com>, StraightEight <straighteight@gmail.com> writes:
> Hi,
> 
> I have very little VMS experience but we have inherited a nice shiny
> new alphaserver 250 to support (ok its not very shiny or new!) which
> is located in the middle of the sea.

   I would think _very_ seriously about contracting a consultant who
   knows VMS.

------------------------------

Date: Wed, 10 Sep 2008 07:15:22 -0700 (PDT)
From: DaveG <david.gudewicz@abbott.com>
Subject: Re: How do I diagnose a server that crashes every night?
Message-ID: <3360773d-860d-4145-8509-21752f00e75a@m73g2000hsh.googlegroups.com>

On Sep 10, 7:52=A0am, koeh...@eisner.nospam.encompasserve.org (Bob
Koehler) wrote:
> In article <6df72036-9e6f-4a79-96cf-a841020f7...@l64g2000hse.googlegroups=
.com>, StraightEight <straightei...@gmail.com> writes:
>
> > Hi,
>
> > I have very little VMS experience but we have inherited a nice shiny
> > new alphaserver 250 to support (ok its not very shiny or new!) which
> > is located in the middle of the sea.
>
> =A0 =A0I would think _very_ seriously about contracting a consultant who
> =A0 =A0knows VMS.

With the OP mentioning that the system was located out to sea
somewhere, I wonder what might be happening to power and/or other
environmental stuff during non-daylight hours?

------------------------------

Date: Wed, 10 Sep 2008 10:59:02 -0400
From: "Richard B. Gilbert" <rgilbert88@comcast.net>
Subject: Re: How do I diagnose a server that crashes every night?
Message-ID: <Ovydnd9z17evfFrVnZ2dnUVZ_szinZ2d@comcast.com>

StraightEight wrote:
> Hi,
> 
> I have very little VMS experience but we have inherited a nice shiny
> new alphaserver 250 to support (ok its not very shiny or new!) which
> is located in the middle of the sea.
> 
> EVERY night without fail, this server is crashing and restarting
> itself. I'd really like to get to the bottom of this as I am being
> called every morning at 3am to log in and start some services which
> don't seem launch at startup despite being in the startup file ("ahh
> it's always been that way...")
> 
> Below is the FATAL BUGCHECK which I suspect is causing the machine to
> reboot. The process which appears to be crashing is key to this
> servers functionality. This is a relatively new problem as this box
> has run itself for the past 10 years.
> 
> I have no idea where to even begin determining the source of the
> problem from this. Is anyone able to give me any pointers as to what I
> should be looking for, what information I will need, and how to make
> sense of it all? My VMS knowledge as I say is extremely limited, so
> any commands would be useful and appreciated also. Nice to see if
> anyone can help!
> 
> Thanks
> str8
> 
> 
> 
> ******************************* ENTRY     435.
> *******************************
>  ERROR SEQUENCE 432.                             LOGGED ON:  CPU_TYPE
> 00000006
>  DATE/TIME 10-SEP-2008 05:27:25.87                            SYS_TYPE
> 0000000D
>  SYSTEM UPTIME: 1 DAYS 00:55:22
>  SCS NODE: PIN01                                            OpenVMS
> AXP V6.2-1H3
> 
>  HW_MODEL: 00000000 Hardware Model = 0.
> 
>  FATAL BUGCHECK AlphaStation 250 4/266
> 
>  MACHINECHK, Machine check while in kernel mode
> 
>        PROCESS NAME    BLYSEM_I1
>        PROCESS ID      0001001F
> 
>        ERROR PC        FFFFFFFF 800485F8
> 
>     Process Status = 20000000 00001F04, SW = 00, Previous Mode =
> KERNEL
>     System State = 01, Current Mode = KERNEL
>     VMM = 00 IPL = 31, SP Alignment = 32
> 
>  STACK POINTERS
> 
>  KSP 00000000 7FF91EE0  ESP 00000000 7FF96000  SSP 00000000 7FF9C100
>  USP 00000000 7EE7D390
> 
>  GENERAL REGISTERS
> 
>  R0  00000000 00000002  R1  00000000 0000940A  R2  FFFFFFFF 80C2DB50
>  R3  FFFFFFFF 80C04D98  R4  00000000 00000048  R5  00000000 00001F04
>  R6  00000000 00000000  R7  00000000 00000001  R8  00000000 7FF9C1F8
>  R9  00000000 7FF9C400  R10 00000000 7FF9D228  R11 00000000 7FFBE3E0
>  R12 00000000 00000000  R13 FFFFFFFF 8326B910  R14 00000000 00000000
>  R15 00000000 7EE7D498  R16 00000000 00000215  R17 00000000 00000001
>  R18 00000000 00000001  R19 00000000 00000000  R20 FFFFFFFF FFFFFFF8
>  R21 00000000 00000017  R22 00000000 00000100  R23 FFFFFFFF 80E08368
>  R24 FFFFFFFF 80E08000  R25 00000000 00000003  R26 00000000 00000210
>  R27 FFFFFFFF 80C34D60  R28 FFFFFFFF 8003B9C4  FP  00000000 7FF91EE0
>  SP  00000000 7FF91EE0  PC  FFFFFFFF 800485F8  PS  20000000 00001F04
> 
>  SYSTEM REGISTERS
> 
>        PTBR            00000000 00001F19
>                                        Page Table Base Register
>        PCBB            00000000 0414A080
>                                        Privileged Context Block Base
>        PRBR            FFFFFFFF 80E0A000
>                                        Processor Base Register
>        VPTB            00000002 00000000
>                                        Virtual Page Table Base
> Register
>        SCBB            00000000 000001A1
>                                        System Control Block Base
>        SISR            00000000 00000000
>                                        Software Interrupt Summary
> Register
>        ASN             00000000 0000003B
>                                        Address Space Number
>        ASTSR_ASTEN     00000000 0000000F
>                                        AST Summary/AST Enable
>        FEN             00000000 00000001
>                                        Floating-Point Enable
>        IPL             00000000 0000001F
>                                        Interrupt Priority Level
>        MCES            00000000 00000008
>                                        Machine Check Error Summary

Well, it says "Machine Check" and that generally means a hardware 
problem of some sort.  If you have a service contract, just pick up the 
phone and call for help.  If not, get prior approval from whoever pays 
the bills and then pick up the phone and call for help!

It might also help to try to find out what else happens every morning at
3:00 AM.  The fact that the timing is consistent suggests that it's 
something happening in the environment that triggers the machine check.

------------------------------

Date: Wed, 10 Sep 2008 08:03:28 -0700 (PDT)
From: Volker Halle <volker_halle@hotmail.com>
Subject: Re: How do I diagnose a server that crashes every night?
Message-ID: <4b119f0b-2778-4e0a-b9e4-014583a6cc4a@79g2000hsk.googlegroups.com>

If you see MACHINECHK crashes, think of hardware problems first. There
should be errlog-entries immediately preceeding the system crash. Find
those and analyze them.

$ ANAL/ERR/SINCE=<1-minute-before-system-crash>

or look at the errors in the dump:

$ ANAL/CRASH SYS$SYSTEM
SDA> CLUE ERRLOG
SDA> EXIT

You may need to install DECevent V3.4 ( $ DIAGNOSE command ) to
translate those error to meaningful text.

---
Volker Halle, Invenate GmbH, OpenVMS Support

An OpenVMS crashdump analysis a day
makes the Windows headaches go away.

------------------------------

Date: Wed, 10 Sep 2008 08:05:51 -0700 (PDT)
From: Bob Gezelter <gezelter@rlgsc.com>
Subject: Re: How do I diagnose a server that crashes every night?
Message-ID: <88aba1d6-9b21-4db0-8774-a9606c9cdc5c@m45g2000hsb.googlegroups.com>

On Sep 10, 6:49 am, clubley@remove_me.eisner.decus.org-Earth.UFP
(Simon Clubley) wrote:
> In article <9074c354-2557-4638-961e-7860857e3...@s50g2000hsb.googlegroups.com>, Bob Gezelter <gezel...@rlgsc.com> writes:
>
> > str8,
>
> > I should note that it is also possible that the Machine Check and the
> > active process are, in effect, not related.
>
> > The last client system that was having machine checks turned out to be
> > caused by an erratic power supply. The power supply worked well when
> > it was working, but it was apparently having problems. The fact that
> > the system in question would appear to be in a somewhat industrial
> > setting raises the question of whether there is an external power or
> > grounding event that is the underlying cause of the Machine Check.
>
> > If there is a UPS involved, there could also be a problem there.
>
> The OP should also be aware that although a machine check is usually a
> hardware issue, it can be caused by a faulty device driver as well.
>
> Personal experience here: I have caused VMS to issue machine checks while
> I have been developing VMS device drivers in the past.
>
> Simon.
>
> --
> Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
> Microsoft: Bringing you 1980's technology to a 21st century world


Simon,

Indeed. When one is not careful running in Kernel mode, particularly
at interrupt level, all kinds of strange results can ensue, for all
kinds of reaons.

My favorite was a problem on an early version of a third-party J-11-
based product, there was no RESET control, it was presumed that
PowerFail could do it. I pointed out that there were many situations
in which PowerFail would cause a problem if there was not a valid
kernel stack pointer.Oops.

- Bob Gezelter, http://www.rlgsc.com

------------------------------

Date: Wed, 10 Sep 2008 11:21:59 -0400
From: "Jilly" <jilly@stny.rr.com>
Subject: Re: Intermittent RWSCS state
Message-ID: <48c7e624$0$21316$ec3e2dad@unlimited.usenetmonster.com>

You really need to look at the credit waits from the viewpoint of all the 
nodes.  But from NODE_A POV there is an overload in talking to VAX_C.  You 
can look at the SYSGEN parameter CLUSTER_CREDITS and set it to the max of 
128 (not sure what version or platform this is available on).  Additionally 
review the Int. Stack usage on VAX_C as all lock requests get serviced on 
the Int. Stack of the cpu handling the SCS interface.  Review the MONITOR 
DLOCK output and see if VAX_C is doing more incoming lock requeust than the 
other nodes.  For recent vintage Alpha VMS versions, you can change the 
interrupt cpu on a multicpu system.  Review which systems 1st create the 
resources and review LOCKDIRWT and PE1 to see if you are limiting the 
movement of lock mastership.  Also when rebooting systems in a cluster 
remember to reboot the desired lock master 1st and then the other systems 
after.  If the lock master node has been rebooted and the other nodes have 
not it is likely that a number of resources will be mastered on non-optimal 
nodes.  When booting a cluster, boot the desired lock master 1st and the 
other nodes after.  You could also be hitting the FDDI bandwidth ceiling if 
there is enough SCS traffic but that is unlikely in this cluster.

RWSCS means that there is a delay for this processes lock request that has 
to be serviced by another node.  As has been said, occasional RWSCS states 
are normal and expected.  Persistent RWSCS states point to a delay in the 
locking path so that includes the physical SCS medium (FDDI in your case), 
speed & load on the involved nodes (Int. Stack etc.) and whether lock 
mastership is being handled by the 'ideal' node for the resource involved.. 

------------------------------

Date: 10 Sep 2008 13:08:49 GMT
From: billg999@cs.uofs.edu (Bill Gunshannon)
Subject: Re: Loose Cannon-dian
Message-ID: <6ipv70Frs1niU2@mid.individual.net>

In article <op.ug8uscxphv4qyg@murphus.hsd1.ca.comcast.net>,
	"Tom Linden" <tom@kednos.company> writes:
> On Tue, 09 Sep 2008 09:53:33 -0700, Bill Gunshannon <billg999@cs.uofs.edu>  
> wrote:
> 
>> It's a poor workman who blames his tools.
> 
> It is a diletante that uses inferior tools

Tell that to all the people using Java.  :-)

bill
 

-- 
Bill Gunshannon          |  de-moc-ra-cy (di mok' ra see) n.  Three wolves
billg999@cs.scranton.edu |  and a sheep voting on what's for dinner.
University of Scranton   |
Scranton, Pennsylvania   |         #include <std.disclaimer.h>   

------------------------------

Date: 10 Sep 2008 13:14:49 GMT
From: billg999@cs.uofs.edu (Bill Gunshannon)
Subject: Re: Loose Cannon-dian
Message-ID: <6ipvi9Frs1niU3@mid.individual.net>

In article <op.ug8upesihv4qyg@murphus.hsd1.ca.comcast.net>,
	"Tom Linden" <tom@kednos.company> writes:
> On Tue, 09 Sep 2008 09:54:53 -0700, Bill Gunshannon <billg999@cs.uofs.edu>  
> wrote:
> 
>> In article <op.ug72p1cghv4qyg@murphus.hsd1.ca.comcast.net>,
>> 	"Tom Linden" <tom@kednos.company> writes:
>>> On Tue, 09 Sep 2008 06:55:43 -0700, Bob Koehler
>>> <koehler@eisner.nospam.encompasserve.org> wrote:
>>>
>>>> In article <op.ug7ul6wehv4qyg@murphus.hsd1.ca.comcast.net>, "Tom  
>>>> Linden"
>>>> <tom@kednos.company> writes:
>>>>> On Tue, 09 Sep 2008 05:17:28 -0700, <johnwallace4@yahoo.co.uk> wrote:
>>>>>
>>>>>> While there is much in what you say, your case is not helped by
>>>>>> demonstrably dubious claims such as "There is nothing to show that
>>>>>> "security" was the underlying principle in everything VMS did any  
>>>>>> more
>>>>>> than that Unix didn't consider it at all". There's plenty of evidence
>>>>>> if you look with open eyes. Native VMS code's widespread use of
>>>>>> descriptors for varying-length items encourages careful programming
>>>>>> and has no equivalent in Windows or any Unix I've seen (since V7, Sys
>>>>>> V, and BSD4.1, I've seen a few).
>>>>>
>>>>> Descriptors are not part of the OS but a feature of the compilers, and
>>>>> the
>>>>> concept really came out of languages like PL/I and Algol, we call them
>>>>> dope vectors.
>>>>
>>>>    The use of descriptors for many of the OS APIs is part of the OS.
>>>>
>>> Don't wish to nitpick, but it is the selection of compilers supporting  
>>> such
>>> constructs that is part of the OS.  Languages deficient in such  
>>> constructs
>>> were enhanced to provide that capability.  OS's like Multics, Primos,  
>>> VOS,
>>> MVS-z/os Burroughs were written in languages in which such constructs  
>>> are
>>> an integral part of the language.
>>>
>> Primos?  A bunch of that was written in Fortran IV.  :-)
> That is true, but from 18 on forward it moistly all PLP

Not really.  I maintained Rev 19 systems and we still had FTN and PMA.
I used to have a copy but it is long gone now.  I never got to work with
Rev 20 so I can't say if they redid all of it by that point.

But I was being very toungue-in-cheek.  It was mostly PL/I code (PLP and
PL/I Subset G) and a lot of fun to work with.  It is as much a shame that
Primos didn't really survive as it will be when VMS fades away.  I expect
both to experience the same fate.  That is, just as Primos is still in use
this long after its demise, so will VMS continue to be used long after
its owners have given up the ghost.  One can only hope that there will
be someone to pick up the ball and run with it when that time comes as
was the case with Primos.

bill

-- 
Bill Gunshannon          |  de-moc-ra-cy (di mok' ra see) n.  Three wolves
billg999@cs.scranton.edu |  and a sheep voting on what's for dinner.
University of Scranton   |
Scranton, Pennsylvania   |         #include <std.disclaimer.h>   

------------------------------

Date: 10 Sep 2008 13:17:50 GMT
From: billg999@cs.uofs.edu (Bill Gunshannon)
Subject: Re: Loose Cannon-dian
Message-ID: <6ipvnuFrs1niU4@mid.individual.net>

In article <ga6u7j$vnc$03$1@news.t-online.com>,
	Michael Kraemer <M.Kraemer@gsi.de> writes:
> Bob Koehler schrieb:
>> In article <ga6c0p$37r$00$1@news.t-online.com>, Michael Kraemer <M.Kraemer@gsi.de> writes:
>> 
>>>johnwallace4@yahoo.co.uk schrieb:
>>>
>>>
>>>>If you want to compare OSes not in common use then maybe comparing an
>>>>SELinux setup with a VMS setup is appropriate, but that still leaves
>>>>VMS mostly ahead (others may obviously disagree).
>>>
>>>AFAIK:
>>>Ordinary VMS has C2 security. SEVMS (sp ?) has B1.
>>>Ordinary Unices have C2. Their "Trusted" variants have B1.
>>>
>>>So where's the difference ?

The difference should be obvious.  More people prefer Unix.  :-)

>> 
>> 
>>    Where C2 and B1 don't go.
>>  
> 
> That's pretty much nowhere land.
> Are there widely accepted certifications beyond
> orange book ?

The rainbow books are being replaced by things like Common Criteria.
Checkout sites like NIST and DISA for information on modern security
requirements.  DISA is a very good source as they even have papers
and scripts to make securing systems, even Windows, very doable.


bill 

-- 
Bill Gunshannon          |  de-moc-ra-cy (di mok' ra see) n.  Three wolves
billg999@cs.scranton.edu |  and a sheep voting on what's for dinner.
University of Scranton   |
Scranton, Pennsylvania   |         #include <std.disclaimer.h>   

------------------------------

Date: 10 Sep 2008 13:19:05 GMT
From: billg999@cs.uofs.edu (Bill Gunshannon)
Subject: Re: Loose Cannon-dian
Message-ID: <6ipvq9Frs1niU5@mid.individual.net>

In article <ga7rmd$b0o$01$1@news.t-online.com>,
	Michael Kraemer <M.Kraemer@gsi.de> writes:
> Tom Linden schrieb:
> 
>> Yes, the Common Criteria E1 thru E6
> 
> And where on that scale is VMS ?

Unless things have changed, VMS's owners have made no attempt to get
rated according to Common Criteria.

bill
 

-- 
Bill Gunshannon          |  de-moc-ra-cy (di mok' ra see) n.  Three wolves
billg999@cs.scranton.edu |  and a sheep voting on what's for dinner.
University of Scranton   |
Scranton, Pennsylvania   |         #include <std.disclaimer.h>   

------------------------------

Date: 10 Sep 2008 13:23:40 GMT
From: billg999@cs.uofs.edu (Bill Gunshannon)
Subject: Re: Loose Cannon-dian
Message-ID: <6iq02sFrs1niU6@mid.individual.net>

In article <eblGRJGgrgz6@eisner.encompasserve.org>,
	koehler@eisner.nospam.encompasserve.org (Bob Koehler) writes:
> In article <ga6u7j$vnc$03$1@news.t-online.com>, Michael Kraemer <M.Kraemer@gsi.de> writes:
>> 
>> That's pretty much nowhere land.
>> Are there widely accepted certifications beyond
>> orange book ?
> 
>    Nowhere?  C2, B1, ..., all were written by some folks based on thier
>    limited knowledge and thier specific needs.  There are a lot of other
>    legitimate security concerns.
> 
>    For example, Windows got a C2 rating at one time, based on
>    limitations like no network, no floppies, ...
> 
>    So what good is a system if you can't enter or retrive data?

Those ratings are for operational systems.  What need is there for a
network connection or floppies on a system running a power plant?

One can take the system offline, connect a floppy, load and install
needed upgrades and then remove the floppy, recertify and return to
production as a C2 system.

When one looks at things in terms of IS's instead of just a Windows
box this stuff makes a lot more sense.  But then, when you are so
totally biased against MS, you become blind to reality.

bill
 

-- 
Bill Gunshannon          |  de-moc-ra-cy (di mok' ra see) n.  Three wolves
billg999@cs.scranton.edu |  and a sheep voting on what's for dinner.
University of Scranton   |
Scranton, Pennsylvania   |         #include <std.disclaimer.h>   

------------------------------

Date: 10 Sep 2008 13:41:01 GMT
From: billg999@cs.uofs.edu (Bill Gunshannon)
Subject: Re: Loose Cannon-dian
Message-ID: <6iq13cFs3tqnU1@mid.individual.net>

In article <op.ug9oznzvhv4qyg@murphus.hsd1.ca.comcast.net>,
	"Tom Linden" <tom@kednos.company> writes:
> On Wed, 10 Sep 2008 06:14:49 -0700, Bill Gunshannon <billg999@cs.uofs.edu>  
> wrote:
> 
>> In article <op.ug8upesihv4qyg@murphus.hsd1.ca.comcast.net>,
>> 	"Tom Linden" <tom@kednos.company> writes:
>>> On Tue, 09 Sep 2008 09:54:53 -0700, Bill Gunshannon  
>>> <billg999@cs.uofs.edu>
>>> wrote:
>>>
>>>> In article <op.ug72p1cghv4qyg@murphus.hsd1.ca.comcast.net>,
>>>> 	"Tom Linden" <tom@kednos.company> writes:
>>>>> On Tue, 09 Sep 2008 06:55:43 -0700, Bob Koehler
>>>>> <koehler@eisner.nospam.encompasserve.org> wrote:
>>>>>
>>>>>> In article <op.ug7ul6wehv4qyg@murphus.hsd1.ca.comcast.net>, "Tom
>>>>>> Linden"
>>>>>> <tom@kednos.company> writes:
>>>>>>> On Tue, 09 Sep 2008 05:17:28 -0700, <johnwallace4@yahoo.co.uk>  
>>>>>>> wrote:
>>>>>>>
>>>>>>>> While there is much in what you say, your case is not helped by
>>>>>>>> demonstrably dubious claims such as "There is nothing to show that
>>>>>>>> "security" was the underlying principle in everything VMS did any
>>>>>>>> more
>>>>>>>> than that Unix didn't consider it at all". There's plenty of  
>>>>>>>> evidence
>>>>>>>> if you look with open eyes. Native VMS code's widespread use of
>>>>>>>> descriptors for varying-length items encourages careful programming
>>>>>>>> and has no equivalent in Windows or any Unix I've seen (since V7,  
>>>>>>>> Sys
>>>>>>>> V, and BSD4.1, I've seen a few).
>>>>>>>
>>>>>>> Descriptors are not part of the OS but a feature of the compilers,  
>>>>>>> and
>>>>>>> the
>>>>>>> concept really came out of languages like PL/I and Algol, we call  
>>>>>>> them
>>>>>>> dope vectors.
>>>>>>
>>>>>>    The use of descriptors for many of the OS APIs is part of the OS.
>>>>>>
>>>>> Don't wish to nitpick, but it is the selection of compilers supporting
>>>>> such
>>>>> constructs that is part of the OS.  Languages deficient in such
>>>>> constructs
>>>>> were enhanced to provide that capability.  OS's like Multics, Primos,
>>>>> VOS,
>>>>> MVS-z/os Burroughs were written in languages in which such constructs
>>>>> are
>>>>> an integral part of the language.
>>>>>
>>>> Primos?  A bunch of that was written in Fortran IV.  :-)
>>> That is true, but from 18 on forward it moistly all PLP
>>
>> Not really.  I maintained Rev 19 systems and we still had FTN and PMA.
>> I used to have a copy but it is long gone now.  I never got to work with
>> Rev 20 so I can't say if they redid all of it by that point.
>>
>> But I was being very toungue-in-cheek.  It was mostly PL/I code (PLP and
>> PL/I Subset G) and a lot of fun to work with.  It is as much a shame that
>> Primos didn't really survive as it will be when VMS fades away.  I expect
>> both to experience the same fate.  That is, just as Primos is still in  
>> use
>> this long after its demise, so will VMS continue to be used long after
>> its owners have given up the ghost.  One can only hope that there will
>> be someone to pick up the ball and run with it when that time comes as
>> was the case with Primos.
> 
> Does anyone maintain it?

Yes.  As a matter of fact, I donated my home Prime system to one of the
people who is still licensed to maintain Primos.  That was several years
ago and he drove out here from Ohio to get it.  I still keep in touch
with a handful of the Prime people.  It was a very nice machine although
a little strange sometimes.

bill

-- 
Bill Gunshannon          |  de-moc-ra-cy (di mok' ra see) n.  Three wolves
billg999@cs.scranton.edu |  and a sheep voting on what's for dinner.
University of Scranton   |
Scranton, Pennsylvania   |         #include <std.disclaimer.h>   

------------------------------

Date: Wed, 10 Sep 2008 07:36:53 -0700 (PDT)
From: johnwallace4@yahoo.co.uk
Subject: Re: Loose Cannon-dian
Message-ID: <fb1eedec-c950-4c4e-874f-f70e46faaf4c@x41g2000hsb.googlegroups.com>

On Sep 10, 2:23 pm, billg...@cs.uofs.edu (Bill Gunshannon) wrote:
> In article <eblGRJGgr...@eisner.encompasserve.org>,
>         koeh...@eisner.nospam.encompasserve.org (Bob Koehler) writes:
>
> > In article <ga6u7j$vnc$0...@news.t-online.com>, Michael Kraemer <M.Krae...@gsi.de> writes:
>
> >> That's pretty much nowhere land.
> >> Are there widely accepted certifications beyond
> >> orange book ?
>
> >    Nowhere?  C2, B1, ..., all were written by some folks based on thier
> >    limited knowledge and thier specific needs.  There are a lot of other
> >    legitimate security concerns.
>
> >    For example, Windows got a C2 rating at one time, based on
> >    limitations like no network, no floppies, ...
>
> >    So what good is a system if you can't enter or retrive data?
>
> Those ratings are for operational systems.  What need is there for a
> network connection or floppies on a system running a power plant?
>
> One can take the system offline, connect a floppy, load and install
> needed upgrades and then remove the floppy, recertify and return to
> production as a C2 system.
>
> When one looks at things in terms of IS's instead of just a Windows
> box this stuff makes a lot more sense.  But then, when you are so
> totally biased against MS, you become blind to reality.
>
> bill
>
> --
> Bill Gunshannon          |  de-moc-ra-cy (di mok' ra see) n.  Three wolves
> billg...@cs.scranton.edu |  and a sheep voting on what's for dinner.
> University of Scranton   |
> Scranton, Pennsylvania   |         #include <std.disclaimer.h>

Power plants are more networked than you seem to think, in order to
(for example) automate the process of matching electricity generation
against electricity demand in something approaching real time (this
kind of thing used to be done by phone but the PHBs prefer things like
this to be automated). And then there's also the wandering contractor
with a potentially-infected laptop connected to the (maybe isolated)
plant network on one side, and (maybe) via a 3G phone to the Internerd
on the other side.

Depending on the technologies used, this can make them more vulnerable
than you seem to think, and almost certainly more vulnerable than they
were prior to Windows monoculture. If the plant network is designed to
be isolated when operational, it will likely still have essential
Window boxes on it in places, so where will those boxes get their
daily AV updates, monthly Windows updates, occasional application
updates? A network connection or a removable media sneakernet,
perhaps? Isolated but out of date (and requiring downtime for each
update), or up to date and vulnerable. Take your pick.

Perhaps you missed the GAO report in May this year which had 92
specific suggestions for IT/SCADA security improvements at the
Tennessee Valley Authority (you've heard of them?) and recommendations
for "best practice" elsewhere?

GAO report: http://www.gao.gov/new.items/d08526.pdf
Sample "IT" media coverage: http://www.theregister.co.uk/2008/05/22/electrical_grid_vulnerable/

------------------------------

Date: 10 Sep 2008 15:47:08 GMT
From: billg999@cs.uofs.edu (Bill Gunshannon)
Subject: Re: Loose Cannon-dian
Message-ID: <6iq8frF5ldU1@mid.individual.net>

In article <fb1eedec-c950-4c4e-874f-f70e46faaf4c@x41g2000hsb.googlegroups.com>,
	johnwallace4@yahoo.co.uk writes:
> On Sep 10, 2:23 pm, billg...@cs.uofs.edu (Bill Gunshannon) wrote:
>> In article <eblGRJGgr...@eisner.encompasserve.org>,
>>         koeh...@eisner.nospam.encompasserve.org (Bob Koehler) writes:
>>
>> > In article <ga6u7j$vnc$0...@news.t-online.com>, Michael Kraemer <M.Krae...@gsi.de> writes:
>>
>> >> That's pretty much nowhere land.
>> >> Are there widely accepted certifications beyond
>> >> orange book ?
>>
>> >    Nowhere?  C2, B1, ..., all were written by some folks based on thier
>> >    limited knowledge and thier specific needs.  There are a lot of other
>> >    legitimate security concerns.
>>
>> >    For example, Windows got a C2 rating at one time, based on
>> >    limitations like no network, no floppies, ...
>>
>> >    So what good is a system if you can't enter or retrive data?
>>
>> Those ratings are for operational systems.  What need is there for a
>> network connection or floppies on a system running a power plant?
>>
>> One can take the system offline, connect a floppy, load and install
>> needed upgrades and then remove the floppy, recertify and return to
>> production as a C2 system.
>>
>> When one looks at things in terms of IS's instead of just a Windows
>> box this stuff makes a lot more sense.  But then, when you are so
>> totally biased against MS, you become blind to reality.
> 
> Power plants are more networked than you seem to think, in order to
> (for example) automate the process of matching electricity generation
> against electricity demand in something approaching real time (this
> kind of thing used to be done by phone but the PHBs prefer things like
> this to be automated). 

I just used that as an example as it is one that shows up here.  If,
as you say, networking is required then obviously t either wouldn't
be C2 or wouldn't be Windows.  I was just trying to show that not having
those things in production did not mean they could not be available in
a C2 rated IS.


>                        And then there's also the wandering contractor
> with a potentially-infected laptop connected to the (maybe isolated)
> plant network on one side, 

The statement was C2 + Windows = "no network" so, not a problem.  Obviously,
a lot more goes into maintaining C2 systems than your home PC but it is done
every day.

>                             and (maybe) via a 3G phone to the Internerd
> on the other side.
> 
> Depending on the technologies used, this can make them more vulnerable
> than you seem to think, and almost certainly more vulnerable than they
> were prior to Windows monoculture. If the plant network is designed to
> be isolated when operational, it will likely still have essential
> Window boxes on it in places, so where will those boxes get their
> daily AV updates, monthly Windows updates, occasional application
> updates? 

You missed the most important point.  "No Network".  Obviously, C2 rated
systems do not get "daily AV updates, monthly Windows updates, occasional
application updates" in the same manner as your home PC.  Tell me something?
Can you get to any of the PC's currently being used by the military in Iraq?
Do you think they are not running Windows?  Do you think they don't get kept
up to date for things like AV and Windows Updates?

>           A network connection or a removable media sneakernet,
> perhaps? Isolated but out of date (and requiring downtime for each
> update), or up to date and vulnerable. Take your pick.

If it is not connected to the outside world in any way and it only runs
one task, vulnerable to what?  You guys really need to change your mindset
and accept that there are secure Windows Systems running all over the world.
I know, I just had to go back to school (again) to have my skills refreshed
on how this is being done.

> 
> Perhaps you missed the GAO report in May this year which had 92
> specific suggestions for IT/SCADA security improvements at the
> Tennessee Valley Authority (you've heard of them?) and recommendations
> for "best practice" elsewhere?

Don't know anything about TVA but I doubt C2 is one of their requirements
for an IS.  And that was what was being discussed.

> 
> GAO report: http://www.gao.gov/new.items/d08526.pdf
> Sample "IT" media coverage: http://www.theregister.co.uk/2008/05/22/electrical_grid_vulnerable/

bill

-- 
Bill Gunshannon          |  de-moc-ra-cy (di mok' ra see) n.  Three wolves
billg999@cs.scranton.edu |  and a sheep voting on what's for dinner.
University of Scranton   |
Scranton, Pennsylvania   |         #include <std.disclaimer.h>   

------------------------------

Date: Wed, 10 Sep 2008 03:53:30 -0400
From: JF Mezei <jfmezei.spamnot@vaxination.ca>
Subject: Re: OT: Message to Mr VAXman
Message-ID: <48c77d70$0$1537$c3e8da3@news.astraweb.com>

Another article about Scientology for Mr VAXman:

(scientology using DCMA to force youtube to take down videos that are
critical of their business/sect/whatever)

http://arstechnica.com/news.ars/post/20080908-scientology-fights-critics-with-4000-dmca-takedown-notices.html

------------------------------

Date: Wed, 10 Sep 2008 09:10:53 +0200
From: Michael Kraemer <M.Kraemer@gsi.de>
Subject: Re: OT: The end of the world in roughly 3 hours
Message-ID: <ga7rsn$b0o$01$2@news.t-online.com>

JF Mezei schrieb:

> ... and lets hope that
> they doN't rely on Windows to run it. 

I wouldn't hold my breath.

------------------------------

Date: Wed, 10 Sep 2008 09:57:42 +0200
From: Joseph Huber <joseph.huber@NOSPAM.web.de>
Subject: Re: OT: The end of the world in roughly 3 hours
Message-ID: <ga7ulm$1110$1@gwdu112.gwdg.de>

Michael Kraemer wrote:
> JF Mezei schrieb:
> 
>> ... and lets hope that
>> they doN't rely on Windows to run it. 
> 
> I wouldn't hold my breath.
> 
> 

Well yes, they do, and we do:
  see my snapshot of a small part of our liquid argon calorimeter 
detector control system
  http://wwwvms.mppmu.mpg.de/~huber/atlas_lar_ready_for_beam.jpg
waiting for beam ...

-- 

  Joseph Huber   - http://www.huber-joseph.de

------------------------------

Date: Wed, 10 Sep 2008 10:07:42 +0200
From: Michael Kraemer <M.Kraemer@gsi.de>
Subject: Re: OT: The end of the world in roughly 3 hours
Message-ID: <ga7v79$fkk$02$1@news.t-online.com>

Joseph Huber schrieb:
> Michael Kraemer wrote:
> 
>> JF Mezei schrieb:
>>
>>> ... and lets hope that
>>> they doN't rely on Windows to run it. 
>>
>>
>> I wouldn't hold my breath.
>>
>>
> 
> Well yes, they do, and we do:
>  see my snapshot of a small part of our liquid argon calorimeter 
> detector control system
>  http://wwwvms.mppmu.mpg.de/~huber/atlas_lar_ready_for_beam.jpg
> waiting for beam ...

Well, that's just a snapshot of a particular experiment, CERN is large,
and not too long ago, the mantra of their IT bosses was that
WindozeNT will take over everything.
What are the accelerator controls running on ?

------------------------------

Date: Wed, 10 Sep 2008 01:12:54 -0700 (PDT)
From: IanMiller <gxys@uk2.net>
Subject: Re: OT: The end of the world in roughly 3 hours
Message-ID: <131f1dfd-249d-4783-ac9e-4a46d9c049f0@d45g2000hsc.googlegroups.com>

Follow the excitement at

http://www.bbc.co.uk/radio4/bigbang/

------------------------------

Date: Wed, 10 Sep 2008 10:34:00 +0200
From: Joseph Huber <joseph.huber@NOSPAM.web.de>
Subject: Re: OT: The end of the world in roughly 3 hours
Message-ID: <ga80pp$12mt$1@gwdu112.gwdg.de>

Michael Kraemer wrote:
> Well, that's just a snapshot of a particular experiment, CERN is large,
> and not too long ago, the mantra of their IT bosses was that
> WindozeNT will take over everything.
> What are the accelerator controls running on ?
> 
Well I'm not near enough to know. The hard core is certainly running on 
hard realtime systems, the controls levels above are a mixture of 
Windows and Linux systems.
There is a common SCADA system in use both for LHC and detector control 
, which runs on Windows and Linux.

Windows has not taken over everything, but in admin and engineering almost.

-- 

  Joseph Huber   - http://www.huber-joseph.de

------------------------------

Date: Wed, 10 Sep 2008 02:09:17 -0700 (PDT)
From: johnwallace4@yahoo.co.uk
Subject: Re: OT: The end of the world in roughly 3 hours
Message-ID: <8095b927-dfc6-466e-9601-3fccce13018e@k30g2000hse.googlegroups.com>

On Sep 10, 9:34 am, Joseph Huber <joseph.hu...@NOSPAM.web.de> wrote:
> Michael Kraemer wrote:
> > Well, that's just a snapshot of a particular experiment, CERN is large,
> > and not too long ago, the mantra of their IT bosses was that
> > WindozeNT will take over everything.
> > What are the accelerator controls running on ?
>
> Well I'm not near enough to know. The hard core is certainly running on
> hard realtime systems, the controls levels above are a mixture of
> Windows and Linux systems.
> There is a common SCADA system in use both for LHC and detector control
> , which runs on Windows and Linux.
>
> Windows has not taken over everything, but in admin and engineering almost.
>
> --
>
>   Joseph Huber   -http://www.huber-joseph.de

"Common SCADA system" = National Instruments Labview, right ? As per
http://sine.ni.com/cs/app/doc/p/id/cs-10795 ?

There are lots of folks who'd consider themselves in the SCADA
industry who wouldn't class Labview as a SCADA package, but...

------------------------------

Date: Wed, 10 Sep 2008 05:15:03 -0400
From: JF Mezei <jfmezei.spamnot@vaxination.ca>
Subject: Re: OT: The end of the world in roughly 3 hours
Message-ID: <48c79086$0$12394$c3e8da3@news.astraweb.com>

AN UPDATE:

The universe still exists today because all they are doing is getting a
few particles to move in one direction. No collisions planned in short
term. They'll next try to get a few particles to go in the opposite
direction. Collisions will happen much later when they get particles to
flow in opposite directions at the same time. (and they have to fine
tune their guidance system so that particles flowing in opposite
directions will hit each other.

So what they have done today is what I used to do at the montreal
velodrome (before it was savagely destroyed by politicians): go round
and round hopefully without a collision...


BTW:

$ curl -I http://www.cern.ch
HTTP/1.1 302 Found
Date: Wed, 10 Sep 2008 09:11:05 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
X-AspNet-Version: 1.1.4322
Location: http://public.web.cern.ch/public/
Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Length: 150


And I thought CERN was populated by intelligent and educated people who
would know not to use microsoft products.

------------------------------

Date: Wed, 10 Sep 2008 11:53:39 +0200
From: Michael Kraemer <M.Kraemer@gsi.de>
Subject: Re: OT: The end of the world in roughly 3 hours
Message-ID: <ga85du$ad9$03$1@news.t-online.com>

JF Mezei schrieb:
> BTW:
> 
> $ curl -I http://www.cern.ch
> HTTP/1.1 302 Found
> Date: Wed, 10 Sep 2008 09:11:05 GMT
> Server: Microsoft-IIS/6.0
> X-Powered-By: ASP.NET
> X-AspNet-Version: 1.1.4322
> Location: http://public.web.cern.ch/public/
> Cache-Control: private
> Content-Type: text/html; charset=utf-8
> Content-Length: 150
> 
> 
> And I thought CERN was populated by intelligent and educated people who
> would know not to use microsoft products.

well, being cynical, one could ask:
And these are the same people telling us
there would be absolutely no problems with
black holes ?
:-)

------------------------------

Date: Wed, 10 Sep 2008 12:01:17 +0200
From: Joseph Huber <joseph.huber@NOSPAM.web.de>
Subject: Re: OT: The end of the world in roughly 3 hours
Message-ID: <ga85td$16bt$1@gwdu112.gwdg.de>

johnwallace4@yahoo.co.uk wrote:

> "Common SCADA system" = National Instruments Labview, right ? As per
> http://sine.ni.com/cs/app/doc/p/id/cs-10795 ?
> 
> There are lots of folks who'd consider themselves in the SCADA
> industry who wouldn't class Labview as a SCADA package, but...

No, definitely not Labview, it is PVSS from ETM (http://www.etm.at/).

-- 

  Joseph Huber   - http://www.huber-joseph.de

------------------------------

Date: Wed, 10 Sep 2008 06:13:16 -0400
From: JF Mezei <jfmezei.spamnot@vaxination.ca>
Subject: Re: OT: The end of the world in roughly 3 hours
Message-ID: <48c79e2a$0$4544$c3e8da3@news.astraweb.com>

Joseph Huber wrote:

> No, definitely not Labview, it is PVSS from ETM (http://www.etm.at/).
> 

This product is said to work on Windows, Linux and Solaris.

I can see commonality between Linux and SOlaris, they both use X and
various graphical toolkits are available on both.

How do they also support Windows ? Isn't this a major headache to
support both Unix and Windows for some serious application when you
consider how different Windows is from the rest of the world ?

------------------------------

Date: Wed, 10 Sep 2008 13:37:44 +0200
From: Joseph Huber <joseph.huber@NOSPAM.web.de>
Subject: Re: OT: The end of the world in roughly 3 hours
Message-ID: <ga8bi8$1a42$1@gwdu112.gwdg.de>

JF Mezei wrote:
> Joseph Huber wrote:
> 
>> No, definitely not Labview, it is PVSS from ETM (http://www.etm.at/).
>>
> 
> This product is said to work on Windows, Linux and Solaris.
> 
> I can see commonality between Linux and SOlaris, they both use X and
> various graphical toolkits are available on both.
> 
> How do they also support Windows ? Isn't this a major headache to
> support both Unix and Windows for some serious application when you
> consider how different Windows is from the rest of the world ?

The first years we used it, the graphical interface was quite different 
between linux and Windows. They now have an intermediate graphic layer ( 
Qt ?), so the graphics part now has a rather common software base on 
both system worlds.
And for datahandling and communication, if there are well defined 
protocols behind (ODBC,OPC,..), then I think one doesn't have to 
maintain too diverse code streams.

PVSS consists of a rather small set of programs (called "managers"), and 
the application work is plugged in as "scripts" in form of (extended-) C 
  code, which is interpreted by the control managers.

There are mainly device drivers tight to the different systems. And here 
is also one reasons why some subsystems are bound to Windows. The 
subproject I'm working e.g. has some CANbus driver and OPC interfaces 
not available for Linux.

-- 

  Joseph Huber   - http://www.huber-joseph.de

------------------------------

Date: 10 Sep 2008 07:49:48 -0500
From: koehler@eisner.nospam.encompasserve.org (Bob Koehler)
Subject: Re: OT: The end of the world in roughly 3 hours
Message-ID: <MqH4rfLH4jTY@eisner.encompasserve.org>

In article <48c74eaa$0$12359$c3e8da3@news.astraweb.com>, JF Mezei <jfmezei.spamnot@vaxination.ca> writes:
> The Large Hadron Collider will first be activated this morning at 09:00
> Central European Time (GMT + 2).
> 
> It is expected that a new universe will be created inside the LHC
> (lasting an eternity for the people in it, but mere millionth of a
> second for us) and it is possible that it will also create a black hole
> that will suck up the earth (like the vaccuum cleaner that sucks  the
> pink panther and then sucks itself out of existance)

   They didn't even turn on the opposing beam yet.  It's really hard
   to get collisions when all the protons are running in the same
   direction.

   Creating a new universe is an extreem and misleading description
   of what is expected to happen when they do get collisions.

   And the possibility of destroying the Earth is just fodder for
   headline seeking "journalists".

------------------------------

Date: Wed, 10 Sep 2008 07:03:12 -0700 (PDT)
From: DaveG <david.gudewicz@abbott.com>
Subject: Re: OT: The end of the world in roughly 3 hours
Message-ID: <4a6998f0-e341-4178-a7ae-eacd7d54b8f5@8g2000hse.googlegroups.com>

On Sep 10, 7:49=A0am, koeh...@eisner.nospam.encompasserve.org (Bob
Koehler) wrote:
> In article <48c74eaa$0$12359$c3e8...@news.astraweb.com>, JF Mezei <jfmeze=
i.spam...@vaxination.ca> writes:
>
> > The Large Hadron Collider will first be activated this morning at 09:00
> > Central European Time (GMT + 2).
>
> > It is expected that a new universe will be created inside the LHC
> > (lasting an eternity for the people in it, but mere millionth of a
> > second for us) and it is possible that it will also create a black hole
> > that will suck up the earth (like the vaccuum cleaner that sucks =A0the
> > pink panther and then sucks itself out of existance)
>
> =A0 =A0They didn't even turn on the opposing beam yet. =A0It's really har=
d
> =A0 =A0to get collisions when all the protons are running in the same
> =A0 =A0direction.
>
> =A0 =A0Creating a new universe is an extreem and misleading description
> =A0 =A0of what is expected to happen when they do get collisions.
>
> =A0 =A0And the possibility of destroying the Earth is just fodder for
> =A0 =A0headline seeking "journalists".

Can't comment on the end of the world or Windows/Linux stuff but I
will pass this on.

At a recent DECUS-->Encompass-->Connect LUG (now chapter) meeting a
brief discussion of the LHC took place.  Guy from Fermi Lab said that
the unit must be shut down for 3 months of the year during the heating
season.  Electrons are used to heat homes, etc and with the LHC using
the Euro grid to power their new machine, there won't be enough
electrons to both run the collider and keep people warm during the
heating season.

------------------------------

Date: Wed, 10 Sep 2008 10:40:34 -0400
From: "Richard B. Gilbert" <rgilbert88@comcast.net>
Subject: Re: OT: The end of the world in roughly 3 hours
Message-ID: <5LmdnRAOyepDQVrVnZ2dnUVZ_h-dnZ2d@comcast.com>

JF Mezei wrote:
> The Large Hadron Collider will first be activated this morning at 09:00
> Central European Time (GMT + 2).
> 
> It is expected that a new universe will be created inside the LHC
> (lasting an eternity for the people in it, but mere millionth of a
> second for us) and it is possible that it will also create a black hole
> that will suck up the earth (like the vaccuum cleaner that sucks  the
> pink panther and then sucks itself out of existance)
> 
> It is unclear what effect the LHC experiment may have on the connection
> between my universe and the one where most of comp.os.vms lives in.
> 
> BBC said that they won't have it at full power today and it will take a
> year before they risk running at at full power.
> 
> 
> www.cern.ch is the official website. As in any modern event, they are to
> have a live webcast. It is not clear what's we'll see in it. It is not
> clear if we will actually hear a "big bang".
> 
> Good luck to all those who worked on that project, and lets hope that
> they doN't rely on Windows to run it. And remember that, as in any
> science fiction movie, all the lights in the world will dim when they
> turn the power on to the collider :-) :-) :-) :-)

Well, the world has clearly survived.  Unless, of course, I'm 
hallucinating all this!

------------------------------

Date: Wed, 10 Sep 2008 15:52:55 +0000 (UTC)
From: helbig@astro.multiCLOTHESvax.de (Phillip Helbig---remove CLOTHES to reply)
Subject: Re: OT: The end of the world in roughly 3 hours
Message-ID: <ga8qgn$j9m$1@online.de>

In article <48c79086$0$12394$c3e8da3@news.astraweb.com>, JF Mezei
<jfmezei.spamnot@vaxination.ca> writes: 

> The universe still exists today because all they are doing is getting a
> few particles to move in one direction. 

I didn't reply at first since it was off-topic, but since so many others 
have....

There are a few folks who really think the world is coming to an end,
since the LHC might produce black holes.  It might, but they won't grow
by sucking in the Earth, since they will quickly decay via Hawking
radiation (the smaller the black hole, the FASTER it decays).  The best
demonstration to realise that there is no danger is to understand that
the LHC wasn't built to produce something which has never existed (on
Earth) before---although this is often claimed by the media---, but
rather to study it in detail.  Cosmic rays regularly reach energies well
in excess of anything the LHC is capable of.  Some elementary particles
were first detected by studying the aftermath of the collisions of
cosmic rays with the atmosphere.  (Q: Why not use cosmic rays instead of
artificial collisions for the current experiments?  A: Because these
dectectors which weigh as much as the Eiffel tower won't fit in a
hot-air balloon.) 

------------------------------

Date: Wed, 10 Sep 2008 15:55:59 +0000 (UTC)
From: helbig@astro.multiCLOTHESvax.de (Phillip Helbig---remove CLOTHES to reply)
Subject: Re: OT: The end of the world in roughly 3 hours
Message-ID: <ga8qmf$j9m$2@online.de>

In article
<4a6998f0-e341-4178-a7ae-eacd7d54b8f5@8g2000hse.googlegroups.com>, DaveG
<david.gudewicz@abbott.com> writes: 

> At a recent DECUS-->Encompass-->Connect LUG (now chapter) meeting a
> brief discussion of the LHC took place.  Guy from Fermi Lab said that
> the unit must be shut down for 3 months of the year during the heating
> season.  Electrons are used to heat homes, etc and with the LHC using
> the Euro grid to power their new machine, there won't be enough
> electrons to both run the collider and keep people warm during the
> heating season.

I don't know if this is true, but if so, the bottleneck is not
electrons, but rather power.  I think at one point it was debated 
whether CERN should have its own power plant.  At DESY in Hamburg, which 
is a similar institution, the average power consumption amounts to 2% of 
the city of Hamburg (where DESY is located), which has over a million 
inhabitants.

------------------------------

Date: Wed, 10 Sep 2008 09:05:51 -0700 (PDT)
From: DaveG <david.gudewicz@abbott.com>
Subject: Re: OT: The end of the world in roughly 3 hours
Message-ID: <ed45d6f9-0f40-4562-874a-4d6b28201c5a@m44g2000hsc.googlegroups.com>

On Sep 10, 10:55=A0am, hel...@astro.multiCLOTHESvax.de (Phillip Helbig---
remove CLOTHES to reply) wrote:
> In article
> <4a6998f0-e341-4178-a7ae-eacd7d54b...@8g2000hse.googlegroups.com>, DaveG
>
> <david.gudew...@abbott.com> writes:
> > At a recent DECUS-->Encompass-->Connect LUG (now chapter) meeting a
> > brief discussion of the LHC took place. =A0Guy from Fermi Lab said that
> > the unit must be shut down for 3 months of the year during the heating
> > season. =A0Electrons are used to heat homes, etc and with the LHC using
> > the Euro grid to power their new machine, there won't be enough
> > electrons to both run the collider and keep people warm during the
> > heating season.
>
> I don't know if this is true, but if so, the bottleneck is not
> electrons, but rather power. =A0I think at one point it was debated
> whether CERN should have its own power plant. =A0At DESY in Hamburg, whic=
h
> is a similar institution, the average power consumption amounts to 2% of
> the city of Hamburg (where DESY is located), which has over a million
> inhabitants.

power =3D=3D  electrons in my note.  Sorry for the confusion.

------------------------------

Date: Wed, 10 Sep 2008 10:11:58 -0700 (PDT)
From: "winston19842005@yahoo.com" <winston19842005@yahoo.com>
Subject: Re: OT: The end of the world in roughly 3 hours
Message-ID: <95f6dfea-f2ff-47e5-9fc5-d37979d99efe@k7g2000hsd.googlegroups.com>

On Sep 10, 10:40=A0am, "Richard B. Gilbert" <rgilber...@comcast.net>
wrote:
> JF Mezei wrote:
> > The Large Hadron Collider will first be activated this morning at 09:00
> > Central European Time (GMT + 2).
>
> > It is expected that a new universe will be created inside the LHC
> > (lasting an eternity for the people in it, but mere millionth of a
> > second for us) and it is possible that it will also create a black hole
> > that will suck up the earth (like the vaccuum cleaner that sucks =A0the
> > pink panther and then sucks itself out of existance)
>
> > It is unclear what effect the LHC experiment may have on the connection
> > between my universe and the one where most of comp.os.vms lives in.
>
> > BBC said that they won't have it at full power today and it will take a
> > year before they risk running at at full power.
>
> >www.cern.chis the official website. As in any modern event, they are to
> > have a live webcast. It is not clear what's we'll see in it. It is not
> > clear if we will actually hear a "big bang".
>
> > Good luck to all those who worked on that project, and lets hope that
> > they doN't rely on Windows to run it. And remember that, as in any
> > science fiction movie, all the lights in the world will dim when they
> > turn the power on to the collider :-) :-) :-) :-)
>
> Well, the world has clearly survived. =A0Unless, of course, I'm
> hallucinating all this!- Hide quoted text -
>
> - Show quoted text -

No, actually I exist, and the rest of you are figments of my
imagination. Or at least that is what my wife is telling me, as she
studies philosophy.

Or at least that is what my mind is telling me, that I have a wife
that is studying philosophy who is telling me that I don't exist, yet
I do and no one else does, they are only projections of my mind...

------------------------------

Date: Wed, 10 Sep 2008 11:44:50 -0400
From: norm.raphael@metso.com
Subject: Re: Pipe search of command procedure log file containing pipe search	command.	co
Message-ID: <OFD56AAA04.A940495C-ON852574C0.00560213-852574C0.00567F37@metso.com>

This is a multipart message in MIME format.
--=_alternative 00567F36852574C0_=
Content-Type: text/plain; charset="US-ASCII"

FrankS <sapienza@noesys.com> wrote on 09/10/2008 10:25:54 AM:

> On Sep 10, 9:40 am, norm.raph...@metso.com wrote:
> > Is there a better way to do this?  
> 
> Why not just check the status condition after each CONVERT completes?

When the IF-STATEMENT finds a dup, it takes all the collected error 
lines and e-mails them.  Constructing something to store all that 
would be far more complex and/or require an output temporary file, 
which is what the pipe search is employed to eliminate.  I don't want 
just to know if there were dups, but to e-mail them.
 
The case here is/was somewhat simplified to uncomplicate it.

Nothing yet explains why it works sometimes and fails other times.

> 
> $ SET NoON
> $ CVT_DUP = %X<put whatever code CONVERT-I-DUP is here>
> $ DUP_OCCURS = 0
> $
> $ CONVERT <do your convert here>
> $ IF ($STATUS .EQ. CVT_DUP) THEN GOSUB RECORD_OCCURANCE
> $ IF (.NOT. $STATUS) THEN GOTO KABOOM
> $
> $ CONVERT <do your convert here>
> $ IF ($STATUS .EQ. CVT_DUP) THEN GOSUB RECORD_OCCURANCE
> $ IF (.NOT. $STATUS) THEN GOTO KABOOM
> $
> .
> .  Repeat however many $CONVERTs you need
> .
> $
> $ IF (DUP_OCCURS .NE. 0)
> $ THEN
> $      <do whatever you need in the event of duplicates here>
> $ ENDIF
> $ EXIT
> $
> $ KABOOM:
> $ <put error condition handling here>
> $ EXIT
> $
> $ RECORD_OCCURANCE:
> $ DUP_OCCURS = DUP_OCCURS + 1
> $ RETURN
> $

--=_alternative 00567F36852574C0_=
Content-Type: text/html; charset="US-ASCII"


<br><font size=2 face="sans-serif"><br>
</font>
<br>
<br><font size=2><tt>FrankS &lt;sapienza@noesys.com&gt; wrote on 09/10/2008
10:25:54 AM:<br>
<br>
&gt; On Sep 10, 9:40&nbsp;am, norm.raph...@metso.com wrote:<br>
&gt; &gt; Is there a better way to do this? &nbsp;<br>
&gt; <br>
&gt; Why not just check the status condition after each CONVERT completes?</tt></font>
<br>
<br><font size=2><tt>When the IF-STATEMENT finds a dup, it takes all the
collected error &nbsp;</tt></font>
<br><font size=2><tt>lines and e-mails them. &nbsp;Constructing something
to store all that </tt></font>
<br><font size=2><tt>would be far more complex and/or require an output
temporary file, </tt></font>
<br><font size=2><tt>which is what the pipe search is employed to eliminate.
&nbsp;I don't want </tt></font>
<br><font size=2><tt>just to know if there were dups, but to e-mail them.</tt></font>
<br><font size=2><tt>&nbsp;<br>
The case here is/was somewhat simplified to uncomplicate it.</tt></font>
<br>
<br><font size=2><tt>Nothing yet explains why it works sometimes and fails
other times.</tt></font>
<br>
<br><font size=2><tt>&gt; <br>
&gt; $ SET NoON<br>
&gt; $ CVT_DUP = %X&lt;put whatever code CONVERT-I-DUP is here&gt;<br>
&gt; $ DUP_OCCURS = 0<br>
&gt; $<br>
&gt; $ CONVERT &lt;do your convert here&gt;<br>
&gt; $ IF ($STATUS .EQ. CVT_DUP) THEN GOSUB RECORD_OCCURANCE<br>
&gt; $ IF (.NOT. $STATUS) THEN GOTO KABOOM<br>
&gt; $<br>
&gt; $ CONVERT &lt;do your convert here&gt;<br>
&gt; $ IF ($STATUS .EQ. CVT_DUP) THEN GOSUB RECORD_OCCURANCE<br>
&gt; $ IF (.NOT. $STATUS) THEN GOTO KABOOM<br>
&gt; $<br>
&gt; .<br>
&gt; . &nbsp;Repeat however many $CONVERTs you need<br>
&gt; .<br>
&gt; $<br>
&gt; $ IF (DUP_OCCURS .NE. 0)<br>
&gt; $ THEN<br>
&gt; $ &nbsp; &nbsp; &nbsp;&lt;do whatever you need in the event of duplicates
here&gt;<br>
&gt; $ ENDIF<br>
&gt; $ EXIT<br>
&gt; $<br>
&gt; $ KABOOM:<br>
&gt; $ &lt;put error condition handling here&gt;<br>
&gt; $ EXIT<br>
&gt; $<br>
&gt; $ RECORD_OCCURANCE:<br>
&gt; $ DUP_OCCURS = DUP_OCCURS + 1<br>
&gt; $ RETURN<br>
&gt; $<br>
</tt></font>
--=_alternative 00567F36852574C0_=--

------------------------------

Date: Wed, 10 Sep 2008 07:25:54 -0700 (PDT)
From: FrankS <sapienza@noesys.com>
Subject: Re: Pipe search of command procedure log file containing pipe search	command.	co
Message-ID: <d4c3e502-b9e6-4358-9936-9da1383a0723@26g2000hsk.googlegroups.com>

On Sep 10, 9:40=A0am, norm.raph...@metso.com wrote:
> Is there a better way to do this? =A0

Why not just check the status condition after each CONVERT completes?

$ SET NoON
$ CVT_DUP =3D %X<put whatever code CONVERT-I-DUP is here>
$ DUP_OCCURS =3D 0
$
$ CONVERT <do your convert here>
$ IF ($STATUS .EQ. CVT_DUP) THEN GOSUB RECORD_OCCURANCE
$ IF (.NOT. $STATUS) THEN GOTO KABOOM
$
$ CONVERT <do your convert here>
$ IF ($STATUS .EQ. CVT_DUP) THEN GOSUB RECORD_OCCURANCE
$ IF (.NOT. $STATUS) THEN GOTO KABOOM
$
.
.  Repeat however many $CONVERTs you need
.
$
$ IF (DUP_OCCURS .NE. 0)
$ THEN
$      <do whatever you need in the event of duplicates here>
$ ENDIF
$ EXIT
$
$ KABOOM:
$ <put error condition handling here>
$ EXIT
$
$ RECORD_OCCURANCE:
$ DUP_OCCURS =3D DUP_OCCURS + 1
$ RETURN
$

------------------------------

Date: Wed, 10 Sep 2008 09:40:04 -0400
From: norm.raphael@metso.com
Subject: Pipe search of command procedure log file containing pipe search command.
Message-ID: <OF89ECCD3B.90494BF1-ON852574C0.0048BAB1-852574C0.004B1250@metso.com>

This is a multipart message in MIME format.
--=_alternative 004B124F852574C0_=
Content-Type: text/plain; charset="US-ASCII"

Here is a code fragment designed to search the running command procedure 
to 
see if any of the converts in the log got duplicate error messages.  The 
second line eliminates matches of the pipe search command itself.
======
$ proc = f$environment("procedure")
$ proc_name = f$parse(proc,,,"name")
$! [snip]
$ pipe search 'proc_name'.log;  "%CONVERT-I-DUP," /mat=or | -
  sear sys$input "pipe search" /match=nor | - 
( read sys$pipe ema ; DEFINE /JOB /nolog email_rec &ema )
$ email_rec=-
  f$edit(f$trnlnm("EMAIL_REC","LNM$JOB"),"compress,trim")
$ sho sym email_rec
$ Deassign/job email_rec
$if f$locate("NO STRING",f$edit(email_rec,"UPCASE")) .eq. 
f$length(email_rec)
======
Here is the log file from a normal run.
The symbol EMAIL_REC contains the expected result of the search when there 

are no duplicate error messages.
======
$ pipe search GET_ORDERS_AM.log;  "%CONVERT-I-DUP," /mat=or | -
  sear sys$input "pipe search" /match=nor | - 
( read sys$pipe ema ; DEFINE /JOB /nolog email_rec &ema )
$ email_rec=-
  f$edit(f$trnlnm("EMAIL_REC","LNM$JOB"),"compress,trim")
$ sho sym email_rec
  EMAIL_REC = "%SEARCH-I-NOMATCHES, no strings matched"
$ Deassign/job email_rec
$if f$locate("NO STRING",f$edit(email_rec,"UPCASE")) .eq. 
f$length(email_rec) 
$! [snip]
  USER1     job terminated at 10-SEP-2008 07:17:21.69
<CR><LF>  Accounting information:
  Buffered I/O count:               8816      Peak working set size: 22240
  Direct I/O count:                 3107      Peak virtual size: 234528
  Page faults:                     11238      Mounted volumes:    0
  Charged CPU time:        0 00:00:06.07      Elapsed time:       0 
01:17:21.69
======
Here is the log file from a failed run.
The symbol EMAIL_REC here contains a null-string even though there 
are no duplicate error messages.
======
$ pipe search CONVERT_FILES.log;  "%CONVERT-I-DUP," /mat=or | -
  sear sys$input "pipe search" /match=nor | - 
( read sys$pipe ema ; DEFINE /JOB /nolog email_rec &ema )
$ email_rec=-
  f$edit(f$trnlnm("EMAIL_REC","LNM$JOB"),"compress,trim")
$ sho sym email_rec
  EMAIL_REC = ""
$ Deassign/job email_rec
$if f$locate("NO STRING",f$edit(email_rec,"UPCASE")) .eq. 
f$length(email_rec)
$! [snip]
  USER1     job terminated at  8-SEP-2008 07:18:56.44
<CR><LF>  Accounting information:
  Buffered I/O count:              13624      Peak working set size: 22224
  Direct I/O count:                 9226      Peak virtual size: 234544
  Page faults:                     25189      Mounted volumes:    0
  Charged CPU time:        0 00:00:14.57      Elapsed time:       0 
01:18:56.45
======
Is this a race condition?  Can it be fixed to provide expected results 
every run?
Is there a better way to do this?   


--=_alternative 004B124F852574C0_=
Content-Type: text/html; charset="US-ASCII"


<br><font size=2><tt>Here is a code fragment designed to search the running
command procedure to </tt></font>
<br><font size=2><tt>see if any of the converts in the log got duplicate
error messages. &nbsp;The </tt></font>
<br><font size=2><tt>second line eliminates matches of the pipe search
command itself.</tt></font>
<br><font size=2><tt>======</tt></font>
<br><font size=2><tt>$ proc = f$environment(&quot;procedure&quot;)</tt></font>
<br><font size=2><tt>$ proc_name = f$parse(proc,,,&quot;name&quot;)</tt></font>
<br><font size=2><tt>$! [snip]</tt></font>
<br><font size=2><tt>$ pipe search 'proc_name'.log; &nbsp;&quot;%CONVERT-I-DUP,&quot;
/mat=or | -</tt></font>
<br><font size=2><tt>&nbsp; sear sys$input &quot;pipe search&quot; /match=nor
| - </tt></font>
<br><font size=2><tt>( read sys$pipe ema ; DEFINE /JOB /nolog email_rec
&amp;ema )</tt></font>
<br><font size=2><tt>$ email_rec=-</tt></font>
<br><font size=2><tt>&nbsp; f$edit(f$trnlnm(&quot;EMAIL_REC&quot;,&quot;LNM$JOB&quot;),&quot;compress,trim&quot;)</tt></font>
<br><font size=2><tt>$ sho sym email_rec</tt></font>
<br><font size=2><tt>$ Deassign/job email_rec</tt></font>
<br><font size=2><tt>$if f$locate(&quot;NO STRING&quot;,f$edit(email_rec,&quot;UPCASE&quot;))
eq. f$length(email_rec)</tt></font>
<br><font size=2><tt>======</tt></font>
<br><font size=2><tt>Here is the log file from a normal run.</tt></font>
<br><font size=2><tt>The symbol EMAIL_REC contains the expected result
of the search when there </tt></font>
<br><font size=2><tt>are no duplicate error messages.</tt></font>
<br><font size=2><tt>======</tt></font>
<br><font size=2><tt>$ pipe search GET_ORDERS_AM.log; &nbsp;&quot;%CONVERT-I-DUP,&quot;
/mat=or | -</tt></font>
<br><font size=2><tt>&nbsp; sear sys$input &quot;pipe search&quot; /match=nor
| - </tt></font>
<br><font size=2><tt>( read sys$pipe ema ; DEFINE /JOB /nolog email_rec
&amp;ema )</tt></font>
<br><font size=2><tt>$ email_rec=-</tt></font>
<br><font size=2><tt>&nbsp; f$edit(f$trnlnm(&quot;EMAIL_REC&quot;,&quot;LNM$JOB&quot;),&quot;compress,trim&quot;)</tt></font>
<br><font size=2><tt>$ sho sym email_rec</tt></font>
<br><font size=2><tt>&nbsp; EMAIL_REC = &quot;%SEARCH-I-NOMATCHES, no strings
matched&quot;</tt></font>
<br><font size=2><tt>$ Deassign/job email_rec</tt></font>
<br><font size=2><tt>$if f$locate(&quot;NO STRING&quot;,f$edit(email_rec,&quot;UPCASE&quot;))
eq. f$length(email_rec) </tt></font>
<br><font size=2><tt>$! [snip]</tt></font>
<br><font size=2><tt>&nbsp; USER1 &nbsp; &nbsp; job terminated at 10-SEP-2008
07:17:21.69</tt></font>
<br><font size=2><tt>&lt;CR&gt;&lt;LF&gt; &nbsp;Accounting information:</tt></font>
<br><font size=2><tt>&nbsp; Buffered I/O count: &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; 8816 &nbsp; &nbsp; &nbsp;Peak working set size: &nbsp;
&nbsp; &nbsp;22240</tt></font>
<br><font size=2><tt>&nbsp; Direct I/O count: &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; 3107 &nbsp; &nbsp; &nbsp;Peak virtual size:
&nbsp; &nbsp; &nbsp; &nbsp; 234528</tt></font>
<br><font size=2><tt>&nbsp; Page faults: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 11238 &nbsp; &nbsp; &nbsp;Mounted volumes:
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0</tt></font>
<br><font size=2><tt>&nbsp; Charged CPU time: &nbsp; &nbsp; &nbsp; &nbsp;0
00:00:06.07 &nbsp; &nbsp; &nbsp;Elapsed time: &nbsp; &nbsp; &nbsp; 0 01:17:21.69</tt></font>
<br><font size=2><tt>======</tt></font>
<br><font size=2><tt>Here is the log file from a failed run.</tt></font>
<br><font size=2><tt>The symbol EMAIL_REC here contains a null-string even
though there </tt></font>
<br><font size=2><tt>are no duplicate error messages.</tt></font>
<br><font size=2><tt>======</tt></font>
<br><font size=2><tt>$ pipe search CONVERT_FILES.log; &nbsp;&quot;%CONVERT-I-DUP,&quot;
/mat=or | -</tt></font>
<br><font size=2><tt>&nbsp; sear sys$input &quot;pipe search&quot; /match=nor
| - </tt></font>
<br><font size=2><tt>( read sys$pipe ema ; DEFINE /JOB /nolog email_rec
&amp;ema )</tt></font>
<br><font size=2><tt>$ email_rec=-</tt></font>
<br><font size=2><tt>&nbsp; f$edit(f$trnlnm(&quot;EMAIL_REC&quot;,&quot;LNM$JOB&quot;),&quot;compress,trim&quot;)</tt></font>
<br><font size=2><tt>$ sho sym email_rec</tt></font>
<br><font size=2><tt>&nbsp; EMAIL_REC = &quot;&quot;</tt></font>
<br><font size=2><tt>$ Deassign/job email_rec</tt></font>
<br><font size=2><tt>$if f$locate(&quot;NO STRING&quot;,f$edit(email_rec,&quot;UPCASE&quot;))
eq. f$length(email_rec)</tt></font>
<br><font size=2><tt>$! [snip]</tt></font>
<br><font size=2><tt>&nbsp; USER1 &nbsp; &nbsp; job terminated at &nbsp;8-SEP-2008
07:18:56.44</tt></font>
<br><font size=2><tt>&lt;CR&gt;&lt;LF&gt; &nbsp;Accounting information:</tt></font>
<br><font size=2><tt>&nbsp; Buffered I/O count: &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp;13624 &nbsp; &nbsp; &nbsp;Peak working set size: &nbsp;
&nbsp; &nbsp;22224</tt></font>
<br><font size=2><tt>&nbsp; Direct I/O count: &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; 9226 &nbsp; &nbsp; &nbsp;Peak virtual size:
&nbsp; &nbsp; &nbsp; &nbsp; 234544</tt></font>
<br><font size=2><tt>&nbsp; Page faults: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 25189 &nbsp; &nbsp; &nbsp;Mounted volumes:
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0</tt></font>
<br><font size=2><tt>&nbsp; Charged CPU time: &nbsp; &nbsp; &nbsp; &nbsp;0
00:00:14.57 &nbsp; &nbsp; &nbsp;Elapsed time: &nbsp; &nbsp; &nbsp; 0 01:18:56.45</tt></font>
<br><font size=2><tt>======</tt></font>
<br><font size=2><tt>Is this a race condition? &nbsp;Can it be fixed to
provide expected results every run?</tt></font>
<br><font size=2><tt>Is there a better way to do this? &nbsp;</tt></font><font size=2 face="sans-serif">
<br>
</font>
<br>
<br><font size=2><tt>&nbsp;</tt></font>
<br>
--=_alternative 004B124F852574C0_=--

------------------------------

Date: Wed, 10 Sep 2008 09:41:11 +0100
From: "Richard Brodie" <R.Brodie@rl.ac.uk>
Subject: Re: Security alarm msg
Message-ID: <ga8178$89k$1@south.jnrs.ja.net>

"Tom Linden" <tom@kednos.company> wrote in message 
news:op.ug2boszghv4qyg@murphus.hsd1.ca.comcast.net...
>I noted following on opcon.  Why is the remote node id in decimal format?
> Remote node id:           998090410

It's stored as an integer in the binary log and the formatter in ANALYZE/AUDIT
only understands DECnet addresses.
 

------------------------------

Date: Wed, 10 Sep 2008 07:25:27 -0700 (PDT)
From: Peter Weaver <info-vax@weaverconsulting.ca>
Subject: Spinning down an old disk array
Message-ID: <e97191b9-77c7-4620-8f85-276279907ba5@34g2000hsh.googlegroups.com>

A customer is planning on doing some maintenance at their data centre.
As part of the maintenance their HSZ80 and HSJ50 disk sub-systems will
have their power cut off. Since most of these disks have been
constantly spinning for the past 8 or 9 years the customer is
concerned about the disks spinning again after power is restored.

Most disks are DR-RZ1FC-VW and some are RZ29.

Some people here feel that as long as the power is off for only 10 or
15 minutes that the disks should spin up again after power is
restored. Some people here feel that even if the power is off for a
few seconds that we risk having disks not spin again.

Does anyone have any experience with turning off the power on disks
that have been running for years? What percentage of disks should we
expect to have fail after;
     - a few seconds
     - a few minutes
     - 10 minutes
     - 15 minutes


Peter

------------------------------

Date: Wed, 10 Sep 2008 07:30:23 -0700 (PDT)
From: FrankS <sapienza@noesys.com>
Subject: Re: Spinning down an old disk array
Message-ID: <41398b6b-9d11-4f83-adf4-cbba80725c29@e39g2000hsf.googlegroups.com>

On Sep 10, 10:25=A0am, Peter Weaver <info-...@weaverconsulting.ca>
wrote:
> Does anyone have any experience with turning off the power on disks
> that have been running for years? What percentage of disks should we
> expect to have fail after; ...

Yes to the first part.  I wouldn't say frequently, but certainly I
have turned off complete disk arrays for maintenance and then powered
them right back up again without incident.

Too random on the second part.  In my experience: none failed.  In
fact, I'd say I've had better experience with the older 5400rpm drives
than newer 10k or 15k drives.

------------------------------

Date: Wed, 10 Sep 2008 08:01:45 -0700 (PDT)
From: Bob Gezelter <gezelter@rlgsc.com>
Subject: Re: Spinning down an old disk array
Message-ID: <dabf7169-3b18-4a3b-bd67-569a02110751@c65g2000hsa.googlegroups.com>

On Sep 10, 9:25 am, Peter Weaver <info-...@weaverconsulting.ca> wrote:
> A customer is planning on doing some maintenance at their data centre.
> As part of the maintenance their HSZ80 and HSJ50 disk sub-systems will
> have their power cut off. Since most of these disks have been
> constantly spinning for the past 8 or 9 years the customer is
> concerned about the disks spinning again after power is restored.
>
> Most disks are DR-RZ1FC-VW and some are RZ29.
>
> Some people here feel that as long as the power is off for only 10 or
> 15 minutes that the disks should spin up again after power is
> restored. Some people here feel that even if the power is off for a
> few seconds that we risk having disks not spin again.
>
> Does anyone have any experience with turning off the power on disks
> that have been running for years? What percentage of disks should we
> expect to have fail after;
>      - a few seconds
>      - a few minutes
>      - 10 minutes
>      - 15 minutes
>
> Peter

Peter,

The original post does not indicate how many of these drives are in
stripes, mirrors, and other flavors of RAID. For certain, particularly
because the term "maintenance" includes much real estate (including
power and water), I would recommend that backups be up-to-date and off-
site during the "maintenance".

That said, I have not seen particularly bad experiences caused by a
single power down. In my experience, most of the interesting problems
come on sites where power-up/power-down is a chronic cycle, and the
cumulative wear and tear does cause failures.

It also has a tendency to uncover out-of-date batteries in various
devices. Perhaps one of the more overlooked checklist items is making
sure that systems and controllers have up to date NVRAM and other
batteries. Spare batteries would not be a bad idea, as is using the
opportunity to change batteries for fresh ones while the systems are
powered down.

- Bob Gezelter, http://www.rlgsc.com

------------------------------

Date: Wed, 10 Sep 2008 11:24:41 -0400
From: "Richard B. Gilbert" <rgilbert88@comcast.net>
Subject: Re: Spinning down an old disk array
Message-ID: <rKOdnQeEg9CqelrVnZ2dnUVZ_ovinZ2d@comcast.com>

Peter Weaver wrote:
> A customer is planning on doing some maintenance at their data centre.
> As part of the maintenance their HSZ80 and HSJ50 disk sub-systems will
> have their power cut off. Since most of these disks have been
> constantly spinning for the past 8 or 9 years the customer is
> concerned about the disks spinning again after power is restored.
> 
> Most disks are DR-RZ1FC-VW and some are RZ29.
> 
> Some people here feel that as long as the power is off for only 10 or
> 15 minutes that the disks should spin up again after power is
> restored. Some people here feel that even if the power is off for a
> few seconds that we risk having disks not spin again.
> 
> Does anyone have any experience with turning off the power on disks
> that have been running for years? What percentage of disks should we
> expect to have fail after;
>      - a few seconds
>      - a few minutes
>      - 10 minutes
>      - 15 minutes
> 
> 
> Peter

Sooner or later EVERY disk will fail!  People use BACKUP to ensure that 
no data is lost.  Various forms of RAID are used to ensure that access 
to data is not lost.

Ideally, you should have spares on hand for each make and model of disk 
drive in use.  It's easy if all your disks are StorageWorks; just pop a 
failed drive out of its socket and plug in a new one.

If anything fails at power on, I would expect it to happen within the 
first sixty seconds or less.

------------------------------

End of INFO-VAX 2008.497
************************