From: SMTP%"BRIVAN@spire.com" 10-MAY-1996 18:06:53.42 To: EVERHART@star.zko.dec.com CC: Subj: WATCHMAN_README.TXT HISTORY of WatchMAN: WatchMAN was originally written for myself while I was a Systems Manager/Programmer. I got tired of showing up to work and finding some batch jobs died because a disk was full. This would cause the company a loss of about $1000.00 a minute. So... I wrote WATCHDOG. It was great. The pager would wake me up all the time (that's not the great part) telling me I had a CPU crash or a DISK low on space. I would be able to have the problem fixed before anyone got to work in the morning. True, the programs would be a little late. But, better late... Anyway, I was on the phone with a salesman selling some software to me, when he heard my pager go off. I explained that I was testing a program I wrote. He loved it and we marketed it as PagerVSM for a company called AVTECH. AVTECH and I went our own ways for various reasons (no hostility, I assure you. I enjoyed working with AVTECH and Michael Sigourney). Since AVTECH and I went our separate ways, I had to call the program something other than WatchDOG (already taken) and PagerVSM (contract stuff), so WatchMAN seems to work for now. Quick start for WatchMAN is on the last page of this document. The first thing you will need to do is type "$ watchman/menu". The screen will look like this: ------------------------------------------------------------------ WatchMan VandeComp Software Main Menu A General Parameters B Pager Addresses C Mail Schedule D Components Setup Z Exit Configure Enter your option ------------------------------------------------------------------- Select Option A to setup the modem. You could also type watch/edit=parameters to get to the "General Parameters" screen. Your will see a screen that will look like (roughly) this: ------------------------------------------------------------------- WatchMan VandeComp Software General Parameters Startup Delay: 1 Monitor Interval: 15 Timestamp Cycle Count: 0 Block Limit For Logfile: 1000 Default Modem Port: TTA2: Baud Rate: 9600 Secondary Modem Port: Baud Rate: 2400 Mail Command: MAIL Default Mail List: @PagerVSM_DIR:PagerVSM Mail List for Modem Failure: BRIVAN Total Pause for Dialing Pager: 30 V2.0 ------------------------------------------------------------------- Startup Delay: Indicates the number of minutes to wait before WatchMan will start monitoring. This is mostly for clusters. Say you are booting a 10 node cluster, WatchMan will start telling you that some of your nodes have crashed, but only because they have not had enough time to boot. This value will give the other node or disk drives time to startup before WatchMan indicates they are missing. Monitor Interval: Indicates the number of minutes to monitor the components. The screen above indicates that the system will be monitored every 15 minutes. Timestamp Cycle Count: This value will tell WatchMan to display a timestamp every x Monitor Interval's. If the Timestamp Cycle Count is 2 and the Monitor Interval is 15, then the Timestamp will display on the console every 30 minutes. Block Limit For Logfile: Indicates the number of blocks the logfile is allow to grow before it is renamed for archive purposes. WatchMan does not perform the archive. It will merely close the logfile it is using and rename it so that it may be archived by the system administrator. Default Modem Port: This field will tell WatchMan where the modem is located. This is the first of two modems that WatchMan will attempt to use. Baud Rate: This will tell WatchMan the baud rate of the Default Modem. Secondary Modem Port: This field will tell WatchMan where the Secondary Modem is located. If the first modem is not available for any reason, WatchMan will use this modem as a fail-over. Baud Rate: This will tell WatchMan the baud rate of the Secondary Modem. Mail Command: Used to indicate the MAIL command to use to mail messages to the specified user. If you are using a mail package other then VMS MAIL, then you will need to enter the MAIL command here. Default Mail List: Enter the VMS MAIL address's of those users you want notified if both the Default and Secondary modems are not available. Mail List for Modem Failure: If the Default and Secondary modems are available but not responding correctly or errors were returned, then WatchMan will send a message to the users in this field. Total Pause for Dialing Pager: This field indicates the total number of seconds WatchMan needs to wait before releasing the modem. Keep in mind that the pager does not send back completion codes to the modem. That means that the communications are ONE WAY... from the modem to the pager. So, the best way to find the number of seconds WatchMan needs to wait is to get a watch that will show the seconds. Have WatchMan start a paging sequence, as soon as you hear the modem start dialing, start counting the seconds it takes for the modem to finish sending commands. Put that number of seconds in this field. You might need to adjust it a-bit once in awhile for the best performance. The next option from the menu is (duh) "B Pager Addresses. You could also startup that screen by typing "$ Watch/edit=addr". The Address screen looks like mostly like this (the [] are actually reversed text and the [] are not there. The [] were added for this document only): ------------------------------------------------------------------ WatchMan VandeComp Software Pager Addresses Address: PBX: Phone Number: Suffix: [ ][ ] [ ] [ ] [ ][ ] [ ] [ ] [ ][ ] [ ] [ ] [ ][ ] [ ] [ ] [ ][ ] [ ] [ ] [ ][ ] [ ] [ ] V2.0 ------------------------------------------------------------------ Address: This field is used to associate a "Name" to a Phone Number. This is to simplify the means of paging. This way you can notify a pager with the command: $ Watch/address=BRIAN/alarm=01*02*03 WatchMan will send the alarm message to BRIAN's pager. PBX: If your phone system needs to use a "9" or some other number to "get out of the building", enter that number here. Phone Number: Enter the phone number that is to be associated with the Address. This phone number is the actual number of the pager that needs to be dialed. Suffix: Enter a Suffix Number. This number is typically used as a PIN number. Option "C Mail Schedule" from the main menu will display the following screen: ------------------------------------------------------------------- WatchMan VandeComp Software Mail Schedule Mon Tue Wed Thu Fri Sat Sun Start Time [ : ] [ : ] [ : ] [ : ] [ : ] [ : ] [ : ] End Time [ : ] [ : ] [ : ] [ : ] [ : ] [ : ] [ : ] Start Time [ : ] [ : ] [ : ] [ : ] [ : ] [ : ] [ : ] End Time [ : ] [ : ] [ : ] [ : ] [ : ] [ : ] [ : ] Start Time [ : ] [ : ] [ : ] [ : ] [ : ] [ : ] [ : ] End Time [ : ] [ : ] [ : ] [ : ] [ : ] [ : ] [ : ] Start Time [ : ] [ : ] [ : ] [ : ] [ : ] [ : ] [ : ] End Time [ : ] [ : ] [ : ] [ : ] [ : ] [ : ] [ : ] V2.0 ------------------------------------------------------------------- This screen is used to tell WatchMan when to send VMS MAIL as opposed to a message to a PAGER. Such as, If you are at your desk, you might want a mail message rather than a pager alert. Start Time: Using this field with the End Time field, will tell WatchMan to send a MAIL message instead of PAGING users. This field tell's WatchMan a starting time to send MAIL. End Time: Using this field with the Start Time field, will tell WatchMan to send a MAIL message instead of PAGING users. This field tell's WatchMan a ending time to send MAIL. EXAMPLE: If you enter a starting time of 08:00 on monday, and a ending time of 10:00 on monday, then any problems that happen on the system between 08:00am and 10:00am will be sent to you via MAIL rather that a PAGER. Option "D Components Setup" from the main menu will setup the components on the computer system to be monitored. The screen will look like this: ----------------------------------------------------------------------------------- WatchMan VandeComp Software Components Setup Component Type: [ ] Enable Checking for this Component: [ ] Component Name: [ ] Component Limit: [ ] Weekdays: [ ] Start Time: [ : ] Working Minutes: [ ] Variance: [ ] Alarm Code: [ ] Always Notify Only by MAIL: [ ] Mail Users: [ ] Always Notify Only by PAGER: [ ] Pager Address #1: [ ] # Times to Try: [ ] # Times Tried: [ ] Pager Address #2: [ ] # Times to Try: [ ] # Times Tried: [ ] Pager Address #3: [ ] # Times to Try: [ ] # Times Tried: [ ] Pager Address #4: [ ] # Times to Try: [ ] # Times Tried: [ ] Pager Address #5: [ ] # Times to Try: [ ] # Times Tried: [ ] Pager Address #6: [ ] # Times to Try: [ ] # Times Tried: [ ] V2.0 ----------------------------------------------------------------------------------- Component Type: This field is used to tell WatchMan the type of component to monitor. There are 5 system components that can be monitored: 1. CPU 2. DISK 3. Batch Job 4. Missing Process 5. Existing Process 10 thru 99 are used to execute user written command files. Enter the command file to be executed in the Component Name field. Enable Checking for this Component: Used to enable or disable checking of the currently displayed component. Component Name: This field is used to define the name of the component to monitor. Valid components are defined by the COMPONENT TYPE field. Valid values are listed as follows: Enter... If COMPONENT TYPE is: SCSNODE NAME 1 Name of a DISK 2 Name of a BATCH JOB 3 Name of a DETACHED 4 or 5 Process Values 10 to 99 are user defined. Enter the name of the command file to be executed. Component Limit: This field is used to define the limit of the component. Valid limits for the component are dependant on the COMPONENT. Valid values are listed as follows: Limit If COMPONENT is... Free Space A Disk Drive threshold SCSNODE Name A Detached Process This field is used only to define limits used by Disk Drives and Detached Processes. This field is not used by any other component. Weekdays: This field is used to indicate the weekday a Batch Job or Detached Process is to be monitored. Start Time: Enter a time for WatchMan to start monitoring this component. The time must be entered using the 24 hour clock format. Such as 22:00 to mean 10:00pm. Working Minutes: Enter the number of working minutes that WatchMan should monitor this component. Variance: Enter the number of minutes that WatchMan can allow for this component to complete before or after the working minutes specify. EXAMPLE: If the Time to start is specified as 03:00 and the working minutes are 120, then a variance of 15 would mean the component could finish between 04:45 and 05:15 without WatchMan sending an alarm. Valid values are: 0 to 9999 minutes. This field is used for Batch Jobs and Detached Processes only. Alarm Code: Enter the alarm code that is to be displayed in the pager view screen. Recommended values should be entered using a structure to indicate the the component with a problem. EXAMPLE: Using 2*03 as the alarm could mean disk drive number 3 has a problem. 2-03 will show in the pager window. Always Notify Only by MAIL: This field is used to instruct WatchMan to always send mail if Yes is entered. If No is entered, then WatchMan will check the SCHEDULE file for the time to send mail. Always Notify Only by PAGER: This field is used to instruct WatchMan to always send a Page if Yes is entered. If No is entered, then WatchMan will check the SCHEDULE file for the time to send mail. If a time has NOT been specified when the problem is encountered, then a page will be issued. Mail Users: If a mail message is to be sent, then WatchMan will notify these users. Pager Address #1-6: Enter the address of the person that should be paged for this component should a problem be found. **NOTE** these entries must be entered into the ADDRESS data file. # Times to Try: Enter the number of times to page the user specified before moving to the next user to be paged. # Times Tried: This field specifies the number of times WatchMan tried to page the specified user. This value is entered by WatchMan. However, you may change this value if you need. QUICK START GUIDE: 1. The first thing you need to do (after you install it that is...) is setup the General Parameters screen so that WatchMan can find a modem to use. Do this by typing at the prompt (usually a "$"): $ WatchMan/edit=par - or - $ WatchMan/menu (then select option "A") Fill out the entire screen and press the keypad enter to save the information. 2. Edit the address screen so that WatchMan has a user to send a message to. Do this by typing at the prompt: $ WatchMan/edit=add (Or option "B" on the menu) Enter an address and a phone number. 3. Now startup WatchMan by typing "$ WatchMan/start". This will start a detached process to monitor the components and a message dispatcher. Since there are no components to be monitored (unless you jumped ahead of the game and entered some.) then the detached process will run REAL fast. #^) 4. After WatchMan starts, You will be able to type a command like this: "$ Watchman/address=brian/alarm=01*22*05" Substitute the name BRIAN (obviously) for the address you put in the address screen. The message "01-22-05" should appear in the pager window if everything works properly. If everything does not work properly, then I would suggest tweaking the number of seconds to pause the program (found in the General Parameters Screen in the last field). You might also have to adjust the pager number to be dialed. It not be waiting long enough after it dials the pager number, so add a comma or two. Most of the problems can be resolved by going next to the modem and listen to what is happening. As of today May 10, 1996 my e-mail address is brivan@spire.com ================== RFC 822 Headers ================== Received: from spire.com by mail11.digital.com (8.7.5/UNX 1.2/1.0/WV) id RAA26709; Fri, 10 May 1996 17:48:43 -0400 (EDT) Received: from mailman.spire.com ([120.110.4.11]) by twiki.spire.com with SMTP id <6165>; Tue, 7 May 1996 10:49:34 -0600 Received: from praxis.spire.com by mailman.spire.com with SMTP; Fri, 10 May 1996 15:06:24 -0600 (MDT) Date: Fri, 10 May 1996 15:05:39 -0600 From: "Been There, Done That!" To: EVERHART@star.zko.dec.com Message-Id: <960510150539.204001de@spire.com> Subject: WATCHMAN_README.TXT