Article 138855 of comp.os.vms: Newsgroups: comp.os.vms Path: nntpd.lkg.dec.com!lead.zk3.dec.com!crl.dec.com!crl.dec.com!bloom-beacon.mit.edu!newsxfer2.itd.umich.edu!gatech!news.mathworks.com!uunet!in1.uu.net!esseye!news From: tillman_brian@si.com (Brian Tillman) Subject: Queue Manager Secrets (was: uilization of file...) Content-Type: Text/Plain; charset=US-ASCII Message-ID: Sender: news@esseye.si.com Nntp-Posting-Host: helpdesk_1.si.com Organization: Smiths Industries X-Newsreader: WinVN 0.99.7 References: <764019@MVB.SAIC.COM> Mime-Version: 1.0 Date: Fri, 26 Jan 1996 17:11:49 GMT Lines: 309 In article , tillman_brian@si.com says... > >There are a number of "secret" Queue Manager commands that are very >interesting. DECUServe contains an article describing them. I could post >it, if anyone's interested. I've received several requests, so here is the information. I obtained this from DECUServe, where it was posted by Dale Coy (coy@eisner.decus.org). Hope you find it interesting. <<< EISNER::$2$DIA7:[NOTES$HIVOL]VMS.NOTE;1 >>> -< VMS and bundled utilities >- ================================================================================ Note 2068.0 Care and Feeding of the New Queue Manager (V5.5+++) 29 replies EISNER::COY "Dale E. Coy (DECUServe MoS)" 22 lines 22-APR-1993 23:07 -------------------------------------------------------------------------------- I've gathered some information about the "New" Queue Manager (Introduced with VMS V5.5). Refer to Topics 1318, 1679, 1725, 1779, and undoubtedly other places, for previous discussion. I am indebted to Kim and Pete (of the CSC) for a lot of the following insight. However - please assume that *opinions* are mine. *facts* are theirs. The new queue manager looks like an excellent attempt to rationalize the queue operations on VMS. Although it has suffered from lots of growing pains, with VMS V5.5-2 and the latest patch it is relatively stable. It appears that one design criterion was "ease of backup" - but that the designers _assumed_ that it was only really important to easily preserve the queues and forms - and that few people really needed to preserve the ENTRIES for jobs on the queues. That was IMO a wrong assumption (more later), but it was certainly an understandable assumption. ================================================================================ Note 2068.1 Care and Feeding of the New Queue Manager (V5.5+++) 1 of 29 EISNER::COY "Dale E. Coy (DECUServe MoS)" 37 lines 22-APR-1993 23:21 -< Overview of the 3 files, and preserving 2 of them >- -------------------------------------------------------------------------------- Three files are needed to totally describe the Queue structure. All are (by default - they can be moved) located in SYS$SYSTEM QMAN$MASTER.DAT This file contains _all_ of the information about your FORMS, and a bit of information about all of the QUEUES. SYS$QUEUE_MANAGER.QMAN$QUEUES This file contains the rest of the information about the QUEUES themselves (characteristics, etc.). SYS$QUEUE_MANAGER.QMAN$JOURNAL;1 This file contains all of the information about the ENTRIES on all of the queues. It's rather dynamic, may not be totally up to date (data not flushed to the file), etc. More on this file later. Note that I gave a version number above. ================== The easy part of preserving the queues (in case of disaster or corruption) is to keep recent copies of the first two files above. If you have good QMAN$MASTER.DAT and SYS$QUEUE_MANAGER.QMAN$QUEUES file copies, that's all you need to restore your queues and forms. Just delete the JOURNAL file, replace the other two with good copies, and start the queue manager. Presto - all of the queues and forms are recovered. No executing jobs (entries), though. The even-better news is that these two files never seem to be "locked". There is an article that recommends doing CONVERT/SHARE to make copies of the files - or you could use BACKUP/IGNORE=INTERLOCK. My personal preference is for CONVERT/SHARE, and I have never seen it fail to work. [Don't use a method where _you_ might lock the file - I would hate to confuse the queue manager] The structure of these two files strongly implies that, even if a file is being "updated" as you convert/share it, the copy you get will be "rational" and "usable" and not corrupt. ================================================================================ Note 2068.2 Care and Feeding of the New Queue Manager (V5.5+++) 2 of 29 EISNER::COY "Dale E. Coy (DECUServe MoS)" 26 lines 22-APR-1993 23:33 -< Details of the JOURNAL file >- -------------------------------------------------------------------------------- That brings us to SYS$QUEUE_MANAGER.QMAN$JOURNAL;1 As previously stated, this is where all of the ENTRIES (jobs) live. It's the source of several problems and changes. This file is a "coded" file, with pointers to lots of things and with lots of links. It can have "old" data, with an "execution pointer" that points after the old data. The physical order of things in this file is probably the same order in which you NOW see the queue entries in show/queue displays. The _correct_ copy of this file is maintained IN MEMORY on the node that is executing the queue manager. In the first versions of the new queue manager, this data was seldom flushed to the file - probably only when the node shut down, and/or the queue manager was shut down. It is my belief that this file is the source of almost ALL instances where people saw "queue corruption". The latest version of CSCPAT_1012 (V1.6 at least) changes the behavior so that the in-memory structure is flushed to the journal file fairly frequently. The interval appears to be at MOST every hour, and for very active queue situations every few minutes. The size of this file tends to be "around 1000 blocks" (with a large variation, of course). ================================================================================ Note 2068.3 Care and Feeding of the New Queue Manager (V5.5+++) 3 of 29 EISNER::COY "Dale E. Coy (DECUServe MoS)" 40 lines 22-APR-1993 23:46 -< Backing up the Journal file >- -------------------------------------------------------------------------------- So - get the CSCPAT_1012 at some version (I would get at least 1.6). Note that the "patch" means that any entry older than the flush interval (e.g., any entry older than an hour) will appear in the file. Newer entries _may_ not be flushed yet. OK, so you really want to back up the journal file, to preserve a snapshot of the queue entries (like you did with FIXQUE - see topic 1318 - in earlier versions). I'll describe several methods - depending on your aversion to risk. ================ METHOD 1: BACKUP/IGNORE=INTERLOCK SYS$QUEUE_MANAGER.QMAN$JOURNAL whatever Convert/share won't work (the file is open). Backup will complain, but make a "fair" copy. The only apparent risk is if you (unluckily) do the backup at the exact same time that a "flush" operation is being done. You could get an inconsistent file. But, IMO, if you kept two versions, the probability of a bad result would be vanishingly small. ==================== METHOD 2 STOP/QUEUE/MANAGER/CLUSTER and then do a backup (or convert/share) of the file when the flush is done and the file is released. [Of course, if you aren't in a cluster, you can omit that qualifier] I didn't extensively test this - but it is the SAFE METHOD RECOMMENDED BY THE CSC. In my testing, I had trouble getting the queue manager to release the file - but perhaps I didn't wait long enough or something. In addition to the disadvantage of having your queues not running for the duration of the backup, note that stop/queue/manager will "crash" any job that is in EXECUTING state. This may not be a problem for you - but it was for me. But I agree with the CSC. It's unconditionally safe. ================================================================================ Note 2068.4 Care and Feeding of the New Queue Manager (V5.5+++) 4 of 29 EISNER::COY "Dale E. Coy (DECUServe MoS)" 70 lines 23-APR-1993 00:09 -< The real way to save SYS$QUEUE_MANAGER.QMAN$JOURNAL >- -------------------------------------------------------------------------------- METHOD 3 - pure unsupported magic. There is an undocumented and "unsupported" method (or two). In the VMS 5.5 kit, in Saveset B, there is a program JBC$UPGRADE.EXE. Install left it on my system, but if you don't have it, you can just get it from that saveset. Put it in sys$system. This is apparently the program that was/is used to convert from old to new styles. But it has other uses!!! With privs CMKRNL (and maybe others), RUN SYS$SYSTEM:JBC$UPGRADE (may need to be on the "execution" node). You get a prompt: JBC$UPGRADE> (More info in next reply, but...) If you type JBC$UPGRADE> DIAG 0 2 (numbers) %JBCUPGRAD-E-SHOWOUTPUT, the output from this command is: Log for playback = 0 Save old Journal files = 1 Log all requests = 0 Dump on error = 0 ........... That's it. Wait a while. You will see a new file in sys$system, named SYS$QUEUE_MANAGER.QMAN$JOURNAL_OBSOLETE What happens is that, _when_ the "flush" is done, NORMALLY the old file is renamed, then a new JOURNAL file is created, and the new file is renamed to .QMAN$JOURNAL;1, and the old file is deleted. With "diag 0 2", the old file is KEPT. *NOTE 1* - the first time you do this, SET A VERSION LIMIT on the _OBSOLETE file. Otherwise, you'll get one more every hour or so. I use a version limit of 2. *NOTE 2* - If you ever have to use it, name it with VERSION NUMBER 1 - otherwise the queue manager won't touch it, I'm told. This is a CRITICAL POINT. In the heat of battle (recovery), it would be very easy to forget. *NOTE 3* - I have been assured that the need/desire for this to be a _supported_ capability _will_ be communicated to engineering. Now - some other technical details: The "switching on" is done for the executing copy. 1. I _know_ that it doesn't survive a cluster reboot. My startup file now does $ S_Q_J := $sys$system:jbc$upgrade $ S_Q_J diag 0 2 for _all_ nodes. 2. Unknown: what happens when the queue manager fails over to another node. I _suspect_ that the "_obsolete" behavior is maintained. 3. Strong suspicion that stop/queue/manager/cluster and then start/queue/manager would cancel the effect of diag. But I don't know that for sure either. [Anybody want to run some good tests for 2 and 3?] 4. If you want to explicitly turn off the behavior, using "diag 0" does that, without any shutting down or whatever. ================================================================================ Note 2068.5 Care and Feeding of the New Queue Manager (V5.5+++) 5 of 29 EISNER::COY "Dale E. Coy (DECUServe MoS)" 28 lines 23-APR-1993 00:16 -< Diag options >- -------------------------------------------------------------------------------- And now for the "really neat/fuzzy/dangerous" stuff. The format of the command line to JBC$UPGRADE> is Keyword Options For keyword DIAG, we used options 0 and 2 in that order (this is apparently just a list of things to do). Option 0 apparently says "cancel everything" and then option 2 says "Save old Journal files". You can string together as many commands as you want (1-6) The output after the command was: > Log for playback = 0 > Save old Journal files = 1 > Log all requests = 0 > Dump on error = 0 I was told that the other options are: 1 - "input playback" (playback journal commands) 3 - "log ALL requests" (maybe for future playback?) 4 - PROCESS dump on error. 5 - Diagnostics - I was told that this is like "loopback" for the queue manager. It wouldn't do anything, but would sit there and process any commands. 6 - SYSTEM CRASH on queue manager error. I just _knew_ you would enjoy option 6. ================================================================================ Note 2068.6 Care and Feeding of the New Queue Manager (V5.5+++) 6 of 29 EISNER::COY "Dale E. Coy (DECUServe MoS)" 19 lines 23-APR-1993 00:23 -< A few more (unexplored) keywords >- -------------------------------------------------------------------------------- Ah - but we aren't through: > The format of the command line to JBC$UPGRADE> is > Keyword Options If you just "press return", you get: %JBCUPGRAD-E-INVFUNC, invalid function valid choices are: SAVE, RESTORE, TEST, COMPARE, NEWJBC, DIAGNOSTIC We already talked about DIAG. SAVE writes "everything" to a file. (except entries for jobs that are executing at the time). It seems reasonable that SAVE and RESTORE are a pair, and that maybe SAVE would substitute for preserving the 3 files. SAVE writes _one_ file that looks like it contains everything. [Anybody want to test it?] And of course there are those other keywords... ================================================================================ Note 2068.7 Care and Feeding of the New Queue Manager (V5.5+++) 7 of 29 EISNER::COY "Dale E. Coy (DECUServe MoS)" 6 lines 23-APR-1993 00:24 -< OK - what can you tell me? >- -------------------------------------------------------------------------------- In summary: 1. I think the "preservation" question is resolved, without much need for re-creating FIXQUE. 2. I think I like the new queue manager. -- -----------------------------+-------------------------------- Brian Tillman | Internet: tillman@swdev.si.com Smiths Industries, Inc. | tillman_brian@si.com 4141 Eastern Ave., MS239 | Hey, I said this stuff myself. Grand Rapids, MI 49518-8727 | My company has no part in it. -----------------------------+--------------------------------