From: CRDGW2::CRDGW2::MRGATE::"SMTP::CRVAX.SRI.COM::RELAY-INFO-VAX" 7-MAR-1992 13:35:01.06 To: ARISIA::EVERHART CC: Subj: RE: Reclaiming deleted mail file space question From: RELAY-INFO-VAX@CRVAX.SRI.COM@SMTP@CRDGW2 To: Everhart@Arisia@MRGATE Received: by crdgw1.ge.com (5.57/GE 1.122) id AA04391; Sat, 7 Mar 92 12:51:36 EST Received: From UU.PSI.COM by CRVAX.SRI.COM with TCP; Sat, 7 MAR 92 09:47:17 PST Received: from lrw.UUCP by uu.psi.com (5.65b/4.1.011392-PSI/PSINet) id AA08518; Sat, 7 Mar 92 12:35:42 -0500 Message-Id: <9203071735.AA08518@uu.psi.com> Received: by lrw.UUCP (DECUS UUCP w/Smail); Sat, 7 Mar 92 12:24:49 EDT Date: Sat, 7 Mar 92 12:24:49 EDT From: Jerry Leichter To: INFO-VAX@KL.SRI.COM Subject: RE: Reclaiming deleted mail file space question In response to a msg about how to reclaim deleted mail file space via some kind of batch process, the PURGE/RECLAIM command was mentioned, which I have never used. I have alway used COMPRESS, which does a much, much better job of reclaiming space (often by a factor of 2 or more). If anyone can explain the difference between the two better than the VMS Mail manual, please do! They do very different things. Let's go through the steps again: a) DELETE simply files a message in your WASTEBASKET folder. The message remains completely accessible - except for its connection to the DELETE and PURGE commands (with PURGE perhaps invoked automatically at exit), the WASTEBASKET folder is the same as any other folder. b) The PURGE command discards the contents of the WASTEBASKET folder. Any external files containing the bodies of purged messages are deleted. The records containing the message header infor- mation, and the bodies of short messages, are deleted at the RMS level. The space they used to occupy is then available to RMS for re-use. c) MAIL.MAI is an indexed RMS file. Records can't be placed just anywhere within such a file - they have to be placed where the index structure can find them. As a result, it can be difficult for RMS to re-use the space freed by PURGE - the space is not only fragmented, but it is fixed into buckets within the file that may not be the buckets that must receive new data. RMS has only a limited ability to consolidate that space, since even if a bucket completely empties, there are still pointers to it within the index structures. The PURGE/RECLAIM command uses the callable convert utility to accomplish what CONVERT/RECLAIM would do from the DCL level. It walks through all the buckets in the file, looking for buckets all of whose records have been deleted. It then re-builds the index structure to remove any pointers to such buckets, and finally returns the buckets to an in-file free list. They can then be used to hold any new records. Since the freed buckets are usually contained within blocks internal to the file, rather than at its end, the size of the file doesn't decrease. Because of the complex structure of an indexed file PURGE/RECLAIM cannot move buckets down to fill in the now-free space. d) COMPRESS uses callable CONVERT to build an entirely new file. It copies the records from the old file to the new one, building an entirely new index structure. Your summary: COMPRESS works as follows: 1. COMPRESS creates a temporary file called MAIL_nnnn_COMPRESS.TEMP. 2. MAIL.MAI is copied into this file and compressed. 3. MAIL.MAI is renamed MAIL.OLD. 4. MAIL_nnnn_COMPRESS.TEMP is renamed MAIL.MAI. is correct. You say: I have seen MAIL.MAIs of over 500 blocks compress down to 60 blocks, so obviously the auto-reclaiming process is not freeing up space very efficiently. No; you are comparing different things. The space is reclaimed just as (well, almost as) efficiently AT THE RMS LEVEL, but it cannot be returned to the file system. Here's one way to see if the difference: Suppose you see such a 500-to-60 block change when you COMPRESS every week. That means that, over the course of the next week, your mail file will slowly grow back from 60 to 500 blocks. Had you done just a PURGE/RECLAIM, your file would have stayed at 500 blocks for the entire week: The free space already within it would have been sufficient to hold all the new mail. The only time you win in the long term is when your mail file is unusually large because of an unusual volume of mail at some point. Continuing with the same example, if one week you re- ceived a large number of messages because of a forwarding loop on INFO-VAX :-) your mail file might grow to, say, 700 blocks. PURGE/RECLAIM would leave it at 700, and there it would remain - and at the end of every week, you'd have about 200 blocks still free in MAIL.MAI. If you ONCE did a COMPRESS, you'd drop back to 60, then over the course of the next week again build to the steady-state 500 block level. The only problem with using COMPRESS is that it leaves the MAIL.OLD files around; we just do a DELETE [*]MAIL.OLD (along with some temporary files that WordPerfect leaves occasionally in [.SCRATCH]) as part of our nightly cleanups. There's more to it than that. If you regularly do PURGE/RECLAIM's, assuming you are not gradually collecting mail (as in fact all of us do, of course), your disk usage will essentially remain constant. On the other hand, if you do COMPRESS's, your disk usage will oscillate, from a minimum just after you delete the MAIL.OLD file, to a maximum just you COMPRESS the file (at which point you are using the disk space for both MAIL.MAI and MAIL.OLD at once). Of course, if you don't get rid of MAIL.OLD right away, you may use even more disk space. Beyond this, the continuous creation of entirely new files can potentially make your disk more fragmented. (Of course, as the disk becomes fragmented, the MAIL.MAI file, which is built up slowly, will also fragment. If you kept your 500-block long MAIL file across a defragmentation, it would remain defragmented indefinitely.) In practice, few mail files are large enough for either of these two effects to be significant on modern disks - except, of course, that we all know that no disk is EVER big enough, so all disks are filled to within a couple of hundred free blocks, right? There is, however, another effect to keep in mind: The DELETE and PURGE operations can be done with shared access to the MAIL.MAI file. As a result, you can receive mail while doing them. This is not true for either the PURGE/RECLAIM or the COMPRESS command: If an attempt is made to deliver mail to you while these commands are running, it will be rejected with a "file access conflict" error. A COMPRESS usually runs longer than a PURGE/RECLAIM, sometimes much longer, leaving this window for error open longer. Beyond that, there are less pleasant, if much less likely, failures that can occur. For example, it is impossible for COMPRESS to execute the following two steps atomically: 3. MAIL.MAI is renamed MAIL.OLD. 4. MAIL_nnnn_COMPRESS.TEMP is renamed MAIL.MAI. What happens if a mail message arrives between these two steps? At that point, no MAIL.MAI file will exist, so the mail server process will create one. When step 4 completes, you'll have two MAIL.MAI versions; version 2 will contain all your old mail, and version 1 will contain the message that just arrived. MAIL will normally only read the most recent version, so unless you know this has happened, you will lose that message. I can't pin it down exactly, but there's another possible timing window, which I've seen cause there to be messages in the MAIL.OLD NEWMAIL folder that are not in the new MAIL.MAI file. It's possible that these are messages that arrive between the time callable convert closes the MAIL.MAI file, and the time step 3 above is executed. Whatever the reason, the interval in which this particular glitch can occur seems to be much longer than the interval for the previous glitch: I do a COMPRESS weekly, and I've actually seen it happen a couple of times a year. (It seems to be rarer in more recent versions of VMS.) The following COM file does a COMPRESS and then deletes the MAIL.OLD file.... BAD idea. Really BAD idea. The problem is those pesky timing windows. Do this on a regular basis for all users on a system, and I'll guarantee you that some mail messages will be lost. COMPRESS and friends are just not 100% bullet-proof: They need a little human supervision. I have a weekly command file that does a COMPRESS (actually, it does a PURGE/RECLAIM/STATISTICS first, which is redundant but I like to have the statistics available in the log file - and since it runs in batch late at night, I really don't care that it takes a little longer). After the whole process completes, it sends me a mail message telling me to "check and delete MAIL.OLD". (If the PURGE/RECLAIM or COMPRESS fails - which can happen if I left a process in MAIL, for example; that happened just last night - it tells me about that, too.) I read my new mail in order; when I get to this message, I do: SET FILE MAIL.OLD DIR NEWMAIL (Note: *NOT* DIR/NEW, which would switch back to MAIL.MAI.) I make sure that all the new messages actually appeared in MAIL.MAI. (In fact, the only place there can be a difference is at the END of the list, where every once in a long while a message I haven't seen will appear; so this is pretty simple. In fact, just a count of messages is enough.) I then exit from mail and do a DIR of my mail subdirectory for MAIL.*. This will let me catch multiple versions of MAIL.MAI - as will as multiple versions of MAIL.OLD, which can happen if I forget to clean up MAIL.OLD one week. Only when I'm sure that all is well do I delete MAIL.OLD. Call me paranoid, but I consider the effort worth my time to avoid accidentally losing mail, even if it's once a year - or once every 5 years. I would be VERY annoyed at a system manager who arbitrarily started doing COMPRESS's on my mail file. (Actually, I'd simply fix the problem - I'd make sure that my MAIL.MAI file was always accessed in shared mode during the interval when he tried to COMPRESS it. His COMPRESS would fail with a "file access conflict".) It would be nice if there were a completely reliable, bullet-proof way to compress mail files. But there isn't. Imposing an UNreliable method on all your users is not good system management policy. -- Jerry