Article 152524 of comp.os.vms:
In article <4u97f9$ji1@nntpd.lkg.dec.com>, vandenheuvel@eps.enet.dec.com 
says...
>
>
>In article <DvpzIG.44u@gatekeeper.mag.aramark.com>, Ferguson@mag.aramark.com 
(Li
>nwood Ferguson) writes...
>>In this case the file went from a disk with cluster size of 3 to a large 
raid 
>>set with cluster size of 11.  Turns out the FDL we used had a bucket size 
of 
>>6, and *apparently* (though I have not seen this written down) CONVERT will 
>>build buckets on cluster boundaries, at least sometimes.
>
>Yes, CONVERT and RMS will start out fresh EXTENDs (and AREAs 
>are extends) at a fresh cluster boundary
>If you are extending a lot, then you are also likely to be
>fragmenting a disk, and likely to loose the contiguous-best-
>try attirbute if you ever had it. RMS tries to mimimize split
>IOs by at least starting out a fresh extend on a cluster 
>boundary if it did not do this, then the bucket that filled
>out the last few blocks in the currnet extend and then 
>had its remainder in the fresh extend would be garantueed 
>to require two (or more) IOs due to the split. In a picture

Interesting.  I went back and did some research into how and why we were 
doing all this.  You are correct, though we did not have too small an 
allocation in the FDL's, we had none.

These were some general purpose routines written about 10 years ago (i.e. I 
remembered little about the details, though I wrote them). They were intended 
to specify the general characteristics of files that could vary easily in 
size by 10x (it was a commercial package), and do a monthly reorg of the 
file.  We "discovered" that the simplest way to stay generic was to not 
specify allocations and let convert find the size as it built the file.  We 
did that to a scratch disk, then copied the file back where it needed to go, 
and this tended to get rid of any extents (or as many as were practical given 
the size and state of the disk). 

This was simpler than trying to automatically edit in allocations (which we 
didn't know exactly) and less disruptive to attributes we wanted to maintain 
than letting EDI/FDL do its often somewhat misguided optimise thing.  In fact 
it worked quite nicely (with the exception of the cluster size issue, which 
has only become relevant recently with huge disks).

It never really connected that if it was extending it every bucket (and I 
guess it is obvious it would in that case) that it was doing it on fresh 
clusters.   I guess that's fairly obvious.

I did some experimenting.  Leaving out both allocation and extension gives 
you one bucket per cluster (+/-, I didn't check ALL of them).  Specifying 
adequate allocation gives no impact of cluster size (well, except the tail 
end), i.e. it ends up the right size.  Specifying extension appears to be a 
good compromise for our case.  For example, on that 160,000 file, a 10k 
extension caused the result to be nearly the same size but still didn't 
require a hard coded allocation.

Thanks for solving a minor mystery.

-- 
Linwood Ferguson                 e-mail: ferguson@mag.aramark.com
Mgr. Software Engineering        Voice:  (US) 540/967-0087
ARAMARK Mag & Book Services