From:	CSBVAX::MRGATE!RELAY-INFO-VAX@CRVAX.SRI.COM@SMTP  1-OCT-1988 01:49
To:	ARISIA::EVERHART
Subj:	VAX disk performance (well, VMS anyway)


Received: From KL.SRI.COM by CRVAX.SRI.COM with TCP; Fri, 30 SEP 88 20:58:20 PDT
Received: from M5.Sdsc.Edu by KL.SRI.COM with TCP; Fri, 30 Sep 88 20:31:35 PDT
Date:    Sat, 1 Oct 88 03:32:11 GMT
From:     gkn@M5.Sdsc.Edu (Gerard K. Newman)
Message-Id: <881001033211.25c0005d@M5.Sdsc.Edu>
Subject: VAX disk performance (well, VMS anyway)
To:       info-vax@kl.sri.com
X-ST-Vmsmail-To: ST%"info-vax@kl.sri.com"


Believe it or not, this wasn't prompted by the recent Unix vs. VMS jihad
which has been polluting my mailbox. I actually needed to have some performance
figures for the VAXen running VMS so a collegue and I could design a fast file
transfer protocol between a Cray and our VAX over a channel which runs at a
measured 4.5 Mbytes/sec (sustained!).  I think the results are interesting
enough to share.  I also think it would be interesting to see similar numbers
for Unix systems with similar hardware (VAX, RAxx).  However, I am not
sufficiently proficient with Unix to do what I consider to be a fair
test, so someone else who is is invited to contribute.

The benchmark writes a file of a given size and then reads it back, and
reports the elapsed time for writes and reads.  The user is allowed to
specify the size of the file, the size of the I/O buffer, the number of I/O
requests which can be pending (buffering depth), whether the file is to be
contiguous, whether the file is to be allocated on a cylinder boundary, and
whether or not to use RMS.  It is written in assembler.

The program simply reads and writes blocks in the file.  The file is
pre-allocated and the last block is written before the test begins to prevent
file high water marking from being a factor in the timing.

As it turns out, using RMS vs. QIOs does not make any difference in terms
of performance, which speaks quite well of RMS block mode I/O.  Another
interesting fact is that there is no further performance gain after
buffering things 4 levels deep (in other words, 4 pending requests performs
as well as 32 pending requests).

I ran the tests on 4 machines in my cluster.  In all cases a 40960 block
(20Mbyte) contiguous, cylinder-aligned file was used, with a buffering depth
of 32 requests.

I used I/O buffer sizes of 32, 64, 96, and 127 blocks (128 is 1 byte too
large for the I/O subsystem to handle).  The disks involved were DEC RA82
and RA81, on a non-busy controller connected to an HSC-50.  None of the
machines were busy.

Here are the results.  All times are in seconds, reported as write/read rate.


Buffer size	  32		64	      96	    127

8350	82    24.26/16.68   24.01/16.25   24.40/16.43   23.15/16.26
	81    24.24/18.73   24.62/18.47   24.85/18.25   25.31/18.34

6210	82    24.85/15.91   24.43/16.16   24.45/16.27   24.22/16.06
	81    25.45/18.27   25.96/18.58   25.63/18.56   25.51/19.02

785	82    18.51/15.53   19.33/15.93   18.91/15.81   18.54/15.85
	81    20.50/18.29   20.54/18.72   20.52/19.38   20.24/18.65

750	82    24.06/15.90   24.01/15.67   23.76/15.66   23.89/15.86
	81    24.75/18.33   26.79/19.27   25.22/18.64   25.12/18.66


Note that varying the buffer size doesn't make much of a difference,
except for the 750 on an RA81 with a buffer size of 64 blocks.  I ran
that test 5 times and it is consistenly slower by 2-3 seconds.  I have
no idea why.

The 8350 and 6210 both have a CIBCA CI interface, which is suprisingly
slow when compared to a CI780 (it performs about the same as a CI750!!).

An RA82 can be written at about 1.33 Mbytes/sec and be read at about
1.35 MBytes/sec on a 785, or about .90 Mbytes/sec and 1.28 Mbytes/sec
on anything else I've got.

An RA81 can be written at about 1.03 Mbytes/sec and be read at about
1.14 Mbytes/sec on a 785, or about .86 Mbytes/sec and 1.14 Mbytes/sec
on anything else I've got.

Just for grins I tried a 20480 block (10 Mbyte) contiguous, cylinder
aligned file on my shadow set (2 RA82s, which is also my system disk).
The reason for the smaller file is that's the largest contiguous free
space I could find on the volume which I could align on a cylinder
boundary ... here's the comparison between a shadowed RA82 vs. a
non-shadowed RA82 on my 8350:


Buffer size	  32		64	      96	    127

8350	82    19.97/8.42    19.77/8.68    20.45/8.19    19.35/7.57
	81    12.27/8.28    12.04/8.31    11.92/8.37    11.57/8.12


So, for writes a shadow set is about 37% slower than a non-shadowed
disk, but about the same for reads.  Actually, I suspect that it is
better on less cohesive reads than what I was doing since the odds
are that one set of heads in the shadow set will have to seek less
distance to get to your data.

Another totally useless set of numbers from ...

gkn
----------------------------------------
Internet: GKN@SDS.SDSC.EDU
Bitnet:   GKN@SDSC
Span:	  SDSC::GKN (27.1)
MFEnet:   GKN@SDS
USPS:	  Gerard K. Newman
	  San Diego Supercomputer Center
	  P.O. Box 85608
	  San Diego, CA 92138-5608
Phone:	  619.534.5076