From: CSBVAX::MRGATE!RELAY-INFO-VAX@CRVAX.SRI.COM@SMTP 1-OCT-1988 01:49 To: ARISIA::EVERHART Subj: VAX disk performance (well, VMS anyway) Received: From KL.SRI.COM by CRVAX.SRI.COM with TCP; Fri, 30 SEP 88 20:58:20 PDT Received: from M5.Sdsc.Edu by KL.SRI.COM with TCP; Fri, 30 Sep 88 20:31:35 PDT Date: Sat, 1 Oct 88 03:32:11 GMT From: gkn@M5.Sdsc.Edu (Gerard K. Newman) Message-Id: <881001033211.25c0005d@M5.Sdsc.Edu> Subject: VAX disk performance (well, VMS anyway) To: info-vax@kl.sri.com X-ST-Vmsmail-To: ST%"info-vax@kl.sri.com" Believe it or not, this wasn't prompted by the recent Unix vs. VMS jihad which has been polluting my mailbox. I actually needed to have some performance figures for the VAXen running VMS so a collegue and I could design a fast file transfer protocol between a Cray and our VAX over a channel which runs at a measured 4.5 Mbytes/sec (sustained!). I think the results are interesting enough to share. I also think it would be interesting to see similar numbers for Unix systems with similar hardware (VAX, RAxx). However, I am not sufficiently proficient with Unix to do what I consider to be a fair test, so someone else who is is invited to contribute. The benchmark writes a file of a given size and then reads it back, and reports the elapsed time for writes and reads. The user is allowed to specify the size of the file, the size of the I/O buffer, the number of I/O requests which can be pending (buffering depth), whether the file is to be contiguous, whether the file is to be allocated on a cylinder boundary, and whether or not to use RMS. It is written in assembler. The program simply reads and writes blocks in the file. The file is pre-allocated and the last block is written before the test begins to prevent file high water marking from being a factor in the timing. As it turns out, using RMS vs. QIOs does not make any difference in terms of performance, which speaks quite well of RMS block mode I/O. Another interesting fact is that there is no further performance gain after buffering things 4 levels deep (in other words, 4 pending requests performs as well as 32 pending requests). I ran the tests on 4 machines in my cluster. In all cases a 40960 block (20Mbyte) contiguous, cylinder-aligned file was used, with a buffering depth of 32 requests. I used I/O buffer sizes of 32, 64, 96, and 127 blocks (128 is 1 byte too large for the I/O subsystem to handle). The disks involved were DEC RA82 and RA81, on a non-busy controller connected to an HSC-50. None of the machines were busy. Here are the results. All times are in seconds, reported as write/read rate. Buffer size 32 64 96 127 8350 82 24.26/16.68 24.01/16.25 24.40/16.43 23.15/16.26 81 24.24/18.73 24.62/18.47 24.85/18.25 25.31/18.34 6210 82 24.85/15.91 24.43/16.16 24.45/16.27 24.22/16.06 81 25.45/18.27 25.96/18.58 25.63/18.56 25.51/19.02 785 82 18.51/15.53 19.33/15.93 18.91/15.81 18.54/15.85 81 20.50/18.29 20.54/18.72 20.52/19.38 20.24/18.65 750 82 24.06/15.90 24.01/15.67 23.76/15.66 23.89/15.86 81 24.75/18.33 26.79/19.27 25.22/18.64 25.12/18.66 Note that varying the buffer size doesn't make much of a difference, except for the 750 on an RA81 with a buffer size of 64 blocks. I ran that test 5 times and it is consistenly slower by 2-3 seconds. I have no idea why. The 8350 and 6210 both have a CIBCA CI interface, which is suprisingly slow when compared to a CI780 (it performs about the same as a CI750!!). An RA82 can be written at about 1.33 Mbytes/sec and be read at about 1.35 MBytes/sec on a 785, or about .90 Mbytes/sec and 1.28 Mbytes/sec on anything else I've got. An RA81 can be written at about 1.03 Mbytes/sec and be read at about 1.14 Mbytes/sec on a 785, or about .86 Mbytes/sec and 1.14 Mbytes/sec on anything else I've got. Just for grins I tried a 20480 block (10 Mbyte) contiguous, cylinder aligned file on my shadow set (2 RA82s, which is also my system disk). The reason for the smaller file is that's the largest contiguous free space I could find on the volume which I could align on a cylinder boundary ... here's the comparison between a shadowed RA82 vs. a non-shadowed RA82 on my 8350: Buffer size 32 64 96 127 8350 82 19.97/8.42 19.77/8.68 20.45/8.19 19.35/7.57 81 12.27/8.28 12.04/8.31 11.92/8.37 11.57/8.12 So, for writes a shadow set is about 37% slower than a non-shadowed disk, but about the same for reads. Actually, I suspect that it is better on less cohesive reads than what I was doing since the odds are that one set of heads in the shadow set will have to seek less distance to get to your data. Another totally useless set of numbers from ... gkn ---------------------------------------- Internet: GKN@SDS.SDSC.EDU Bitnet: GKN@SDSC Span: SDSC::GKN (27.1) MFEnet: GKN@SDS USPS: Gerard K. Newman San Diego Supercomputer Center P.O. Box 85608 San Diego, CA 92138-5608 Phone: 619.534.5076