Article 129411 of comp.os.vms:
Path: nntpd.lkg.dec.com!crl.dec.com!crl.dec.com!caen!news.eecs.umich.edu!newsxfer.itd.umich.edu!newsfeed.internetmci.com!howland.reston.ans.net!math.ohio-state.edu!uwm.edu!homer.alpha.net!mvb.saic.com!info-vax
From: ivax@meng.ucl.ac.uk (Mark Iline - Info-VAX account)
Newsgroups: comp.os.vms
Subject: Re: RAID strategies in lieu of Spiralog
Message-ID: <95091513222720@meng.ucl.ac.uk>
Date: Fri, 15 Sep 1995 13:22:27 GMT
Organization: Info-Vax<==>Comp.Os.Vms Gateway
X-Gateway-Source-Info: Mailing List
Lines: 91

>	Given that Spiralog will offer "up to 10 times" write performance
>	over the existing filesystem, how will RAID strategies be effected?

>	To give a for instance to get things rolling, suppose:
>
>		- You have the opportunity to RAID your existing AlphaVMS box.
>		- You want 2 to 3 times write speedup over your existing drives.
>		- You anticipate starting with 8 Gig and growing to at most 
>		  10 in 5 to 10 years.
>		- Will be using Digital controller based RAID.

A quick RAID summmary, largely lifted from the "Digital Guide to RAID 
Storage Technology".

Raid 0 (striping) providing requests don't generally cross chunk
boundaries, read & write request rate scales directly with the number of
drives. No cost penalty (in terms of capacity lost), but reliability worse 
than a single drive.

Raid 1 (shadowing) Good realiability, read request rate scales with number 
of drives, write request rate =< that of a single drive.

Raid 5 (data and parity striped across multiple drives) providing requests 
don't generally cross chunk boundaries, read request rate scales with the 
number of drives. However, writing generally involves combining the 
incoming data with existing parity data, so write requests may well involve 
reading the old data and parity off of two drives, then writing the new 
data and new parity back to the same two drives. Write performance will 
suffer. Intelligent controllers may partially work around this, as may 
careful software control of the patterns written.


If you want a write performance increase, (assuming we mean request rate, 
ie smallish requests), on the face of it, only Raid 0 will help. However, the 
chances are that you'll be concerned about the storage reliability, so 
you'll want to combine it with Raid 1 (0+1). To get the 3 fold increase, 
you'll need 3 times the drives, going to 6 times the drives with 
shadowing. This may well not be what you want...

Spiralog gives a write performance increase, because the writes to disk 
tend to occur in a 'spiral pattern', which is typically as fast as the disk 
can go. This benefits with JBOD, Raid 0, 1, 0+1. However, there's an extra 
benfit with Raid 5.

Because Spiralog (rather than the application) has control of what is 
actually written to the media, and because the writes are generally large 
due to the re-writing of retrieval information, along with the data, and 
Spiralog's ability to group writes when working in a write-behind mode, 
Spiralog can ensure that the writes to the Raid-set are an integral number 
of stripes in length, and occur on strip boundaries.

Hence, Spiralog can potentially overcome Raid 5's write problems (up to 4 
operations for every write) and still write at close to the spiral write 
rate. Since Raid 5 is very good in terms of cost, its read request rate, 
and it's reliability, this is a big win.

However, what about intelligent, caching Raid 5 controllers ? Provided that 
you build a suitable level of redundancy and battery backup into your 
controller, you can minimise the risk of operating in write-back mode. On 
the face of it, controller based write-back caching doesn't do much to 
improve 'write to media' throughput, which is one of Spiralog's strengths. 
However, given that applications may well read-modify-rewrite disk blocks, 
and there's a fair chance that multiple applications may be updating the 
same disk blocks, a number of optimisations can be applied.

If the read data is already in the cache, the write operation has no need 
to re-read the old data to calculate parity. The parity data may also be in 
the cache, due to a previous write to the same stripe. While the write 
operation still requires 2 writes (2 to disks), this goes part way to 
alleviating the problem.

Secondly, if the same disk block is re-written 10 times, only the last
update actually needs to make it onto the disk. If the controller can defer
its writes, it can make significant savings. Similarly, if it can 'stack
up' writes until it can write a complete stripe (effectively writing in
Raid 3 mode), it can make further savings. [This might sound unlikely, but
it's quite possible if copying sizeable files around, and is a feature of
some DEC controllers]. 


This really leaves it down to Raid 5 with an intelligent controller (ie not 
host-based), or Raid 5 with Spiralog (either host-based, or a more 'basic' 
controller-based implementation). Both have their pros and cons, but I 
suspect that Spiralog's backup will clinch it in many cases.


Mark Iline	system@meng.ucl.ac.uk
Dept Mech Eng, University College, London. UK

			Read at your own risk.