<<< MOVIES::DISK$SYSDATA:[NOTES$LIBRARY]DOLLAR_INFO.NOTE;1 >>> -< Dollar File System Information >- ================================================================================ Note 86.3 Additional Product Requirements 3 of 8 LEDER1::PETTENGILL "mulp" 82 lines 26-APR-1994 02:22 -< Don't use CRC, use a Fletcher code >- -------------------------------------------------------------------------------- In reading the notes about backup I was reminded that a major issue with backup is the cost of computing the CRC. Well, there is to my knowledge NO advantage to using a CRC a la CCITT or 32 or any of the other CRC varients. Instead, a Fletcher code makes much more sense as it is much cheaper to compute and provides equivalent error detection. In fact, computing a Fletcher code on Alpha will cost no more than computing a checksum and it is simple enough to be combined with other functions such as computing the XORcise redundancy block. And just in case you feel that hardware is now so reliable that there is no need for FCS or redundancy, well, remember the DEQNA. That was the Ethernet adapter that corrupted data with no reasonable workaround that NISCS had to include an FCS to catch problems with the DEQNA. Guess what, a similar problem has popped up in another LAN adapter. So far it has only been forced to occur using VMS NISCS doing disk I/O and the larger the transfer the more likely to occur. The problem is being fixed, but the reason that it was found was that CVG does lots and lots and lots of testing of disk/file I/O and we check almost all the data for validity. Now, this problem is being fixed because we found it and after some months of investigation the problem was found and a fix is being made to the host drivers so that the hardware can ship prior to the respin of the GA. We have also found data corruption problems in many other adapters (CI, DSSI, NI, FDDI, etc) and in a number of controllers (HSC, KDM70, etc.) and in disk drives. (For the record, we find far more software problems than hardware or firmware.) But we do NO SIGNIFICANT TESTING OF TAPE. While storage is certainly doing tape testing, they are not doing complete system testing where they are looking at end to end data integrity. What is the true error rate of the complete system, including the VMS tape class driver, the tape mscp server, the LAN software, firmware, hardware, etc. The reliability of a backup can be no better than sum of all those error rates. --- For those unfamilar with the Fletcher codes they are computed like: for i=1..n b=data[1] c0=(c0+b) mod m c1=(c1+c0) mod m c2=(c2+c1) mod m c3=(c3+c2) mod m Typically c0,c1,c2,c3 are each 8 bits. If you want a 16 bit FCS then you compute just c0 and c1. For increased redundancy and larger blocks, then compute additional values. The modulus can be something convenient such as 256 or even 255 with little impact on cost. Alternatively, c0,c1,.. can be 16 bits or any other convenient size. While the mathematical algorithm calls for byte processing, the actual algorithm used would normally use larger values and in the case of Alpha it can all be done based on longwords, even with a modulus of 255. The code of significance would be something along the lines of ldl get longword x addl c0 x addl c1 x addl c2 x addl c3 lda update pointer cmp test for done br loop and if an xor block were being computed, then only four instructions would be added to the similar loop above. (The above would be unrolled and scheduled for best performance.) One might also imagine it being used to protect segments in data cache at essentially zero cost. Since data in a data cache is copied to and from the user buffer, the data is already loaded into a register and the cost is the load/store, not the computation, so computing an FCS would be very low cost. The error detection capability of a Fletcher code is similar to and sometimes greater than that of a CRC if the data is actually checked for octet framing via some other mechanism, which is certainly the case with disks, FDDI, etc. It was for this reason that OSI Transport uses a Fletcher code instead of CRC. (I guess to a mathematician Fletcher and CRC are the same since both are simply polynomials, but the cost of computing CRCs in software isn't quite so simple.)