To: mathog@seqaxp.bio.caltech.edu Subject: RE: ramdisk vs. file cache, and the winner is, file cache I wonder how much of the slowdown is due to directory fiddling in VMS...stuff like sorting on disk...that Linux avoids? It would also be interesting to look at i/o counts in show dev/full on the mem disk to see how many actual driver operations are requested. Was the I/O count you give the total after a clean boot and after running the programs, so that 216,000+ ramdisk operations were done? Or anyhow how many I/Os does a run of your prog generate? -----Original Message----- From: mathog@seqaxp.bio.caltech.edu [mailto:mathog@seqaxp.bio.caltech.edu] Sent: Saturday, June 10, 2000 6:40 PM To: Info-VAX@Mvb.Saic.Com Subject: ramdisk vs. file cache, and the winner is, file cache I had to do some system maintenance this weekend and thought that I'd revisit the file caching performance issue by installing a ramdisk on my system. So after 2.3 was installed I ran the programs which follow my signature on a 32 Mb RAMdisk on a DS10, with these results: $ r maketest $ mysplit:==$mda0:[temp]mysplit $ create testsplit.com $ sho time $ define/user sys$output nla0: $ mysplit test.nfa 200 $ sho time ^Z $ @testsplit 10-JUN-2000 15:03:22 10-JUN-2000 15:03:23 (delta varied between 1 and 2 seconds in multiple runs) $ sho dev mda0 Disk SEQAXP$MDA0:, device type RAM Disk, is online, mounted, file-oriented device, shareable. Error count 0 Operations completed 216498 Owner process "" Owner UIC [SYSMGR,SYSTEM] Owner process ID 00000000 Dev Prot S:RWPL,O:RWPL,G:R,W Reference count 1 Default buffer size 512 Total blocks 64000 Sectors per track 64 Total cylinders 32 Tracks per cylinder 32 Volume label "MDA0" Relative volume number 0 Cluster size 3 Transaction count 1 Free blocks 62700 Maximum files allowed 8000 Extend quantity 5 Mount count 1 Mount status System Cache name "_SEQAXP$DKA0:XQPCACHE" Volume owner UIC [SYSMGR,SYSTEM] Vol Prot S:RWCD,O:RWCD,G:RWCD,W:RWCD Volume Status: ODS-2, subject to mount verification, file high-water marking, caching is disabled. That's MUCH faster than I could ever achieve by RMS tuning, but oddly, STILL not as fast as the same code run on Linux on an otherwise similar DS10. It runs there about 2-3X faster as judged by the rate at which the names of the created files scroll by (it completes in under a second, so hard to time it precisely.) This is when the mysplit program is run without suppressing the messages. A small part of the speed difference may be a longer image activation on the VMS side, but once it gets rolling it is clearly taking longer per file on the OpenVMS end. I tried $ set RMS/extend=204 (the size of the output files) but that didn't speed things up at all. Caching was disabled already (it's pointless when going to a RAMDISK, isn't it?). Turning off highwater marking didn't help either. So this is the situation: OS OpenVMS Linux Version 7.2-1 RedHat 6.2 Machine DS10 DS10 input file ramdisk file cache output files ramdisk file cache program ramdisk file cache C RTL disk file cache (?) compiler Compaq C Compaq C version V6.2-007 ccc-6.2.9.002-2 Run time 1-2 0.5 (seconds) So why does OpenVMS STILL run slower than Linux? While 2-3X slower is certainly better than the 100X slower it registered "vanilla" the result seems very wrong because CPU intensive programs usually run within a few percent of each other on the two platforms, and here I've essentially reduced this disk IO application to a pure CPU/memory application, and yet there's still a 2-3 fold difference. I wonder if it may not be related to the earlier result in my TCP/IP tests, where TCP/IP services sending data through a pipe to itself did so slower than Linux did - by a similar ratio. Anybody care to speculate about what accounts for the remaining large difference in the performance? Regards, David Mathog mathog@seqaxp.bio.caltech.edu Manager, sequence analysis facility, biology division, Caltech ********************************************************************** /* MAKETEST.C makes a 16000 entry fasta file, each containing a 500 bp sequence */ #include #include void main(void){ int i,j; FILE *fd; fd=fopen("test.nfa","w"); for(i=0; i< 16000; i++){ (void) fprintf(fd,">test%.4d\n",i); for(j=0; j < 10; j++){ (void) fprintf(fd,"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\n"); } } } ********************************************************************** /* MYSPLIT.C quicky program. It splits a fasta file into a series of new files at N line intervals. First argument is the filename and second is the number of entries per file fragment. Single long sequence lines will not be handled properly if they exceed the buffer size. */ #include #include #include #define MYMAXSTRING 100000 int main(int argc, char *argv[]){ char *infile; char *outfile; char root[]="frag"; char bigstring[MYMAXSTRING]; char outname[200]; int n,count,fragcount; FILE *fin; FILE *fout; fout=NULL; fragcount=0; count=0; n=0; if(argc != 3 || (sscanf(argv[2],"%d",&n)==EOF) || n<1){ (void) printf("Usage: mysplit infile N, where N is number of entries perfragment\n"); exit(0); } infile=argv[1]; (void) printf("Processing %s with n=%d\n",infile,n); fin=fopen(infile,"r"); if(fin==NULL){ (void) printf("Could not open input file %s\n",infile); exit(0); } while( fgets(bigstring,MYMAXSTRING,fin) != NULL){ if(bigstring[0] == '>'){ count--; if(count<=0){ count = n; fragcount++; if(fout!=NULL)fclose(fout); (void) sprintf(outname,"frag%.3d",fragcount); (void) printf("Opening output file %s\n",outname); fout = fopen(outname,"w"); } } (void) fprintf(fout,"%s",bigstring); } (void) printf("All done, entries in final file segment: %d\n",n-count); exit(1); }