From: HENRY::IN%"DCOTTLER%rca.com%csnet-relay.csnet%sri-kl.ARPA%relay.cs.net@rca.com" 1-MAR-1987 17:13 To: info-vax@sri-kl.arpa Subj: RE: system hangs >...We are experiencing a strange problem on one of the Vaxen in our >cluster (consisting of 2 11/750s, 1 11/785, 1 8650). We recently >increased the number of logins allowed from 64 (default) to 120 on our >8650. Shortly there after the system would start to hang with about 70 >or 80 users. The system would slow down and processes would be placed in >a RWSWP state. These processes would hang completely. eventually the >system would completely hang and no one could do anything, including the >console... ...processes in the RWSWP state... The last time the system >started doing this I was able to do a 'SHOW MEM' from DCL and our swap >file was more that 90% used, but the page file had plenty of free space. We also went thru this problem when we first expanded our VAXcluster. It took the TSC several weeks to get back to us. By that time, via the VMS internals man and the doc set, we solved it ourselves. The TSC answer simply confirmed it... The RWxxx, Resource Wait, process states are simply breakdowns of the MWAIT, Misc Wait, state. RWSWP -- Resource Wait for Swap File Space -- indicates that those process were hung waiting for space in the swap file. Under VMS V4, swap file and page file are allocated a bit differently than in VMS V3. See the tuning guide for details -- the bottom line is that each process must have a minimum area in the swap file. This space is used for process header, page tables, etc. If that area isn't available when needed, the process hangs. After a while, this situation will either clear itself when processes are deleted, thus freeing space, or the problem will snoball and hang the entire system. In a VAXcluster, it is often/usually the case that: you have NO swap file on the system disk; you have a MINIMUM page file on the system disk; you have large secondary page and swap files on other disks. This organization is done for performance purposes. In a VAXcluster, you can easily swamp a system disk if you are doing extensive paging or swapping on it. The minimum page file is just enough to get VMS booted then let you mount the other disks and enable their secondary page and swapfiles from systartup.com. Thus all real paging/swapping gets done to the secondary files. This is also a BIG disk space savings in large VAXclusters -- when you have a lot of layered software, and large memory VAXes (so you need large sysdump files) space on the system disk is critical. In pre-VMS V4, the minimum size of the page file on the system disk was 4K blocks. In VMS V4, due to size changes in the executive, the minimum is now 8K blocks. ie -- if your page file on the system disk is less than 8K blocks, you may hang during boot. Actual sizing of these files will vary depending on your application. AUTOGEN looks for a pagefile of at least 2*VIRTUALPAGECNT. This may or may not be acceptable. Our VAXcluster has a VAX-8650 with 52MB of real memory. Our main (and largest) memory applications are VLSI design and AI. We need to supply our users with 40 to 60MB virtual memory. In our case, our 8650 was hanging because the secondary swap file was MUCH too small. This has been adjusted and we now running with: VIRTUALPAGECNT = 128000 WSMAX = 33000 IJOBLIM = 90 BJOBLIM = 5 (for the memory files listed below, S: is the system disk, and U: is a user disk.) S:[SYS7.SYSEXE]SWAPFILE.SYS - Doesn't exist. S:[SYS7.SYSEXE]PAGEFILE.SYS - 8192 blocks. S:[SYS7.SYSEXE]SYSDUMP.DMP - 106500 blocks (52MBish+dump header) U:[VAXVMS]SWAPFILE.SYS - 100000 blocks. U:[VAXVMS]PAGEFILE.SYS - 300000 blocks. During the day, the 8650's secondary swapfile runs about 60 to 70% full. The secondary page file runs around 80% full, but has been known to completely fill if too many chip designers get too ambitious at once. This is a limitation we've decided to live with because we can't afford the disk storage to size the page file to the actually required 450,000 blocks. I hope this helps. The full explinations can be found in the books distributed in DEC's VAX Performance Seminar. Dan Cottler RCA Advanced Technology Laboratories Moorestown, NJ