From - Thu Sep 25 07:57:07 1997
Path: news.mitre.org!blanket.mitre.org!agate!newsfeed.kornet.nm.kr!newsfeed.dacom.co.kr!newsfeed.internetmci.com!208.206.176.15!dimensional.com!not-for-mail
From: efricha@alumni.cs.colorado.edu (Eric F. Richards)
Newsgroups: vmsnet.internals,comp.os.vms
Subject: RMS/Proc Perm File/Terminal driver confusion
Date: Thu, 25 Sep 1997 03:11:48 GMT
Organization: What organization?
Lines: 148
Message-ID: <3429c747.1107792@news>
NNTP-Posting-Host: p17.pm-4.pm.dimcom.net
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Newsreader: Forte Agent 1.5/32.451
Xref: news.mitre.org vmsnet.internals:5337 comp.os.vms:179395

Greetings, all...

I've been doing some exotic things with RMS lately, and have found
an unusual problem.  Under certain circumstances, RMS can get
completely confused about its internal state, and do things like
return RMS$_BUSY from SYS$WAIT. (and just what are you supposed to
do to recover from that?)

The following program is a simple demonstration of the problem.
Simply run it on a "soft" terminal (telnet session, DECterm),
type CTRL/S, wait a bit, type CTRL/Y, then type EXIT.  Note that
the exit handler cannot -- even using $WAIT -- use the stream
again.

Following BUG.MAR is an even more bizarre manifestation of
this problem.

---begin BUG.MAR
        .title  bug	RMS misbehavior demo
        .ident  /V1.00/
        $rmsalldef
        .psect  bug, page
        .entry  bug, ^m<>	; main entry point, no regs saved
        $dclexh_s desblk=blok	; set up an exit handler
        blbc    r0, 20$		; on error, bail out

        $create fab=fab		; open SYS$OUTPUT
        blbc    r0, 20$		; error check

        $connect rab=rab	; connect a stream to SYS$OUTPUT
        blbc    r0, 20$		; error check

10$:    $put    rab=rab		; start spewing output -- doesn't
        blbs    r0, 10$		; ...matter what it is

20$:    ret			; error out -- this never executes

        .entry  ex, ^m<>	; this is the exit handler
        $wait   rab=rab		; wait for I/O to complete -- FAILS

        pushl   rab+rab$l_stv	; put out the error from $WAIT
        pushl   r0		; ...
        calls   #2, msg		; ...

10$:    movab   done, rab+rab$l_rbf		; set up a string to
        movw    s^#done_len, rab+rab$w_rsz	; ...output
        $put    rab=rab				; write it out -- FAILS

        pushl   rab+rab$l_stv	; put out the error stats from $PUT
        pushl   r0		; ...
        calls   #2, msg		; ...

20$:    $close  fab=fab		; close the file -- "succeeds", sorta

        pushl   fab+fab$l_stv	; put out SYS$CLOSE status
        pushl   r0		; ...extended status as well
        calls   #2, msg		; ...

30$:    ret			; end of exit handler

        .entry  msg, ^m<>	; quick & dirty way to get errors out
        $putmsg_s msgvec=(ap)	; ...vector looks just like a call frame
        ret			; ...

        .align  long
fab:    $fab    fnm=<sys$output:>, rat=cr, fop=cif, -
                fac=put, shr=<get,put,del,upd,trn>
rab:    $rab    fab=fab, rbf=line, rsz=line_len

line:   .ascii  /This is a very long line of some meaningless/
	.ascii	/ text to output/
        line_len = . - line

done:   .ascii  /The exit handler produced this/
        done_len = . - done

        .align  long
blok:   .long   0
        .address ex
        .long   1
        .address 10$
10$:    .long   0
        .end    bug			; end of demo program
---end of BUG.MAR

Now, this program works some of the time as it should, and fails
as described earlier other times.  I had the failures using a telnet
session from a PC; VAXman reproduced the failures on his machine
from a DECterm.

....now, if you've made it this far.......

If you've been reading vmsnet.internals lately, you probably saw the
earlier discussion regarding catching SYS$PUT.  This all came about
because of some debugging code I'd put into my intercept.  The
intercept code and the debugging FAB and RAB all live in P1 space.

SDA> exam/inst sys$put+2
SYS$PUT+2:  JMP @#INTPUT

...

intput::movpsl	r1			; check processor mode
	cmpzv	#psl$s_curmod, -	; verify that we're in
		#psl$v_curmod, -	; ...user mode
		r1, #psl$c_user		; ...and bail if not
	bneq	realput+2	; copy of the original vector

	movl	g^ctl$gl_phd, r0	; get process header
	cmpl	imgcnt, phd$l_imgcnt(r0); do we have to set up
	beql	10$			; ...again?
	bsbw	setup			; if so, do it here
10$:
....
;
;	Set up some debugging data for display to SYS$ERROR
;
	pushal	dbgrab		; this RAB produces the output
	calls	#1, realput	; display the output

....
setup:		; init FAB, RAB for debug, OPEN/CONNECT, etc.


Like the earlier demo program, if you XOFF the output this produces,
then CTRL/Y, then XON, any call referencing dbgrab will return
RMS$_BUSY.  This is true even if you write zeros over the FAB and RAB!
            ---------------------------------------------------------
Even more bizarre, if RMS doesn't get confused, you can still call
SYS$PUT using the address of where the RAB used to be, and get output!
(I found that a code path was successfully writing to SYS$ERROR before
it had been opened....)

I'm guessing that at least 3 things are involved:  1) SYS$ERROR is a
process permanent file, so special rules apply.  2) soft terminals
like telnet sessions behave asynchronously, even with I/O that's
supposed to by synchronous.  I bet the problem cannot be recreated
by someone using a VT100 on a DZ11 (God help them). 3) Some race
condition in RMS rundown.

Any ideas?  (Besides, "get a new hobby.")

Thanks much!
--
Eric F. Richards
efricha@alumni.cs.colorado.edu
"The weird part is that I can feel productive even when I'm doomed."
 - Dilbert