From:	MERC::"uunet!CRVAX.SRI.COM!RELAY-INFO-VAX"  2-NOV-1992 19:43:41.12
To:	info-vax@kl.sri.com
CC:	
Subj:	Re: Alternate Type Ahead Buffer Questions

In article <1992Oct30.163055.15639@cs.tulane.edu>, Jeff E Mandel <mandel@tmc.tulane.edu> writes:
> In article <1992Oct30.023455.813@cmkrnl.com> , jeh@cmkrnl.com writes:
>>> I have a process that is trying to drain a 19.2K line into a file.
>>> To allow it time to process, I have split the process into two
> processes
>>> that communicate via a global section.  Thus the first process issues
>>> 500 byte QIO reads on the terminal line and places the results into
>>> the circular buffer global section.
>>
>>Unless you are on a multi-CPU (SMP) VAX, there is no advantage to
> splitting the
>>work between two processes.  Just use an AST-driven thread to read the
> data.
> 
> Well, you may want to refer back to the thread on shared VM_ZONES we had
> last month. The advantage to splitting it into two processes is that it
> gives you the ability to buffer with less "costly" memory if your writing
> process gets blocked. Basically, the approach I took was to use the PPL$
> library, and create a shared vm_zone. Now I create a task that reads from
> the serial line, allocates a buffer in the shared zone with lib$get_vm,
> and places that buffer into a PPL$ work queue. The second task reads the
> work queue, inserts the data in the record into the file (in my case an
> Rdb database), and lib$free_vm's the buffer. The advantages of this are:
> 1) Your bufferring is in virtual memory, rather than in a fixed-size
> global section, or worse, in nonpaged pool (if you use a great big
> typeahead buffer).

There is really no difference in terms of the "cost" of the memory between what
you have described and either doing the global section yourself (just how do
you think shared zones are implemented, anyway?), or LIB$GET_VM with the
reading thread (AST-driven) and the writing thread (either AST-driven, or at
the non-AST level) in the same process.  

As far as the creation of virtual address space is concerned, these are just
different interfaces to the same mechanisms.  The memory is virtual in each
case.  A "fixed size global section" has a maximum size, all right, but so does
a VM zone (16 megabytes; a global section can be larger, various sysgen and uaf
parameters permitting).  

As far as physical memory is concerned, the global section is just as "virtual"
as as-yet-unallocated parts of a VM zone.  The only difference is that for the
global section, page table entries are created for the entirety of the section
at one time, while the VM zone can cause a gradual expansion of virtual
address space.  Either way, no physical memory is used until pages are faulted
in (only <pagefaultclustersize> pages at a time, and I might add that you can't
control the pfcsize on a VM zone as you can with the $CRMPSC service), and can
be paged out of the process[es]' working set[s] as needed. 

Using a great big typeahead buffer:  Funny you should mention that; we have to
do that sometimes in uucp, to support very large packets + large windows.  It's
necessary when you can't count on the code that's reading the serial port to
wake up fast enough.  Even then it's limited to a few tens of kilobytes (seven
packets in the window * 4Kbytes/packet).  (As a practical matter, no one really
needs to run with windows and packets that large to optimize uucp throughput.)
This is probably about the most data anyone should count on keeping in the
typeahead buffer, since there are a lot of systems out there where finding a
chunk of pool that size is a chancy thing.  (Yes, I know about NPAGEVIR.  There
are a lot of system managers who set it equal to NPAGEDYN, thinking that
they're saving on memory... life would be lots simpler if one could write 
applications only for well-managed systems!) 

> 2) The buffer is a queue, rather than a circular buffer. You don't have
> to worry about overwriting live data (until you exhaust vm)

Again... there is no need to go to the PPL$ routines and shared VM zones to get
this behavior.  Nor will these routines be any cheaper in terms of physical
memory.  

For this particular problem, reading data from a 19.2Kbps line (about 2000
characters per second), there really shouldn't be much problem keeping up
unless the data must be read with very small $QIO buffers.  For example, uucp
on a MicroVAX 3600 can absorb data from a Trailblazer modem at 1400 bytes/sec,
and write it to disk, and leave at least 70% of the CPU free.  This involves
doing two $QIO reads for every 64 bytes of data (one for the 64 bytes, and one
for the 6-byte header that precedes it), PLUS a $QIO write to send the ACK to
the modem for each 64-byte packet.  At 2000 char/sec I wouldn't worry about
overrunning a circular buffer unless the CPU is VERY busy (or slow). 

Now:  I agree that using LIB$GET_VM to allocate chunks of memory, and queueing
these to a listhead, etc., as you have described, is a good technique for
queueing messages from one execution thread to another, whether the two threads
are part of the same process (using ASTs, and a process-private VM zone) or are
in differrent processes (using the PPL$ library).  It's a heck of a lot easier
than setting up a section (global or private) and managing your own space
allocation!  However there are time penalties to be paid for all that
convenience.  The closer you are to the edge of acceptable performance, the
more you need to bypass the "convenient" interfaces and do the work yourself.
And the mechanisms behind the convenient interfaces need to be understood
before implementation decisions can be made. 

I can't resist just one more nit:  You do need to be aware that if your second
thread is running in its own process (whether under control of the PPL$
routines or not), there will be a slight performance penalty due to process
context switching, vs. switching between AST contexts in a single process.  The
size of the penalty depends on how many context switches happen per second, the
virtual addressing behavior of the processes (since a context switch causes the
translation buffer to be flushed), the speed of the CPU (although process
context switch speed doesn't scale linearly with CPU speed), etc., etc.  

I can't quote numbers for this; I can tell you that this penalty is
unmeasurable on a MicroVAX II that's doing 50 context switches a second
(quantum set to 2). (ie two compute-bound processes can get the same amount of
work done whether quantum is set to 2 or 200.)  On modern VAXes you probably
don't need to worry about this until you are doing a few hundred or more
context switches/sec.  

> 3) If you decide you want to use a new version of the writing process,
> kill the old one and start a new one. You don't have to shut down the
> serial line while you restart the program, as you would in a AST threaded
>  single process.

yep.  The application permitting, you can even set up multiple processes to 
pick the data out of the queue.  Suppose that instead of writing the data to
disk the job was to summarize the data in some way.  On an SMP machine you
could use one process in one CPU to grab the data blocks and put them on the
queue, and n-1 "summarizer" processes.  

> 4) When someone gives you an SMP VAX for Christmas, you have a parallel
> application all ready to go!

Not likely to happen here, but I'll keep it in mind.  :-)

> Jeff E Mandel MD MS
> Associate Professor of Anesthesiology
> Tulane University School of Medicine
> New Orleans, LA
> 
> mandel@vax.anes.tulane.edu

	--- Jamie Hanrahan, Kernel Mode Consulting, San Diego CA
drivers, internals, networks, applications, and training for VMS and Windows-NT
uucp 'g' protocol guru and release coordinator, VMSnet (DECUS uucp) W.G., and 
Chair, VMS Programming and Internals Working Group, U.S. DECUS VAX Systems SIG 
Internet:  jeh@cmkrnl.com, hanrahan@eisner.decus.org, or jeh@crash.cts.com
Uucp:  ...{crash,eisner,uunet}!cmkrnl!jeh