From:	SMTP%"@CRVAX.SRI.COM,@UNIX.SRI.COM:lrw!leichter@LRW.COM"  2-SEP-1994 08:50:37.93
To:	EVERHART
CC:	
Subj:	Re: re: posix

Message-Id: <9409020249.AA22260@uu3.psi.com>
Date: Thu,  1 Sep 94 22:36:03 EDT
From: Jerry Leichter <leichter@lrw.com>
To: INFO-VAX@SRI.COM
Subject: Re: re: posix
X-Vms-Mail-To: UUCP%"ivax@meng.ucl.ac.uk"

	Everhart@arisia.gce.com:
	> When I heard an early presentation on VMS Posix, it was discussed
	> that on a given piece of iron, unix took ~20 msec. to fire up a new
	> process, VMS took over 100 msec. (sorry, I forget the figure) and
	> that VMS Posix would reduce the amount of stuff to be carried and
	> expected to get to ~70 msec. for fork. [...]

	Anyone know if there's any truth in the rumour that kernel based
	threads are coming to VMS ? (Presumably these would make threads
	'schedulable entities', allowing threads to be scheduled
	independantly, and also spread across multiple processors). By my
	limited understanding, these would be a much better match for fork
	than subprocesses are. (?). 

I've heard the rumors, too, from a well-informed source.  Don't know when they
will arrive, however.

Kernel-based threads would not, however, provide an effective replacement for
fork().  fork() creates a new address space (which starts off as a copy of the
original address space).  Multiple threads share a single address space.  The
"kernel-based" part just has to do with who schedules threads.  In a
kernel-based implementation, the schedulable entity is the thread, not the
process.  In a user-mode thread implementation - such as DECthreads - the
schedulable entity is the process.  That, within that process, a separate
scheduler decides which thread to run when the process is given control is
of no concern to the operating system, which in fact is generally not even
aware that multiple user-level threads exist.

There are advantages and disadvantages to each approach.  The pluses for a
kernel-based design include:

	- On a multiprocessor, multiple threads can run simultaneously
		on different processors;

	- OS level blocking operations can easily be set up to block the
		thread, not the process.  (Since the OS doesn't know there
		are multiple user-level threads within a process, if one
		thread does a QIOW, all the threads in the process wait for
		it to complete.)

	- Thread priorities can mean something across processes, not just
		within them.

Advantages of user-level threads include:

	-  They can be implemented much more cheaply.  Switching from
		one user-level thread to another takes 10-20 instructions in a
		reasonable implementation.  It's hard to get within a factor
		of 10 of that cost in a kernel-mode implementation.

	- User-level threads don't *really* run in parallel, so inter-thread
		synchronization is much simpler.  This (a) is easier to get
		right, a worthwhile consideration if you don't really *need*
		true parallelism; (b) is much more efficient.  In fact, in
		many cases a non-preemptive user-level thread implementation
		is good enough, making synchronization even easier.

	- User-level threads can be implemented without changing the OS.
		Since the code that manages them is purely user mode code,
		it can safely be changed.  Thus, user-level threads can be
		much more flexible.

	- Various management policy decisions don't need to be made.  In a
		kernel-based thread implementation, you could increase your
		share of the CPU by simply starting many threads, set up so
		that they are really all doing pieces of the same thing.
		In a user-level implementation, this gains you nothing, since
		it's still your one process that gets scheduled.  There are
		ways to handle this, but they seem complex (and add yet more
		overhead to a kernel-based implementation).

There's actually an intermediate technique, called scheduler activations.  In
brief summary, there is now a third object, and "activation", that is the
schedulable entity.  A process has one or more threads, and one or more
activations.  When the kernel schedules one of the activations in a process,
the code that gets control in the process is the user-level thread scheduler,
which picks a thread to run.  If there's one activation in the process, you
get a user-level implementation; if there are as many activations as threads,
you get a kernel-based implementation.  In between, you get something that
trades off the advantages and disadvantages of the two extremes.  I think the
Solaris lwp (light-weight process) package does something of this general
sort.
							-- Jerry