From:	SMTP%"raspuzzi@mrlat.enet.dec.com" 28-OCT-1994 09:36:48.80
To:	EVERHART
CC:	
Subj:	Re: Any more recent info/articles on LAT rating algorithm?

X-Newsgroups: comp.os.vms
Subject: Re: Any more recent info/articles on LAT rating algorithm?
Message-Id: <38f07n$gma@mrnews.mro.dec.com>
From: raspuzzi@mrlat.enet.dec.com (Michael D. Raspuzzi)
Date: 23 OCT 94 20:43:44
Organization: Digital Equipment Corporation
Nntp-Posting-Host: mrlat.enet.dec.com
Lines: 316
To: Info-VAX@Mvb.Saic.Com
X-Gateway-Source-Info: USENET


In article <1994Oct20.024416.29188@ultb.isc.rit.edu>, dsf5454@ultb.isc.rit.edu (D.S. Foster ) writes...
>Howdy...
>	There's a file obtainable via anon FTP at:
> 
>ACFCLUSTER.NYU.EDU, [VMS]LAT.EXPLAINATION
> 
>That has a post from Nick Smith to comp.os.vms circa 1990 that had interesting
>info on the LAT rating algorithm. I'm aware that the algorithm has most
>likely changed since then... so I'm curious if anyone had any info or
>pointers where any could be found for that particular subject that was
>done after 1990?
> 
>-Dan
>Internet:	dsf5454@rit.edu

The rating in use in 1990 (~VMS V5.5 release) is still the same one today.
Below is a lengthy explanation of some rating anomalies that have been seen
from time to time.  I hope that the explanations make some sort of sense.
It is 200+ lines.

I should also mention that we are considering supplying a default rating
algorithm that can be easily modified at sites (that is, you can write
your own LAT rating algorithm and plug it into LTDRIVER).  This is in the
works for a future release of OpenVMS (both VAX and AXP).  At this time, I have
no release information so I don't know when it will be available.  The plan
would be to ship the sample rating algorithm (written in C and a little MACRO)
in SYS$EXAMPLES.

--
Mike Raspuzzi (raspuzzi@mrlat.enet.dec.com)
Digital Equipment Corporation

-----------------------LAT Rating Info-----------------------

We get a lot of questions about the LAT rating algorithm.  This text is
an attempt to describe some scenarios and explanations for LAT rating
algorithms that "don't feel right" or don't look right due to a lack
of understanding on exactly what is happening with the rating algorithm.

First, let's dissect the rating algorithm so that an understanding can
be gained from it's make up.

The LAT rating algorithm is as follows:

    !
    ! The LAT rating is calculated using system load average and number of
    ! free job slots:
    !
    !          20 * (IJOBLIM - IJOBCNT)    min(235,IJOBLIM) * 100
    ! RATING = ------------------------ +  ----------------------
    !                 IJOBLIM              (100 + GHB$L_LOADAVG)
    !
    ! If IJOBLIM = 0 or IJOBLIM = IJOBCNT then rating is set to 0.
    !
    ! GHB$L_LOADAVG is a quantity that equals 100 if there is an average of
    ! 1 job waiting in a computable queue (200 if 2 jobs, etc.)  and is
    ! a moving average taken every 5 seconds.  Note, only those processes
    ! whose priority is DEFPRI or higher are included in the load average.
    !
    ! The first term in the above formula is dubbed the "AVAILABILITY" term,
    ! and represents the fraction of free job slots available.  The second
    ! term is known as the "LOAD" term, and varies according to system load.
    !  
    ! The factor "min(235,IJOBLIM)" may be changed by the LATCP command
    ! "SET NODE/CPU_RATING=nnn".  The quantity "nnn" represents the system
    ! manager's estimate of relative CPU power.  Its value can range
    ! from 1 (for a low power CPU) to 100 (for the highest power CPU which
    ! offers the given LAT service).  This range is scaled up by LATCP
    ! to produce a factor which ranges from 1 to 235 in order to fit in
    ! the above formula.
    !
    ! In addition to the above terms, there may be a term which represents
    ! a penalty of up to 40 rating points if the system is low on free 
    ! memory. This term is:
    !
    !		    (FREEGOAL + 2048 - FREECNT)
    ! PENALTY = 40* ---------------------------
    !                   (FREEGOAL + 2048)
    !
    ! This term is SUBTRACTED from the rating ONLY IF it is positive.  If
    ! FREECNT is high enough, this term is not used.
    !

In it's simplest form, the LAT rating is simply:

RATING = AVAILABILITY + SYSTEM LOAD

Since the rating can only have a maximum value of 255, LTDRIVER must
take care in making sure that AVAILABILITY + SYSTEM LOAD does not
exceed the value 255.  To do this, LTDRIVER only lets the AVAILAABILITY
term have a maximum value of 20 and the SYSTEM LOAD can only have a
maxmimum value of 235.  So, if both were set at their maximum values,
the LAT rating would be 20 + 235 or 255.

To start with, let's look at the first term.  It is dubbed the "AVAILABILITY"
term.

[MYTH]
If the login limit on a system is set to 100 and 50 people are logged in, I
would expect the LAT rating to be half way to 0.

[FACT]
This is not entirely true.  Remember, the AVAILABILITY term is only worth
20 points of the rating.  So, if 50 people out of 100 are logged in, then
the AVAILABILITY is calculated accordingly:

	AVAILABILITY = 20 * (50 / 100) = 10

Let's assume the system load is at a constant 100.  As people login, the rating
will drop slower than one would expect.  In the simplest case, let's say
that the interactive login limit for a node is 20.  With a system load of 100,
the rating will be (with no user's logged in):

	RATING = 20 * (20 / 20) + 100 = 120

Assume the system load stays constant as users begin to login.  Every time
a user logs in, the rating will only drop 1 point.  So, with 10 users logged
in, the rating:

	RATING = 20 * (10 / 20) + 100 = 110

It is a common misconception that the LAT rating drops drastically as people
log in.  This is not entirely true if the system load remains the same.  The
only exception to this is if the interactive login limit is reached.

When the interactive login limit is reached, the rating is unconditionally
set to 0.

Now, let's look at the load term.  It is the most important term
since it makes up 235 / 255 or about 92% of the entire LAT rating.

This frist thing to notice is that the system load term can only have a
maximum value of 235.  It *could* be smaller.  For example, a node that
has an interactive job limit of 64 will not use 235 as its upper limit.
It uses 64 as the upper limit of the SYSTEM LOAD.  This is observed by
looking at MIN(235,IJOBLIM) of the SYSTEM LOAD term.  So, if IJOBLIM is
below 235, then the value of IJOBLIM will be used.

Keeping this in mind, what is the absolute highest RATING a system with
an interactive (IJOBLIM) job limit of 64 can have?  First, you know that
0 users have to be logged in to get the maxmimum out of the availability
term.  Next, you know that the load average has to be at its lowest value
(0) indicating the system is completely idle.  Using those values (no one
logged in and a load average of 0) the highest load average a system with
an interactive job limit of 64 can have is:

	RATING = 20 * (64 / 64)   +   (MIN(235,64) * 100) / (100 + 0)
	RATING =     20           +   (64    *     100) / 100
                 AVAILABILITY           SYSTEM LOAD

As you can see, the maxmimum rating is 84 (20 from the AVAILABILITY and
64 from the SYSTEM LOAD).  If a system has an interactive job limit of 64
and the LAT rating is higher than 84, then something odd is going on (either
an incorrect calculation in LTDRIVER or someone is changing dynamic system
parameters - IJOBLIM - on the fly).  Chances are, someone changed IJOBLIM.

[MYTH]
A LAT rating of 0 means no one can connect to that node.

[FACT]
A LAT rating of 0 simply means that the system is "not very available" or
overloaded.  It has nothing to do with whether or not the system can be
connected to.  A system with a LAT rating of 0 can still be connected to
but one may not be able to login.

Let's continue to look at the SYSTEM LOAD term.  Note, that this is
calculated to be 100 if 1 job is computable, 200 if 2 jobs are computable,
etc.  By computable, this means a process that is either on a COM/COMO
queue or any process in one of the following states:

		MWAIT, COLPG, FPG, PFW

THIS DOES NOT TAKE INTO ACCOUNT ANY PROCESS THAT IS CURRENTLY RUNNING
ON A CPU.  This is a common misconception about the LAT rating.  It is
assumed that because the CPU is 100% busy that the LAT rating must be
real low because the system is loaded.  THIS MAY NOT BE THE CASE!!!

[MYTH]
Monitor or other system performance tools show the CPU to be 100% busy
and several processes are contending for the CPU.  The load rating should
therefore be very low.

[FACT]
While the above is a very general statement, it does NOT apply to all
situations.  Why?  Remember that the only processes counted in the system
load average are those in the COM queues AND THE PRIORITY MUST BE ABOVE
THE SYSGEN PARAMETER DEFPRI.

For example, a system with 20 batch jobs running at priority 3 (with
DEFPRI set to 4) may have the same LAT rating as a system that is totally
idle.

Moreover, when a process is CUR on a CPU, it is not any of the COMputable
states that LAT uses for its rating.  Therefore, a CUR process has ABSOLUTELY
NO BEARING ON THE LAT RATING ALGORITHM LOAD AVERAGE.  When several processes
begin contending for the CPU, some of them will be in the COMputable state
and that is when the system load average begins to fluctuate.  Having only
1 COMputable process using 100% of the CPU is not causing CPU contention
and therefore, this may not affect the LAT rating.

Keeping in mind that CUR processes are not counted as part of the LAT
rating algorithm, let's take a look at how another popular misconception
can lead to confusion.

Let's assume there is a 2 node cluster.  The first node (NODE1) is a VAX
6000 model 600 with 6 processors.  The second node (NODE2) is a VAX 6000
model 400 with 1 processor.  Both systems have 128MB of memory and for
now, assume that the memory penalty is not a factor in the load average
calculation.

Next, assume both systems are configured identically - SYSGEN parameters,
common UAF file, print/batch queues and interactive job limits.  For
the purposes of simplicity, let's assume both systems have an interactive
job limit of 200.  That means that the maximum SYSTEM LOAD either system
can have is 200 (remember MIN(235,200)).  Also, the maximum AVAILABILITY
is 20 so the absolute highest rating either system can have in this
configuration is 220.

Now, let's use some numbers in a simple load calculation (from this point
on, assume all processes used in SYSTEM LOAD calculations are running at
DEFPRI or higher).

On NODE1 - 75 users are logged in.
On NODE2 - 10 users are logged in.

One NODE1, 5 processes are compute intensive and on NODE2 only 2 processes
are compute intensive.  What will the load average look like?

First, let's look at NODE1.  Five processes are compute intensive.  Because
NODE1 has 6 CPUs in it, chances are, the SYSTEM LOAD will remain at its
highest value (200).  But wait, that doesn't make sense you say!  However,
if you think carefully, it does.  Remember, processes in the CUR state are not
counted in the SYSTEM LOAD.  Since there are 5 compute intensive processes,
chances are, each one is CUR on a processor in the VAX 6660 (and there is
even 1 processor to spare).  So, the system load average might look like this:

	RATING = 20 * (25 / 100) + 200

Even though 5 processes are running all out, it is possible that they are
not affecting the system load average at all.  The above shows a RATING of
205.

Now let's look at NODE2.  There are 2 processes that are compute intensive
but the system only has 1 processor.  So, chances are, the load average for
this system is going to be 100 (because 1 process will be CUR while the other
process is COMputable - waiting in a compute queue).  In this node's case,
the rating is has a little more math:

	RATING = 20 * (90 / 100) + ((100 * 100) / (100 + 100))
	RATING = 18              +      50

As you can see, the LAT rating for NODE2 is only 68 yet it only has 2
processes running and only has 10 users logged in.  This may seem to
defy logic but shows the point how COMputable processes can affect
the LAT rating algorithm.  The other system, has plenty of compute
horsepower (even though 75% of its user capacity has been reached).

MORAL of this story:

	BE CAREFUL WITH SMP SYSTEMS!!!!  The more CPUs in a system, the
	higher the chance that the system has a lighter load average.

This story also demonstrates another common problem.  A VAX 6000-600 with
6 CPUs is a pretty powerful machine.  Yet, by setting the interactive login
limit to 100 (the same as the VAX 6000-400 with 1 processor) this is
effectively making the machines "equal".  Make sure IJOBLIM is set
appropriately for each system in a VMScluster.

The story also demonstrates how COMputable processes (i.e. processes waiting
to use CPU resources) will drop the LAT rating quickly.  It has no relationship
to system load measured by other tools.

Let's take another example where the same 2 node cluster exists, but this
time, both systems are completely identical.  System number 1 (NODE1) is
a VAX 7000 model 600 - 1 processor.  System number 2 (NODE2) is also a
VAX 7000 model 600 with 1 processor.  For simplicity, assume the following:

NODE1 - 90 users logged in - 1 process is 50% computable
NODE2 - 20 users logged in - 1 process is compute intensive

On NODE1, the load average will be about 50 (because the 1 process is 50%
computable - if it were always computable, the load average would be 100).
The LAT rating looks something like this:

	RATING = 20 * (10 / 100) + ((100 * 100) / (100 + 50))
	RATING =     2           +       66

The rating for NODE1 is 68.

On NODE2, the load average will be about 100 because 1 process is
computable all the time.  LAT rating:

	RATING = 20 * (80 / 100) + ((100 * 100) / (100 + 100))
	RATING =      16         +        50

The LAT rating for NODE2 is 66!  Even though there are 70 fewer users logged
into NODE2, the LAT rating is slightly lower because the system load is
higher.

The system load is the dominating factor for the LAT rating.

[MYTH]
The number of users logged into a system should control the LAT rating.

[FACT]
As you can see from the above examples, the dominating portion of the
LAT rating is the load average - not the number of users logged in.

The memory penalty is simply a deduction to the LAT rating when a system
has less physical memory available for processes to use.

Note, if the LAT rating algorithm is not suitable to a site, then it
is possible to write a program (based on CSC's DYNRAT program) to calculate
the LAT rating and set a static rating based on that calculation.  The
LAT rating is not perfect but works well in most instances.