Path: news.mitre.org!blanket.mitre.org!philabs!newsjunkie.ans.net!newsfeeds.ans.net!news-was.dfn.de!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!news.idt.net!news.voicenet.com!news.new-york.net!news.columbia.edu!news.cs.columbia.edu!versed.smarts.com!usenet From: Jerry Leichter Newsgroups: comp.arch Subject: Re: IA64 Self Virtualizable? Date: Thu, 20 Nov 1997 15:07:19 -0500 Organization: System Management ARTS Lines: 98 Message-ID: <34749877.4FED@smarts.com> References: <64q6l9$q0v@crl.crl.com> <64u9q0$jlu$1@xs155.wins.uva.nl> NNTP-Posting-Host: just.smarts.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Mailer: Mozilla 3.01Gold (X11; I; SunOS 5.5 sun4d) To: Marcel Beemster | >Does anyone know whether the IA64 instruction set will be | >self-virtualizable? | | Ahum. Would anyone be so good as to DEFINE self-virtualizability of an | instruction set architecture? | | Is it a property of the hardware to be its virtual self? Mainly. There's a whole PhD disseration (at Harvard, in the early '70's I believe, on this subject). The issues, and even the definitions, can get subtle. In one sense, self-virtualization is trivial: Write, in the ISP of the processor, a simulator for the full ISP. Now run the "guest" OS's under the simulator. Complete virtualization, complete safety. Horrible performance - it's tough to do a simulator that won't average at least 5 real instructions per simulated instruction (i.e., a factor of *at least* 5 slowdown), and simulating things like I/O get even more expensive. To be usable, a self-virtualizing system has to let the hardware handle the vast bulk of the instructions executed - say, >95% - with no extra overhead. One line is often drawn between user and privileged modes: Let all user-mode code run on the raw hardware, but simulate all privileged code (by trapping all instructions that switch to privileged mode and simulating until a return to user mode). That's better, but still expensive - and may not even be workable. (User-mode code may have direct read access to memory owned by privileged code. If there are constraints on where this memory may appear address range, you may find it difficult to find a place to "hide" the hypervisor. In practice, any good VM system can let you fake this, though it may call for a lot of map swapping.) Simulating all the privileged code *still* has way too much overhead to be really usable. *Most* privileged code is doing non-privileged things; it's just that it has access to areas of memory that are normally out of bounds to user-mode code. The next step, then, is to run the privileged code *but in user mode*. When the hypervisor sees the guest code try to enter the privileged state, it swaps the memory mappings around so that the guest's "kernel areas" are now accessible. Then it lets the guest continue. When the guest trys to execute an instruction that is *really* privileged - access an I/O device, change the virtual memory mappings - the hypervisor gains control and simulates the effects. An ISP gotcha' that can kill you here: If the instruction that switches from privileged mode back to user mode isn't privileged, the hypervisor will have no way of knowing when the guest OS is returning code to the user. That will allow user mode code within the guest to act as if it were part of the guest OS. Bad news. A more subtle problem is a non-privileged instruction that lets a user- mode program determine what mode the machine is in. The OS will expect to be told "kernel mode". (This has real uses, as when an OS routine is designed to be callable from both user and kernel mode, but with additional checking in user mode.) The x86 has one instruction of this form, making it non-self-virtualizable. (The "fix" is to make that instruction privileged. Then the hypervisor traps it, checks to see what *simulated* mode the guest is in, and returns the appropriate value.) I/O is a big headache, especially on machines that use memory-mapped device registers: The hypervisor has to make those pages inaccessible, trap the accesses, then figure out if they are permitted and what they should really mean. The old 360 channel program design, as John Mashey pointed out in an earlier posting, does well here because the hypervisor gets essentially full information about the actual I/O to be done in a single trap, rather than (in the worst case, for simple serial devices) having to get involved once per character). BTW, the x86 has the *potential* for a good solution here, since it's possible to control the accessibility of I/O devices (at least those that use I/O ports, rather than memory mapping) on a fine-grained basis. I forget the details, but you can set things up so that the innermost ring is the hypervisor, and the next outer one has access to exactly the I/O devices it is allowed to manipulate. > Is it a property of the OS to be virtually someone else? Generally, true self-virtualization is taken to mean that *any* OS can be supported transparently. If you're willing to modify the OS - to ensure that it doesn't use that x86 instruction that looks at the "wrong" processor state, for example - the problem is *much* easier. Of course, then you can't run the "stock" OS that you would normally use on the raw hardware within a virtual machine for debugging or what have you. Practical self-virtualizing systems (of which VM/370 is probably the only real example, beyond - I'm sure - some experiments here and there) tended to *support* any OS, but to *favor* OS's that were aware of the hypervisor and cooperated with it. (Think about what happens when the guest and hypervisor both try to do page management independently ... much, much better for the guest to tell the hypervisor what it's up to.) The difference in performance can be substantial - that last 1% (or whatever) needed for full self-virtualization can really cost you. > Is it is property of the OS that requires hardware support? > If so, what support? -- Jerry