What makes a virtual environment immersive?
by Erica Driver and Sam Driver.
What makes a virtual world or campus, immersive learning environment, or 3D business application immersive? Immersiveness isn’t all or nothing. It’s not determined by whether the software used is a Web browser or thick client. Instead, it’s a continuum that is determined by 1) the degree to which the user’s senses are engaged, and 2) the desirability and meaningfulness of the activity in which the user is participating. Below is a description of factors that make virtual environments or experiences more or less immersive: visual, tactile, auditory, and collaboration and interactivity (see Fig. 1). A virtual environment doesn’t need to score high in all of these areas to be immersive, but the more “highs” it gets, the more immersive it is (see Fig. 2).
Fig 2: The Immersiveness Continuum
| Factor | Low Immersiveness | High Immersiveness |
| Visual | ||
| Rich graphics | The environment looks cartoony or avatars look strange or move in a disconcerting way. | Realistic-looking lighting, shapes, textures, avatars, plants, etc. At the high end, graphics are photorealistic. Or, for abstract experiences (e.g., chemistry and mathematics) the visuals contain a high level of detailed information. |
| Avatars | Users do not have graphical representations of themselves in the environment. | Users have configurable or customizable avatars with which they identify. |
| 3D environment | Much or all of the environment comprises 2D images. | The environment uses three-dimensional representations of geometric data. Avatars and objects take up and can move in 3D space. |
| Ability to control viewpoint | The user’s viewpoint into the environment is static or limited to a few pre-selected perspectives. | The user has full control over their visual focus in the environment. They can zoom and pan in all directions. |
| Physics | No physics engine, or a very basic one | A sophisticated physics engine that simulates properties like mass, velocity, gravity, friction, and wind resistance. The environment weather and collision detection. |
| Size of display | The display fills only part of the user’s computer screen | The display fills the user’s entire computer screen |
| Tactile | ||
| Haptics | No support for haptic devices | The user experiences the environment through the sense of touch, via a controller or input device. Through a handheld device, glove, etc. the user feels vibrations, forces, pressure, or motion. An example of this is the Wii controller. |
| Auditory | ||
| Voice | No built-in voice over IP. Or if the system has VoIP, it is not spatialized; instead, it sounds similar to a phone call. | Spatialized, 3D audio. When an avatar is standing to your avatar’s left, you hear that person’s voice in your left speaker. Voices of those whose avatars are closer to yours are louder than those who are farther away. At the high end, voice colorization allows users to modify the way others’ voices sound to make it easier to differentiate among speakers. |
| Non-voice sounds | Sound is mono. | Sounds are stereo and spatialized. |
| Collaboration and Interactivity | ||
| Integrated collaboration, communication, and productivity tools | The environment lacks functionality like built-in voice, screen sharing, collaborative document editing, etc., requiring people to leave the environment (e.g., using the ALT-TAB key combination on a PC to switch applications) to get their work done. | Within the environment participants can communicate with each other via public or private voice chat, local or group or private text chat, messaging, document and object sharing, screen sharing, etc. The applications and information the user needs to complete a task (e.g., have a meeting, deliver a presentation, collaborate on a model) are accessible from and can be displayed within the virtual environment (e.g., via screen sharing or real-time document editing). |
| Gesture and emotion | Avatars do not lip sync. Ability to express emotion visually is limited. Gestures are basic. | Avatars lip sync while users are talking. Users can express emotion visually through their avatars. Today this usually is done by clicking on a menu of icons but in the future it will become more natural through the use of cameras, which will project the user’s movements and expressions onto an avatar. |
| Interactivity | Objects in the environment are static. | Using the mouse or other input device, the user can click on an object to display an item or change the way an item behaves). The user can flip switches to rev up a turbine, sit in the driver’s seat and operate a vehicle, etc. |
© 2009 ThinkBalm. All rights reserved.


Subscribe via RSS
This is a SUPER start on defining “immersive” which is absolutely required if we are to establish Immersive Internet (and Immersive Intelligence and Immersive Learning and…) as valid discipline(s). This is a good springboard since the community can contribute on enlarging each of these technical factors (dimensions) for “immersive”.
1) I like the low-high scale. Perhaps we could expand this to a 5-point scale of immersive, where each level has a specific set of requirements
2) We also need a set of factors on the ‘soft’ side, like the impacts (feelings, thoughts) on a person. What is the term where you intimately identify with your avatar as being yourself? And, when you feel as if you were truly in the virtual environment rather than sitting at your desk in a boring office?
would add under non-voice sounds, appropriate background noise (wind, traffic, crowd noises, etc.)
Also, we aren’t far off from olfactory cue technology being available to the home user, and this could be huge.
Great Grid!
In the ThinkBalm LinkedIn community, there is an insightful comment stream on “Virtual World in a Browser”, which is debating whether 2D browser could support “immersive”. Needs to be folded into this stream.
See http://www.linkedin.com/e/vaq/6786617/2005729/6302983/view_disc/
Richard: Thank you for the great mindmap to add to this blog post and for your suggestions re: five-point scale and soft factors.
Dick: Good point about the background sounds. I agree, that adds to immersion.
here is an excerpt from a slide I use when I speak at conferences on this subject. The point of it is that “reality is where your mind is”. Technology that captures the user’s awareness(via a combination of multi-sensory variables) has succeeded in immersing the user in the target environment. I really think the definition is as simple as that, BUT, when we consider the effectiveness of legacy tools in capturing user awareness, relative to “virtual reality” tools, we see that VR really takes 2D environments out of the running. Here is the excerpt:
•According to Winifred Gallagher, author of the recently published “Rapt”, we constantly make decisions determining what we are going to pay attention to. Any events, experiences, and activities not within the scope of our chosen interest will not exist to us as we immerse ourselves in our selected targets of concentration and focus. What we choose to concentrate on defines our state of consciousness and becomes our Reality.
• The fact that our mind and body may be “in two different places at the same time” is not a unique characteristic of Virtual Reality, and does not therefore create the need to refer to this immersive technology as “Virtual Reality”
–If we are deep within our own thoughts, and no longer mindful of our physical surroundings, are we in “virtual reality” ?
–If we are reading a news article on a website, and are so engrossed that we forget that we are late for a meeting, do we say that “virtual reality” interfered with “physical reality”?
•The power of “Virtual Reality” software is in its ability to emancipate the mind from the body, making physical “reality” a minor element of our Federated Reality.
–Federated Reality has two parts:
•Consciousness
•Physical Existence
–For our purposes, we will always need to address the requirements of Physical Existence and Consciousness. However, it is our objective to relegate Physical Existence to nothing more than a set of requirements for accessing a medium that will manipulate the Consciousness away from Physical Existence
•In other words, people have bodies: eyes, hands, etc. We must build software that provides convenient physical access. But after t hat, the vast majority of our attention will be turned to capturing the full attention of the user
–We use “federated” to mean that the two parts of Reality have a partnership, but are not so close as to be synonymous. They are individualy self-sustaining.
•At Étape Partners we believe that there is only one relevant reality, and it is defined as:
– A context of stimulation that acutely focuses awareness and defines our state of consciousness.
I agree – this is very good, and deserves ongoing attention from the community (and Erica/Sam of course – grin).
While I see the need (and support the effort) to arrange these things in a continuum, and don’t disagree with the characterization of dipoles as such (e.g., ‘cartoony’ vs. ‘photoreal’), I think it’s important to clarify that what we’re doing, here, is specifying which platform characteristics tend collectively to induce a complex psychological state in users, and note that the subjective experience of immersion (and, I would argue, the capacity for immersion, which is highly variable) is independent of any particular platform characteristic. So any of these points are debateable — I’ve achieved immersion with _no_ graphics (in various forms of IM and online chat, and in text-based MMOs), with primitive graphics (Everquest Online), with photoreal graphics (SL), and with surreal graphics (World of Warcraft). So it’s clearly not about the graphics, per se. And yet — on the other hand — it _is_, because we’ve all noted the improvement in our own subjective experience of immersion, and in others, as graphics density improves.
I agree with Richard – I think we may need to look more closely at immersion. For starters, we might break it down this way:
Scenario – Immersion happens within a scenescape — an ‘online place’ that evokes notions of location, topology, geography. At the very simplest level, a named IM chat-room (no graphics) can be considered a ‘place where people meet.’ And this is, I think, the core of one continuum of immersion: the transition-point between thinking of an online software instance as a tool, and thinking of it as a ‘place.’
I suspect this is the only level at which platform characteristics absolutely determine the capacity of an application to ‘be immersive’ in the sense that we mean, and they do so at a very, very low and fundamental level — e.g., simply by maintaining instantiation and showing traces of human activity — well below considerations of graphics or communications modes and affordances.
Reference frame – Once placefulness is established, the next question is whether is the environment enables users to apply known operational reference frames to assist navigation, use-reasoning, etiquette/protocol and constrain task-specific language. i.e., Given that an IM chatroom can be construed as a ‘place,’ what does it take to then let us construe it also as a ‘classroom’ (to quote a reference frame with which we’re all familiar), and thus infer how to interact in it?
Continuity of self-representation – This is where avatars (of various sorts) come in — be they photoreal or as simple as user-selected ‘screen names,’ it seems as though we all agree that unless the self is represented in some continuous way in an online environment, the environment isn’t immersive.
Communications ‘flow’ – I’m undecided as to whether realtime communications is absolutely necessary for immersion, but certainly, even in asynchronous environments, the sense that one is engaging in conversation seems critical to engender a sense of immersion.
Social memory and validation of experience – For humans, places are about ‘who uses them’ and ‘what happens there.’ I’d argue that we’re more moved to acknowledge the immersive character of a platform/place if our memories of ‘what happened there’ are validated as placeful memories by other people. So there’s a sense in which the whole hierarchy of immersion — at any level above very basic platform conditions expressed in software — is modulated by social convention.
Not to complect or distort the dialogue, but it’s a little like ‘eroticism,’ isn’t it? There’s the mechanics, and then there’s the infinitely-complex and lyrical social web we build on it — along with the sense that immersion, a la pornography, is ‘hard to define but I know it when I see it.’ (grin)
Brian: What a great quote: “Reality is where your mind is.” This makes me think about other succinct ways to describe immersion. In a nutshell, you’re immersed when your attention is captured and held, right?
John: Like you, I see great value in describing an immersive environment not as a thing but as a place. And I like where you’re going with describing the place more and more specifically. You’ll see in the report we published today titled “How to Write a Business Case for Immersive Technology Investments” (http://www.thinkbalm.com/2009/09/08/how-to-write-a-business-case-for-immersive-tech-investments/)that we recommend that advocates and implementers describe the solutions they are proposing as a collaborative workspace, a convention center, rehearsal studio, or facility operation center.
Today I saw a couple of tweets on Twitter offering formulas for immersion, in the context of virtual environments.
* From Twitter user @SurrealNumbers (http://twitter.com/SurrealNumbers/), whose Twitter bio says, “Mathematical scientist engaged in research, design, and development for augmented tourism, virtual education, and immersive information.” He or she describes the formula as “Immersion = (user control) + (breadth and depth of detail) + (degrees of freedom) + (self expression) – (sterility) Q.E.D.
* From Twitter user @sonicviz (http://twitter.com/sonicviz/), who is Paul Cohen of SonicViz. Paul’s formula is “Connectivity + Context = “Immersive Killer App”
I think the formulas exist already, no need to reinvent them. We will use Contextual Innovation to derive our Immersion formulas. I present two examples: Casinos and Hypnosis.
Casinos are all about immersion, further, they are also very much interested in “Sticky Immersion”. So lets show a couple of casino formulas:
Casino Immersion = Multi-sensory environment + elimination of “outside world” reminders + attentive needs fulfillment + “make me feel important”
Casino Sticky Immersion = Casino Immersion + Effective ability to meet expectations + Surpassment of expectations in an impactful way + delivery of services better than the next-best provider
Hypnosis is a blunt form of demonstrating Federated Reality(your awareness/mind are detached, but still in partnership with your body). Hypnosis may have its own goals, but the first objective is to shift patient awareness away from physical stimulation.
Hypnotic Immersion = patient willingness + effective tool(i.e. therapist)
VR must deliver the best of a Casino, and the willingness of a Hypnosis patient
VR Immersion = Union of(Casino Immersion + Hypnotic Immersion)
VR Sticky Immersion = Union of(Casino Sticky Immersion + Hypnotic Immersion)