Sunday, February 10, 2008

Understanding Ineractivity, Dag Svanæs


Understanding Interactivity
Steps to a Phenomenology of
Human-Computer Interaction
Dag Svanæs

Preface and Acknowledgments
”What is interaction?” This simple question has been intriguing me for more than a decade.
The present work can be seen as my attempt at answering it. The question emerged from a
growing dissatisfaction with not really understanding the subject matter of my professional
life as software designer, tool developer, lecturer, and researcher. I felt that there must be
more to interaction than what is revealed by straight forward definitions like "the set of
processes, dialogues, and actions through which a human user employs and interacts with a
computer" (Baecker & Buxton, 1987, p. 40). The question became an obsession to me, and
led me on a long quest for the ”essence” of the modern interactive computer.

The present work sums up research done over a period of six year. During these years,
both my understanding of the subject matter and my research strategy changed. When time
came to present my findings, I was left with the difficult question of genre: to what extent
should the actual process be made explicit? On the one hand, by being journalistically true
to all the details of the research process, I would probably bore the reader with matters of little
relevance to the subject matter. By keeping strictly to a scientific structure with its insistence
on starting out with a well-defined hypothesis, the inductive nature of the research would be
hidden and much of the richness of the material would be lost. As a compromise between the
two extremes, I have kept to a relatively standard structure, while at critical points making
explicit the inductive nature of the process.

Recent texts on research methodology use the term reflexivity for the practice of making
explicit the researcher’s unavoidable bias (Smith, 1996 and King, 1996). This rests on the
assumption that research will always be colored by the bias of the researcher. It is
consequently better to make this bias explicit and allow for the reader to make judgements,
than to assume that one is able to take an objective stance, the authors agrue.

As our professional bias to a large degree is shaped by the work we have done, the people
we have met, and the books we have read, I have found it necessary to include a short sketch
of my professional background. It will hopefully serve as a clarifying background for the
reader, and provide a necessary framing of the study. Presenting my history in this way might
seem irrelevant or even somewhat exhibitionistic. My intention is to give a necessary feel of
the technological, intellectual, and cultural setting in which my current research questions
have emerged.

My first encounter with the problems related to graphical user-interface design dates back
to 1979. I was then a student at the Norwegian Institute of Technology. I was hired as a
summer intern at their computing center to design and implement an interactive graphical
editor to be used for creating nice and colorful transparencies. The editor was implemented in
viii Understanding Interactivity

FORTRAN on a NORD-10 mini-computer with a Tektronix storage-display terminal as
output and a tablet as input.

Four years later I had the opportunity to play around with the predecessor of the
Macintosh, the Apple Lisa computer. This was for me a revelation and the beginning of a
long-lasting love affair. The bit-mapped screen and the mouse opened up possibilities for
applications that I could only dream of with the Tektronix.

In 1984, the Norwegian Ministry of Education initiated a task force on the introduction of
computers in the Norwegian school system. Ordinary teachers were given summer courses in
software design to enable them to take active part in the future development of educational
software. The teachers worked in groups and came up with suggestions for design on paper.
For the second summer course in 1985, I was invited together with four other programmers to
implement prototype versions of the teachers' designs on the PC-like machines used by the
schools at that time. The machines had at that time no windowing system, and the graphics
and interaction had to be programmed from scratch. After one week of typing in co-ordinates
and writing simple PASCAL programs, I decided to create a tool to enable the teachers to
build their user interfaces themselves. The first version of the tool was implemented in
PROLOG that summer during a couple of nights of hard work. It had a text-based user
interface, and could consequently not be used directly by the teachers, but it highly increased
the productivity of the programmers. The head of the task force saw the usefulness of such a
tool, and I was hired to lead the design and implementation of a production version. The
resulting interface builder was ready in spring of -86, and was named MOSAIKK. It was used
by approx. 1000 teachers and software developers in courses and software development teams
all over Scandinavia over the following five years (Svanæs and Thomassen, 1991).

I was frequently invited as lecturer and adviser in these summer courses. This gave me a
unique opportunity to observe how the interface builder was being used, both by programmers
and by computer-naive users. The MOSAIKK editor included a very simple event-based
scripting language for expressing user-interface behavior. All tool users learned this
formalism relatively fast, but I started wondering whether some other way of expressing
behavior could have been more intuitive for the non-programmers. I had designed the
scripting system to be very easy to use, but the users still had to learn programming in some
sense. Pondering with that question lead me for the first time to the conclusion that I did not
really understand very much about the nature of the medium I was working with.
In the mid-80s, I studied social anthropology for three semesters. This included a short
field study on encounter groups in the new-age movement, seeing it as a cultural
phenomenon. This led me eventually into the organizing committee of the 1985 Scandinavian
summer conference on humanistic psychology. Humanistic psychology at that time included
such things as psychodrama, Jungian psychology, gestalt therapy, music therapy, Tai Chi
Chuan, and Marxist network therapy. At that time, all this was new in Scandinavia. Today it
is part of the big global supermarket of self-realization techniques.
In fall 1986, I started teaching at the Department of Computer and Information Science in
Trondheim. This brought me in close contact with the academic world, including the HCI

Preface and Acknowledgements ix
community. Unfortunately, I found very little in the early HCI literature that satisfied my
curiosity concerning the "essence" of the modern computer. Help came from another
direction.

In 1986, Winograd and Flores published their now classical "Understanding Computers
and Cognition". This book was for me a turning point in my intellectual development. It
opened up a totally new world for me, and became the Rosetta stone I needed to be able to
apply in a fruitful way my humanistic interest to my work with computers. They convinced
me that I should dig deeper into 20th century western philosophy, especially phenomenology
and hermeneutics. I consequently started reading more philosophy, focusing first on
Heidegger and later on Merleau-Ponty. Here I found at least some answers to my
philosophical questions, but more important, under the influence of current continental
philosophy I hopefully freed myself of some of the "positivistic"/"Platonic" bias from my
computer-science training.

Returning to the question on interactivity, reading Heidegger and Merleau-Ponty brought
me back to focusing on the world of the end user; but now knowing why this is necessary, and
with some guidance on how it should be done. The question "What is interaction?" had for me
slowly been transformed into the question "How is interaction experienced?". The first
question is posed, and can only be answered, within "the world of ideas", while the latter asks
for empirically-based answers.

It is also worth mentioning that as a Scandinavian computer scientist I am to a large
extent influenced by the so called "Scandinavian School of Systems Development", or the
Participatory Design tradition as it has been coined in the US. I feel both politically and
philosophically very close to this tradition, and I have had great pleasure in knowing some of
its proponents.

In the early 90s I was involved as a systems designer in a large EU project on software
for the disabled, the COMSPEC project (Svanæs, 1993b). One of the results from the project
is a software tool for building and configuring computer-based communication devices for
physically disabled users. One of the user groups for the tool was computer-naive therapists
and teachers. Here I met again the problem of finding intuitive ways to let non-programmers
express interactive behavior. We ended up with a combination of a hi-fi metaphor for
expressing data flow, and a simple scripting language for detailed control. The usability tests
showed that it worked pretty well, but personally I was only partly satisfied. There should be
some other way of doing it. It is my hope that the current work will open the design space for
similar projects in the future, i.e. making more design alternatives available.

At the 1993 CHI conference in Amsterdam, I presented a short paper on some of the
early ideas leading up to this study (Svanæs, 1993a). This brought me in contact with Bill
Verplank from Interval Research and Stephanie Houde from Apple Computer. Two years
later I got invited to spend a fall sabbatical with Apple Computer’s Advanced Technology
Group (ATG) in Cupertino, California. ATG was at that time headed by Don Norman, and
was a very inspiring place to be. This gave me an opportunity to discuss my ideas with some
of the most knowledgeable researchers in the field. I am in great debt to Stephanie Houde,

x Understanding Interactivity

Dave Smith, Allan Cypher, Jim Sphorer, Alan Kay, Don Norman, and many others for
inspiration and feedback. Also special thanks to game designer Scott Kim for interesting
discussions, and for introducing me to his very exciting world.

After three years back in Trondheim, I was in 1998 again invited to Silicon Valley. This
time I spent a winter semester at Interval Research and Stanford University, invited by
Verplank and Winograd. At Interval, I was involved in work with Tangible User Interfaces
and the phenomenology of interaction. At Stanford, I was involved with Winograd’s
“phenomenology course” based on his book. I was also involved with Verplank on his User-
Interface course. I had come full circle. Having had the opportunity to discuss my ideas and
work with these giants is a rare privilege that I could not have dreamed of when I read
Winograd and Flores’ book 10 years earlier.

Last, I would like to thank all colleagues and friends who have given me feedback and
encouraged me to work on. This includes Statoil and PAKT that provided me with office
space in an inspiring environment.

A very special thanks goes home, and to my kids Germain and Aleksander for their
patience with me over the last years.
Trondheim, December 1999
Part I
Theory

Part I, Theory 1
Chapter 1 Introduction
”This new ‘metamedium’ is active...
We think the implications are vast and compelling.”
Alan Kay and Adele Goldberg, 19771.
On most popular personal computer platforms, a variety of multi-media tools are currently
available for doing interaction design. These are easy to use, require little or no skill in
programming, and range from editors for pixel graphics and animation, to tools like
MacroMedia Director for integrating the different media resources. Most of the tools provide
excellent support for graphics, sound, and video. The problems arise when designers want to
be creative concerning interactivity. If a designer wants to create interactive solutions that
were not imagined by the tool makers, he or she has to make use of a scripting language like
Lingo, or even leave the tools all together and do programming in traditional programming
languages like C++ or Java.

Most designers do not have training as programmers, and for these users programming
becomes a barrier that cannot be crossed without detailed help from a professional
programmer. If such help is not available, the designer has hit the wall and has to settle for
solutions with less interactivity. As the potential for interactivity is the most powerful feature
of the computer compared to other media, this is a very unfortunate situation.
In "Drawing and Programming" Laursen and Andersen (1993) describe the problems they
had with the design and implementation of a multimedia system about the Scandinavian
Bronze Age. To illustrate how a landscape was experienced by people in the Bronze Age,
they introduced the concept of Interactive Texture. The idea was quite simple:
"In the Bronze Age, the geography had a very different meaning from nowadays.
While we now see water as a hindrance to locomotion, and firm ground as a help, the
situation was to some extent the opposite at that time when water united (because of
boats) and land divided (because of large forests and moors). We can let the user
experience this through his fingers by making the cursor move differently in different
areas. If the spot is on land, it travels slowly, while it goes quickly if it is on sea." (p.
260)
1 (Kay and Goldberg, 1977, p.254)
2 Understanding Interactivity
To implement this feature they found the Hypermedia tools they were using quite inadequate,
Instead, they had to do advanced scripting. From this and similar experiences they conclude:
"Logical-mathematical intelligence is of course necessary for programming, but it
brings forth sterile and boring products... The real solution is to invent programming
environments and to create a style of programming that artistic people find
themselves at home with ... to remove programming from the clutches of logicalmathematical
intelligence, and hand it over to musical and spatial intelligence". (p.
262)
Computer literacy
Alan Kay points to the same problem in the preface of the book "Watch what I do" (Cypher,
1993). The book sums up a decade of research in end-user-programming. He writes:
“The term "computer literacy" also surfaces in the sixties, and in its strongest sense
reflected the belief that the computer was going to be more like the book than a Swiss
army knife. Being able to "read" and "write" in it would be as universally necessary
as reading and writing became after Gutenberg.” (p. xiii)
Having described "reading" he continues:

“’Writing’ on the other hand requires the end-user to somehow construct the same
kind of things that they had been reading - a much more difficult skill. ... One of the
problems is range. By this I mean that when we teach children English, it is not our
intent to teach them a pidgin language, but to gradually reveal the whole thing: the
language that Jefferson and Russell wrote in.... In computer terms, the range of
aspiration should extend at least to the kinds of applications purchased from
professionals. By comparison, systems like HyperCard offer no more than a pidgin
version of what is possible on the Macintosh. It doesn't qualify by my standards.” (p.
xiii)

There is little reason to believe that the current technological development alone will change
this situation. Most of the popular windowing systems have internally grown increasingly
complex during the 90s. The programming skills needed to implement inventive behavior on
top of the current windowing systems now far exceeds what a curious user-interface designer
can learn in her spare time.

A modern personal computer today consists of layer upon layer of constructed systems of
abstraction, from assembler languages, through programming languages and user interface
environments, up to the visible levels of the applications in use. The structure and content of
these layers is the result of thousands of design decisions made by numerous programmers
and systems architects over a long period of time. It is by no means determined by the
hardware at hand, as a lot of different systems of abstractions are possible. In constructing
these abstractions, the systems developers were constantly confronted with the trade-off
Part I, Theory
between simplicity and expressive power. For every new layer that was added to the software
system, some part of the design space was lost.

The complexity of the modern computer makes it impossible to bring the full expressive
power of the lowest levels available all the way up, but on the other hand, adding new layers
often makes it practically feasible to implement products that would otherwise require
enormous efforts in programming. From a theoretical point of view, no part of the design
space is lost when a new level is added, as it is always possible to mix levels. From a practical
point of view however, most users of a certain level do not master the levels below. In
addition, their imagination is often shaped by the abstractions that are being made available to
them.

From the point of view of a computer user with little or no skill in programming, who
wants to create interactive software, the current technological development has created the
following situation:

• The simplifications and abstractions made by the tool designers make it practically
impossible to experiment with and implement many interactive solutions that are
technically possible.
• The user’s image of the design-space concerning interaction is fragmented and
incomplete. It is shaped by the tools they have been using and by the solutions they
have seen. Consequently, many solutions that are technologically possible are
"unimaginable" to most interaction designers.

At the root of these problems lays our metaphorical understanding of the computer. We
conceptualize the computer through metaphors (e.g. information system, hyper-media,
communication medium), and externalize this understanding in the conceptual model
underlying the systems software (e.g. the desktop metaphor, World-Wide-Web, e-mail). We
thus “freeze” a certain understanding of the nature of the computer, and this understanding is
reinforced every time a new piece of software is created within one of the existing structures.
To open up the design space, it is necessary to see beyond the metaphors and do an
investigation of its technology-specific properties.
1.1 Beyond Metaphor

New phenomena are often first understood as modifications and combinations of phenomena
we already understand. This is also the case concerning new technologies and media. The
telephone was first envisioned as a one-to-many medium for broadcasting concerts directly to
the homes. The very first motion pictures were taken in front of a stage with performing
actors. Both the telephone and film rapidly became separate media with no need to be
understood metaphorically with reference to other media or technologies. They are now used
metaphorically to describe new media and technologies.
As a medium, the modern personal computer has not yet reached a similar level of
maturity. Its "individualization process" is still in its early stage. A good indication of this is

4 Understanding Interactivity

the way in which the computer largely is described metaphorically as modifications and
combinations of other media (e.g. multi-media). In its early days in the 50s and early 60s, the
computer was described as calculator. Engelbart (1988) might with his pioneering work in the
60s and 70s have been the first to see it as anything else with his metaphor of the computer as
”brain augmentation”. Today the computer is understood metaphorically as simulator, theater,
tool, sign-system, art-medium, type writer, multi-media machine, and a window to the
Internet, just to mention a few.

If we also include the perspectives and scientific traditions that have been made relevant
for understanding the medium in use, we have to include cognitive psychology, anthropology,
activity theory, Marxism, sociology, and application of the work of philosophers like
Wittgenstein, Heidegger, and Searle. The modern personal computer is like a crystal ball that
indirectly reflects images of current technology, culture, and science, but little of the crystal
ball itself. The situation is similar to the old Indian myth about the blind men and the
elephant. The blind men's descriptions of the elephant tell us more about the men, their
culture, and their physical environment than about the elephant.

Since the modern personal computer with bit-mapped display and mouse emerged in the
research labs in the late 70s, the PC has become part of the everyday environment of millions
of people, both at work and at home. Despite its enormous popularity, there have been
relatively few attempts since the "pioneer era" at developing a deeper understanding of its
media-specific properties.

In (Verplank, 1988) William Verplank sums up the experiences from the design of Xerox
Star, the first graphical workstation to reach the market. As an experienced graphics designer,
he observed in the late 70s that he was now working with a new medium that created new
design challenges. He described the medium as:

"dynamic, manipulable graphics with complex behavior". (p.370)
At the same time Kay and Goldberg (1977) reflected on the properties of the new medium in
their paper "Personal Dynamic Media":
"...This new "metamedium" is active... this property has never been available before
except through the medium of an individual teacher. We think the implications are
vast and compelling". (p. 254)
They saw that the modern personal computer with high-resolution graphics and mouse input
differed from all previous media mainly in its potential for interactivity. But what does
interactivity mean in this context?
Part I, Theory 5
The interactive dimension
A first and necessary step to take before this question can be answered, is to define the
relations between the terms Interaction, Interactive, and Interactivity. An interaction involves
at least two participants. In the context of human-computer interaction, the human is
interacting with the computer. I define an artifact to be interactive if it allows for interaction.
I further use the term interactivity to denote the interactive aspects of an artifact. The relation
between interactivity and interactive consequently becomes the same as between radioactivity
and radioactive: Uranium is radioactive; Madame Curie studied radioactivity. Modern
computers are interactive; The current study is about interactivity. Interactivity can both be
used as a noun to signify a general phenomenon, or to signify a property, as in “the
interactivity of the modern computer”. This is in the same manner as radioactivity can both
refer to a general phenomenon, and be used as in “the radioactivity of Uranium”.
One way to approach interactivity as a phenomenon is to start with the notion of "look
and feel". The term has become more or less synonymous with how the term style is used in
other design disciplines. In a concrete sense, the "look" of a GUI is its visual appearance,
while the "feel" denotes its interactive aspects. Designing graphics for a computer screen is
not very different from designing for other visual media. As already noted by Verplank
(1988), the fact that the screen is built up of a limited number of pixels provides some
interesting challenges. Along the visual dimension we can draw on rich traditions in art,
graphics design, advertisement, film, and television. For centuries people have struggled
with visual media. Our current technical, aesthetic, artistic, and psychological knowledge is
the culmination of the lifework of thousands of people over the last 2-3000 years. Compared
to this, the ”feel” dimension has hardly been investigated.
Is it fruitful to study the “feel” dimension of interactive media as a phenomenon separate
from the visual? How can we study interactivity in isolation as a “pure” phenomenon when it
requires appearance? It is true that interactivity requires appearance, but it requires
appearance no more than color requires form, or form requires color. The latter has never
stopped us from seeing the "yellowness" of the sun as a property separate from its
"roundness". In the world of the concrete, a form will always have a color, and a color can
never fill the entire universe. We never find pure form or pure color in nature. They are
abstractions, but that does not make them less useful as concepts.
Figure 1 shows an example of how form and color can be varied independently in a
formal analysis similar to what was done by the modernist painters2 early in this century.
2 See Chapter 4.2 on Kandinsky and the modernists.
6 Understanding Interactivity
Form
Color
Figure 1. The design space for three forms and three colors.
With three forms and three colors we get a design space of 3x3 possible combinations. What
could be a similar example with appearance and interactivity on orthogonal axes?
On Light Light
Behavior
Off Light Light Appearance
Off Light Light
On Light Light
Off Light Light
OOffn LiLghigtht LiLgihgtht
Toggle
Pushbutton
Double click
Toggle
Figure 2. The design space for three appearances and three behaviors.
Let us pick three standard user interface components of similar complexity: a button, a
checkbox, and a radio button. Let us in the same fashion pick 3 different interactive
behaviors: push-button behavior, single-click toggle behavior, and double-click toggle
behavior. As illustrated in Figure 2, this gives rise to nine distinct combinations of appearance
and behavior.
This example indicates that it is meaningful to study interactivity as a phenomenon
separate from form and color.
Part I, Theory 7
1.2 Methodology: How to Study Interactivity?
As Bødker has pointed out (Bødker, 1990), computer science has always been multidisciplinary
in that it has borrowed from other fields. Borrowing from other disciplines
always involves elements of selection, translation, and synthesis. These processes are by no
means straightforward. In its early days, when the research problems were mainly related to
making the computer work in a purely technical sense, computer science borrowed mainly
from formal disciplines like logic, mathematics, and linguistics. It took fairly long before
computer scientists had to take seriously the fact that computer users are human beings with
bodies, minds, history, culture, language, and social relations. Today, a lot of the research
problems are related to how computers are used. Computer science consequently now
borrows from the humanities and the social sciences.
Learning from history, I do not expect new insights concerning interactivity to emerge
from within the current computer-science tradition alone. I have consequently found it
necessary to seek inspiration from outside my own field, with all the complications involved.
The structure of the sciences
One way to start a scientific investigation is to look for the largest possible picture. In this
case that would be to have a look at the structure of the sciences. In “Knowledge and Human
Interests”, Habermas (1971) identifies three "knowledge interests" working as driving forces in
science. He uses the terms technical interest, hermeneutic interest, and emancipatory interest:
• The technical interest is governing research aimed at improving our control of nature
and/or society. Most research in the natural sciences fit this description.
• The h ermeneutic ("practical") interest aims at getting a deeper understanding of a
phenomenon, not focusing on the "usefulness" of such an endeavor from a technical/
economical point of view. Most research in the humanities fit here.
• The e mancipatory interest is governing research aimed at removing repression and
injustice by unveiling the power relations in society. Such research is always
political in some sense or another. Examples of such research can be found in the
Action Research tradition in organization theory.
Figure 3 illustrates how these three knowledge interests relate.
8 Understanding Interactivity
Technical Hermeneutic
Emancipatory
(Understanding,
Interpreting)
(Controlling,
Predicting)
(Social change,
personal growth)
Action
research
Case studies,
Reflections
Technological
research
"It works!" "I understand it!"
"It helped others!"
Figure 3. Habermas’ three “knowledge interests” in science.
For Habermas all three interests are equally “scientific”, and together they form science as a
whole. Traditionally, computer science has had a strong bias towards a technical interest and
the relevance of a project has been judged solely by its technical usefulness. There is no
reason why this should continue to be the case in the future.
Research aiming at understanding a phenomenon is in Habermas' terminology driven by
a hermeneutic interest, and has historically belonged with the humanities. It
should not come as a surprise that most work related to computers and computer
usage driven by a hermeneutic interest have been initiated by researchers belonging
to, or having training in, one of the humanistic disciplines (e.g. Andersen, 1990). Examples of
research in computer science done from an emancipatory knowledge interest can be found in
the Participatory Design tradition (see Ehn, 1988).
Research methodology in related work
Since the mid-80s there has been an increased interest in digging below the technological
surface of computer-related research problems. These works are relevant both concerning
methodology and content. The following is a short list of some of the most important early
works within this emerging "critical/reflective" tradition in computer science, focusing
on overall research methodology:
• In (Winograd and Flores, 1986), the authors make an analysis of the implicit
assumptions of the AI research tradition, and sketch an alternative theoretical
foundation for the design of computer systems. They explicitly express technical
Part I, Theory 9
applicability as an aim of their investigation, but there has been a dispute as to how
useful their practical advises actually have been (Suchman, 1991). I use their general
methodological approach as an inspiration: They start out by making explicit some
of the implicit assumptions about humans and computers within the AI community
(as they see it). They continue by introducing three other scientific traditions, and
reformulate the AI project within these frameworks. They then draw some general
conclusion as to how these new insights could be applied to practical design.
• In (Suchman, 1987), the author is studying some implicit assumptions within the
computer science community concerning human cognition and man-machine
communication. She also sketches out an alternative view, in this case by drawing to
the reader’s attention the importance of taking into account the situatedness of
human activity. She follows Winograd and Flores‘ structure with one major
exception: the use of a detailed empirical case study. By underpinning her arguments
with references to empirical data from everyday man-machine communication, she
adds a new depth to her investigation.
• In (Turkle, 1984) and (Turkle, 1995), Sherry Turkle studies the computer as a
cultural artifact in different subcultures. Methodologically she belongs within the
social science tradition in that she does not state technical applicability as an aim of
her research. Her studies differ from Suchman's in that she to a larger extent uses the
empirical data inductively. Where Suchman uses her case to illustrate a conclusion
she has already made, Turkle enters into the data analysis without a well-defined
hypothesis to test.
• In (Ehn, 1988), the author sums up and reflects on the early years of “Participatory
Design” in Scandinavia. He re-frames systems development from three positions:
Marxism, the philosophy of Heidegger, and the late work of Wittgenstein.
Throughout the book, he keeps a focus on the relations between work, workers,
designers, and systems; and does not hide that the political dimension of the research
is a further democratization of the workplace.
A common methodological denominator of these studies, and of the work by Bødker (1991),
Laurel (1991), and Andersen (1991), is that they reframed computer-related problems within
theoretical frameworks that were at that point not made relevant to computer science (i.e.
Activity Theory, Dramatic theory, and Semiotics). In addition to the reframing, some of the
authors try to ground their resulting conclusions in empirical findings (i.e. Andersen, Ehn,
Suchman and Turkle).
All works mentioned show a strong hermeneutic knowledge interest, while (Winograd &
Flores, 1986) and (Bødker, 1991) in addition explicitly express a technical knowledge
interest. The emancipatory interest is most explicit in Ehn’s work, but is also largely present in
the work of Winograd & Flores.
10 Understanding Interactivity
Overall research methodology
The current study belongs within this "critical/reflective" tradition in computer science
concerning overall research methodology:
• It shares with all the above mentioned studies a reframing of the research problem
from fields outside of traditional computer science.
• Of the above mentioned, it shares with Ehn, Suchman, and Turkle the use of detailed
empirical studies.
• It differs from all the above studies except Turkle’s in that it does not start out with a
strong hypothesis of what will be found, but lets the research question be gradually refined
through the research process. This inductive research strategy has similarities with how
designs gradually evolve in iterative design processes.
• The study differs from all the above studies in that the empirical data are from
experiments. The detailed rationale for this choice is given in Chapter 5.
In the terminology of Habermas, the current study has primarily been driven by a hermeneutic
knowledge interest, in that the aim has been to broaden our understanding of interactivity as a
phenomenon.
At the same time, the pure curiosity has been followed by a technical knowledge interest.
The new knowledge is intended to enable tool developers to construct better software tools to
support interaction design.
The research has to a much lesser degree been driven by an emancipatory knowledge
interest. The focus has not been on the relations between people or groups of people. The data
are from experiments where the power struggle of real-world systems development and use
has to a large degree been eliminated. This is not to say that I have intended the research to be
totally neutral concerning current design practice, but the scope is relatively narrow compared
to the early work in the PD tradition. For example, my suggestion that dance classes should be
included in the curriculum of interaction designers (see Chapter 12), can hardly been
described as an upheaval of the current order of things.
1.3 The Larger Cultural Backdrop
The current research draws on many sources and traditions. I have included next a short
presentations of the four most relevant traditions. It will hopefully frame the research for
readers not familiar with these traditions:
• The AI debate and 20th century philosophy.
• The user-centered design tradition.
• The Scandinavian systems development tradition.
• Media studies, and the work of the abstract painter Kandinsky.
A more in-depth presentation of the "critical" work of Winograd, Flores, and others can be
found in Chapter 2. Heidegger is discussed implicitly in the presentation of the work of
Winograd and Flores. More information about the user-centered design tradition in computer
Part I, Theory 11
science can be found in (Norman and Draper, 1986, Nielsen, 1993a, and Nielsen, 1993b). The
participatory design tradition is introduced in (Ehn, 1988) and (Greenbaum and Kyng, 1991).
Chapter 3 gives an introduction to Merleau-Ponty. More about Kandinsky and media studies
can be found in Chapter 4.
The AI debate and 20th century philosophy
Philosophy is one of the most important sources for getting alternative perspectives on a
problem. In their critique of the Artificial Intelligence (AI) tradition, Winograd and Flores
were inspired by the work of the German philosopher Martin Heidegger. Winograd & Flores
found Heidegger particularly interesting in that they found his ideas to be in direct opposition
to most of the implicit assumptions of the AI field at that time.
Winograd and Flores argued that Heidegger´s understanding of the human condition is a
better foundation for understanding and designing computer technology than the ruling
paradigm in AI at that time. As they saw it, the cognitivist approach to understanding
computers in use must be rejected if we take Heidegger seriously. In this critique they
followed up the early work of Dreyfus (“What computers can’t do”, 1972).
The AI debate as such has limited relevance for a study of interactivity. The reason is that
its research question is totally different. The AI debate has centered around the question “Can
we build intelligent computers?”. This was a reaction to early AI research that mainly asked
the question “How do we build intelligent computers?”. The debate led to the question “What
is intelligence?”, which again led to the question “In what ways are computers different from
people?”. The latter is the question Dreyfus discussed in (1986) with reference to philosophy.
We are then very close to the philosophical question “What does it mean to be human?”.
Dreyfus entered the AI debate from philosophy. He soon realized that the early AI
researchers were in many respects doing philosophy of mind, but with little or no knowledge
or reference to two and a half thousand years of philosophical research on the subject. The
main differences between the AI researchers and the philosophers were their choice of
medium and their choice of research methodology. For the philosophers, the medium has
always been language, and the methodology has always been the philosophical discourse. For
the early AI researchers, the medium was the computer, and the research methodology was
systems construction. Dreyfus showed how the AI research, despite these differences,
repeated ancient discussions in the philosophy of mind. The strength of Dreyfus’ analogy is
that it enabled him to make predictions about the results of these “discourses” based on his
knowledge of the similar philosophical discourses. His predictions have so far to a large
degree been correct. I find this to give a strong credibility to his argument.
The strongest relevance of the AI-debate for the current study of interactivity is in its use
of philosophy. It showed to many in the computer-science community that philosophy can be
used as a resource and inspiration without having to become a philosopher, much in the same
way as researchers in computer science have always used mathematics without becoming
mathematicians. Dreyfus (1972) draws mainly on three philosophers: Heidegger, Merleau-
Ponty, and Wittgenstein. As the work of these philosophers have relevance for the current
work, a short introduction is appropriate.
12 Understanding Interactivity
Martin Heidegger (1889-1976) belongs to the phenomenological tradition in Continental
philosophy. One of its most influential proponents was his teacher Edmund Husserl (1859-
1938). For Heidegger it was important to move philosophy back from the realm of the spheres
to the reality of everyday human life. This meant, as he saw it, a definite break with 2000
years of philosophical tradition. In his "Being and Time" from 1927 (Heidegger, 1997), he
breaks with the tradition of exploring ideas without reference to our factual existence as
human beings. He departed from his teacher concerning the possibility of making explicit this
“background” of everyday practices that gives meaning to the world. In trying to develop a
philosophy starting out with our factual human existence, he found himself trapped in the web
of meaning produced by the basic assumptions of Western civilization. He found it necessary
to develop a set of new concepts better suited for the task. Reading Heidegger consequently
becomes a difficult task, as one first has to acquire his new "language". The problem is that
this language can not be fully understood purely through definitions referring back to our
"ordinary" language. The meaning of his concepts slowly emerge through the reading of the
work. The reading of Heidegger thus becomes an iterative process, or what in philosophy is
called a hermeneutic circle.
The French philosopher Maurice Merleau-Ponty (1908-1961) was heavily influenced by
both Husserl and Heidegger. Put simply, Heidegger brought philosophy back to everyday
human life, while Merleau-Ponty took it all the way back to the human body. In Merleau-
Ponty´s most important work The Phenomenology of Perception from 1945 (Merleau-Ponty,
1962) he explored the implicit assumptions about perception at that time. He ended up with an
understanding of perception that is totally different from the naive idea of perception as
stimuli reception. The latter view can still be found in part of the current literature on Human-
Computer Interaction (HCI). To Merleau-Ponty, perception is a process where an active body
enters into a "communion" with its surroundings. Perception is a continuos interaction
involving the subject's intentions, expectations, and physical actions. From this perspective,
every attempt at applying some variation of Shannon and Wiever's information theory (see
Reddy, 1993) to Human-Computer Interaction becomes an absurdity. There is clearly no purely
active "sender" or purely passive "receiver", nor any well-defined "information" or "point in
time". The fact that his understanding is in direct opposition to some of the most influential
theoretical foundations of the HCI field, makes a study of Merleau-Ponty an interesting
starting point for an exploration of human-computer interaction.
Since Merleau-Ponty published Phenomenology of Perception in 1945, phenomenology
as a philosophical discipline has developed further. The most complete attempt to date at
building a complete analysis of human existence based on the phenomenological insights is
done by Schutz and Luckman in their The structure of the life-world (1973). Luckman uses
this framework as a foundation for his current empirical study of everyday social interaction.
As a sociologist, he makes use of light-weight video equipment and films long sequences of
everyday interaction between people in their natural surroundings. He then analyzes these
sequences in search of levels of meaning.
In British philosophy we find a similar questioning of the limits of analytical approaches
in the late work of Ludwig Wittgenstein (1889-1951). Wittgenstein started out with a pure
Part I, Theory 13
analytical approach to philosophy. His main interest was the philosophy of language. In his
most important early work “Tractatus” (Wittgenstein, 1923), he argued for the logical nature
of language and worked out a complete system for determining the “truth value” of sentences.
Referring to the AI debate, his early position would have placed him among the first
generation of AI researchers with their trust in an analytical, symbolic, and de-contextualized
approach.
After publishing “Tractatus”, Wittgenstein found no interest in philosophy, as he though
its problems to be “solved”. After some years as a school teacher in Austria, he started
questioning the foundations of his early work, and returned to Oxford. He struggled until his
death with all the paradoxes he found in his early approach. He never developed his
ideas into a coherent philosophy, but published his thoughts on the subject in
“Philosophical Investigations” (post-hume, 1953). He found one of the most important
problems of the analytical approach to an understanding of language to be its lack of attention
to context. This led him to develop the concepts language game and life form. To the late
Wittgenstein, the meaning of a sentence is given by its use. Language is primarily a means of
communication. In a certain use situation, there is a context of language users, physical
objects, and practices that give meaning to the words. He described these local uses of
language as language games. He further argues that all use of language is done within a
certain language game, whether it is involving only two people coordinating a specialized
task, or a discourse about the language of philosophy itself. For language users to be able
to comprehend the words of another language user, they need a shared background of
experience. This includes culture, corporeality, sensory system, social life etc. Wittgenstein
uses the term life form for this. To him, language users of different life forms can never truly
communicate.
We see strong similarities between Heidegger, Merleau-Ponty, and Wittgenstein in their
critique of a purely analytical approach to philosophy. They all ended up with a focus on
everyday life, and on our factual existence as human beings. This is why Dreyfus found them
relevant for the AI debate, and this is why they are relevant for a discussion of interactivity.
14 Understanding Interactivity
Listening to the user
I share with Luckman, Suchman, and Turkle the intent to base my results on empirical
findings. My inspiration here has to a large extent come from the User-Centered Design
tradition and the Scandinavian School of Systems Development (in the US also known as the
Participatory Design tradition). Up until the 60s there was no tradition whatsoever within
engineering and design for including the end-user in the design process. No matter whether
the person in charge was an architect, an engineer, a designer, or a systems developer; and no
matter whether the end-product was a house, a bridge, a tea pot, or a computer system; he (it
was mainly a he) saw the design process as a problem to be solved, in much the same way as
problems are solved in mathematics and physics. When things did not work as expected, the
answer was always to improve the technical methods, the mathematical foundation, and the
accuracy of the data, i.e. to make the work more "scientific".
The ultimate example of this approach can be found in Simon’s “The science of the
artificial” (1969). This approach to design is still dominant among many engineers, architects,
and systems developers. At least, this is the dominant way of describing engineering practice.
There is much evidence (see Schön, 1983) that there is a great difference here between
rhetoric and practice. Everyday design and engineering is less scientific and to a large extent
based on simple rules of thumb, improvisation, habits, and rough estimates.
In his book “From Bauhaus to our house”, Wolfe (1981) gives an excellent example of
what can happen when the end-user is kept out of the design process. The Bauhaus was an
avant-garde German school of arts and architecture that lasted from 1919 until it was closed
down by the Nazis in 1933. It attracted some of the most talented artist and architects of its
time, including the painters Kandinsky and Klee. When Bauhaus was closed down, many of
its teachers fled Nazi Germany to the US and became the new ”Gods”3 of American
architecture. One of the important ideas they brought with them was that the working class
needed clean and functional houses. There was no room for the underlying philosophy in the
US, and only the style survived. This led to functionalism in architecture and the construction
of large suburbs with extremely sterile blocks of flats.
It took close to 50 years from the ideas behind the functionalist style in architecture first
evolved among the avant-garde in Weimar, until anybody actually asked a "user" what it felt
like to live in such a construction. As Wolfe reports, the latter happened in 1970 when the city
council of St. Louis arranged a mass meeting with all the residents of one of its suburb. The
meeting was arranged to address the growing social problems that resulted from this style of
architecture. The residents where asked what to do with their houses, and from the crowd
came a unison "Blow it up". Two years later the city of St. Louis decided to do just that. It is
quite a paradox that some years earlier the architect in question had won an award from the
American Institute of Architects for designing just this building complex. This, and similar
incidents lead to an increased focus among architect in the 70s on the psychological and social
effects buildings have on their residents.
3 Wolfe’s term.
Part I, Theory 15
User-centered and participatory design
A similar interest in involving the end-users early in the design process emerged at about the
same time concerning software development. This happened in the early 70s independently in
the US and in Scandinavia. Due to the differences in cultural setting, both the rationales for
bringing in the users, and the ways in which it was done was radically different at the two
sides of the Atlantic.
In the US, the need to get early feedback on the usability of graphical user interfaces led
to the development of a systematic methodology for doing usability testing. This was first
done on a large scale in the design of the Xerox 8010 "Star" workstation in the late 70s (see
Johnson et al., 1989 and Smith et al., 1982). Prototypes were developed to simulate the
behavior of different design alternatives, and these were tested with representative users. The
results from the tests were fed back to the design process. Similar iterative design
methodologies, involving cycles of design exploration, prototyping, usability testing, and
redesign, now dominate in the software industry, at least in theory. A good introduction to
these methodologies can be found in Jakob Nielsen's "Usability Engineering" (Nielsen,
1993a).
The success of the Xerox Star user interface as the mother of Lisa, Macintosh, Windows,
and all other GUIs in the 80s and 90s is a good indication of the usefulness of systematically
bringing the users in early in the design process. In the capitalistic setting of the US in the
70s, the rationale for bringing in the users in the design process was to get better products, i.e.
product that would eventually sell better in the market.4
In the Scandinavia of the 70s, the rationale for involving the users was totally different,
as was the cultural setting. Around 1970, approx. 90% of the work force in Scandinavia was
unionized, the governments were run by social democrats, and the academic world was
dominated by Marxist ideas. Around 1975, automation in industry had led to fear of
despoiling and unemployment. This led the workers unions to cooperate with radical
computer scientist on initiating research projects with a large component of worker
involvement. Much of the theoretical and ideological roots of this approach can be traced
back to the Tavistock Institute of Human Relations in England in the 50s and 60s.
As described by Ehn (1988), the two main rationales were Industrial Democracy and
Quality of Work and Products. As all projects were concerned with in-house development,
products in this context did not mean computer artifacts, but the end-products from the work
done by the computer users. A good example of such research is the UTOPIA project, a cooperation
between academia, graphics workers unions, and a newspaper. One of the aims of
the UTOPIA project was to develop technology for computer-based layout together with the
graphics workers. A lot of powerful design techniques were developed, and the resulting
4 The commercial failure of the Xerox Star in comparison to its successors, and the commercial success of a
windowing system that for a very long time was lagging years behind its largest competitor concerning usability,
indicates that the rationale for bringing in the users might have been built on a naive model of the PC market.
The rule in this niche of the computer market seems to be that as long as the technology has a certain minimum
of usability, money is better spent on increased marketing, smart sales strategies, and clever lawyers than on
substantial investments in product quality and usability.
16 Understanding Interactivity
design was very similar to what a little later emerged on the market as desktop publishing.
The experience from this and similar projects is today carried on by a research tradition on
"Participatory Design" both in Scandinavia, in the rest of Europe, and in the US. (see
Greenbaum and Kyng, 1991).
Due to the current structure of American society, with its lack of job security, its weak
unions, and its lack of traditions for collective solutions in general and democracy at the
workplace in particular, Participatory Design is understood in the US mainly as a collection of
useful techniques for improving product quality.
We see here a process of translation from the European context to the American context
with similarities to what happened to the Bauhaus ideas. The focus moved from being on
people, to being on objects. In Habermas' terminology, end-user involvement in the design
process is in the US largely driven by a technical knowledge interest, while in Scandinavia it
was in the 70s driven by an emancipatory knowledge interest.
For the current purpose of investigating the interactive aspects of the modern computer,
my knowledge interest is less technical, less emancipatory, and more hermeneutic. I want to
explore empirically what happens in interaction mostly for the reason of deepening our
understanding. For this purpose, both the User-Centered Design tradition and the
"Scandinavian School" has provided insights, research methodologies, and techniques that has
made it possible for me to go beyond a purely theoretical investigation and study the
phenomenon empirically.
Media and communication studies
The media perspectives of Kay & Goldberg, Laurel, and Andersen all recognize the computer
as a new medium with totally new qualities that need to be explored. They thus belong within
a long tradition of media and communication studies in the humanities and the social sciences.
In the paper “From Field to Frog Ponds”, Rosengren (1994) sums up the field and finds it
resting on a multitude of different paradigms.
Research in media and communication studies can be categorized by its focus:
• Much research focus on the effect of electronic media on society. Examples of such
research can be how television has changed the political process in a country.
• Another important sub-tradition deals with content analysis. An example can be the
analysis of genre in television ads.
• Relatively little research has been devoted to the analysis of the media themselves, decontextualized
from concrete content and social meaning.
For the purpose of studying the computer as medium, only the last focus is of direct interest.
The most important contributor to our understanding of media as such is Rudolf Arnheim (see
Arnheim 1969, 1974, 1977). He has written extensively on painting, sculpture, architecture,
and film. His background from gestalt psychology and art studies enabled him to see the
uniqueness of each of the media he studied, and the ways in which they relate.
Part I, Theory 17
Unfortunately, Arnheim has not made an analysis of interactive media. The current work is in
many ways close to Arnheim, both in choice of subject and approach.
Modern art
Where can we find guidance on how to explore a new medium? As Arnheim and others have
found, much can be learned from the history of art. A lot of art forms and traditions could
have been selected as paradigm cases, but I have concentrated in this study on painting;
especially on the movement towards abstraction that happened in the second decade of this
century. One reason for selecting painting is that it is relatively close to interactive graphics,
or at least closer than poetry and opera. Another reason for having a closer look at abstract
painting is that we find here a tradition that deliberately explored the possibilities of a
medium.
The Russian painter Wassily Kandinsky (1866-1944) was one of the first to move
painting from the concrete towards the abstract. This happened around 1910. He is
particularly interesting in that he not only painted, but also wrote eloquently on what he did
(Kandinsky, 1994). He compared painting with other media and art forms, and tried to
identify the media-specific properties of his medium. To him, painting consisted of form and
color, and this lead him to identify the basic form and color elements, and to explore how
these elements interact on the canvas. This search resulted in an aesthetic theory based on his
observations and experience as a painter. The correctness of Kandinsky's theories is not
important for this investigation. The important inspiration is from how he and his colleagues
deliberately explored their medium while reflecting on what they did.
1.4 Research Question and Structure of the Study
As already stated, the main research question of this study is: “What is Interactivity?”
Interactivity can be studied from many different perspectives. The search for a new
understanding of interactivity is presented here as a long circular movement that starts and
eventually ends within computer science.
It starts with a survey of how interactivity has been understood in Computer Science,
with focus on the Human-Computer Interaction (HCI) tradition. This is followed by an
attempt to reframe the problem from two different perspectives, i.e. the phenomenology of
Merleau-Ponty, and media and art studies. This reframing points to the need for an empirical
grounding.
Next follows some experiments and their interpretation. Taken together, the reframing of
the research question and the interpretation of the experiments form the basis for a new
understanding of interactivity.
The work ends with an attempt to show how the results can be applied to tool design, and
how the research methodology can be “re-used” for a different technology.
Following this outline, the study is organized into three parts: Theory, Experiments, and
Reflections.
18 Understanding Interactivity
The chapters in more detail
• Theory
− Chapter 2 starts out by developing three concrete scenarios of interactive behavior.
This is followed by an identification of the relevant theoretical frameworks within
the HCI field for analyzing interactivity. Each framework is then applied to the
analysis of the three scenarios, and the results are compared. The chapter ends by
pointing to interesting traditions that could shed more light on the phenomenon of
interactivity.
− Chapter 3 contains a short introduction to the phenomenology of Merleau-Ponty and
shows how the problem of interactivity can be reformulated within his philosophical
framework.
− Chapter 4 starts with an introduction to media and art studies. It continues by
introducing the ideas and works of the early modernist painters represented by
Kandinsky. Inspired by the moderninsts’ attempts at seeking the essential properties
of their medium through abstraction, it presents some examples of abstract
interactive graphics.
• Experiments
− Chapter 5 presents the empirical research question "how is interactivity
experienced?” It further deals with the methodological issues related to an empirical
approach to the study of interactivity. It ends by giving an overview of the
experiments described in detail in Chapters 7, 8 and 9.
− Chapter 6 describes Square World, some examples of abstract interactive graphics
developed as input for the experiments.
− Chapter 7 describes an experiment where subjects were asked to explore and explain
the simple examples of abstract interactive graphics in Square World.
− Chapter 8 describes an experiment where the subjects where exposed to a set of
different design tools for building the kind of artifacts found in Square World.
− Chapter 9 describes an experiment where the subjects where involved in a process of
designing a tool for building abstract interactive graphics.
• Reflections
− Chapter 10 is a synthesis of the results from the experiments. A general ontology of
Square World is presented.
− Chapter 11 is a discussion of the theoretical consequences of the experimental
findings for our understanding of interactivity.
− Chapter 12 shows how the results apply to Interaction Design. This chapter also
includes the description of an experimental agent-based software tool that resulted
from the theory. This tool allows for constructing interactive graphics at the pixel
level.
Part I, Theory 19
− Chapter 13 shows how the methodology from the experiments can be generalized
and applied to the study of different technologies. This chapter also contains a
critical examination of the methodologies used.
− Chapter 14 contains an attempt to draw a set of conclusions from the findings
concerning the nature of interactivity. In addition, it points to interesting research
questions that arise from the findings.
20 Understanding Interactivity
Chapter 2 Interactivity in the HCI Literature
“In 1974, we were in the position of having to create a new field.”
Stuart K. Card and Thomas P. Moran, 19865.
Since the late 60s there has been a community of researchers defining their area of research as
Human-Computer Interaction (HCI). The first ACM SIGCHI conference was held in 1982
(Schneider, 1982), and marked the field’s entrance into the mainstream of computer science.
A detailed description of the history of HCI research up until 1995 can be found in (Baecker
et al., 1995). Currently, the HCI field is a well-established area of research. In ACM's latest
strategic document (Wegner and Doyle, 1996), it is listed as one of 22 areas in computer
science.
The field currently rests on a multitude of theoretical foundations. This can be interpreted
negatively as a sign of immaturity, but also positively as an interesting potential for crossfertilization.
The tradition started out in the Anglo-American scientific context with its
emphasis on formal analysis and experimental psychology. Cognitive science with some
variations is still the ruling paradigm of HCI, but alternative theoretical foundations like
phenomenology, activity theory, semiotics, and Gibsonian psychology are currently seen as
fruitful supplements. As described in Chapter 1, there has during the 90s been a crossfertilization
between the HCI tradition and the Scandinavian Participatory-Design (PD)
tradition. A description of the HCI field at the end of the 90s should consequently also include
the PD tradition.
What do these theories have to say about interactivity, and to what extent do they
adequately describe the phenomenon? This chapter can be seen as an attempt to answer these
two questions through a comparative analysis. Comparison and evaluation of scientific
theories belonging within different research traditions is not trivial. It is much easier to
compare theories belonging within the same "paradigm” (Kuhn, 1962), having a shared
foundation of evaluation criteria, concepts, and research practices. In multi-paradigmatic
research areas like the HCI field, there is little such common ground. Comparison of different
theoretical frameworks done in the abstract, and based solely on a set of “objective”
evaluation criteria, can easily be criticized of being biased. On the other hand, evaluating each
5 (Card and Moran, 1986, p.183)
Part I, Theory 21
theory only with the examples and evaluation criteria provided by their proponents is not
satisfactory.
One starting point is to accept all current approaches to the analysis of human-computer
interaction as equally valid. An evaluation can then be done by applying all theoretical
approaches to the same specific phenomenon, and evaluate the usefulness of the resulting
descriptions of the phenomenon. Such an evaluation will not compare the theories as such, but
only their explanatory power in relation to a specific problem. For the current purpose, such
an analysis is sufficient, as the phenomenon to be studied is interactivity in a relatively
restricted sense and not every aspect of cognition.
As a vehicle for this analysis, I have developed three concrete scenarios of human-artifact
interaction. The scenarios each highlight one or two important aspects of interactivity. All
theoretical frameworks are applied to all three scenarios, giving rise to a matrix of
descriptions. From this exercise there emerges a set of blind spots for each framework, i.e.
phenomena in the scenarios that the framework could not account for. By comparing and
contrasting these blind spots, there emerges a set of common problem. This points the way
to the need for theory development.
2.1 Narrowing the Scope: Three Scenarios
Human-computer interaction as a phenomenon covers a broad class of systems and use
situations. A full analysis of the phenomenon is far beyond the scope of this study. Such an
analysis would on the computer side have to include a classification of all existing kinds of
hardware and software and the ways in which these systems are inter-related. On the human
side, it would have to include all current uses of computer systems and the ways in which the
interactions are given meaning by the social settings in which they take place. Taken to its
extreme, the latter would force us to broaden the scope to include the organizational settings
in which the interactions took place. This could further take us to an analysis of the cultural
settings of the organizations etc.
The aim of this study is less ambitious. The current research question emerged from an
analysis of some of the problems facing designers of interactive systems. This led to a search
for a deeper understanding of the media-specific properties of the computer, – our material
for building interactive systems. With this aim in mind, we can narrow the scope to
interactivity in a more restricted sense.
To be able to do a detailed analysis of human-computer interaction, it is necessary to
make the examples very simple. There is no guarantee that simple examples scale, but on the
other hand we can not hope to be able to understand the complex if we do not understand the
simple.
22 Understanding Interactivity
Scenario Ia: A switch
One of the simplest interactive artifacts is the light switch. Electronic devices like switches
are by nature interactive, even if their behaviors are in most cases of a very simple kind.
Figure 4. A schematic diagram of Mrs. X's switch and light.
Let us start out with a switch with two states. Every interaction happens in a social and
physical context. To make the scenario concrete, let us invent a user Mrs. X. Let us imagine that
she interacts with the light switch illustrated in Figure 4. The scenario could go as follows:
Mrs. X has her home office in the attic of her house. It is Sunday afternoon, and she has
been reading her electronic mail. She has just decided to go back down to the rest of her
family. She stands by the door, and is just about to turn off the light in the room. The
switch is by the door, and it needs only a light touch of her finger to turn the light off. It
has what user-interface designers call toggle behavior.
Mrs. X touches the switch, and the light in the room goes off. The switch is of a design
that glows with a low green light when it is turned off, to make it easier to locate in the
dark. Having turned off the light, Mrs. X leaves the room.
Scenario Ib: Interrupted interaction
So far, this is an example of uninterrupted interaction. A theory of Human-Computer
Interaction should also be able to deal with interrupted interactions. Returning to our scenario,
the latter would have been the case if the switch had been out of order. The above story could
then have ended differently:
On Off
Part I, Theory 23
Mrs. X touches the switch, but experiences to her surprise that the light does not go off.
What surprises her even more is that the switch starts glowing as if the light had gone off.
She utters "strange" and leaves the room.
Scenario II: Interaction design and interactive messages
As illustrated in the uninterrupted version of Scenario I (Ia), a lot of our coping with
interactive artifacts require very little thinking. In some cases though, we do reflect on the
interaction. The most obvious cases are situations like Scenario Ib, where we are forced to
reflect on what is going on because a device ceases to behave as expected.
In other cases, users intentionally reflect on interaction as part of the job they are doing.
One important example is when interaction is being consciously designed. All construction of
interactive behavior is done through some kind of a tool. Further, design rarely happens in a
social vacuum, and designers need to communicate about interactive behavior.
To shed light on the processes involved in interaction design, consider the following
scenario:
Figure 5. The stack Mrs. X sent to Mr. Y
24 Understanding Interactivity
Let us assume that the same Mrs. X happens to be a product designer who is currently
working on the design of a switch. It is Monday morning and she is at work in front of her
computer. She has been discussing with one of her customers, a Mr. Y, the design of a
switch to be placed in all rooms of a large public building he is planning. They have
ended up with a design very close to the switch Mrs. X has in her home.
Mr. Y is located in a different city, so to give him a feel of some of the design
alternatives, she has built a prototype in HyperCard and sent it to him via electronic
mail.
The prototype is shown in Figure 5. She here presents him with two parameters
concerning the switch design that she wants his feedback on. The first parameter is about
whether there should be visual feedback in the switch itself as it is in her switch back
home. The second parameter has to do with the interactive behavior of the switch. She
wants him to feel the difference between a switch where the reaction comes immediately
when you press on it, and a switch that reacts when you remove your finger. She
simulates the behavior of the switch in HyperCard, and lets Mr. Y control through the
radio buttons whether the switch reacts on mouse-button-press, or on mouse-buttonrelease.
Having received the prototype, Mr. Y tries out the switch with different combinations
of visual feedback and behavior. He then sends an e-mail back to Mrs. X telling her that
he prefers a switch with visual feedback that reacts when you remove your finger,
because as he puts it: "it feels right".
This scenario also highlights an important property of the computer as a medium: The
computer can be used for communicating interactive messages. From a media perspective, the
content of the HyperCard stack includes the behavior of the switch and the ways in which it
can be modified. This is a property we do not find in any other medium.
Scenario III: Magic
The two previous scenarios show interactivity of a kind that most people in the industrialized
world are now being acquainted to, but the computer also allows for interactivity of a less
usual kind. Sometimes a new medium can not easily be understood solely as modifications
and combinations of other media.
The last scenario builds around the computer game "Heaven and Earth" by puzzle
designer Scott Kim. This game explores ways in which the computer can be used for
constructing new “worlds” that are more than simulations of existing media. The workings of
the game is illustrated through the scenario.
Part I, Theory 25
Figure 6. The start situation of a puzzle from “Heaven and Earth”.
Let us assume that Mrs. X has taken a copy of “Heaven and Earth” home to her husband
Mr. X. Mr. X is sitting in front of the computer trying to figure out how to solve one of
Scott Kim's interactive puzzles.
Mr. X first sees the game in its initial state as shown in Figure 6. The task is to recreate
in the left square the shape to the right. The squares are black and the background is
white.
26 Understanding Interactivity
Figure 7. Six snapshots from Mr. X's interaction with the game.
Mr. X soon finds out that the squares can be moved around with the mouse. He realizes
that there must be a trick involved here, as the game only provides four squares and five
squares are required to build the cross.
Mr. X keeps moving the four black squares around for a while in arrangements that are
close to the solution. Snapshot II of Figure 7 shows one such attempt. A little later he
creates the configuration shown in snapshot III. He realizes that he is no closer to a
solution.
He then brings the squares into the configuration in snapshot IV. He now gives up and
says to his wife: "This is as far as it is possible to get. I have a cross, but to get it right I
have to get rid of the square in the middle". Mrs. X nods and with a mysterious smile she
says "Yes". "What kind of an answer is that, you mystery queen?” he replies with a smile.
He then thinks for a while and says, "No, but that's impossible". He clicks on the white
"square" in the middle, drags it away as shown in snapshot V, and places it on the
background as shown in snapshot VI. Under the white square there is now a black square
and the puzzle is solved. The game congratulates him with “pling”. "Magic!" Mr. X says.
Part I, Theory 27
2.2 Seven Theoretical Frameworks
The current literature on human-computer interaction is vast. One way of getting av overview
of the field is to start with the two most commonly used text books: (Shneiderman, 1998) and
(Preece et al., 1995). Shneiderman lists a number of books, proceedings, and journals (pp. 32-
49). Preece et al. present an extensive list of works, and give an overview of the most
important approaches in the field.
From the diversity of works in HCI, I have selected seven important contributions to the
field, each representing a distinct approach to the analysis of Human-Computer Interaction.
All works, except (Andersen, 1991) are listed by Shneiderman or Preece et al. The semiotic
approach of Andersen is added because it represents the only important direction of the
Scandinavian tradition that is not embraced by the HCI field.
Given a different focus, the list could have been different. The list represent approaches
that are important for understanding the three examples of interactivity developed here. It
does not include early “paradigm” works on HyperText (Nelson, 1981), Internet (Turkle,
1995), Participatory Design (Ehn, 1988), World-Wide Web (Berners-Lee et al., 1994),
Cyberspace and Virtual Reality (Gibson, 1986), or Ubiquitous Computing (Weiser, 1991).
The listed books are all major works that were the first introduction of a certain intellectual
perspective to the HCI community. As such they are all “paradigms” in Kuhn’s original sense
of the term (Kuhn, 1962). A “paradigm” was to Kuhn primarily a major scientific work that
introduced a totally new way of conceptualize a domain. Classic examples of such paradigm
works are Newton’s work on gravity and Kepler’s work on the planetary orbits. Such works
create new “paradigms”. Generations of researchers follow in their footsteps and soon forget
that you could once burn on the stake for meaning what they now see as self evident. Not only do
paradigm works lay a new foundation of axioms, they also in many cases change the way
research is done. From this perspective, it is important to study original works in science.
The “paradigms” are here listed with secondary sources:
1. The Model Human Processor.
• 1983: Card, Moran, and Newell's book "The psychology of Human-Computer
Interaction". This was the first attempt at building a complete theory of humancomputer
interaction.
Secondary source:
• The authors’ reflections on their research 5 years later in a paper titled "From pointing
to pondering" (Card et al., 1988).
28 Understanding Interactivity
2. Cognitive Science & Gibsonian perspectives
• 1986: Norman & Draper’s book “User-Centered System Design” with a focus on Don
Norman's paper "Cognitive Engineering".
• 1988: Don Norman’s book “The Psychology of Everyday Things” (1988), which
almost immediately acquired cult status in the design community.
Secondary sources:
• Gentner and Stevens’ (eds.) (1983) classical collection of papers “Mental Models”.
3. Phenomenology and Speech-Act Theory
• 1986: Winograd & Flores’ book "Understanding Computers and Cognition" was
originally written as part of the AI debate of the 80s, but ended up having a
much stronger impact in the HCI community.
Secondary sources:
• Capurro's critique of Winograd & Flores (Capurro, 1992).
• Dreyfus’ "Being-in-the-world", which is currently the most influential commentary on
Heidegger’s “Being and Time” in the English-speaking world (Dreyfus, 1991).
4. Ethnomethodology
• 1987: Lucy Suchman's book "Plans and Situated Action" was the first book on
human-computer interaction written by an ethnomethodologist.
Secondary source:
• Hutchins’ book "Cognition in the Wild" (1994) does not deal with computers, but it is
still relevant as an example of this approach to the study of work and technology. It is
also relevant for (Norman, 1988).
5. Activity Theory
• 1991: Susanne Bødker's "Through the Interface" represents the first introduction of
Russian activity theory into the field of HCI.
Secondary source:
• Bonni (ed.): "Context and Consciousness" (1996a) is a collection of papers
representing state-of-the-art in the application of Activity Theory to HCI.
Part I, Theory 29
6. Semiotics
• 1991: P.B. Andersen's book "A Semiotics of the Computer" applies semiotics to the
study of human-computer interaction.
Secondary source:
• P.B. Andersen's paper "Computer Semiotics" (1992). Here Andersen clarifies some of
the difficulties in his book.
7. Dramatic Theory
• 1991: Brenda Laurel's book "Computers as theater" is an application of dramatic
theory to HCI.
Secondary source:
• The basic ideas in the book appeared already in her paper “Interface as Memesis”
(Laurel, 1986) in Norman & Draper’s collection “User-Centered Design”.
By using a major work like Card, Moran, and Newell's book (1983) as a representative of a
research traditions there is a danger of doing injustice to the tradition by attacking a straw
man. I clearly see this danger, and try to supplement with secondary sources when this is
possible. One argument for focusing on frequently cited major works is that they often
represent the common understanding of a theory within a certain research community. This is
not to say that I take a layman's perspective on the theories, but that my main focus is on the
theories as they have been presented to the HCI community.
By looking at the professional background of the above mentioned authors, we find that
only Newell, Winograd, and Bødker started out in computer science, while Card, Moran,
Flores, Suchman, Turkle, Laurel, Norman, Draper, Nardi, and Andersen all started out in the
humanities or the social sciences. This illustrates the cross-disciplinary nature of the field, but
it also illustrates the relatively limited interest from within traditional computer science in
reflecting on our own tradition. The strong influence from the humanities and the social
sciences has been very important in giving the field an awareness of the context in which the
modern computer exists. By doing research on computer-related problems without training in
traditional computer science, there is on the other hand a danger of losing of sight the
subtleties of the computer as medium.
Development since the Paradigm Works
Since the publication of the “paradigm works” in the late 80s and the early 90s, there have
been some interesting developments.
The popularity of the Activity Theory (AT) approach has grown. Since Bødker’s
introduction of AT to the HCI community in 1990, it has matured to a field with its own
conferences and publications. A number of books have been published on the subject (see
Nardi, 1996a for an overview). This development has not changed Activity Theory, and the
conclusions from the analysis of Bødker’s book still apply.
30 Understanding Interactivity
The Participatory-Design tradition has long had its own conferences (e.g. the IRIS
conferences) and journals (e.g. Scandinavian Journal of Systems Development). With the
exception of Andersen’s clarifications of his position (Andersen, 1992), I have found no
further attempts within this tradition at investigating the interactive aspects of the computer.
The focus has largely been on systems development, organizational studies, and on the
underlying philosophy of the approach.
2.3 The Model Human Processor
From the late 70s, Stu Card and T.P Moran from Xerox had been cooperating with A. Newell
at Carnegie-Mellon University on the application of cognitive psychology to the study of
human-computer interaction. In 1983 they published their now classical textbook "The
psychology of Human-Computer Interaction" (Card et al., 1983). Here they made explicit
their model of the user as a "Model Human Processor" (MHP).
Figure 8. The Model Human Processor
Part I, Theory 31
Figure 8 shows the famous drawing of their model as it appeared in the book (p. 26).
Human-Computer Interaction is here described as symbolic information processing. They
borrowed their terminology from computer science, systems theory, and Shannon and
Weaver's communication theory. They did not state that human beings actually work like
computers, and they made explicit that their theory is a model. Despite these reservations the
model has been very influential in the HCI community.
The model describes the user’s “cognitive architecture” as consisting of three processors,
four memories, 19 parameters, and 10 principles of operation. In their model, information
reaches the perceptual processor as chuncks. These pieces of information are stored in image
store, transferred to working memory, manipulated by the cognitive processor, and some
times stored in long-term memory. Some times the activity in the cognitive processor leads to
commands being sent to the motor processor, leading to muscles being contracted and actions
being performed.
Scenario I
In their model, the interaction between Mrs. X and the switch-light system can be described as
follows:
1. A CHUNK6 named "switch" reaches Mrs. X’s PERCEPTUAL PROCESSOR through her
VISUAL CHANNEL.
2. This CHUNK reaches the VISUAL IMAGE STORE in her WORKING MEMORY within approx.
100 milliseconds.
3. Her COGNITIVE PROCESSOR processes this CHUNK and produces a SIGNAL "touch
switch", which is sent to her MOTOR PROCESSOR.
4. Her MOTOR PROCESSOR takes care of moving her hand to the right position.
5. The CHUNK "Light is off" reaches her PERCEPTUAL PROCESSOR.
6. The CHUNK "Switch is green" reaches her PERCEPTUAL PROCESSOR.
7. The CHUNK "Light is off" is passed to her VISUAL IMAGE STORE.
8. The CHUNK "Switch is green" is passed to her VISUAL IMAGE STORE.
9. Based on these CHUNKS her COGNITIVE PROCESSOR emits the SIGNAL "Leave room" to
her MOTOR PROCESSOR.
The TASK of turning off the light is a SUB TASK of the larger TASK of leaving the room.
The whole interaction is thus a good example of rational goal-directed behavior.
6 Throughout this chapter, all technical terms are CAPITALIZED.
32 Understanding Interactivity
The most obvious blind spot in this description is that it sees the interaction only from the
position of the detached observer. The theory gives no hint what so ever as to how Mrs. X
actually experiences the interaction, what it means to her, or how it is part of the rest of her
life. The latter is only slightly touched upon by saying that the interaction is a "sub task".
The problem with applying a task analysis is that the task structure is derived from
observation only, and not from asking Mrs. X what she is actually doing. Like most of us,
Mrs. X would probably not feel very comfortable with having her whole life reduced to a set
of tasks.
Scenario Ib
The theory has no way of dealing properly with situations where devices behave in an
unexpected way. Scenario Ib would have been described as just another sequence of chunks
reaching the Cognitive Processor, and her experience of surprise would go unnoticed. The
description is identical to scenario Ia up until step 4. From step 5 it could go:
(4. Her MOTOR PROCESSOR takes care of moving her hand to the right position.)
5. The CHUNK "Light is on" reaches her PERCEPTUAL PROCESSOR.
6. The CHUNK "Switch is green" reaches her PERCEPTUAL PROCESSOR.
7. The CHUNK "Light is on" is passed her VISUAL IMAGE STORE.
8. The CHUNK "Switch is green" is passed to her VISUAL IMAGE STORE.
9. Based on these CHUNKS her COGNITIVE PROCESSOR emits the SIGNAL "Speak out:
'Strange!'" to her MOTOR PROCESSOR.
Scenario II
Scenario II contains a large element of person-to-person interaction. As the MHP approach is
blind to the social dimension, this theory is only able to deal with the surface level of what is
going on here. The theory can be used to describe in detail how each primitive element of the
interaction is performed, but it would not be able to shed any light on Mrs. X's rationale for
performing these actions. It would only be able to deal with the mechanical details of the
interaction in a very fragmented manner.
The blindness to the social dimension would hide the fact that the stack can be seen as a
message from Mrs. X to Mr. Y; actually an interactive message. Card et al's theory would not
be able to say anything about how the stack is part of a social interaction between two persons
within a larger cultural context.
It would neither be able to shed light on how the stack can be seen as a combination of at
least two genres: the informal note and the computer application. It is also blind to the fact
that the stack has an intended meaning. The theory would have given the same resulting
analysis if the text had consisted of meaningless syllables.
Part I, Theory 33
Scenario III
An application of Card et al's theory to Mr. X's interaction with the puzzle in Scenario III
would again only touch on the surface of the phenomenon. Seen through the glasses of the
MHP theory, this interaction would be just like any other manipulation of objects on a screen.
As the theory has no concept for meaning, it would miss how to Mr. X the puzzle behaved in
an unexpected manner when the squares were placed in a certain combination.
Discussion
We see that the MHP approach works well in describing the quantitative aspects of routine,
low-level interaction. This is also the areas of application in which the more advanced
versions of this model have been most successful (e.g. Kieras, 1988). However, when the
tasks are not routine, such models quickly lose their predictive power (see Baecker, 1995,
p.579-581 for a discussion).
The Model Human Processor should not be taken as Card, Moran, and Newell’s final say
about human-computer interaction. As Card and Moran put it in a later paper (1986, p. 190):
“However, the book represented only the main line of our research efforts that fit
together into a tightly knit view. There were many other areas of our work that we
decided not to put in the book, such as the issues of learning and of the user’s mental
model.”
2.4 Cognitive Science
In 1986, the psychologists Steven Draper and Don Norman edited a book on "User-Centered
System Design" (UCSD) (Norman and Draper, 1986) with contributions from 19 researchers
in the HCI community. As they put it (p. 2):
"This book is about the design of computers, but from the user's point of view".
In “The Psychology of Everyday Things” (POET) (Norman, 1988) that came out two
years later, Norman further developed some of the ideas presented in UCSD and added some
new material, mainly on Gibson. POET was put in a less academic voice, and the book soon
acquired a cult status in the design community. The two books complement each other, and
are here treated together.
34 Understanding Interactivity
GOALS
INTENTION EVALUATION
"EXPECTATION"
INTERPRETATION
EXECUTION PERCEPTION
MENTAL ACTIVITY
PHYSICAL ACTIVITY
1.
2.
3.
4. 5.
6.
7.
ACTION
SPECIFICATION
Figure 9. Norman's model of human-computer interaction.
Norman's contribution in UCSD is called "Cognitive Engineering". This paper is a clear
exposition of the Cognitive Science approach to HCI. He here decomposes human action into
seven stages (p. 41):
1. Establishing the Goal.
2. Forming the Intention
3. Specifying the Action Sequence
4. Executing the Action
5. Perceiving the System State
6. Interpreting the [System] State
7. Evaluating the System State with respect to the Goals and Intentions.
Figure 9 shows his graphical representation of the seven stages as it appears in the book
(p.42). In a footnote he says that the number of stages is rather arbitrary, but he insists that
"a full theory of action, whatever its form, involves a continuum of stages on both the
action/execution side and the perception/evaluation side". (p. 41)
Part I, Theory 35
Another issue central to Norman in this paper is the notion of mental models. Users form
mental models of the systems they are using:
"I believe that people form internal, mental models of themselves and of the things
and people with whom they interact. These models provide predictive and
explanatory power for understanding the interaction". (p. 46)
He does not provide any example of such models, but refers to other contributions in the
book, i.e. Riley and diSessa.
Mary Riley (1986) uses a text editor as an example in her discussion of mental models.
The model she builds is very similar to the kind of representation used by AI researchers at
that time. Riley’s model consists of semantic nets representing the internal structure of the
text editor and of the syntax and semantics of its commands.
Andrea diSessa (1986) makes a distinction between three kinds of mental models:
structural models, functional models, and distributed models. Structural models describe the
system. This is similar to Riley's models of the text editor. Functional models say something
about how to get things done. This corresponds to Riley's model of command syntax and
semantics. Distributed models make use of one or more models from other domains that are
applied metaphorically to an understanding of the new domain.
The mental models of Riley and diSessa belong within the cognitive science paradigm.
The models are symbolic representations of the user's understanding. It is assumed that
interpreted by some cognitive architecture, these models can reproduce human behavior close
to what is observed with real users. This is not to say that these authors believe that mental
models literally exist in the brains of the users, but there is reason to believe that they think
such symbolic models reflect important properties of human perception and understanding.
As such, they belong to a long tradition in cognitive science. The book "Mental Models",
edited by Gentner and Stevens (1983) gives a good overview of the early work in cognitive
science on mental models.
Norman’s use of Gibson’s ecological approach
In his book "The psychology of everyday things" Don Norman (1988) draws on a lot of
theoretical traditions including Gibsonian psychology, Cognitive Science, Industrial Design,
and Architecture. He presents a set of useful concepts of which only "affordance" will be
applied here to the analysis of the scenarios. "Affordance" is a concept Norman has adopted
from Gibsonian psychology. Norman defines affordance as:
"...the perceived and actual properties of the thing, primarily those fundamental
properties that determine just how the thing could possibly be used... A chair affords
sitting... Glass is for seeing through, and for breaking. Knobs are for turning...". (p. 9)
A designer might say that affordance is “form indicating function”. Norman's interpretation
differs somewhat from the standard understanding of Gibson. In a footnote (p. 219) he states:
36 Understanding Interactivity
"my view [on affordance] is in conflict with the view of many Gibsonian
psychologists, but this internal debate within modern psychology is of little relevance
here".
Some years later, Norman clarified his use of the concept further in a public discussion on a
“news” group:
“J.J. Gibson invented the term affordances, although he doesn't use them for the same
purpose I do. I got the idea from him, both in his published writings and in many
hours of debates with him. We disagreed fundamentally about the nature of the mind,
but those were very fruitful, insightful disagreements. I am very much indebted to
Gibson. Note that in The Design of everyday Things, the word 'affordance' should
really be replaced (if only in your mind) with the phrase 'perceived affordance'. Make
that change and I am consistent with Gibson.” (Norman, 1994)
From a cognitive psychology perspective, it might seem a bit odd to include J.J. Gibson’s
ecological theory of perception in a chapter on Cognitive Science. The main reason for
placing Gibson here is that his theories were introduced to the HCI community mainly
through Norman. In his books, he never describes Gibson’s ecological approach to human
cognition and perception as incommensurate to the symbol-processing approach. A different
reading of Gibson could have resulted in a break with the foundations of cognitive science,
but as long as no such reading has been attempted, Norman’s eclectic application of Gibson to
the study of human-computer interaction can be catalogued as Cognitive Science.
The latter is not to say that Norman’s introduction of the “affordance” concept to the
Interaction Design community has not been fruitful. To the contrary, it might be the single
most useful concepts in the analysis of how people interact with graphical user interfaces. In
the first iteration of most user-interface design projects, most of the design flaws are simple
“affordance” errors, i.e. cases where the designers have not been able to communicate to the
user which of the graphical elements on the screen are interactive.
Distributed cognition
In UCSD, Norman has a paper together with Hutchins and Hollan on direct-manipulation
interfaces (Hutchins, Hollan, and Norman, 1986). They here made a first attempt at describing
the psychology of direct-manipulation interfaces, i.e. user interfaces where the user works by
manipulating graphical objects on the screen. This led them away from a linguistic/cognitivescience
perspective on information, to a focus on representation (“Form of Expression”).
They expressed these two aspects of an interface in the concepts “Semantic Directness” and
“Articulatory Directness”. The first is concerned with meaning in a traditional “abstract”
sense, while the latter “has to do with the relationship between the meanings of expressions
and their physical form” (pp. 109-110).
In POET, Norman continues this line of thought in his distinction between “Knowledge
in the Head” and “Knowledge in the World”. The first is what we know and can recall
without in any physical context. An example of such is “The capital of Norway is Oslo”. The
Part I, Theory 37
next is knowledge that “exists in the world”; knowledge that we can only make use of when
we are in a concrete physical environment with those artifacts present. An example of the
latter is the abacus. For a trained user of an abacus, it enables her to do calculations that she
can not do without the physical abacus present. Her skills can not be de-contextualized from
the physical artifact. Norman uses coins as examples: Most people can easily distinguish
between all the coins of their currency when the coins are presented to them, while without
the coins present few can describe in any detail the coins they use every day.
In his view on distributed cognition, Norman has probably been inspired by the work of
his co-author Hutchins. In a more recent study (Hutchins, 1994), Hutchins applies this view
on cognition to an ethnographical study of ship navigation. To Hutchins, cognition is not only
distributed between users and their artifacts, but also between users. He argues that to
understand what happens when a crew brings a supertanker to port, one has to include the
users, their communication, and their use of navigation artifacts. The cognition is distributed,
and can not be adequately understood with a focus only on either individual cognition, artifact
use, or communication patterns.
Scenario Ia
A description of Scenario Ia could consist of two parts: Mrs. X's mental model of the switchlight
system and a description of her cognitive processes.
State 0 (Off) State 1 (On)
Touch
Touch
Figure 10. Mrs. X's mental model of the switch.
A lot of possible mental-model formalisms are possible. As the choice of formalism is of little
importance here, we will use one of the simplest formalism that fits the task: a Finite State
Automaton (FSA).
38 Understanding Interactivity
The description of Scenario Ia would then go as follows:
The FSA has two STATES: 0) light-off and 1) light-on. It has two TRANSITIONS between the
STATES, both being TRIGGERED by the ACTION touch-switch. In the STATE light-off, the
roof light is off and the switch light in on. The opposite is the case in the other STATE.
Figure 10 shows Mrs. X’s MENTAL MODEL of the switch.
Her cognitive process consists of seven stages:
1. Establishing the GOAL:
The GOAL is to turn off the light.
2. Forming the INTENTION:
As she knows that the roof light is on and that the switch light is off, she
knows that the system is in STATE 1. She also recognizes that the GOAL will be
fulfilled if she can make the system reach STATE 0. The model tells her
that to do this she has to PERFORM the ACTION touch-switch.
3. Specifying the ACTION SEQUENCE:
To reach the goal she has to touch the switch.
4. Executing the ACTION:
She next ACTIVATES her MOTOR SYSTEM and moves her hand.
5. Perceiving the SYSTEM STATE
She now sees that the roof light is off and that the switch is glowing.
6. Interpreting the [SYSTEM] STATE:
From this she concludes that the SYSTEM is now in STATE 0.
7. Evaluating the SYSTEM STATE with respect to the GOALS and INTENTIONS:
SYSTEM STATE 0 satisfies the GOAL, and this cognitive PROCESS comes to an
end.
The blind spots for this scenario are very similar to the blind spots produced by the Model
Human Processor approach. What we have gained by introducing a mental model is a first
step towards a description of Mrs. X understanding of the switch, but the analysis is still
formal in the sense that it is based on a general theory of human cognition, and not on
empirical data from the interaction as experienced by the user. The workings of the switch is
described as a FSA. Few switch-users have ever heard of a Finite State Automaton.
Part I, Theory 39
Scenario Ib
The malfunctioning switch of Scenario Ib would have shown up in the analysis in stages 6
and 7. Her inability to reach the goal would have triggered another "goal establishment" to
solve that problem.
In Scenario Ib, the light did not go off when she touched the switch. The green light in
the switch started glowing as if she had turned the light off. This behavior is inconsistent with
Mrs. X's mental model of the switch-light system. Her verbal response to the event is an
indication that she observes this inconsistency, but there is no way that Norman's seven-stage
cognitive process could have led her to that conclusion. Norman's model describes
uninterrupted routine action, and is unable to account for reactions to the unexpected. In its
most formal interpretation it is consequently blind to learning processes.
Affordance
If we add to this analysis an application of Norman's concept of affordance, we get a
somewhat different interpretation:
To Mrs. X, the switch is an object that AFFORDS turning the light on and off. She turns
the light off by touching the switch.
The “affordance” of the switch as experienced by Mrs. X reflects her previous interactions
with the switch. The concept of affordance would shed no light on the situation where the
switch was out of order. This would simply have been described as an affordance that did not
"hold what it promised".
As already mentioned, Norman’s use of “affordance” differs from Gibson’s original use.
In a footnote Norman defines it:
"I believe that affordances result from the mental interpretation of things, based on
our past knowledge and experience applied to our perception of the things about us".
(Norman 1988, p. 219)
The important word here is “interpretation”. According to Norman’s view, the “objective”
sensory data are interpreted by the subject, and this gives rise to meaning. Expressed
mathematically this could go somewhat like: m = i(d), where m is the meaning, i() is the
function of interpretation, and d are the sense data.
40 Understanding Interactivity
Gibson would probably have objected to such use of his term. Gibson (1979) says about
affordance:
“The theory of affordances is a radical departure from existing theories of value and
meaning. It begins with a new definition of what value and meaning are. The
perceiving of an affordance is not a process of perceiving a value-free physical object
to which meaning is somewhat added in a way no one has been able to agree upon; it
is a process of perceiving a value-rich ecological object.” (p. 140).
This view is in conflict with the more traditional mental-model analysis. Applied to Scenario
I, Mrs. X’s perception of the switch would not require any internal representations neither of
the switch, of her interpretation of the switch, nor of her previous interactions with the switch.
Mrs. X would simply perceive the switch directly as a “value-rich ecological object” with a
certain affordance.
Scenario II
A Cognitive Science analysis of Scenario II would be much more complex than the analysis
of Scenario I. A full analysis of the user's work with HyperCard would create a very large
hierarchy of dynamically changing cognitive tasks. This hierarchy would have to be
accompanied by descriptions of Mrs. X's evolving mental models of both HyperCard and of
the stack she was creating.
Without a combination of a think-aloud protocol from her work and a detailed interview
with Mrs. X, it would be hard to be precise concerning her mental models. Let us assume that
we had access to such data, and let us for one moment also forget about the methodological
problems related to interpreting these data. Let us first concentrate on how she perceives the
stack she is creating.
Following diSessa (1986), a possible mental model of the stack could consist of a
structural model telling us something about the structure of the stack and a functional model
telling us something about how it works. Young (1983) provides an analysis of different
approaches to mental modeling of interactive devices. He suggests using task/action mappings
as a basis for such models. This is very close to diSessa's conclusions.
Part I, Theory 41
Text
"Visual feedback"
"Sensitivity"
Simulation
Switch Light
User
True/False
Glow/No glow Off/On
On press/On release
Has
Has
Has Has
Controls
Controls
Can be
Can be
Can be Can be
Controls
Figure 11. Mrs. X's structural model of the stack.
A structural model of Scenario II is shown in Figure 11. The dotted lines in the figure indicate
that Mrs. X has a notion about what parts of the stack can be changed by the user. The
diagram is a semantic network that represents the following description of the stack:
The stack consists of two main parts: a text and a simulated switch-light system. In
addition to a static text, the text contains some parameters that can be set by the user to
control the simulation. These parameters are visual feedback, and switch sensitivity.
Visual feedback can be true or false. Switch sensitivity can be "on press" or "on release".
The simulation consists of a switch and a light bulb. The switch can be tried out with
different settings of the parameters.
The functional model could be simply a set of rules:
To test out the switch for a certain set of parameters do in sequence:
1. Set the parameters.
2. Try out the switch
42 Understanding Interactivity
In more detail:
1. To set a parameter, do one of:
1a. If it is a Boolean parameter, turn the check box on or off by clicking.
1b. If it is a set of radio buttons, select a value by clicking in the
corresponding radio button.
2. To try out the switch do:
Press on the switch and observe what happens .
Mrs. X’s mental model of HyperCard would have a similar structure, but it would be much
more complex and would have to include a reference to her mental model of the underlying
operating system. Her mental model of HyperCard would also have to be able to describe how
she knows that she is building her stack "within" HyperCard.
Norman’s affordance concept would apply only to the GUI elements of the HyperCard
stack: the radio buttons, the check box, and the button. The “Knowledge in the Head/World”
distinction would tell us that the visually presented aspects of the stack represent “Knowledge
in the World”, while hidden aspects of its nature like the fact that it is a HyperCard stack is
“Knowledge in the Head”.
Compared to the Model Human Processor approach, we see that Cognitive Psychology
can account for the meaning of the stack as seen from Mrs. X. This is true only if we have
access to verbal protocols from the interaction, i.e. that the model is empirically based. In the
analysis of Scenarios Ia and Ib, we did not make this assumption. Much reference to mental
models ignore this difference between an idealized model of how users could interpret a
phenomenon, and an empirically-based model of how one or more users actually interpreted
their interactions.
One blind spot has to do with the lack of a social dimension in the analysis. Her
understanding of the stack as it shows up in the analysis is only related to the stack itself
without reference to the fact that it is intended as a message to Mr. Y.
Cognitive Psychology, at least the part of it that has so far been applied to the analysis of
computer use, does not contain any notion about software as carriers of meaning in humanhuman
communication. In Scenario II, Mr. Y's intended interaction with the stack is part of
the content of the communication between the two. Cognitive Psychology is blind to this
aspect of Scenario II.
Part I, Theory 43
Scenario III
Scenario III lends itself well to a "mental model" analysis:
First, Mr. X sees the game as consisting of squares on a background as if the squares were
"real". He consequently applies all the relevant RULES he knows from dealing with the
physical world. This leads him to see the squares as being in the foreground "over" the
background. His first MENTAL MODEL of the board can be described as a real-world
METAPHOR.
The scenario lends itself easily to a TASK ANALYSIS as we have here a well-defined TASK,
i.e. to figure out how to solve the puzzle. At some point the user realizes that he is not
able to reach his GOAL while insisting on his real-world METAPHOR. Hinted to the solution
by his wife, he tries out a different MODEL. In this MODEL it is possible to move squares
even if earlier on they did not exist as movable squares. The MODEL can be stated
something like: "Where ever there is something that looks as if it is a square, it is a
square". This involves swapping foreground/background. This reframing led Mr. X to a
new MENTAL MODEL of the board, which enabled him to reach the GOAL of his TASK.
With an inclusion of Gibson and Distributed Cognition, we must add that the squares afford
dragging. As all relevant aspects of the puzzle are visually present, the mental model must be
understood as “Knowledge in the World”. The puzzle is as such a good example of directmanipulation
interfaces.
In the following I will borrow a metaphor from philosophy of science and refer to Mr. X's
experience of reframing as his "paradigm shift". Thomas Kuhn actually referred to a classical
psychological experiment by Bruner & Postman involving reframing as an illustration of his
concept of paradigm shifts in science. (p. 63, Kuhn, 1962).
The blind spot here has to do with this shift. The mental-model theory is useful for
identifying Mr. X’s mental models before and after the shift, but it sheds no light on the shift
itself.
In another paper in the same collection, Williams et al. (1983) point to a similar blind
spot in the mental-model theory. They list some experimental data that they were not able to
explain with this theory. In a psychological experiment involving reasoning about simple
physical systems they found that one subject often used more than one mental model. They
stated:
"We consider the use of multiple models to be one of the crucial features of human
reasoning" (p. 148).
This feature was listed among what they considered to be the blind spots of Cognitive Science.
44 Understanding Interactivity
Discussion
We see that Cognitive Science has provided us with a set of useful concepts. First of all the
very idea that users will in most cases have an understanding of an artifact that is very
different from the technical understanding of the designer. This focus on “the user’s
perspective” was old news to psychology, but to a field with deep roots in engineering and the
natural sciences, this was important input. It has made software designers realize that in
addition to making things work, there is a need for some empathy with the user.
Don Norman’s application of Gibsonian psychology to HCI has made concepts such as
“affordance” and “constraints” part of the field’s language. The affordance concept has been
the most successful. It has given designers a way of talking about what a product tacitly
signals about its possible uses. This is included in the term “product semantics” from
Industrial Design, but Norman’s concept is more precise in pointing to use.
One of the reasons for the success of the mental-models approach in the HCI field, is
probably that it enabled software developers to express their understanding of the user in a
language they were familiar with, i.e. formal systems.
The fact that Cognitive Psychology makes use of the same language of description as AI,
and rests on the same epistemological assumptions, makes debates on AI highly relevant for
a discussion of “mental modelling”.
2.5 Phenomenology and Speech Act Theory
The first systematic attempt at coming up with an alternative foundation for software design
was done by Terry Winograd and Fernando Flores in their book "Understanding Computers
and Cognition, a New Foundation for Design" (1986). As mentioned in Chapter 1, the book
was targeted mainly at the AI community as a critique of this approach, but it has become
influential also in the HCI community.
Winograd and Flores (W&F) started out by analyzing and rejecting what they saw as the
ruling paradigm in computer science. They named this paradigm "the rationalistic tradition",
and showed that its epistemological basis is shared by most of Anglo-American science and
philosophy. As "an alternative foundation for design" they presented the works of Maturana,
Heidegger, and Austin, three important thinkers whose philosophies in different ways break
with important aspects of Anglo-American epistemology.
Maturana was included in the book because his writings marked their first step away
from Cognitive Science towards Phenomenology and Hermeneutics. We will here only
mention that Maturana and his co-author Varela found it necessary to reject the cognitivist
foundations when they wanted to describe how the brain works.
The speech act theory of Austin and Searle was included to challenge the implicit
assumption in cognitive science that the meaning of an utterance can be determined
independently of the context in which it is uttered. Austin states that we do thing with words.
He calls these acts speech acts.
A considerable part of the book is devoted to an interpretation of Heidegger. From the
vast work of this philosopher, they picked the following concepts and insights:
Part I, Theory 45
• The impossibility of a neutral point from where it is possible to make explicit all our
implicit beliefs and assumptions.
• The primacy of practical understanding over detached theoretical understanding
• Heidegger’s rejection of "the assumption that cognition is based on the manipulation of
mental models of representations of the world.." (p. 73).
• His rejection of understanding interpretation as an activity detached from social context.
To analyze tool use, they introduced Heidegger’s terms breakdown, readiness-to-hand,
present-at-hand. They used Heidegger’s example of the carpenter hammering in a nail:
“To the person doing the hammering, the hammer as such does not exist… The
hammer presents itself as a hammer only when there is some kind of breaking down
or unreadiness-to-hand.”. (p. 36)
To say that the hammer does not exist must be understood metaphorically. What it means in
this context is that the one using the hammer does not at that specific time deal with the
hammer as an object in the world. To the one doing the hammering, focus is on the task at
hand, and on all the materials and other elements involved. If a nail is being hammered into a
piece of wood by a carpenter, the nail and the piece of wood exist as objects for the carpenter,
but the hammer is “invisible/transparent” in use.
Heidegger is not easy reading, as he invented a complex terminology (see Chapter 1.3).
To avoid the subject-object connotations of our words for the human being, he invented the
term Dasein (German for “being there”) to denote the subject in the world (Being-in-the-world).
Scenario Ia
Winograd and Flores would most certainly have applied Heidegger to scenario Ia:
The analysis starts with what is given, i.e. that Mrs. X is THROWN into the situation of
being a social being in this technological world. She has decided to leave her home office.
If we assume that this is not the first time she leaves that room, the switch is part of "the
BACKGROUND of READINESS-TO-HAND that is taken for granted without explicit recognition
or identification as an object." (p. 36).
The act of turning off the light is part of her unreflective concernful acting in the world.
She has no mental representation of neither the switch, the light, nor the act itself. DASEIN
is simply acting. To the Heidegger of W. & F., the working switch is part of the
unreflective BACKGROUND of DASEIN as long as it works properly. It is part of what is
READY-TO-HAND (Zuhandenes).
46 Understanding Interactivity
Scenario Ib:
If for some reason the light had not been turned off when she touched the switch, the
unexpected behavior would have created a BREAKDOWN, and the switch would have
emerged to DASEIN as PRESENT-AT-HAND (Vorhandenes).
An alternative Heideggerian interpretation of Scenarios Ia and Ib can be achieved if we
include an aspect of Heidegger's theory that is implicitly introduced in W&F's examples.
On p. 37 they use the example of word processing:
"I think of words and they appear on my screen. There is a network of equipment that
includes my arms and hands, a keyboard, and many complex devices that mediate
between it and the screen".
Here they implicitly introduce Heidegger's tool concept (Equipment = Zeug) and the concept
of mediation. The story would now go:
Mrs. X has decided to leave her home office. This includes turning off the light. The act
of turning off the light is MEDIATED through the switch. During this act, the switch and all
the electric devices between the switch and the light become part of the EQUIPMENT of
DASEIN. In actual use it becomes TRANSPARENT to the user. The light is thus a "thing" that
is turned off.
What are the blind spots here? The first blind spot has to do with the level of detail that an
application of Heidegger's theory can provide. In this example the story contains no detailed
description of the working of the switch, nor what bodily movements are necessary to operate
it. The switch itself and Mrs. X's "mental model" of it appear as "black boxes" through
Heidegger's glasses. There is for example no way of deducing from the description that the
switch has toggle behavior. This is not to say that it is not possible to extend a Heideggerstyle
analysis down to the micro level of human-artifact interaction, but neither Heidegger nor
W&F give any advice on how this should be done.
The lack of detail should not come as a surprise if we take into consideration Heidegger's
rationale for analyzing tool use in his "'Being and Time". According to Capurro (1992),
Heidegger's tool analysis was not intended to say anything specific about modern technology.
It was included to help answering the more general philosophical question:
"How do we become aware of the world in terms of the open dimensions of our
existence in which we are normally immersed?" (Capurro, 1992, p. 367).
Part I, Theory 47
Heidegger's answer to this question is that we become aware of the world through
disturbances, what W&F call breakdowns. Capurro argues that W&F have an incomplete
understanding of Heidegger's concept of breakdown:
"Winograd & Flores call such a disturbance breakdown, thus simplifying the
Heideggerian terminology and missing the point. What happens in these cases is not
simply that tools become present-at-hand (Vorhandenes), instead of their former
practical way of being as ready-to-hand (Zuhandenes), but that the world itself, i.e.
the possibility of discovering beings within a structure of reference, becomes
manifest". (p. 367)
We should consequently not expect that an instrument developed for a different purpose
should be directly applicable to the analysis of human-computer interaction.
The second blind spot that emerges from a Heideggerian analysis of Scenario Ia has to do
with the way in which the switch indicates its availability, its “affordance”. In the scenario it
does this both through its shape and through its green glow. W&F did not include Heidegger's
treatment of signs in their book. Dreyfus (1991) addresses this question when he deals with
how Heidegger is able to explain that we also perceive the world when there are no
disturbances.
"Can we be aware of the relational whole of significance that makes up the world,
without a disturbance?" (p. 100)
Heidegger has a sign concept to deal with this problem, but to him a sign is always something
that functions in a practical context for those who "dwell" in this context. A sign can never
have a meaning independent of context and interpreter. Applied to Scenarios Ia and Ib, the
introduction of a Heideggerian sign concept would have added to the story that the green glow
for Mrs. X indicated that the switch was off, and that it could be turned on.
Scenario II
The Heidegger of W&F is not able to shed very much light on Scenario II. It is possible to
analyze Mrs. X's interaction with the computer as tool use, but W&F's Heidegger would be
blind to the details of the interactive nature of the stack and its role in the communication
between Mrs. X an Mr. Y.
To account for the social aspects of computer use, W&F included Austin and Searl’s
speech act theory. Austin argues that when we speak, we do not simply refer to things.
Everyday conversation consists of "speech acts" of different kinds such as descriptions,
questions, promises, rumors, and flirtations. It is thus meaningless to talk about the meaning
of an utterance without reference to the social context in which it appears.
From Austin's perspective, the stack in Scenario II can be seen as a communicative act
from Mrs. X to Mr. Y. This gives us a new perspective on Scenario II compared to the Human
Model Processor, Cognitive Science, Gibsonian psychology, and W&F’s Heidegger. The
most obvious blind spot concerning Scenario II has to do with the nature of this "speech act",
48 Understanding Interactivity
i.e. the fact that its content is interactive. We see that both Heidegger's tool perspective and
Austin's communication theory miss the fact that for Mrs. X an important aspect of her
"message" to Mr. Y is in his intended interaction with the stack.
Scenario III
W&F would probably have described Scenario III as a case of breakdown in the Heideggerian
sense:
Mr. X has a set of expectations about the behavior of the game, but the artifact does not
behave as expected. This leads to a BREAKDOWN. Mr. X consequently starts a process of
reflection on his own action, which results in a new interpretation of the situation. With
this new understanding of the situation he is able to solve the problem.
"Breakdown" as defined by W&F is slightly different from what happens here. W&F's
examples of breakdown are all related to tool malfunction. The computer game in scenario III
never stops behaving as expected. The problem is of a different kind, namely that the user is
not able to reach his goal using the "tool" the way he is doing.
If we see the interactive aspects of the game as a tool, this situation is a breakdown
("disturbance") in the original Heideggerian sense. Cappuro (1992) lists three kinds of
breakdown described by Heidegger, of which the first is called "Conspicuousness"
(Auffälligkeit)7. Conspicuousness is described as
"when we meet tools as something unusable, i.e. "not properly adapted for the use we
have decided upon. The tool turns out to be damaged, or the material unsuitable (p.
102 ["Being & Time"])" (p.368).
Mr. X's understanding of the behavior of the game was unusable for the given task. The fact
that there was nothing wrong with the tool as such, but with Mr. X's understanding of the tool,
makes little difference in a Heidegger-style analysis because his focus was on tools as they are
subjectively experienced and understood in use, and not as decontextualized objects "in the
world".
A second interesting difference has to do with Mr. X's process of reinterpreting the
results from the breakdown. To W&F, a breakdown always leads to a reinterpretation of the
situation in the sense that objects appear from the background of "ready-at-hand" activity.
When a tool is damaged, this happens quite spontaneously and often reveals some aspects of
the tools that can be of use in trying to "repair" the situation. In Scenario III, Mr. X’s
reinterpretation is a result of a long internal struggle involving both thinking and
communication with his wife. Again referring to Capurro, this reinterpretation is more similar
7The other two kinds of breakdown are "obstrusiveness" (when something is missing) and "obstinacy" (when
something is blocking further use).
Part I, Theory 49
to the process Heidegger calls change-over (umschlagen) where objects show themselves as
different things for different purposes. Change-over does not cover this case 100%, as Mr. X's
purpose has not changed, but it is as close as we get using the terminology of Heidegger.
In addition to ready-at-hand (Availableness) and present-at-hand (Unavailableness),
Heidegger lists Occurentness and Pure Occurentness as two more “modes of being” towards
an object. Pure Occurentness is when we simply contemplate an object without any intention
whatsoever. Occurentness on the other hand is when we recontextualize the object (e.g. the
carpenter’s hammer) from its natural environment (i.e. in the carpenter’s working hand) and
treat it purely as a physical objects with certain objective properties like weight and color.
Occurentness is the scientific attitude towards the world.
Applied to Scenario III, one could say that Mr. X “mode” of being towards the computer
game changed from Availableness to Uavailableness when he almost gave up, while it
changed into Occurentness when he started reflecting on the game in cooperation with his
wife.
Discussion
From this analysis of the application of W&F’s book to the three scenarios, we see that it has
added in important ways to our understanding of human-computer interaction.
Most important, W&F’s breakdown concept gave us for the firs time an instrument to
deal with the interrupted interaction in Scenario Ib. The implicit tool perspective has added an
important dimension to the analysis of the interaction. The breakdown concept also allowed
for the first time for an adequate analysis of Scenario III.
By including speech act theory, they provided a way of describing human-human
communication. Two different theories were used to describe human-computer and humanhuman
interaction (Heidegger and Austin respectively). This combination made it impossible
to describe the interactive message in Scenario II. This phenomenon still asks for an
integrated theory.
2.6 Ethnomethodology
Lucy Suchman's book "Plans and situated actions, the problem of human-machine
communication" (1987) was written as a contribution to the ongoing scientific discourse at
that time on machine intelligence. As with W&F's book the year before, Suchman's work
turned out to be an inspiration also for the HCI community.
Suchman's main point in her book was that purposeful human action is not primarily
rational, planned, and controlled as we like to think, but is better described as situated, social,
and in direct response to the physical and social environment. This view differed from the
dominant paradigm both in the AI community and in Cognitive Sciences at that time. She
made the point that ordinary interpersonal interaction is very complex and differs dramatically
from human-computer interaction.
Suchman's theoretical foundation is a combination of ethnomethodology and
phenomenology. Suchman based a lot of her work on detailed analysis of empirical data.
50 Understanding Interactivity
Ethnomethodology provides a powerful methodology for uncovering the subtle details of
human communication.
To Suchman, language as it is used in conversation is mainly indexical in the sense that
the meaning of an utterance can only be understood with reference to the actual situation in
which it is uttered. "Following Peirce", she wrote, she observed that the sign
"is also a constituent of the referent. So situated language more generally is not only
anchored in, but in large measure constitutes, the situation of its use." (p. 62).
The latter means that the situation is a result of the conversation itself. Ethnomethodology
treats non-verbal action in this same way, as both anchored in practice and at the same time as
constituent of practice.
To be able to deal with tool use, Suchman introduced Heidegger in her theoretical
toolbox. For an ethnomethodologist, this is a very natural thing to do, as the field has strong
philosophical roots in phenomenology.
Scenario I
Suchman's cases deal with devices much more complex than the switch in Scenario I, i.e.
copiers. Being confronted with Scenario I, she would probably have made an Heideggerian
analysis. Her interpretation of Heidegger is more detailed than what is found in W&F's book.
She cites Dreyfus (1991) concerning the transparency of "equipment" in use, and on the
different kinds of disturbances ("breakdowns"). Her analysis would consequently be more or
less identical to the Heideggerian analysis in Chapter 2.5.
As with W&F, this would leave us with at least two blind spots: First, the analysis leaves
out a lot of the details of the interaction. Second, it does not account for what Norman would
have seen as the affordance of the switch.
Ethnomethodology would probably not be of very much help in analyzing Scenario I, as
the interaction is very primitive and does not contain any linguistic data. An analysis is
presented here to introduce her coding scheme.
In her data analysis of human-computer interaction she coded the data into four columns,
two for the user and two for the system. For the user she listed to the left everything that was
said and done that was not available to the system. To the right she listed those actions that
were detectable by the system. For the system she lists to the left the parts of the system state
that is available to the user, and to the right she listed the Design Rationale.
Scenario Ia could be coded into the following table:
Part I, Theory 51
User System
Not available to the
system
Available to
the system
Available to
the user
Design
Rationale
Mrs. X is walking
Towards the door.
Light is on.
Switch is visible.
Switch can turn off
light.
Mrs. X touches the
switch.
The light goes off.
The switch glows.
Switch can be seen in
the dark.
Mrs. X leaves the
room.
Table 1. The interaction in Scenario I.
As long as everything works smoothly there is not much more to be said about Table 1. If the
switch had been damaged and the action had not lead to the desired result, this would have
appeared here as a table without the two last rows. If Mrs. X's next move had been to touch
the switch a second time, this would have been coded as a new row.
The blind spot with such an analysis of non-verbal human-artifact interaction is that it
only records the user's actions. To say anything about what an action means to the user, we
have to make use of other methods.
Scenario II
Given that we had access to video recordings of both Mrs. X and Mr. Y's interaction with
their computers, Scenario II would have provided a practitioner of ethnomethodology with a
substantial amount of data. These data would have to be coded into tables like Table 1 with
columns for the human actions, the computer-generated responses, and the design rationale
behind these responses. To give a feel of what this could look like, let us perform an analysis
on a small section of these two imaginary videotapes.
52 Understanding Interactivity
User System
Not available to the
system
Available to
the system
Available to
the user
Design
Rationale
She writes code on
paper.
The switch button is
selected. She is in
"button mode".
She opens a window
on the code of the
switch.
The code is available
as text.
The code can be read,
modified, and saved.
She changes the code
to check for the
parameter.
The new code is there
as text.
She saves the code
and enters "run
mode".
The hand appears to
indicate "run mode".
It is important to
provide visual clues
for modes.
She sets "visual
feedback" to "no".
The cross disappears. Check boxes are
WYSIWYG *
She presses the
switch button
The "light" turns off,
but no "glow".
This is the
simulation.*
She sets "visual
feedback" to "yes".
A cross in the check
box.
Check boxes are
WYSIWYG *
She presses the
switch button
The "light" turns off,
and "glow".
This is the
simulation.*
Table 2. Mrs. X programs her stack and tries it out.
For Mrs. X, let us have a look at how she connects the "visual feedback" radio button to the
behavior of the switch. Let us assume that she does this by programming the switch in such a
way that when a user clicks on it, it checks the state of the "visual feedback" parameter to
determine if it should highlight or not. Let us further assume that having done this, she tests
out the switch to see if her HyperTalk code works. This interaction sequence could be coded
as in Table 2.
There is a notation problem here in the "Design Rationale" column in the boxes marked
with *. When the user is constructing something in HyperCard, we should state the
design rationale behind HyperCard. The problem arises when the stack designer is testing out
something she has made herself. Should we then list the design rationale of HyperCard (i.e.
that users should be able to test their stacks), or should we state the design rationale of the
stack designer.
The reason for these problems lies in the nature of Scenario II compared to the cases
provided by Suchman in her book. Suchman's users never design interaction. Someone else
had always designed the intended interaction, and her users could never change these rules.
Part I, Theory 53
To be able to analyze software design of the kind we find in Scenario II, we would have to
identify whose design rationale we are talking about when we interpret the interaction.
Let us cut out a similar sequence from Mr. Y's interaction with the stack. Let us assume
that he tries out the switch with the two different settings of the "Visual Feedback" parameter.
This could have been coded into Table 3.
User System
Not available to the
system
Available to
the system
Available to
the user
Design
Rationale
He opens the stack in
HyperCard
The hand appears to
indicate "run mode".
It is important to
provide visual clues
for modes.
He reads the text.
He sets "visual
feedback" to "no".
The cross disappears. Check boxes are
WYSIWYG
He presses the switch
button
The "light" turns off,
but no "glow".
This is the simulation.
He sets "visual
feedback" to "yes".
A cross in the check
box.
Check boxes are
WYSIWYG
He presses the switch
button
The "light" turns off,
and the switch
"glows".
This is the simulation.
Table 3. Mr. Y tries out the stack.
We see that in Table 3 we have the same interaction as in the end of Table 2, but Mr. Y's
interaction has a totally different meaning. It is to him a set of design alternatives to be
actively tested, while to Mrs. X the same interaction was done to test that the stack worked the
way she wanted.
We see that ethnomethodology highlighted very well the situated nature of humancomputer
interaction. Meaning is always created in a context, and the same interaction, as
seen from outside, can have many possible interpretations depending on the frame of
reference.
The level of detail that the methodology creates is very helpful. As a researcher it helps
you get emphatic understanding of the use situation, but unfortunately ethnomethodology
alone does not help us very much when we want to communicate this understanding. This is
probably the reason why Suchman included a chapter on Heidegger.
In the current analysis, I have used Suchman's coding table as a general method for
describing human-computer interaction. The table was created for the purpose of showing
how little information about the user is actually available to a machine. Applying this table
rigorously here might not be the best way to use ethnomethodology.
54 Understanding Interactivity
In addition to highlighting the details of the interaction, ethnomethodology could also
have provided us with an analysis of Mrs. X and Mr. Y’s organizational context and how their
communication should be understood as part of this.
Scenario III
Let us again assume that we have access to a video recording of the interaction between Mr.
X, his wife, and the computer in Scenario III. We could then have coded this interaction into a
table. An analysis of this table would have identified a breakdown when Mr. X was not able
to solve the puzzle. This breakdown differs from the breakdowns reported in Suchman's case
study in that it is a breakdown by design. The design rationale behind the whole puzzle is to
force the user into a dead-end situation where he has to reframe his conceptions about
foreground and background. In the same manner as Brecht used forced interruptions
(verfremdung) in his plays to force the audience to reflect on the relation between fiction and
reality, the forced breakdowns in "Heaven and Earth" create in the user an awareness about
the computer as a medium.
The blind spot here is that ethnomethodology does not provide us with any concepts for
dealing with problem reframing of the kind happening in Scenario III. Used as we have done
here, it is blind to the details of what is happening on the screen and it can never capture "the
rules of the game" as they are perceived by the user.
Discussion
Ethnomethodology, as applied by Suchman to human-computer interaction, has provided us
with a deep appreciation for the contextual nature of meaning in interaction. Especially for
Scenario II it showed how meaning is always created in a situation, and how the interpretation
of the situation in the next moment constitutes the situation.
Comparing this with the theories treated so far, the situatedness of interaction and
communication is only partly implicit in Austin’s speech act theory. W&F’s analysis of
Heidegger does not treat the contextual nature of tool use explicitly. From a certain
“objectivist” reading of W&F it is possible to conclude that Heidegger would have made the
same analysis of a carpenter hammering a nail into a piece of wood, as of a Roman soldier
hammering someone to a cross. This is of course a “devilish” reading of both W&F and
Heidegger, but highlights ethnomethodology’s major contribution.
2.7 Activity Theory
Bødker (1990) presents Activity Theory as an alternative foundation for HCI. Activity Theory
(AT) is a branch of Marxist psychology that was developed in the former Soviet Union by the
followers of their famous psychologist Vygotsky (1896-1934). The focus of AT is on
individual and collective work practices, but the framework can easily be applied to other
domains of human life. AT is presented here as it appears in Bødker's book.
Part I, Theory 55
Simply stated, the theory breaks human work down into three levels and two aspects. The
topmost level in the analysis is called activity. Examples of activities are: traveling from town
A to town B; cleaning the house; writing a letter. Individual activities can be part of collective
activities involving more than one person working for the same goal. Examples of collective
activities are: playing a game of soccer; landing the first man on moon; constructing a
software product.
Individual activities are composed of actions. An action is something the subject is
conscious of doing. The activity of writing a letter on a PC could include the following
actions: powering up the PC, starting the word processor, typing in the letter, saving it,
printing it out, turning off the PC.
Actions are composed of operations. Operations are usually not articulated. They can be
articulated in retrospect, but in the actual work situation the worker is not conscious of
performing the operation. Examples of operations are: pressing a key to type in a letter,
moving the mouse to place a cursor, taking a piece of paper out of the printer. Operations are
triggered by the material conditions of the situation.
Every action can have two aspects: a communicative side and an instrumental side. The
communicative side has to do with the action as part of communication with other human
beings. The communicative aspect of writing a letter is that it is a way of communicating with
the recipient. The instrumental side of an action is everything having to do solely with the
material aspects of the work. The instrumental side of writing a letter on a computer is the
changes of computer state involved. The AT research tradition makes a strong point of not
using communication to denote interaction with inanimate matter. It is also meaningless in
this tradition to talk about making changes to states in other people's minds. Instrumental
activities are directed solely towards objects; communicative activities are directed solely
towards subjects.
When an action fails to give the anticipated result, a breakdown situation occurs. The
operations that the action is built up from then get conceptualized and might become actions
in the next try. In the same manner, an action that is done many times gets operationalised
(automated) and becomes an operation. Actions both towards objects (instrumental) and
subjects (communicative) are often mediated through artifacts.
Bødker sees her approach as a synthesis of the tools approach of Ehn (1988) and others in
the Participatory Design community, and the media approach of Andersen (1991). Concerning
design practice, she puts a strong emphasis on user involvement in the design process, and on
the use of early prototyping.
56 Understanding Interactivity
Scenario Ia
From an Activity Theory perspective, Scenario Ia might look as follows:
Mrs. X's INDIVIDUAL ACTIVITY of leaving her home office is part of the COLLECTIVE
ACTIVITY of the family, which might be to get ready to see their favorite late night show
on the television. Leaving the room to a large extent only has an INSTRUMENTAL side. The
larger ACTIVITY of going up there in the first place also had the COMMUNICATIVE aspect of
reading her e-mail and possibly responding to it. The interaction with the computer itself
is not regarded as communication as the computer is not (and never can be) a subject.
The ACTIVITY of leaving the room involves the ACTION of turning off the light. This
action includes the OPERATION of touching the switch. The switch, the light, and the rest of
the room constitute the MATERIAL CONDITIONS of the situation. These conditions determine
what OPERATIONS are possible.
Other ACTIONS are the opening of the door, the stepping out, and the closing of the door.
The ACTION of turning off the light is MEDIATED through the switch and the electric
equipment.
Activity
Action
Operation
Leaving the room
Turning off the light Opening the door
Pressing the switch Turning the knob
Time
"Conscious"
"Nonarticulated"
Figure 12. The three levels of the instrumental side of Scenario I.
Figure 12 illustrates the interaction in Scenario Ia when the switch works as expected. This
drawing is included here only as an illustration and does not represent any graphical notation
of Activity Theory.
Part I, Theory 57
Scenario Ib
If the light had for some reason not gone off when she touched the switch, this would
have lead to a BREAKDOWN SITUATION, and the OPERATION would have been
CONCEPTUALIZED. If she had next decided to try once more, the act of touching the switch,
which had previously been an OPERATION, would now be a conscious ACTION consisting of
primitive OPERATIONS like moving the arm.
Leaving the room
Turning off the light
Pressing the switch
Time
Breakdown
Conceptualisation
Pressing the switch
Moving the hand
Activity
Action
Operation
Figure 13. The breakdown situation in Scenario Ib.
Figure 13 shows the processes of conceptualization that occurs immediately after the
breakdown situation in Scenario Ib.
One blind spot here has to do with the methodological problems concerning how to
decide what is an operation and what is an action in this example. Bødker's book gives very
little direction on how to interpret empirical data, and we consequently have to make a lot of
assumptions concerning what is operationalised in Scenario I.
In the same manner as in the Heideggerian analysis, the sign aspect of the switch is partly
hidden by AT. The switch is described as part of the material conditions of the situations, but
AT can not provide us with anything similar to Norman's affordance concept. Such a concept
could have helped us describing in detail how certain properties of the material conditions
"afford" certain operations.
Another blind spot produced by Activity Theory as it is presented by Bødker is related to
Mrs. X's understanding of the working of the switch. The mental-model concept did this
within the Cognitive Science paradigm. As the epistemological foundation of AT is
incompatible with the cognitivist view that knowledge "resides" in the mind of the user as
symbolic representations, AT would have to account for the user's understanding of the
interactive aspects of an artifact in a different manner.
58 Understanding Interactivity
Scenario II
The following description is quite general, but hopefully it gives an idea of how Activity
Theory could have treated Scenario II :
Mrs. X is engaged in the COLLECTIVE ACTIVITY of designing a switch to be places in a
large building complex. Her INDIVIDUAL ACTIVITY is to build a stack in HyperCard to
illustrate for Mr. Y some design alternatives. This ACTIVITY has two aspects: a
COMMUNICATIVE and an INSTRUMENTAL.
The COMMUNICATIVE aspect is the fact that the stack is part of the communication
between Mrs. X and Mr. Y.. It is a message with an intended meaning. From this
perspective the creation of the stack is a COMMUNICATIVE act.
The INSTRUMENTAL aspect of the ACTIVITY of creating the stack consists of ACTIONS like
creating a button, writing a HyperTalk script, and testing the stack. Each of these ACTIONS
consist of OPERATIONS like clicking a button, typing in text, and making selections in a
menu. When using HyperCard in "edit mode", her work towards the stack is MEDIATED
through HyperCard. The stack becomes an OBJECT that she works on with the TOOLS
provided by HyperCard. When she tries out the stack, her interaction is MEDIATED
through the operating system and the HyperCard runtime system.
We see that with AT we are in the fortunate situation of being able to cope with both the
communicative and the instrumental aspect of Scenario II within the same theoretical
framework. W&F's book had to use phenomenology to deal with the instrumental side, and
speech act theory to deal with the communicative side.
Activity Theory still leaves us with some blind spots. First, it does not have any notion of
interactive signs. As was the case both with Cognitive Science, Heidegger, and Austin,
Activity Theory is not able to give an adequate description of the fact that an important part of
Mrs. X’s message to Mr. Y is in his intended interactions with the switch.
The second blind spot has to do with the meaning of the stack. The stack contains a lot of
internal references both explicit in the text and implicit in the underlying code. The text refers
to the switch as for example in: "Try changing its behavior....". The setting of the parameters
automatically leads to behavioral changes in the switch. These blind spots result from the
lack of something similar to a semiotic theory in Activity Theory as it is presented by Bødker.
Scenario III
Activity Theory as presented by Bødker provides a breakdown concept very similar to its
Heideggerian counterpart. It lies somewhere between W&F’s version of Heidegger, and
Heidegger's original uses of breakdown in "Being and Time”. Bødker defines a breakdown
situation as a situation
Part I, Theory 59
"in which some unarticulated conflict occurs between the assumed conditions for the
operations on the one hand, and the actual conditions on the other; between the
human reflection of the material conditions, and the actual conditions" [p. 27].
To her, a breakdown both leads to the appearance of transparent tools as objects, and to a
conceptualization of the tacit operations involved. The first part is identical to W&F's
interpretation, the latter implies a self-reflection that is not described explicitly by W&F, but
that is present in the original Heidegger. Heidegger's concept of disturbance (breakdown)
goes further. To him, a disturbance leads to a disclosure of the world of the subject. It is an
event that forces the subject to become aware not only of the details of his operations, but also
of the whole structure of his environment and of the way in which he is present in this
environment. Scenario III according to Bødker could go as follows:
Mrs. X is showing a computer game to her husband. Mr. X. is applying to his
interpretation of the game the experience of being involved in everyday practice, i.e.
handling physical objects. This leads him to make certain assumptions about its behavior.
His ACTIONS consist of OPERATIONS, like moving a square. These OPERATIONS are
triggered by the MATERIAL CONDITIONS of the game. These MATERIAL CONDITIONS include
the visual appearance of the squares, and their spatial arrangement. The MATERIAL
CONDITIONS of the game all trigger OPERATIONS that correspond to OPERATIONS in the
physical world.
At a certain point in the interaction we get a "conflict between the assumed conditions
for the operations on the one hand, and the actual conditions on the other hand" (p. 27)
i.e. a BREAKDOWN SITUATION. This BREAKDOWN SITUATION plus some thinking leads to a
new understanding of the MATERIAL CONDITIONS of the game. This makes it possible for
him to perform an ACTION that he would otherwise not have thought possible.
We see here that we end up with a problem very similar to the problem we encountered with a
Heideggerian interpretation. Bødker's prediction would be that a breakdown should lead to a
conceptualization of the operation involved. In this case this would be the "dragging and
dropping" of squares. This is only partly the case, because in the discussion with his wife he
says "...to get it right I have to get rid of the square in the middle.. ..but that's impossible...".
This can be interpreted as a conceptualization of the "drag and drop" operation, but the
breakdown and the resulting reflection also lead to a reinterpretation of the material
conditions. In this example, the behavior of the game was designed in such a way that this
reinterpretation made it possible to apply the "drag and drop" operation without modification.
Mr. X only needed to see the game "through new glasses".
Bødker's book does not contain any description of how a breakdown situation can lead to
a reinterpretation of the material conditions. It is possible to read her book as if she believes
60 Understanding Interactivity
material conditions exist in the world more or less independent of personal and social
contextualisations. This "naive realism" interpretation of Bødker is probably unjust and due to
a lack of examples involving reframing in her case studies.
Discussion
With Activity Theory we got an integrated theory capable of dealing with both humancomputer
and human-human interaction/communication. ATs distinction between
communicative and instrumental actions is in this respect very clarifying. I find its insistence
on not treating objects as subject, or subjects as objects, as healthy.
With its layered analysis of activities, actions, and operations, it also gives a deeper
insight into the role of consciousness in interaction and communication.
2.8 Semiotics
In the book "A theory of computer semiotics" (1990) P.B. Andersen presents a framework for
analyzing the computer and its use from a semiotic perspective. His book has three parts
"Theory", "Computers", and "Language, work, and design".
In the theory part he discusses the different traditions within linguistics and semiotics,
and how this applies to a semiotics of the computer. He explicitly rejects both Chomsky's
generative paradigm and Montague's logical grammars as fruitful frameworks for analyzing
human-computer interaction. He justifies this with their lack of attention to context and the
situated nature of language use. He also partly rejects Halliday's systemic grammar and
proposes to use the Glossematics of the Danish linguist Hjelmslev as a basis for his analysis.
One of his reasons for choosing Hjelmslev is this theoretician's insistence on always starting a
linguistic analysis directly from the text to be studied, and not from any external
understanding of its content. As Hjelmslev's theories had been "out of use" for two decades,
he had to reinterpret and extend Hjelmslev to fit the task.
Andersen goes on to develop a design methodology for user interfaces that makes use of
careful linguistic analysis of situated work language as input. In one example he shows how a
command language was redesigned to better suit the mental model of its users. This "mental
model" was induced from an analysis of the implicit metaphors in their work language. He
based this part of this work partly on the metaphor concept formulated by Lakoff and Johnson
(1980).
Andersen explicitly discusses how his semiotic approach relates to Bødker's book. To
him, a semiotics of the computer will always have a blind spot concerning the construction of
systems. Where Bødker puts an emphasis on user participation, Andersen simply shows how
to make an analysis of work practice. He identifies another blind spot in his theory concerning
what he calls "silent operations", i.e. the parts of the interaction with the computer that are
“automated” by the user. Andersen recognizes that he has no concepts to deal with tools that
disappear in a use situation.
Neither does he have any concept to deal with situations where the behavior of a system
is different from the user's expectations. Neither does the framework have any concept for
Part I, Theory 61
describing a user's intention. To him the system is like a text in the observable world that is
"read".
A sign analysis of interactive media
In his part II, "Computers", he develops a formal semiotics of user interfaces. He sees the
interface as composed of signs, of which some are interactive. He uses a modified petri-net
formalism to describe the appearance and behavior of these signs. Every sign can have
permanent, transient, and handling features. The permanent features of a sign are the parts of
its visual appearance that are stable throughout the interaction. The transient features are the
parts of the visual appearance that can be changed. The handling features describe the ways in
which the user can interact with the sign.
Region User input
User action Visible state
Process
Figure 14. The elements of Andersen's perti-net formalism.
His perti-net formalism introduces the concepts state, process, region, and user input. The
visual representation of these elements can be seen in Figure 14. (from p. 142)
He goes on to identify 6 classes of signs (pp. 199 - 212) as different combinations of the
three features.
These are:
1. Interactive signs Permanent + Transient + Handling
2. Actor signs Permanent + Transient + Internal action
3. Controller signs Permanent + Internal action
4. Object signs Permanent + Transient + can be controlled by others
5. Layout signs Permanent only
6. Ghost signs Invisible, but can control other signs.
The perti-net representation of an interactive sign consists of a state, a process, user input, and
user action.
62 Understanding Interactivity
Action
State State in
another sign
Interactive sign
Figure 15. An interactive sign.
A user action triggers a user input. The input, together with the internal state of the sign and
possibly of other states in other signs, triggers the internal process of the sign. This process
may affect both the internal state of the sign and states in other signs. This is illustrated in
Figure 15.
Language games
In part III of his book he introduces language games as an analytical tool to deal with real-life
linguistic data. His concept of language games is very similar to Wittgenstein's language
game concept (Wittgenstein, 1953), but he never refers directly to his philosophy. This is a bit
peculiar as he refers directly to Pelle Ehn's Ph.D. thesis (1988), which uses Wittgenstein as
one of its most important theoretical foundations.
It is hard to understand Andersen's position concerning the status of a semiotic analysis.
At one hand he states that no analysis is possible without reference to concrete subjective
interpretations:
"The sign product we construct only comes into existence when people interpret it;
only at that point, and not before, can we use semiotic methods for analysing it"
(p.25).
This view dominates his analysis of situated work language in part III, where the starting
point is always the observed use of language. In part II however, he starts out an analysis of
the sign structure of a user interface without any empirical foundation whatsoever. Reading
only part II of his book, his semiotics of the computer can easily be interpreted as a formal
method that can be applied rigorously without reference to actual users or use situations.
He never shows how his methodology for analyzing linguistic data from part II of the
book should be applied to the analysis of users interacting with graphical user interfaces. To
bridge this gap he would have had to develop a way of dealing with what he calls "the silent
Part I, Theory 63
operations" of human-computer interaction. As Andersen recognizes, an application of
linguistic/semiotic theory to graphical user interfaces is not straightforward.
Scenario Ia
Andersen writes specifically about the computer, but his semiotic framework is general
enough to be applied to other interactive artifacts. The following is a semiotic analysis of Mrs.
X's interaction with the switch-light system:
The switch is an INTERACTIVE SIGN. The HANDLING FEATURE of the switch is the fact that it
changes its STATE when you touch it with your finger. It has both PERMANENT and
TRANSIENT FEATURES. Its PERMANENT FEATURE is its outer shape and color, i.e. every visual
aspect of the switch that does not change. Its TRANSIENT FEATURE is the green glow that
can be in one of the two STATES: On or Off.
The light is an OBJECT SIGN. It is CONTROLLED by the switch SIGN, and can be in one of
the two STATES On or Off. It does not have any HANDLING FEATURES as it can only be
turned on and off through the switch. Its PERMANENT FEATURE is its appearance as a lamp,
while its TRANSIENT FEATURE is whether it emits light or not.
Mrs. X reads these two SIGNS and makes an ACTION. The ACTION is to touch the switch.
This makes both the switch SIGN and the light SIGN change STATE.
On/Off
Glow
Switch sign
Bulb
Light sign
Figure 16. Scenario Ia as an interactive sign and an object sign.
Figure 16 shows the perti-net for scenario Ib consisting of an interactive sign controlling an
object sign.
64 Understanding Interactivity
Scenario Ib
If for some reason the light had not been turned off when she touched the switch, Mrs. X
would have been dealing with a different SIGN system where the switch sign did not
CONTROL the light SIGN.
As expected from Andersen's own analysis of the blind spots in his theory, the semiotic
analysis of Scenario I does not say anything about what parts of the interaction are conscious
to the user and what are "silent operations". On the other hand, the semiotic analysis is very
strong concerning the details of how the switch-light system appears and works. As such, the
resulting semiotic analysis looks very much like an idealized "mental model" of that system.
If we instead of identifying the "signs" of the interface had gone directly to a petri-net
analysis, we would have got the model shown in Figure 17. The switch in Scenario Ia is here
represented as one input, one process, and two state variables. This model has a lot in
common with the FSA that resulted from the mental-model analysis in Chapter 2.4. As such,
the petri-net models do not differ from technical models used by electrical engineers and
computer programmers to describe systems behavior.
On/Off
Bulb
Glow
Switch-Light System
Figure 17. A perti-net representation of Scenario I.
Using Norman's terminology (Norman, 1988), the semiotic analysis gives us the design model
of the switch-light system, i.e. the designer’s conceptual model of how the artifact "should" be
interpreted. The design model is not always identical to the user's model, and this brings us
directly to the second part of Andersen's book where he puts an emphasis on grounding a
Part I, Theory 65
semiotic analysis in empirical field data. Since empirical data to Andersen are always of a
linguistic kind, a semiotic analysis of the concrete situation in Scenario Ia would be
impossible because little was said when she turned off the light.
Scenario II
The analysis of Scenario III will be done at two levels, following the division in Andersen's
book. First, I will present an analysis of Mrs. X and Mr. Y's interaction and show how the
stack fits into the larger picture of their language game. Second, I will present a more detailed
analysis of the stack using Andersen's sign theory.
An analysis of Mrs. X and Mr. Y's collective work practice could go as follows:
Mrs. X and Mr. Y can be seen as being involved in a LANGUAGE GAME about the design of
a switch. Calling it a game is not meant to indicate that this work is not taken seriously
by the participants, it is just a way of pointing to the structured nature of their interaction.
The LANGUAGE GAME can be described as the PROFESSIONAL WORK PRACTICE of designers
and engineers. Mrs. X and Mr. Y's professional relation is to a large extent given by their
roles in their organizations and by the way these organizations cooperate. The latter
includes formal agreements, deadlines, and mutual responsibilities. The LANGUAGE GAME,
at least as it is described here, is a game with the explicit goal of reaching an agreement
on a design alternative. If we assume that there is no hidden agenda involved, their
interaction can be interpreted as an open creative process. Different hidden agendas
would have forced us to make different reinterpretations of their observed behavior.
The stack is part of the interaction between Mrs. X and Mr. Y. Mrs. X has created it to
present an idea. It is an invitation to Mr. Y to try out the practical consequences of some
of Mrs. X's design alternatives. It has a form similar to an informal note or an internal
design document. This is a GENRE that is very common among designers. The stack differs
from the informal note in that it is interactive. It borrows its interactive elements from the
GENRE of computer applications.
Andersen's empirical cases deal exclusively with verbal communication, but his theory could
easily be extended to include written communication like informal notes. It is more difficult to
see how a concept of interactive messages could be included in his language games. He ends
up with a blind spot when it comes to messages that are intended to be tried out and used, i.e.
when the intended meaning is in the intended interaction.
The stack can also be analyzed as a sign system:
66 Understanding Interactivity
The static text in the stack consists of LAYOUT SIGNS because they have no TRANSIENT
FEATURES or HANDLING FEATURES, i.e. they do not change appearance and can not be
interacted through. The interactive elements of the stack can be described as INTERACTIVE
SIGNS. The parameter-setting SIGNS CONTROL the switch-light SIGNS.
The major problem with this analysis is of course that it is not based on empirical evidence,
and the choice of primitive signs is consequently arbitrary. Andersen's book gives us very
little advice on how to identify the signs of this scenario: Should the check-box sign also
include its leading text? Should the two radio buttons be regarded as one sign with two states,
or as two separate signs? Should the switch and the light be regarded as one sign or as two
signs? The lack of an empirical grounding makes the analysis a formal exercise.
Andersen's approach assumes that users decompose user interfaces along his sign
categories. The generality of this remains to be shown empirically. In Scenario II, Mrs. X might
as well have perceived the two parameters plus the switch as the basic interactive unit, instead
of seeing the stack as a composition of interactive signs. I find Andersen's concept of
interactive signs to be an important step towards a theory of interactivity, but I see it as a
problem that his choice of first-class objects is based on formal properties of the computer
medium, and not on empirical data.
Scenario III
If we concentrate on Mr. X's interaction with the game, we can make a semiotic analysis of
the game as it appears to him before his "shift".
The board consists of a LAYOUT SIGN and four INTERACTIVE SIGNS. The LAYOUT SIGN is the
cross to the right consisting of the five squares. The INTERACTIVE SIGNS are the four
movable squares. The PERMANENT FEATURES of the interactive squares are their shape and
color, and the TRANSIENT FEATURES are their position. They also have a HANDLING FEATURE,
which is the fact that they can be moved around with the mouse.
The only possible blind spot so far in the analysis is the lack of something similar to Norman's
constraint concept as presented in “The psychology of everyday things” (1988). Such a
concept would have made it possible to express the way the squares block each other to
prevent overlapping.
After the shift, Mr. X has a different model of the game board. In semiotic terms, we
would have to say something like:
Part I, Theory 67
A certain combination of the INTERACTIVE SIGNS lead to the possibility of creating new
INTERACTIVE SIGNS. When square SIGNS are positioned such that a black square appears, it
is possible to remove this square to reveal another white square behind it.
The first blind spot here is that there is no room for sign creation of this kind in Andersen's
semiotics of the computer. Andersen's petri-nets can not be used to express Mr. X's mental
model after his shift.
Andersen's framework has no concepts to deal with "paradigm shifts" of this kind. From
a semiotic point of view such a shift could be described as a change of context. Andersen
partly points to this problem when he discusses the non-problematic relation between a
software designer's intended meaning and the user's interpretation of the running code:
"The weakness of the Physical Model view is that it implicitly assumes that the ideas
expressed by the programmer in the program texts are communicated to the user
through the program execution" (p. 192).
The example he uses to illustrate this point is of a similar kind as Scenario III, but without an
interactive dimension. He shows a picture created by a computer program as squares within
squares. The squares are drawn black on white, but they create the impression of white
columns on a black background. This example is similar to the classical drawing used by the
gestalt psychologists of a vase/two faces, and Wittgenstein's duck/rabbit.
We see that even though Andersen has a concept to deal with interactive signs, he lacks a
way of describing how interactive signs are embedded in an interactive medium. Different
media give different relations between the signs. With text, the signs are ordered in a
sequence. In a painting, the signs have spatial relations. In a film, the signs have both spatial
and temporal relations. In an interactive medium, the relations between the signs have
interactive aspects. His framework is not able to deal properly with cases where not only the
signs are interactive, but where the medium itself has interactive properties. The implicit
medium in Andersen's analysis is something similar to a HyperCard stack with cards and
buttons; a set of two-dimensional spaces on which interactive signs can reside together with
non-interactive signs. Scott Kim’s “Heaven and Earth” in Scenario III hopefully illustrates
that there is more to interactive media than can be captured by HyperCard’s card metaphor.
Discussion
The most powerful new concept from Andersen’s semiotics of the computer is the interactive
sign. By making signs interactive, he knocks an interesting first hole in what Readdy (1993)
calls the conduit metaphor for describing linguistic phenomena. With this simple addition to
semiotics he enables a semiotic analysis of much interactive software. His analysis is still very
much formal, and as such very close to the Cognitive Science tradition. Its semiotic nature
made it much more natural than mental-model analysis.
68 Understanding Interactivity
As Andersen pointed out, his semiotic approach is blind to the tacit aspects of humancomputer
interaction.
2.9 Computers as Theatre
In her book "Computers as Theatre" (1991) Brenda Laurel uses classical Aristotelian dramatic
theory as a guide for understanding human-computer interaction. She sees interactivity as
"acting within a representation" (p.21). Already in (Laurel, 1986) she had developed the
concept of "first-personness" to describe the user's experience of agency within a
representational context. She describes this experience as "engagement":
"Engagement is only possible when we can rely on the system to maintain the
representational context. A person should never be forced to interact with the system
qua system; indeed, any awareness of the system as a distinct, "real" entity would
explode the mimetic illusion, just as a clear view of the stage manager calling cues
would disrupt the "willing suspension of disbelief" for the audience of a traditional
play". (Laurel, 1991, p. 113).
She then reinterprets human-computer interaction in terms of the six levels of a play as
defined by Aristotle in his Poetics. To Aristotle, each level is made up of elements from the
level just below as its "material cause". The six levels are from bottom-up: Enactment,
Pattern, Language, Thought, Character, and The-whole-action. The levels go from
uninterpreted sensory input at the bottom to the whole human-computer interaction at the top.
The concept of interactivity is only introduced at the top level, i.e. The-whole-action. She
illustrates this with a diagram of what she calls "the flying wedge in interactive forms".
Possible Probable Necessary
Potential
Potential
Time Time
Necessary1
Necessary2
Necessary3
Necessaryn
Theatre: Interactive form:
Figure 18. "The flying wedge" of drama and of interactive form.
In Figure 18 (pp. 70-72) can be seen "the flying wedge" of traditional drama contrasted with
"the flying wedge" of "interactive form". For traditional drama, the wedge illustrates the plot's
Part I, Theory 69
progression from the possible, to the probable, to the necessary. For human-computer
interaction there are a lot of possible whole actions, depending on the choices of the user.
Her main point is that for an involved user interaction means acting within a
representation. Human-computer interaction becomes interactive drama. This must be
understood as part of an argument against seeing the user-interface as something "external" to
the user. In (Laurel, 1986) she explicitly identifies the latter with the tool perspective.
Scenario Ia
As Laurel writes specifically about human-computer interaction, application of her theory to
low-tech experiences like turning off a light has to be speculative. The following description
is done as if both the switch and the light were simulated on a computer:
Mrs. X is acting within a REPRESENTATIONAL CONTEXT given by the switch and the light.
The six levels of the interaction are:
1. The WHOLE ACTION is the whole episode of turning off the light.
2. There is only one "CHARACTER" here, and that is the switch-light system. It can be
seen as something that she interacts with, an entity.
3. The "THOUGHT" of the switch-light system is its functionality, the way it works.
4. LANGUAGE, "the selection and arrangement of signs" [p.50] is the switch and the
light as SIGNS.
5. PATTERN is the shape, color etc. of the switch and the light.
6. ENACTMENT is the sensory basis of the interaction.
The interactive aspect of this whole episode has to do with the fact that Mrs. X could have
decided not to turn off the light when she left the room. That would have created a
different progression from POSSIBILITY to NECESSITY (see Figure 18 on "the flying
wedge").
Scenario Ib
When the light did not go off when she touched the switch, this resulted in a loss of
FIRST-PERSONNESS that "exploded the mimetic illusion".
One blind spot here has to do with the lack of an empirical grounding. As with Andersen's
semiotic analysis, this description is purely formal. The analysis takes for granted that Mrs. X
actually experiences the switch-light system as it is objectively described in Aristotelian
terms.
70 Understanding Interactivity
The analysis contains no reference to Mrs. X's intentions. It does though implicitly
contain a reference to her expectations, because without an expectation there is no way that
the "mimetic illusion" could "explode". It is probably a wrong interpretation of Brenda
Laurel's book to say that she does not see computer users as human beings with an intention,
but throughout this analysis the interaction emerges more as something that happens to the
"user", than as something that she does.
Scenario II
Laurel sees the computer mainly as a medium. Following this notion, the stack is an
expression in this medium.
The stack is Mrs. X's way of ORCHESTRATING Mr. Y's ACTION. It is intended to give him an
EXPERIENCE. It contains a potential for a lot of possible interactions.
Following Laurel's six levels of human-computer activity, the stack can be seen at the
LANGUAGE level as "an arrangement of signs". The case of this particular stack involves
an interaction between levels because some of the SIGNS control the behavior of another
SIGN.
This change in SIGN behavior is experienced at the level below, i.e. the PATTERN level:
"the pleasurable perception of patterns in sensory phenomena". In dramatic terms this
might correspond to an actor referring to his own melody of speech.
The blind spot here has to do with how the stack is part of the particular interaction between
Mrs. X and Mr. Y. To Laurel there are only two roles for a computer user: as designer or as
user. She does not deal with issues like organizational context, speech acts, or interpretation.
Her "computer as medium" is mainly “computer as mass medium", and not the computer as a
communication medium between two or more subjects. The mass-media approach works well
in describing multi-media and computer game production, but it fails to capture important
aspects of what happens when end-users start exploring the interactive potential of the
computer medium in new and creative ways. This blindness to person-to-person
communication should not come as a surprise if we observe that the theater is one-to-many or
few-to-many, but never bi-directionally one-to-one.
More important, the theatre is a cultural institution that provides a collectively accepted
and relatively fixed context for interpretation. In the theatre, the play starts at a certain time,
and what happens on the scene from then on should be interpreted as an illusion that creates
its own context. The intended meaning of the play is the same on Tuesday, Wednesday, and
Thursday. This is different from what happens before and after the play, where all interactions
acquire their meaning from the interaction history of the involved subjects.
Part I, Theory 71
An example: You see the same play performed the same way with the same ensemble
both on Tuesday and Wednesday. Both days you meet with one of the actors immediately
afterwards and congratulate him. Both days he replies: "Thank you very much my friend.
Sorry, but I have to leave you now. See you later", and leaves. On Wednesday morning this
actor had borrowed a lot of money from you, and had promised to pay you back immediately
after the performance. His leaving you that evening without mentioning the money acquired a
very different meaning from the same act the day before. The interaction history between
you and the actor changed the context for interpretation. This again changed the meaning of
the action. The meaning of the play itself though stayed the same.
Laurel says that human-computer interaction is like being on stage with the actors. What
does this mean? Does it mean something similar to structured role-playing games where you
are given a role to interpret in interaction with other players? Should may be "being on the
stage" be understood as a metaphor for life itself like Erwin Goffman does in his
"Presentation of self in everyday life" (1959), and Moreno (1987) in his psychodrama therapy.
Using "being on stage" as metaphor for life poses a problem in this context because we
lose of sight the representational quality of human-computer interaction. Every non-psychotic
computer user knows that the signs on a screen would disappear if someone switched off the
computer. This corresponds to the "willing suspension of disbelief" (p. 113) that
characterizes the "contract" between the actors and their audience. For most of us life itself is
not "just pretend". A Buddhist guru might not be surprised if one day someone suddenly
"powered down the world" and revealed to us the true world behind Maya, but most of us
chose to believe that life is not an illusion8.
Using structured role-playing games (e.g. psychodrama, and “live” role games) as
metaphor for human-computer interaction creates other problems. It creates a blindness
concerning how the personal histories of the users affect the interpretation of their
interactions. It also makes it very hard to incorporate situations where interactive software is
used as part of person-to-person communication. With two play-writers on the same stage, the
interactive messages would have to be seen as a new play within the play, with new characters
and rules. Very soon this becomes very complex and the whole dramatic metaphor becomes
absurd and breaks down.
It is hard to tell how Aristotle would have reacted to the use of his Poetics to describe a
play where the audience is on the stage. He would probably have described such a social
event as a ritual and not as tragedy or comedy. This fits well with the prehistory of the Greek
drama as Dionysian rites, and we end up instead with the notion of "Computers as Carnival".
8 The movie Matrix explores this “psychotic” idea.
72 Understanding Interactivity
Scenario III
The third scenario is closer to the examples Laurel uses in her book in that it involves a
computer game. Using Laurel's framework, we might start with a general description of game
playing:
As a game player, Mr. X is engaged in the REPRESENTATIONAL CONTEXT created by the
running software on the computer. He interacts with the computer as if the computercreated
illusions were real. He does not believe that what he sees on the screen is real, but
he implicitly accepts the illusions through a "WILLING SUSPENSION OF DISBELIEF". As such,
his interaction with the computer is little different from watching a movie.
Next follows a more detailed analysis structured according to the six qualitative elements of
Aristotle:
1. To start at the top, the WHOLE ACTION is Mr. X's game playing. It has a definite start and
a definite end, and it follows a certain path. The computer constantly creates POTENTIALS
FOR ACTION of which Mr. X only takes advantage of a few. The fact that the interaction
follows one particular path out of many possible paths is a consequence of the interactive
nature of this kind of ACTION.
2. The notion of CHARACTER in human-computer interaction creates a lot of interesting
questions. Laurel observes that as computer users we always implicitly attribute AGENCY
to either the computer itself or to some part of it like the operating system. This becomes
evident when something goes wrong and users say things like "My word processor
trashed my file" (p.60).
Turkle (1984) points to the same phenomenon in her study of how children describe
computers. She found that computers were typically described in biological and
psychological terms like in "It is sort of alive" (p. 325), and in "Computers are good at
games and puzzles, but they could never have emotions" (p.331).
In our case, the CHARACTER would probably be the game application. However, without
clues to Mr. X's understanding of the interaction, we have to guess. One such clue could
have been an interruption that would have led him to say something about his interaction.
If for example the game had stopped without reason at a certain point and produced a
dialogue box asking him if he enjoyed the game, Mr. X would probably have said
something like "What a stupid game!". This would have led us to postulate that he was
interacting with a CHARACTER being "the game".
Part I, Theory 73
3. In the same way as the WHOLE ACTION is built up from the interaction with the
CHARACTERS, the user's experience of a CHARACTER is built up from the THOUGHT process
attributed to it. In the realm of human-computer interaction, the THOUGHT is the
reoccurring behavioral patterns of the entity we interact with, i.e. the way it works. As
Laurel notes, this is different from mental models of how and why a piece of software
works the way it does.
The THOUGHT process of the game in Scenario III is the fact that it is possible to move
the squares around, and the fact that it is possible to move squares that emerge from the
background. The THOUGHT process is more or less identical to the rules of the game as
experienced by the user.
4. To Aristotle, THOUGHT is composed of LANGUAGE. LANGUAGE in this case is the squares
as SIGNS, and the ways in which they can be arranged. Laurel's analysis differs from the
semiotics of Andersen in that she never discusses interactivity at the level of the SIGN. To
her, interactivity enters into the picture only at the topmost level of the WHOLE ACTION,
and the SIGNS themselves are consequently not interactive.
5. Laurel's treatment of the next two levels is far from easy to understand. She diverts
from the orthodox interpretation of Aristotle that identifies MELODY (PATTERN) with
the auditory channel and SPECTACLE (ENACTMENT) with the visual channel. To Laurel such
an understanding of Aristotle breaks with the general idea that every level should be the
MATERIAL CAUSE of its preceding level. Her definition of PATTERN includes both visual and
auditory elements of the user interface. PATTERNS are "pleasurable accessories" that are
"pleasurable to perceive in and of themselves" (p.55). The PATTERNS in Scenario III must
consequently be the aesthetic experience created by the forms and colors of the game.
6. Given Laurel's understanding of MELODY, SPECTACLE is now the direct auditory and
visual experience at the lowest level of perception. In our example, this could mean the
most primitive form and color elements of the game. Another possible interpretation is
that SPECTACLE simply is the sum of the pixels on the screen.
How would Mr. X's "paradigm shift" be described by Laurel? In her book, she devotes a
subchapter to the use of "Discovery, Surprise, and Reversal" in drama. This aspect of Scenario
III clearly falls into this larger category, and is very close to what Laurel describes as a
"reversal". A reversal is
"a surprise that reveals that the opposite of what we expected is true.... Reversals can
cause major changes in our understanding of what is going on and our expectations
about what will happen next..." (p. 91)
74 Understanding Interactivity
The "paradigm shift" in Scenario III differs from a reversal in that it does not alter Mr. X's
interpretation of his interactions with the game preceding the shift. His new understanding of
how the game works does not force him to reinterpret his direct-manipulation operations with
the squares. The opposite would have been the case if for example the game at a certain point
had revealed to the user that he was actually playing a version of Space Invaders, and that the
squares should be interpreted as enemy ships. A true reversal of this kind would have differed
from Scenario III in that it would have forced Mr. X to reinterpret the interaction history.
Returning to the discussion of the six qualitative elements, it is necessary to identify
where in the hierarchy the reversal takes place.
1. To begin at the top, the REVERSAL changes the course of action as it enables a "happy
ending" to THE-INTERACTION-AS-A-WHOLE.
2. At the level of CHARACTER it does not change very much as the user's conception of
playing a game is the same before and after the shift.
3. At the level of THOUGHT we get a radical change in Mr. X's understanding of how the
game works.
4. At the level of LANGUAGE we get a change in the SIGN vocabulary as signs emerging
from the background were not part of the language of the game before the REVERSAL.
5. At the level of PATTERN we get a qualitative change because the feeling of dragging a
"background square" is different from the feeling of dragging a "foreground square".
6. At the level of ENACTMENT it is hard to see any changes.
To sum up, we get structural changes at the levels of Pattern, Language, and Thought that
affect course of action at the Action level.
Staying within the dramatic frame of reference, it might be more accurate to take as an
object for analysis the actual interaction history of Scenario III. This is indeed also more true
to Aristotle as he had no concept for changing ontologies. From this perspective, it is more
appropriate to talk about the language of the game as a constant that from the very beginning
of the episode also contained the elements "background squares" and their behavior. This
differs from an understanding of Language as a dynamic entity that at any time in the
interaction reflects the user's growing understanding of the game. As with Andersen's
semiotics, we see that Laurel makes no distinction between "design model" and "user model".
Discussion
What are the blind spots here? Concerning interactivity it is a drawback in Laurel's theory that
she introduces interactivity only at the topmost level. Put in mathematical terms, most
interactive systems have more than one "state variable". This gives rise to a multi-dimensional
space of potential actions. The perceived state variables in the beginning of Scenario III could
be the four individual positions of the squares. A user of this game would most probably not
Part I, Theory 75
think of the potentials for action in terms of one dimension with N to the 4th possible values,
but would implicitly recognize a similar degree of freedom along four axes and thus
drastically reduce the complexity of the task to N times 4 possible values9. Introducing
interactivity also at the other levels (i.e. character, thought, and language), would have made it
much easier to analyze interactive software, but it would also have necessitated the
introduction of concepts to describe the interrelationship between the interactive entities. In
scenario III the latter would have included a description of how the freedom to move one
square is affected by how you have positioned the others.
Another potential problem with Laurel's analysis is that it only partly takes the
perspective of the computer user. This is a bit unfair to Laurel as her work has contributed to
putting the user in focus. The problem has to do with the methodological difficulties that arise
when we want to ground this kind of analysis in empirical data. Most dramatic analysis since
Aristotle has been formal, and staying within this tradition Laurel does not give any advice on
how to elicit from the users the details of the software as experienced reality. Traditionally,
this has been the role of the critics, i.e. the self-appointed representatives of the audience. In
software evaluation, the critic corresponds to the user-interface expert used in different kinds
of formal evaluations. As with a play, any piece of software opens up for a lot of possible
interpretations, and a particular user rarely recognizes more than one or two of them.
The lack of an empirical grounding is most evident when it comes to the interactive
aspects. Laurel's observation that interactivity means a potential for action is enlightening, but
it does not say anything about how this potential for action is perceived by the user at a
certain point in the interaction. As most of us have very limited mental capacities, the
perceived potential for action is very different from the actual potential for action. Expert
chess players are able to see ten draws or more ahead, while most of us see only two or three
draws ahead if we are lucky. To continue the chess metaphor, the formal analysis of chess or
of a particular game of chess is very different from the particular experience of playing it.
2.10 What do the Theories Shed Light on?
From the analysis there emerges some common themes in the theories. In search of
improvements it is important not to forget what we already have. I will here try to sum up
some of the insights concerning human-computer interaction from the theories.
9 This would not necessarily have been the case if the input device had been a data glove with each of the four
squares controlled by a separate finger. It is important to note that "the computer medium" in this context means
a high-resolution pixel-based display with mouse input. Other hardware configurations would most probably
have lead to different ontologies (see Buxton, 1988)
76 Understanding Interactivity
Tool use
• When we use a computer, this involves both "silent" and "conscious" operations. As we
learn to master a tool, using it becomes "second nature" to us and we do not have to
think about the details of how it works. We simply use it.
• Sometimes we encounter problems in our "fluent" dealing with the physical
environment. These problems can be due either to malfunctions in the artifacts or to
incomplete understanding on our side of how they work. These problems (breakdowns)
force us to think about what we do. Our focus then changes from begin on the objects
we are dealing with, to being on the means for dealing with them.
• Breakdown situations are particularly interesting from a research perspective because
the "silent" part of the user's practice then becomes articulated and more easily available
for investigation. Breakdowns are often followed by attempts at "repair", either by
fixing the artifact or by learning more about how it works. After a successful "repair",
the articulated operations again become "silent" and no longer have the attention of the
user.
• Objects on the screen often tacitly signal their use. This is Norman’s “affordance”.
Communication and meaning
• Communication between human subjects is always done in a social context. To be able
to understand the experienced meaning of the communication (i.e. for the participants),
it is important to understand this context.
The computer as medium
• The computer creates an interactive illusion that the user accepts as illusion. As users
we act within this illusion in the same way as when we watch a movie. Only when
something goes wrong and this illusion breaks down are we forced to pay attention to
the computer as the creator of this illusion.
• The user always interacts with "something". What this "something" is differs for
different users and different situations. Examples include the computer itself as an
entity, the operating system, and an application. In Laurel's theory this "something" is
called "Character".
• In some way or another, a user "knows" how the software he is using works. This
practical competence ("Thought" in Laurel's terminology, Mental Models" in Norman)
is different from an articulated theory of how the software works.
• Some of the experienced first-class objects of a graphical user interface are interactive.
Andersen calls these basic carriers of meaning in interactive media "interactive signs".
• The medium itself is interactive. This makes possible illusions that have no resemblance
to experiential domains in the physical world.
Part I, Theory 77
2.11 Blind Spots
Having seen all seven theories in use, there emerges some blind spots that none of the theories
are able to deal with properly. These blind spots are of special interest because they point to
phenomena that are currently not properly understood by the HCI field.
Blind spot 1: Qualitative analysis of non-verbal interaction data.
None of the theoretical frameworks were able to provide a methodology for analyzing nonverbal
empirical data from the interaction. The exception is of course Card et al's framework,
but their quantitative model is of no help in studying Mrs. X's interactive experiences.
The lack of an empirical grounding makes it impossible to say for certain what part of
Mrs. X's interactions were automated (operationalised, transparent). It also makes it
impossible to say anything definite about her understanding (mental model, thought) of the
switch and of the stack. We were left with a lot of possible choices of experienced first-class
objects (entities, signs, interactive signs) in Scenario II.
Blind spot 2: Interactive messages as part of person-to-person communication
None of the theoretical frameworks were able to deal properly with the fact that the stack of
scenario II was intended as an interactive message. When the behavioral dimension of the
computer medium is used as a carrier of meaning, a simple information-flow analysis breaks
down.
Blind spot 3: Interactive signs embedded in an interactive medium
With Scenario III we need to account for gestalt phenomena along the "feel" dimension that
are very similar to their visual and linguistic counterparts. When the new square created by
the black squares can be interpreted both as background and as foreground, this is the same
phenomenon as we observe in the visual domain when a vase becomes two faces. The first is
not a purely visual phenomenon, because it is only through interaction that the square shows
its double nature. A theory of interactivity should be able to explain these phenomena and
describe in which way the "feel" domain differs from the visual domain and the linguistic
domain.
The closest we get to an understanding of the "magic" of the squares of Scenario III
comes from combining Andersen's concept of interactive signs with Laurel's notion of
interactive form. With this hybrid theory, the squares could be interpreted as interactive signs
embedded in an interactive medium.
Blind spot 4: "Paradigm shifts" in the interactive domain
The frameworks also left some blind spots related to Mr. X's reinterpretation of the game in
Scenario III. None of the theories were able to deal with the fact that this reinterpretation was
78 Understanding Interactivity
a result of a "breakdown by design". Neither were the theories able to account for how Mr. X
was able to go from a "physical" understanding of the game to an understanding that was not
based on anything he had ever experienced before.
2.12 A Common Theme in the Blind Spots
Is it possible to trace the reasons for these blind spots in the philosophical roots of the
theoretical frameworks, and could such an analysis give guidance on where to find new
answers? Before we embark on such a journey, it is necessary to identify the common theme
in the blind spots.
All seven frameworks treat symbolic and non-symbolic interaction with different sets of
analytical tools. The closest we get to an integrated view is AT, but this framework also uses
two different sets of concepts to treat communicative and instrumental actions. There emerges
in the frameworks a tendency to study "meaning" only concerning the verbal/symbolic
aspects of interaction. The non-verbal/bodily aspects of interaction are treated as more
primitive, i.e. animal like, processes.
Concerning human-artifact interaction it shows up in Scenario I as blind spot 1: The lack
of a methodology for making qualitative analysis of non-verbal interaction data, i.e. to get to
the perceived meaning of an interactive experience.
Concerning computer-mediated communication it shows up in Scenario II as blind spot 2:
Interactive messages.
The difference between symbolic and non-symbolic interaction shows up in all frameworks:
Model Human Processor and Cognitive Science
Cognitive Science describes verbal and non-verbal communication with two different
terminologies. Symbolic "content", i.e. information, resides one logical level above the
perception and motor processes that are seen as physical. This corresponds to the form/matter
distinction in Aristotelian physics. This is often combined with applications of what Reddy
(1993) calls the "conduit metaphor", which leads us to treat form as if it was matter. The result
is terms like "information flow", that are at best misguiding.
Winograd and Flores
Winograd & Flores use two totally different schools of though to deal with symbolic and nonsymbolic
interaction/communication. They use speech act theory to deal with verbal
communication, and Heidegger's analysis of tool use to deal with non-verbal communication.
This split between the mental and the physical, which still dominates most of western
epistemology, is what Heidegger tried to overcome in his philosophy, but paradoxically it
shows up in the structure of their book. An inclusion of Heidegger’s analysis of mitsein
(being-with-others) might have helped overcome this split. On the other hand, more Heidegger
Part I, Theory 79
would probably have made the book inaccessible to its audience, and the book would not
have been the success it has been.
Ethnomethodology
As W&F, Suchman also uses two different theories to deal with symbolic and non-symbolic
interaction. Non-symbolic interaction is treated with Heidegger's tool concept, and symbolic
interaction/communication is treated with concepts from ethnomethodology.
Activity Theory
Activity Theory also uses two different analytical tools to deal with symbolic and nonsymbolic
interaction. The symbolic is treated as the communicative aspect and the nonsymbolic
as the instrumental aspect of the interaction/communication. In Activity Theory, the
two perspectives are brought together under one umbrella, but the clear distinction is still
there between meaningful communication and action without meaning. This has the
consequence that Activity Theory as a whole has little room for meaningful interactions (e.g.
the stack of Scenario II) or communication without meaning (e.g. non-figurative art).
Semiotics
Andersen does not deal with non-symbolic interaction at all in his theories. He only points to
the fact that his semiotics of the computer has a blind spot when it comes of the "silent
operations" of human-computer interaction. Only the part of his book dealing with the
analysis of "language games" is relevant here. The part dealing with the formal semiotic
analysis of graphical user interfaces makes no explicit psychological claims, but the concept
of interactive signs can be interpreted as a first attempt at providing a remedy for the blind
spots created by the dichotomy in the theoretical foundation.
Dramatic theory
Laurel's Aristotelian analysis is very similar to Andersen's formal semiotics and deals solely
with symbolic interaction. Her way of "fixing" Aristotle to deal with interaction is by
introducing a branching time concept where the user makes choices concerning where to go.
Discussion
The problem with doing comparisons of theoretical frameworks based on “second hand”
sources, is that it is easy to do injustice to the original works. It is therefore important to note
that the resulting “blind spots” came from an analysis of the books listed, and do not
necessarily tell the full story about the research traditions behind these books. As will be
evident from the next chapter, this is particularily true for phenomenology.
Bearing this in mind, the common themes in the blind spots relate to the relation between
body and mind.
80 Understanding Interactivity
To sum up:
• In all the frameworks the body is only involved in tacit interaction.
• For the body there is no meaning.
• Meaning exists only for mind, and is always symbolic
This all points to the body/mind split of western philosophy, which can be traced back to
Descartes. Chapter 3 is devoted to this topic.
2.13 Related Discourses and Comparisons
Since the publication of the paradigm works, there have been two debates between proponents
of the different positions that are of special relevance. In additions there have been some
comparisons of the different approaches that deserve attention.
The “Coordinator” debate: Suchman vs. Winograd
In two consequetive numbers of the journal Computer-Supported Cooperative Work in 1991,
Lucy Suchman and Terry Winograd discussed the practical implications of the ideas
presented in (Winograd and Flores, 1986). Suchman argued that the language/action
perspective of Winograd&Flores is blind to the political and ethical aspects of systems design
(Suchman, 1991). The discussion centered on this question, and on the question of the
neutrality of computer systems. The latter relate to her analysis of the actual use of “The
Coordinator”, a system described by Winograd&Flores as an example of the language/action
approach.
The debate is interesting and important because it relates two of the seven works, and
involves two of the authors. For the current purpose of studying interactivity in a more narrow
sense (as made concrete through the scenarios), the debate is unfortunately of little relevance.
A debate on Situated Action: Cognitive Science vs. “the rest”
The journal Cognitive Science in 1993 had a special issue on Situated Action. The
introduction was by Norman: “Cognition in the Head and in the World” (Norman, 1993). As
he put it, the issue contains “
a debate among proponents of two distinct approaches to the study of human
cognition. One approach, the tradition upon which cognitive science was founded, is
that of symbolic processing… The other more recent approach, emphasizing the role
of the environment, the context, the social and cultural setting, and situations in which
actors find themselves, is variously called situated action or situated cognition.” (p.1)
Part I, Theory 81
The issue starts out with a long paper by Vera and Simon (1993). They here defend the
traditional cognitive science view against a number of approaches they chose to lump together
as “Situated Action” (SA). The paper is then answered in four papers by Greenmo and Moore,
Agre, Suchman (1993), and Clancey. Vera and Simon ends the issue by responding to these.
Vera and Simon first (1993a) define SA to include the phenomenology in (Winograd &
Flores, 1986), the ethnomethodology of (Suchman, 1987) and the robot work of (Brooks,
1991). For all these, they argue that their views can be incorporated into a symbolic cognitive
science.
All replies, except Clancey’s, see Vera and Simon’s SA category as problematic. They do
not see how their view fits together with the other views lumped under this label, and they
feel to a large extent that their view on the subject is over-simplified.
The main topic for all articles in the issue is Artificial Intelligence. The debate is
therefore not directly relevant for the current discussion of human-computer interaction. None
of the papers add to the analysis done of the three scenarios. The issue is though interesting in
that it shows that there are no “two distinct approaches to the study of human cognition” as
Norman proposed in the introduction (i.e., if we take the reaction of the proponents of the
“SA” approaches more seriously than the analysis from outside done by Simon and Vera).
This situation has not changed since 1993. Phenomenology, Ethnomethodoloy, and
Distributed Cognition, which are the approaches treated here, are theoretical frameworks that
can not easily be fused together in a unified “Situated Action” as an alternative to symbolic
cognition. This conclusion justifies the separate treatment of these approaches in the analysis
of the scenarios.
Ehn on the nature of Computer Artifacts
In (Ehn, 1988), Pelle Ehn devoted the last of four parts to the nature and role of the computer
in design. The main question was to what extent the computer should be made into a tool, and
what consequences that would have.
It must first be noted that Ehn’s focus was on the use of the computer in traditional crafts
like typography. He was at that time not concerned with situations where the end-products of
work were computer-based, interactive or virtual of nature. Multi-media, HyperMedia, and
VR were at that time in their infancy.
In his discussion of the nature of the computer, Ehn makes relevant the Cognitive Science
and Distributed Cognition approach in (Norman & Draper, 1986), the phenomenology of
(Winograd & Flores, 1986), early work on semiotics by Andersen (1996), Laurel’s “Interfaces
as Memesis” paper (1986), and Bødker’s Ph.D. thesis that was later published as (Bødker,
1990). Of the seven approaches analyzed, only Card, Moran & Newell’s “Model Human
Processor” and Suchman’s work are not present.
He analyzes four different perspectives on the computer: as tool, as media, as dialogue
partner, and as interactive play. He sees the tool and media perspective as complementary (p.
411), but sees the dialogue partner perspective as less fruitful as a design principle. The
Interactive Play approach is treated together with the media perspective. He sees the
interactive-play pespective as an interesting ideal for software design (p. 415).
82 Understanding Interactivity
His use of the tool concept is slightly different from how I have used it in the present
analysis. This becomes evident in his analysis of spreadsheet applications: “A spreadsheet
application is an example of a computer-based tool that transcend properties of traditional
tools and materials.” (p. 430). He also sees HyperCard as a tool. By treating both spreadsheets
and HyperCard basically as tools, he loses from my perspective of sight the strong media
nature of these applications. When I build a HyperCard stack, it clearly has tool (and material)
properties. When I navigate in a stack built by someone else, it is no longer meaningful to see
it as a tool. I am then interacting with a medium, and that medium has “interactive play”
properties. Scenario II deals with this double nature of HyperCard.
Ehn only slightly touches on this interactive nature of the computer. He notes concerning
media that computers
“are a different kind of medium or material, that, due to the capacity for signal
manipulation, can be designed to change operations according to our interaction with
them”. (p. 410).
About the “meta-medium” nature of the computer he says:
“Since computers are media or materials we can deliberately design computer
artifacts as media (just as we can design them as craft tools” (p. 411)
He continues by listing advanced books, hypertext, mail handlers, and shared materials for
planning and coordination as such new media.
Ehn’s analysis is interesting in that he compares so many perspectives, and in that he sees
how the approaches complement each other. He shares with all the approaches listed a
mind/body “dichotomy” in seeing the computer partly as media and partly as tool. He
recognizes that the tool perspective can create “a blindness towards the media aspects of use
of computer artifacts” (p.411). His argument for making the computer more tool-like is to Ehn
part of a critique of what he sees as a Cartesian approach to systems design (p.51) (i.e. doing
design with boxes and arrows). He thus comes close to Winograd & Flores in attempting to
overcome the mind/body split by going to the other extreme. In relation to the current analysis of
the seven frameworks and three scenarios, Ehn’s analysis thus adds no new theoretical
perspectives, but confirms the results concerning the tool/media split in the prevailing
theories.
Part I, Theory 83
Kammersgaard: Four percpetives on HCI
Kammersgaard (1990) identified four perspectives on human-computer interaction:
1. The systems perspective.
This perspective treats the human being as one component among many in a system.
We see this perspective in the Model Human Processor, and partly in Cognitive Science.
2. The dialogue partner perspective.
This perspective treats the computer as if it was a human being. This is the perspective
underlying attempts at natural language interfaces, and much AI.
We see this perspective partly in the Cognitive Science perspective in concepts like
“dialogue”, “commands”, and “vocabulary”.
3. The tool perspective
This is the idea of seeing the computer as a tool used to perform some task.
We find this perspective made explicit in W&F’s use of Heidegger, in the
Instrumental side of Activity Theory, and in Suchman’s use of Heidegger.
4. The media perspective
This is the idea of seeing the computer as a medium, like film or radio.
We find this perspective in Laurel’s theatre metaphor, and in Andersen’s semiotics.
Kammersgaard’s analysis gives a good overview, but by making gross simplifications, it loses
of sight the uniqueness of each approach. He does for example neither treat the problem of
context, nor the difference between formal and empirically-based theories. His analysis is
included here because it was one of the first early attempts at doing a meta-analysis of the
HCI field.
Bannon and Bødker: Activity Theory vs. Cognitive Science
In “Encountering Artifacts in use” (Bannon and Bødker, 1991), the authors argue for an
extension of the HCI field to pay more attention to issues of context. They argue against the
limited scope of traditional cognitive science, and conclude that one promising alternative
candidate is Activity Theory. They further argue that a focus on the social, physical, and work
context has the consequence that the design should go on closer to the where the users are.
The paper adds no new theoretical perspectives to what can be found in (Bødker, 1990),
but it is interesting in that it clearly contrasts Cognitive Science and Activity Theory. For the
current purpose of studying interactivity, its main relevance is in a making clear the
fundamental differences between the two approaches.
84 Understanding Interactivity
Nardi: Activity Theory vs. Situated Action and Distributed Cognition
The paper “Studying Context: A Comparison of Activity Theory, Situated Action Models,
and Distributed Cognition” (Nardi, 1996b) appears in a collection of papers on the current
status of Activity Theory in HCI (Nardi, 1996a). In the paper, Nardi evaluate the three
approaches with respect to their descriptive power for real-life HCI phenomena.
She first compares the Distributed Cognition (DC) approach with Activity Theory (AT),
and finds DC’s concept of “knowledge in the world” to be problematic. “But an artifact can
not know anything”, she argues, “it serves as a medium of knowledge for a human” (p. 87).
She finds this view incommensurable with AT’s clear divide between human subjects and
physical objects. She recognizes that the DC approach has given rise to many interesting
studies, but agues that their focus on the role of artifacts in cognition is already present in
AT’s concept of persistent structures. The collaborative aspect of cognition in the DC
approach is also present in AT.
Having thus described how AT can account for all phenomena dealt with by the DC
approach, she goes on to compare AT to Situated Action (SA). Nardi makes a more precise
use of SA, and refers to the field studies of Suchman and Lave. She also refers to the
discussed debate in Cognitive Science. In the introduction to the book (Nardi, 1993a), Nardi
refers to (Bannon and Bødker, 1991) for a discussion of AT vs. Cognitive Science.
In comparing AT and SA, she again describes AT as a richer framework that
incorporates the perspectives of SA. She further criticizes what she sees at the limited scope
of the SA approach. By only focusing on “situations”, she argues, the SA approach loses of
sight the larger contexts in which users work. By not trusting what people say, and paying
attention mainly to what they do, the SA approach has become too narrow in scope. She
applies the same critique to Winograd & Flores’ use of Heidegger.
Using the terminology of AT, she argues that the SA approach deals only with the two
lowest levels (i.e. operations and actions) in AT’s three-level model. SA consequently is blind
to the level of activity. She sees another major difference between AT and SA in their
treatment of consciousness. SA has had an “aversion” against including the consciousness of
the user in their understanding of a situation. She argues that this is to take the critique of
Cognitive Science’s symbolic approach to cognition too far.
Nardi’s analysis is very interesting in that it clarifies the differences between AT, SA,
and DC. The analysis of the three scenarios supports the view that AT is more powerful than
both SA and DC in describing real-life human-computer interaction. From my reading of
(Suchman, 1987) I am not convinced that Nardi is right concerning SA’s total lack of
attention to AT’s activity level. Suchman describes the larger context of her subjects, but
treats this as background information. Even though AT might theoretically be more complete
than all other approaches together, I feel less informed from reading some of the papers in
(Nardi, 1993a) than from reading studies such as (Suchman, 1987) and (Hutchins, 1994). I see
a danger in that AT’s complex “apparatus” can get in the way for both seeing and
communicating real-world phenomena of human-computer interaction. As researchers, we
need to develop an empathic understanding of the subjects we study. Too much formal theory
often gets in the way for such empathy, and make us construct abstractions too early in the
Part I, Theory 85
process. The real life of real people is rich and complex, and can only partly be captured
through ready-made categories.
Concerning the nature of the interactive computer, none of the papers in (Nardi, 1993a)
had anything to say beyond Bødker’s analysis (Bødker, 1991).
86 Understanding Interactivity
Chapter 3 Non-Cartesian Alternatives
“Cogito ergo sum” (I think, therefore I am),
René Descartes, 1641.
The analysis of the blind spots from the previous chapter showed that none of the theories
were able to describe symbolic and non-symbolic interaction within the same framework.
Symbolic interaction and thinking is in western culture to a large extent seen as a mental
activity, while non-symbolic interaction is described with terms like “body language”. From a
philosophical perspective, this points to an ongoing discourse in philosophy about the relation
between body and mind. This discourse has relevance for a study of the phenomenon of
interactivity.
The relation between “body” and “mind” was discussed already by Plato and Aristotle,
but the contemporary discourse on the subject relates back to the work of the French
philosopher René Descartes (1596-1650). His writings have up until the present had an
enormous influence on the course of philosophy and science.
René Descartes
One way of introducing Descartes is to start with his cultural and political context. We are in
Continental Europe in the first half of the 17th century. Politically, the French revolution is
more than a century into the future, and the world is ruled by Kings and Popes. The medieval
worldview is dominant, and science is to a large extent a matter of interpreting sacred texts
and the Greek masters. In 1633, Galilee is sentenced to life by the inquisition for having
publicly supported Copernicus' heliocentric cosmology. He is later given amnesty by the
Pope, but it is clear that the Catholic Church can not accept a competing cosmology that
insists on an empirical foundation. Such a belief system would undermine the authority of the
Holy Bible, and threaten the raison d'être of the Catholic Church.
Descartes' philosophical project was to build a worldview based solely on reason. In
Meditations (1641) he argues that if everything can be doubted, the only thing I can know for
certain is that I am the subjects doing the reasoning: cogito ergo sum (I think, therefore I am).
Trusting nothing but reason, he splits the world in two: a "pure" mental domain (res cogitans)
and a physical domain (res extensa) external to the mental domain. The two domains are ruled
Part I, Theory 87
by different laws and are only connected through the pineal gland in the brain. Of the four
Aristotelian causes, res extensa is stripped of all but the efficient cause. The subject becomes
a self-conscious mental entity in res cogitans, who has a body in res extensa. Will and
consciousness belong solely in the mental domain, while the physical world (including the
human body) is like a machine ruled by natural laws. In Descartes' view, the subject is
essentially an immaterial mind having a body. The body becomes an object for mind.
Descartes ended up with an extreme rationalism. Trusting only mathematical reason, he
generalized his own ideal and created a world where man became a detached observer to
himself.
Escaping the Cartesian dualism
Descartes’ body/mind problem has been commented on by almost every philosopher since. I
will here only deal with its treatment in 20th century philosophy.
Some of the most influential 20th century attempts at breaking out of the Cartesian
paradigm were done in Germany by Heidegger (Sein und Zeit, 1927, Trans. 1997), in France
by Merleau-Ponty’s (Phenomenology of Perception 1945, Trans. 1962), and in Britain by
Wittgenstein (Philosophical Investigations, post-hume 1953). In the US, it is possible to
interpret the pragmatics of Peirce (1839-1914) as a break with the Cartesian tradition.
It is not until the late 1980s that we find in established American science and philosophy
an interest in the role of "body" in "mind". Two different approaches are here worth
mentioning. Lakoff and Johnson (Metaphors we live by, 1980, Women, Fire, and Dangerous
things, 1987, The body in the mind, 1987) start out with linguistic and cognitive research
problems related to the use of metaphor, and end up with an embodied model of cognition.
Varela et al. start out with systems theory, and inspired by Tibetan Buddhism and Merleau-
Ponty's philosophy they end up with a focus on the Cartesian body/mind split (Varela et al.
The embodied mind, 1991, and Heyward et al. Gentle Bridges (together with Dalai Lama),
1992).
Of the above mentioned works, I will here only deal with Heidegger, Merleau-Ponty’s
Phenomenology of Perception (1962), and Johnson’s The body in the mind (1987). My
motivations for not including the rest are as follows: Varela et al. (1991) end up with a theory
very close to that of Merleau-Ponty. The late Wittgenstein is treated by Ehn (Ehn, 1988).
Wittgenstein is also partly implicit in Andersen’s “language game” approach.
3.1 The Human Body in Heidegger
With the seven works analyzed in the previous chapter in mind, the most obvious place to
start a search for a non-Cartesian understanding of the body is by Heidegger. Dreyfus (1991)
notes that Heidegger never included the bodily aspects of Dasein in his analysis. Heidegger
did not see an inclusion of the human body as necessary to be able to escape the subjectobject
dichotomy of Western thought.
This might come as a surprise since some of his most famous examples, like hammering,
largely involve corporeal issues. Quoting from Dreyfus (p. 41), Heidegger acknowledged that
88 Understanding Interactivity
“This ‘bodily nature’ hides a problematic of its own. (143)[108]”. (The numbers refer to page
number in Being and Time.) Again referring to Dreyfus (1991, p. 137), there is no way to
infer from reading Heidegger that Dasein has a left and a right, a front and a back etc. These
properties of Dasein can only be explained with direct reference to the particular structure of
the human body, e.g. that left-right is symmetrical, while front-back is not. Heidegger never
makes this analysis. Dreyfus concludes that this is not satisfactory.
I will argue that the situation is even worse. An understanding of the world also requires
direct reference to some of its material properties, e.g. gravity. An analysis of the up-down
relation will make this evident. Up-down is used both to refer to objects in the world, and to
refer to the human body. An example of the first is “I went up the hill”, and the second “The
spider crawled up his leg”. If you stand on your head, the world appears “upside-down”. For
the body, the topology is that feet are down, while head is up. In the world, down is the way
things fall. Up is heaven, down is earth. It follows from this that our understanding of up and
down requires gravity. In weightless space, there can be only one true up-down relation, i.e.
that of the body. Every other up-down relation is an indirect reference to life with gravity.
Heidegger’s analysis of Dasein in Being and Time is useful as a general description of
any living subject’s existence in any world. To get a deeper insight into the specifics of
human existence in our everyday environment in a more concrete sense, we have to look
elsewhere.
3.2 Maurice Merleau-Ponty's "Phenomenology of Perception"
All modern phenomenology refer back to the works of the German philosopher Edmund
Husserl (1859-1938). Husserl’s phenomenological method consisted of doing a process of
what he called trancendental reductions. Reduction was not meant in the sense of “making
less”, but in the sense of “getting to the thing itself”. The aim of the phenomenological
method was to make explicit the “background” of a phenomenon, i.e. the conditions that make
a phenomenon show itself the way it does. After such a reduction, a description of the
phenomenon must include a description of the hierarchy of implicit “background” skills that
make the phenomenon possible.
Maurice Merleau-Ponty (1908-1961) was, besides Jean Paul Sartre, the most influential
French philosopher in the 1940s and 50s. As Heidegger, he used Husserl's phenomenological
method as a way to overcome the Cartesian dualism.
As Heidegger, Merleau-Ponty also stressed that every analysis of the human condition
must start with the fact that the subject is in the world. This being-in-the-world is prior to both
object perception and self-reflection. To Merleau-Ponty, we are not Cartesian self-knowing
entities detached from external reality, but subjects already existing in the world and
becoming aware of ourselves through interaction with our physical environment and with
other subjects.
In his major work "The phenomenology of perception" from 1945 (trans. 1962), Merleau-
Ponty performs a phenomenological analysis of human perception. His purpose is to study the
Part I, Theory 89
"pre-cognitive" basis of human existence. He ends up rejecting most of the prevailing theories
of perception at his time.
In all his writing there is a focus on the first-person experience. Merleau-Ponty dealt with
many topics in the book. Only the aspects of his work that are of direct relevance to humancomputer
interaction will be treated here. These aspects can be summarized:
• Perception requires action.
Without action there can be no experience of anything "external" to the subject.
Every perception is consequently an "interactive experience", or as Merleau-Ponty
puts it: a “communion” with the world. (p. 320)
• Perception is governed by a “pre-objective” intentionality.
Most of these interactions are going on in "the pre-objective realm" and are governed
by an inborn intentionality towards the world.
• Perception is embodied.
We perceive the world with and through our active bodies "The body is our general
medium for having a world " (p. 146).
• Perception is an acquired skill
Perception is to a large extent an acquired bodily skill that is shaped by all our
interactions with the world.
• The perceptual field
Our immediate interpretation of what we perceive is given by our previous
experiences. Our experiences have shaped our way of being in the world. This creates
what Merleau-Ponty denotes the perceptual field.
• Tool use
When we learn to use a tool, it becomes integrated into our body both as potential for
action and as medium for perception.
• Bodily space
When we act in the world, our body has a dual nature. On the one hand, we can see it
as an object among other objects in the “external” world. On the other hand, it exists
to us as our experiencing/living body (le corpse propre). As a living body, we move
within a space given by the structure and limitations of our own body; our bodily
space.
• Abstract vs. concrete movement
A movement changes nature from “concrete” to “abstract” when it is done
consciously.
90 Understanding Interactivity
Perception requires action
Merleau-Ponty rejects the idea of perception as a passive reception of stimuli. The first
chapter of Phenomenology of Perception begins:
“At the outset of the study of perception, we find in language the notion of sensation,
which seems immediate and obvious.... It will, however be seen that nothing could in
fact be more confused, and that because they accepted it readily, traditional analyses
missed the phenomenon of perception.” (p. 3)
When we perceive objects with our eyes this is not a passive process of stimuli reception, but
an active movement of the eyeballs in search of familiar patterns. He quotes Weizsäcker:
"sense-experience is a vital process, no less than procreation, breathing or growth" (p.
10).
Merleau-Ponty continues:
"The thing is inseparable from a person perceiving it.... To this extent, every
perception is a communication or a communion" (p. 320).
This view is in total opposition to the popular view in “information processing” psychology
of seeing perception as sense data being passively received by the brain. To Merleau-Ponty
there is no perception without action; perception requires action. In their explanation of this
aspect of Merleau-Ponty's philosophy, Varela et al. (1991) refer to a couple of relevant
psychological experiments.
In a study by Held and Hein (1963), two groups of kittens were raised in the dark and
only exposed to light under controlled conditions. One group was kept passive while the other
was free to move around. The experiment was arranged such that the passive group
experienced the same sense data as the active group. When both groups were released after a
couple of weeks, the passive group bumped into things as if they were blind. As Varela et al.
(1991) put it:
“This beautiful study supports the enactive view that objects are not seen by the
visual extraction of features, but rather by the visual guidance of action.” (p. 175).
Varela et al. next refer to research done by Bach y Rita (1972) on a visual prosthesis for blind
users. A video camera was connected to a matrix of vibrating stimulation points on the skin of
the user in such a way that the image reaching the lens was mapped directly onto an area of
the skin. It was found that it was only when the users were allowed to move the camera that
they were able to learn to "see" with the device. Again, this supports the view that perception
Part I, Theory 91
requires action. It is only through interaction that objects appear to us as immediately existing
in the external world.10
To use the terminology of cognitive science, the “motor system” takes active part in the
creation of our "mental maps" of the world. Against this use of cognitivist terminology,
Merleau-Ponty would probably have argued that it is misleading to talk about a motor system
because the body is very integrated and contains no truly autonomous systems.
Intentionality and the embodied nature of perception
Merleau-Ponty saw perception as an active process of meaning construction involving large
portions of the body. The body is governed by our intentionality towards the world. This
“directedness” is a priori given. When I hold an unknown object in my hand and turn it over
to view it from different angles, my intentionality is directed towards that object. My hands
are automatically coordinated with the rest of my body, and take part in the perception in a
natural way.
"Sight and movement are specific ways of entering into relationship with objects..
(p.137).
Any theory that locates visual perception to the eyes alone does injustice to the phenomenon.
Perception hides for us this complex and rapid process going on “closer to the world” in "the
pre-objective realm". To Merleau-Ponty, the body is an undivided unity, and it is meaningless
to talk about the perceptual process of seeing without reference to all the senses, to the total
physical environment in which the body is situated, and to the "embodied" intentionality we
always have towards the world.
Perception is an acquired skill: The phenomenal field
Merleau-Ponty introduced the concept of the phenomenal field to signify the ever changing
"horizon" of the subject. The phenomenal field is shaped by all the person's experiences of
being in the world. It is what the subject brings into the situation. It is not the sum of
experiences, but closer to what Heidegger calls the background. For the subject, it determines
the structure of the world.
It is at this stage that the real problem of memory in perception arises, in
association with the general problem of perceptual consciousness. We want to know
how, by its own vitality, and without carrying complementary material into the
mythical unconscious, consciousness can, in course of time, modify the structure of
its surroundings; how, at every moment, its former experience is present to it in the
form of a horizon which it can reopen - "if it chooses to take that horizon as a theme
of knowledge"- in an act of recollection , but which it can equally leave on the fringe
10 These experiments might help explain the extremely complex recognition problems created by the current
“computer vision” approach that does not let the computer interact with the world.
92 Understanding Interactivity
of experience, and which then immediately provides the perceived with a present
atmosphere and significance. [p.22].
The phenomenal field is a solution to the problem of the relation between memory and
perception. Instead of saying that perception involves memory, Merleau-Ponty argues that
perception is immediate and happens within the horizon of the phenomenal field.
We can choose to reflect on this horizon, i.e. our perceptual habits, but most of the time it
is part of the unarticulated “background” of perception.
Tool use
The dynamic nature of the phenomenal field is seen clearly when we acquire a new skill like
learning to use a new tool. The body has an ability to adapt and extend itself through external
devices. Merleau-Ponty uses the example of a blind man's stick to illustrate this. When I have
learned the skill of perceiving the world through the stick, the stick has seized to exist for me
as a stick, and has become part of “me”. It has become part of my body, and at the same time
changed it.
Merleau-Ponty uses an organ player to illustrate how we internalize external devices
through learning. He uses an example where an organ player had to play on a new organ, and
in the course of one hour before the concert he acquainted himself with all its levels, pedals,
and manuals. This was not an intellectual activity where he followed a logical course of
action, but an interaction with the instrument with the aim of “understanding” it. The organ
became part of his experienced body, and changed his bodily space. Through skill acquisition
and tool use, we thus change our bodily space, and consequently our way of being in the
world. Metaphorically, we could say that by learning new skills and using technology we
change the world we live in. Merleau-Ponty puts it:
"The body is our general medium for having a world" (p.146).
Bodily space and le corpse propre
Merleau-Ponty next makes the distinction between the spatiality of the body and the spatiality
of objects. As a human being, I am aware of my body both as an object among other objects
and more directly as my experienced/lived body (le corpse propre). The latter he sees as a
"spatiality not of position, but of situation" (p. 100). When I touch the tip of my nose with my
index finger, I act directly within the spatiality given by my body. This is different from what
happens when I operate a switch on the wall. The switch on the wall exists as an object among
other objects, external to the experienced body. When I operate the switch, I move my hand in
relation to the switch, not in relation to my body. Merleau-Ponty's point here is that when we
act in the world, we treat the body both directly as the experienced body and indirectly as
objective reality.
Part I, Theory 93
The structure of my bodily space is very different from the structure of the external space
of objects. While external space is organized along the axes up-down, left-right, and frontback,
my bodily space is constituted by my potential for action in the world.
When the blind man picks up his stick, an object in the external world becomes part of
the experienced body. When I move my hand towards a cup of coffee in front of me, I relate
to this object as a thing positioned outside of my body. When I grasp the object and move it to
my mouth, the cup is no longer primarily an object external to me whose position I judge in
relation to other objects, but it has entered into the space of my experienced body and thereby
changed its structure. When again I place the cup of coffee on the table, it leaves my
experienced body and becomes part of the space of objects.
The bodily space is different from the external space in that it exists only as long as there
are degrees of freedom and a skillful use of this freedom. The bodily space is mainly given by
the subject’s specific potentials for action. For a totally paralyzed body with no kinesthetic
experiences, there is no bodily space. Different bodies give rise to different spaces, and so
does external factors such as clothing, tool use, and different kinds of prosthesis. It is
important to notice that learning a new skill also changes the bodily space.
Abstract movement vs. Concrete movement
Merleau-Ponty makes a distinction between movements made on purpose "as movements"
and movements done naturally as part of a situation. The first he calls abstract movements, the
latter concrete. When I ask someone to move his left foot in front of his right foot, the
movement becomes abstract because it is taken out of the normal context of bodily
movements. As part of everyday walking, the same movement is a concrete movement.
3.3 An Example: The Mr. Peters Button
This integrated view of action and perception makes Merleau-Ponty an interesting starting
point for a discussion of meaningful interactive experiences. A consequence of his theory is
that it should be possible to lead users into interactions with the computer that are meaningful
at a very basic level. The interactions themselves are meaningful. To test out this idea I have
created some examples in HyperCard. Figure 19 shows one of these examples.
94 Understanding Interactivity
Figure 19. The Mr. Peters button
The two buttons have scripts that make the text “Mr. Peters” jump over to the other button
when the cursor is moved into them. The user tries to click on the button but experiences that
“Mr. Peters” always “escapes”. In an informal study I have tested this example with a limited
number of users. They all understood the intended meaning of the example and described Mr.
Peters as a person who always avoids you, a person you should not trust. The interaction itself
works as a metaphor for Mr. Peters' personality.
How does the philosophy of Merleau-Ponty shed light on this example?
• Perception requires action
Perception of the button requires action. The button as interactive experience is the
integrated sum of its visual appearance and its behavior. Without action, we are left
with the drawing of the button, not the actual button as it emerges to us through
interaction.
• Perception is an acquired skill
One of the necessary conditions in the Mr. Peters example is that the user has acquired
the skill of moving the mouse cursor around. This skill (Merleau-Ponty: habit) is part of
being a computer user in the 1990s. Without this skill, the only perception of the
Mr.Peters button would be its visual appearance.
• Tool integration and bodily space
For the trained computer user, the mouse has similarities with the blind man’s stick. The
physical mouse and the corresponding software in the computer is integrated into the
experienced body of the user. The computer technology, and the skills to make use of it,
changes the actual bodily space of the user by adding to his potentials for action in the
physical world also the potentials for action presented by the computer. His world of
objects is in a similar manner extended to include also the “objects” in the computer.
Part I, Theory 95
This view of interactive media is different from the popular view that computer
technology can create a “Virtual Reality” that the user “enters into” while leaving
“reality” behind. Unless you are truly in a psychosis, you never leave reality behind.
The use of computers implies a willful expansion of our potentials for action. As human
beings we can never escape the fact that our living bodies form an important part of the
background that makes experience possible.
• The phenomenal field
In the above example the context of the button is given by the leading text on the “card”
and by the user’s past experiences with GUIs. It is important to notice that this example
only works with users who are used to clicking on buttons to find more information.
This is the horizon of the user, i.e. the phenomenal field that all interaction happens
within.
The Mr. Peters button emerges as a meaningful entity because the appearance of a
button on a Macintosh leads to a certain expectation and a corresponding action. The
action is interrupted in a way that creates an interactive experience that is similar to that
of interacting with a person who always escapes you.
• Perception is embodied
To experience the “Mr. Peters” button requires not only the eye, but also arm and hand.
Mouse movements and eye movements are an integrated part of the perceptual process
going on in the “pre-objective” realm. The interactive experience is both created by and
mediated through the body.
• Intentionality towards-the-world.
As a computer user I have a certain “directedness” towards the computer. Because of
this “sub-symbolic” intentionality, a button on the screen presents itself to me not
primarily as a form to be seen, but also as a potential for action. From seeing a button to
moving the cursor towards it, there is no need for a “mental representation” of its
position and meaning. The act of trying to click on the button is part of the perceptual
process of exploring the example. When the button jumps away, the user follows it
without having to think.
Mr.Peters according to the HCI theories
How would this example have been described within the theoretical frameworks presented in
Chapter 2?
• Cognitive Psychology
It would be very difficult to conceive this example within the cognitive science school
of computer science with its notion of information processing. There is no room for
meaningful interactive experiences of this kind in Card et al's model human processor.
A typical interaction with the Mr. Peters example would be described simply as a
process of goal seeking behavior leading to a new goal formation. This experience
might be stored in long-term memory, but only as a memory of unsatisfied goals and
not as "input" at the perceptual level.
96 Understanding Interactivity
• Heidegger
Both Winograd & Flores and Suchman could have used Heidegger to analyze the Mr.
Peters button. An interpretation of the Mr. Peters example according to Heidegger will
have to be speculative. To a trained Macintosh user, the button is "invisible" in use. Its
sudden jump on the screen can be seen as a breakdown of this transparency. The button
goes from being in the background of what is ready-to-hand to being present-at-hand. In
Heideggerian terms it is hard to see how it can present itself as anything but a
malfunctioning tool (Zeug).
• Activity Theory
In activity theory terms, the intention of clicking on the button is part of an
operationalized action for a trained user. The button does not behave as expected, and
this leads to a re-conceptualization of the click-button-to-get-information operation. It
might be possible to interpret the button behavior as a sign in the original writings of
Leontjew, but this has to be speculative.
• Semiotics
In P.B.Andersen’s semiotic analysis, the button would be an example of an interactive
sign. In his examples he only uses interactive signs that change appearance and not
position. A liberal reading should allow for position as a possible way for interactive
signs to indicate state. This allows for the button to be a sign, but in his theory the
meaning, i.e. what the sign signifies, is either built on visual resemblance or on cultural
convention. He has not included behavioral resemblance as a possible function from
sign to the signified. His theory is a modification of a general theory (i.e. semiotics)
that, to the extent that action is mentioned at all, treats action and perception as two
separate processes.
• Computer as theatre
In Laurel’s theatre metaphor interaction is seen as a choice of direction in an interactive
play. The interactive behavior of "Mr. Peters" would not appear at the sign level of her
analysis. It might appear as a meaningful entity at the character level, but this
interpretation is speculative because the most straightforward application of Laurel's
theory to this example is that the unexpected behavior of the button would lead to an
explosion of the mimetic illusion. Her theory does not give any details on how we learn
the conventions of different mimetic illusions, or how "explosions" are repaired "in
situ".
We see that the closest we get to an understanding of the Mr. Peters button with the other
theories, is P. B. Andersen’s concept of interactive signs.
Part I, Theory 97
3.4 Merleau-Ponty on the Three Scenarios
As a way of applying Merleau-Ponty's philosophy to HCI, I will show how one could apply
his theories to the three scenarios from Chapter 2.
Scenario I: Concrete movement
A description of Scenario I could go as follows:
Mrs. X inhabits the room. Operating the switch is one of her bodily skills. She has the
intention of turning the light off. Her previous experience with the switch gives her the
implicit expectation that a touch of her finger will turn the light off. This expectation is
part of her PHENOMENAL FIELD at that point.
She moves her hand towards the switch. This movement is part of an integrated process
involving her body as a whole. Both her bodily movements and her perception is driven
by her “PRE-OBJECTIVE” INTENTION towards the world.
The visual perception involves moving her focus in search for meaningful patterns. The
movements of her eyes are integrated with the movements of the rest of her body. She
touches the switch, and the light goes off. This happened as expected, and she continues
out of the room.
Turning off the light is thus a CONCRETE MOVEMENT because it is an integrated part of
her everyday coping with her environment. If she had been an actor who had been asked
to turn off the light on her way out of the room, the action would have been an ABSTRACT
MOVEMENT, at least during rehearsal.
Through her interaction with the switch, the switch confirmed its way of working. It
thus adds to her perceptual field. This field is already shaped by all her "communions"
with that switch, both tactile and visual.
To Merleau-Ponty there is little difference between interaction and perception. It represents
two different ways of describing the experiences of a living body in its environment. We see
that with Merleau-Ponty we finally got a theory with a concept for Mrs. X's interaction with
the switch. The interactive experience is identical to her perception of the switch, which is the
switch to her. The touch of the finger is part of her perceptual process, and it is consequently
meaningless to talk for example about her "mental model" of the switch as consisting of
"states". The sum of her experiences with the switch-light system is stored in her body as her
attitude towards that switch.
Merleau-Ponty’s philosophy givesus a language for discussing the interactive experience,
but it does not provide a concrete methodology for interpreting non-verbal interaction data, i.e.
to get to the specific meaning of an interaction, for a specific person in a specific situation.
98 Understanding Interactivity
Scenario II: Concrete and Abstract movement
The stack in Scenario II could be used as an illustration of the difference between abstract and
concrete movements.
When Mrs. X is trying out the button in her HyperCard stack, this is an ABSTRACT
MOVEMENT because it is not something she does intentionally with focus on the action.
To Mr. Y it is also an ABSTRACT MOVEMENT because he tests out the switch with focus
on how it feels to use it. The movements are abstract in different ways for the two,
because Mrs. X has Mr. Y's interaction in mind, while Mr. Y has the end-user in mind.
The parameter setting is a CONCRETE MOVEMENT for both of them, because it is
movement done as part of something else.
When Mr. Y has learned to set the parameters, the radio buttons and the logic that
makes it affect the switch behavior has become internalized. He has understood it with his
body. He has it is in his fingers just like a blind man's cane. To Mr. Y, the stack in use is
integrated into the BODILY SPACE as three dimensions: the two parameters and the switch.
The light is an object external to the body.
The stack is interesting in that the parameters control the way in which the light is
operated. A metaphor could be a patient with a leg cut off below the knee who is using his
two arms to try out different prosthesis. The arms control the way in which the body is
extended, and thus the structure of the BODILY SPACE. It is only through actual use that
the patient can judge exactly how the BODILY SPACE is different with the different
settings for his artificial leg. Without actually trying to walk on it, he has to rely on his
image of how it could be. This image is very different from the bodily experience of
actually walking on it.
The stack is thus an example of a self-referencing system within the bodily space. From
this it is possible to speculate that after long training, Mr. Y could have learned to
integrate how the parameter settings control the switch behavior. It would then have been
possible for him to operate the switch quite fluently including parameter setting. This is
similar to how for example a car mechanic changes tools throughout his work. The stack
as a whole would then have been "understood" by the body as a possible BODILY
EXTENSION towards the simulated light.
We see that we here have been able to describe the details of Mrs. X and Mr. Y's interaction
with the stack. The blind spot so far is that we have not been able to account for how the stack
is part of the communication between the two. The problem with interactive messages has to a
certain extent disappeared, because when perception is an active process, the stack is no
different from other signs.
Part I, Theory 99
Scenario III: “Making the phenomenal field a theme of knowledge”
In Scenario III, Mr. X's PHENOMENAL FIELD at first makes him see the puzzle as four
squares that he can move around on a background. He tries it out, and he "understands" it.
When this perspective becomes insufficient, he "take[s] that horizon as a theme of
knowledge". He starts reflecting on his own PHENOMENAL FIELD and ends up changing it.
The new FIELD includes the possibility of dragging emergent squares. Having tried it, this
new understanding of the puzzle is brought into his BODILY SPACE.
The reflection happens in cooperation-operation with his wife, and the solution is
hinted at by her. She is aware of how his PHENOMENAL FIELD differs from her FIELD
concerning this puzzle and is able to direct him towards a solution. Having understood the
solution, it is no longer possible for Mr. X to experience the puzzle from his initial
perspective because his PHENOMENAL FIELD has changed. He can pretend that he does
not know how the puzzle works, but the change made to the puzzle as it appears to him is
irreversible.
We see that with Merleau-Ponty we got a notion of a change in perspective that makes the
learning experience in Scenario III a very natural thing.
3.5 Mark Johnson's "The Body in the Mind"
In 1980, the philosopher Mark Johnson wrote the book "Metaphors we live by" together with
Berkeley linguist George Lakoff. They are both brought up in the Anglo-American
philosophical tradition with very little influence from 20th century Continental philosophy.
They seem also to have been unaware of Wittgenstein's break with logical positivism
(Wittgenstein, 1953, see also Ehn, 1988).
Through their work with metaphor, Lakoff and Johnson came to doubt the foundations of
their tradition. They ended up rejecting the two dominant epistemological positions at that
time, which they named subjectivism and objectivism.
Of special interest for an investigation of interactivity is Johnson's book "The Body in the
Mind" (1987). Here he starts an exploration of the experiential basis of cognition in general
and of metaphor in particular. He argues that all understanding of language is rooted in our
experience of being human beings in a physical and social environment. Where in "Metaphors
we live by" Lakoff and Johnson approached metaphor from a linguistic angle, Johnson now
explores the “background” phenomena that make metaphorical understanding possible. The
latter includes an analysis of the structure of our interaction with the physical environment.
As an example, he shows how Boolean logic can be seen as a metaphorical projection of
the inside-outside schemata. Johnson argues that the inside-outside schemata results from our
everyday experience with containment. He mentions entering and leaving rooms, and putting
100 Understanding Interactivity
things into boxes as examples. This experiential domain has a structure that is isomorphic to
the structure of Boolean logic. True corresponds to inside and False to outside.
To support this analysis of Boolean logic, Johnson examines a lot of linguistic material. It
is for example common sense that a proposition can not be true and false at the same time. In
the same way you can not be present and not present in the same room at the same time. A
consequence of his view on the embodied nature of logical reasoning is that we do not need to
learn all the rules of logic to be able to practice it. As long as we have implicitly understood
the connection to the correct experiential domain, the rest comes for free through
metaphorical projection.
Johnson also spends some time on Eve Sweetser's work on the experiential basis of
modal verbs. She argues (Sweetser, 1990) that our understanding of "can", "may", "can not"
etc. is based on kinesthetic image schemata resulting from the everyday interaction with our
physical environment. "I can" is seen as a force that is not blocked. "I can not" is a blocked
force.
Kinesthetic image schemata and interactivity
The most interesting thing with this theory is that Johnson argues that kinesthetic image
schemata not only work as a basis for understanding, but that we are able to reason with this
kind of schemata much in the same way as we can reason with other schemata. This implies a
kind of tacit thinking in the kinesthetic domain that might have many similarities to visual
thinking (Arnheim, 1969).
Mark Johnson / Merleau-Ponty
Mark Johnson belongs within a tradition that largely treats visual and auditory perception as a
passive process of stimuli-reception. Johnson also believes in mental representation. As such,
he is far from Merleau-Ponty.
In his analysis of the kinesthetic sense modality, Johnson gets closer to Merleau-Ponty’s
epistemological position concerning the importance of intentionality in perception. This
should not come as a surprise as it would be very awkward to argue that our everyday
interaction with the environment is something that just happens to us as passive recipients.
The role of action in perception is seen very much clearer in the kinesthetic domain than in
vision and hearing. When I try to open a door to see if it is locked, this is clearly something I
do. What I learn about the door results from the sum of my intentions, my expectations, my
actions, the reaction from the environment, and my interpretation of this reaction, i.e. my
interaction with the door.
Johnson does not include the intentionality of the subject in his kinesthetic image
schemata. In his analysis of the inside-outside schemata, he says that this schemata results
from our experience of being in rooms, and of things being within other things. Merleau-
Ponty would probably have seen this as two different experiential domains. To treat them as
one phenomenon would to him mean to ignore bodily space and only see the body as an
object in external space. From the bodily-space perspective, leaving and entering rooms
Part I, Theory 101
means crossing borders. My relation to objects having inside-outside relations, including my
own body, give rise to a different kinesthetic image schemata.
If ordinary Boolean logic is based on the "objective" inside-outside schemata, it should
consequently be possible to find a logic based on the border-crossing schemata. Spencer-
Brown's "Laws of Form" (1969) might be an attempt at describing such a logic. A further
investigation of this topic is outside the scope of this work.
What does Johnson add to Merleau-Ponty's understanding? As I see it, he adds to
Merleau-Ponty's phenomenology a treatment of transfer. Merleau-Ponty relates language
development to kinesthetic development in relation to malfunctions, but he does not show any
examples of how the kinesthetic and the linguistic/cognitive relate in the well functioning
individual. As Johnson and Merleau-Ponty's epistemologies are irreconcilable, one can not
simply add Johnson's theories to Merleau-Ponty and get a coherent theory. They appear here
together because they both have made useful contributions to an understanding of the
phenomenon of interactivity. They both represent examples of a “corporeal turn” in science.
3.6 Discussion
We saw in Chapter 2 how the different theoretical approaches to HCI use different metaphors
for capturing the nature of the interactive computer. The two most relevant metaphors were
seen to be Computer-as-Tool and Computer-as-Media. These were seen to represent the two
parts of the Cartesian Body/Mind split.
With an application of Merleau-Ponty’s philosophy to human-computer interaction, we
get a new understanding of Interaction-as-Perception. By seeing interaction as a perceptual
process involving both “Body” and “Mind”, we overcome the Tool/Media dichotomy. When
perception is understood as an active process involving the totality of our body, it no longer
makes sense to see it as passive reception of information through a medium. When action in
the same way is seen as an expression of our being-in-the-world, it no longer has meaning to
see hammering etc. as a purely “bodily” activity.
102 Understanding Interactivity
Chapter 4 Computer as Material
“Each material conceals within itself the way in which it should be used.”
Wassily Kandinsky, 1912.11
Having thus gone into detail about the human side of Human-Computer Interaction, time has
come to have a closer look at the computer side: What is the nature of the interactive
computer?
4.1 The Nature of the Computer
We saw in Chapter 3 how the Interaction-as-Perception perspective made us overcome the
Tool/Media dichotomy. This makes necessary a new understanding of the nature of the
interactive computer.
Computer-as-material
With the danger of replacing one set of metaphors with another, it is tempting to let
Interaction-as-Perception go together with Computer-as-Material. Material must in this
context be interpreted quite wide. It means simply something that can be shaped into
something else.
The computer material has some interesting properties. It is first of all interactive. This is
not only in the sense that a piece of clay is interactive. It is interactive by nature, and
everything it can be shaped into will be interactive.
In (Budde and Züllighoven, 1991), the authors argue for a “tools and materials” metaphor
for the computer. They see the computer as opening up for building different kinds of new
“materials” that can be crafted with different kinds of tools. Their argument is similar to the
above, but they never see the computer itself as a material.
If we treat the computer as a raw material, which we can make into a myriad of shapes, it
is important to get a feel of this material.
11 (Kandinsky, 1994, p.154)
Part I, Theory 103
Arnheim’s media studies.
As introduced in Chapter 1, there is a tradition in Media and Art Studies for asking questions
concerning the nature of the medium/material being studied. Compared to the study of
social and cultural impact, media studies with a focus on the properties of the medium, are
rare. The most important author on this subject is Rudolf Arnheim.
Arnheim has dealt mostly with film, painting, drawing, sculpture, and architecture. For
these media he has analyzed their media-specific properties from an artistic and psychological
perspective. In the introduction of (Arnheim, 1974), he states explicitly that he is not
concerned with the cognitive, social, or motivational aspects. Nor is he concerned with “the
psychology of the consumer” (p.4). His focus is on shapes, colors, and movements, and how
they interact.
By ignoring all elements of social function and meaning in a traditional sense, he is free
to discuss issues such as balance, space, shape, form, and movement in relation to the
different media. Arnheim draws heavily on examples from art and gestalt psychology.
The concrete results from Arnheim’s studies are of little relevance here. What is relevant
is his approach to the study of a new medium. I use the term medium here simply as a way of
placing the computer in relation to this tradition. It does not imply a Computer-as-Medium
metaphor in the sense it was discussed in Chapter 2.
Following Arnheim’s methodology, much can be learned from having a detailed look at a
tradition that deliberately explored a new medium. As presented in Chapter 1, the Modern Art
movement at the beginning of this century is an example of such a tradition.
4.2 The Abstract Art of Kandinsky
The Russian painter Wassily Kandinsky (1866-1944) is often heralded as having painted the
first abstract painting. In (Lärkner et al., 1990) P. Vergo (1990) argues that Kandinsky to a
large extent was inspired by Schopenhauer´s ideas about the role of art. He explains
Schopenhauer’s view:
"All that matters is the Idea, that inner essence which lies behind the merely external
aspects of the world, and which it is the function of art - all art - to reveal". (p. 55)
Kandinsky saw music as the ultimate art form, and wanted to create visual art that like music
rested in a realm of its own without references to real "objects". He named his abstract
paintings "compositions" to reflect the similarity with music. In the same volume T.M.
Messer (1990) comments on Vergo´s paper and writes concerning Kandinsky´s analogies
with music:
"This raises the question whether painting fully can aspire to the condition of music,
whether eye and ear are really analogue to that degree". (p.163)
This is a question that clearly has to do with the nature of painting as a medium. In his book
on the intellectual environment of Klee and Kandinsky (Roskill 1992), the historian Roskill
104 Understanding Interactivity
does not mention Schopenhauer explicitly. Continental Europe in the first decades of this
century was a melting pot of a multitude of currents in philosophy, in politics, in the arts, and
in religion. Kandinsky was inspired by Nietzsche, Russian formalism, the theosophy of
Blavatsky, Marxism, and almost all the intellectual currents of the time. He became himself a
transformer of culture, and has had great influence on the way art is today perceived and
taught. Kandinsky´s book "On the spiritual in art" (1912) is both a personal manifesto on the
essence of art, and the first attempt at a theory of abstract painting. In the chapter "The
pyramid" he write (about himself?) that if a painter wants to express his "inner world", it is
very understandable that he turns to music as an inspiration.
"From this comes the current search in painting for rhythm, mathematical abstract
construction, and the use of the tones of color in such a way as to bring about a
movements of the colors " (Author’s translation from German, p. 55).
He compares music with painting and points out the following differences:
• The composer has time available to him, which the painter has not.
• The painter can on the other hand give the viewer an impression of the work as a
whole in a blink of the eye. This is not possible with music.
• Music does not make use of external forms, while painting currently (i.e. 1912)
almost exclusively make use of such forms.
On the last point he has a lengthy footnote on the use of "natural" sounds in music. His point
is that it is obvious that the use of sounds from a farmyard in a musical composition is not a
very good idea. Music has its own ways of putting the listener in a specific mood without
these kind of “Varieté-like” means of expression.
The chapter "Language of form and color" also starts out with a reference to music:
"Musical sound has direct access to the soul". (Kandinsky, 1994, p. 161)
To be able to create abstract paintings that are like musical compositions, he continues, the
painter has available two means: Color and Form. A large portion of the book is from then on
devoted to a theory of color and form.
The book was written at a point in the history of visual art where abstract painting was
still in its very beginning. The movement towards abstraction in Kandinsky´s painting can be
illustrated with five paintings from 1906, 1911, 1926, and 1930.
Part I, Theory 105
Figure 20. "Riding couple", 1906 and “Abstract Composition (Painting with a circle)”, 1911
His "Riding couple" from 1906 is very much figural in the sense that it clearly shows a
dream-like motif of a riding couple. It is on the other hand not naturalistic, neither in the sense
that it shows an existing natural scene, nor in the sense that it tries to give a correct
reproduction of "external form".
Kandinsky called his "Abstract Composition (Painting with a circle)" from 1911 the
world’s first abstract painting. Others (i.e. Mondrian, Delaunay, and Malevitch) were working
with similar ideas at the same time, but this is at least Kandinsy´s first abstract painting. We
see no longer any motif; this is pure color and form.
"Counterweigths", or "counterbalance" as he also named it, was painted in 1926 (Figure
21). We see here a good example of his experimentation with color and form elements. Seen
as a composition, this painting has the strong regularity of a fugue by Bach. The number of
form elements is very low, and with a couple of exceptions, all colors are pure. The painting
conveys his idea of the three primary forms circle, triangle, and square.
To the left in Figure 22 is shown his painting “Thirteen rectangles” from 1930. In the
painting "Two squares" to its right, Kandinsky took one move further towards simplicity. The
elements are here reduced to one form element and two colors.
106 Understanding Interactivity
Figure 21. “Counterweights”, 1926.
Figure 22. “Thirteen rectangles” and "Two squares", both 1930.
Part I, Theory 107
To sum up, Kandinsky´s movement towards abstraction can be characterized as:
1. Going from the figurative to the non-figurative.
2. Going from the complex to the simple.
3. Identification of the two orthogonal dimensions form and color.
4. Identification of "pure" form and color elements.
The aim of his move towards abstraction and simplicity was to make painting a new medium
of expression that would allow the painter to "speak directly to the soul". Much of the work of
Kandinsky can be seen as a deliberate search for the properties of this medium.
The Constructivists and the Bauhaus.
Kandinky was not the only painter at that time who moved towards the abstract. Pablo Picasso
and the other cubists had around 1910 already distorted the object to such an extent that it was
hardly recognizable. For the following decade, cubism served as an inspiration for a variety of
directions in visual art that removed the recognizable object all together.
In (Barr, 1936) the former director of The Museum of Modern Art in New York, A.H.
Barr, draws a diagram of the different schools and their interrelations. The picture is very
complex and contains a lot of cross-fertilizations. He ends up with two major schools just
before Worlds War II. The first being the expressionists and surrealists belonging to the same
tradition as Kandinsky, the second being the Constructivists, the Purists, and other proponents
of purely geometric art. For the latter, abstract form was the means that made it possible to
express visually the ideology of the era of the machine. They saw the logic of the machine as
that of form following function.
The conflicting views of these schools of abstract art can be seen most clearly in the
history of the Bauhaus, especially the shift around 1924. The Bauhaus school of art and
architecture existed in Germany from 1919 until the Nazis closed it down in 1933. Despite its
relatively short life span, this school has had an astonishing influence on 20th century art and
architecture. Kandinsky was teaching here from 1922 until it was closed down, and the school
attracted some of the most influential artist of its time.
In its first years, the basic course was given by the Expressionist painter Johannes Itten
(see Itten, 1975). His pedagogy can be characterized as a subjective experimentation with the
inherent properties of the materials. In 1923, Itten was replaced by the Constructivists Albers
and Moholy-Nagy. As Hunter and Jacobus (1992) put it:
"They [Albers and Moholy-Nagy] abandoned his [Itten’s] intuitive approach for more
objective and rationalistic methods. Taking advantage of many aspects of modern
technology, such as visual experience derived from photography, and using industrial
materials, among them clear plastics, they organized their teaching essentially around
a core of Constructivist aesthetics." (p . 245)
108 Understanding Interactivity
4.3 Abstraction in the Computer Medium
Kandinsky says about the individuality of materials (media):
"The artist must not forget ... that each one of his materials conceals within itself the
way in which it should be used, and it is this application that the artist must
discover.” (Kandinsky, 1994, p . 154)
A literal interpretation of this passage leads to a half-mystical metaphysics, while understood
as poetic metaphor it tells us something important about materials/media.
What can be learned from the modernists concerning a deliberate exploration of the
computer medium? First, the very idea of exploring a design space by creating abstract (i.e.
non-figural) pieces of work. Next, that it is possible to identify the orthogonal dimensions of a
medium and the elementary building blocks of these dimensions.
What are the orthogonal dimensions of the computer medium? In the spirit of Kandinsky
it is tempting to start out with color and form, and add interaction as a third dimension as we
did in Figure 2 on page 6. The popular term “look and feel” is clarifying in this respect. If the
“look” is the experience of color and form, the “feel” is the experience of interacting with the
computer.
Abstract expressions in interactive media
One of the most extreme expressions of abstract art imaginable would be a white canvas.
Figure 23 shows such a painting in an exhibition.
Figure 23. Untitled.
Imagine now that we make the medium interactive. What would then be the simplest possible
abstract expression?
Part I, Theory 109
Figure 24. Untitled II (Interactive).
Figure 24 shows an interactive version of “Untitled”. “Untitled II” (interactive) could for
example go from white to black on touch, and return to white when the hand is removed. For
a detached observer it would appear identical to its non-interactive counterpart. Its interactive
quality would only be revealed when the viewer touched it.
A similar expression in the computer medium is shown in Figure 25 together with its
corresponding State Transition Diagram (STD). It initially appears as a single black frame on
a white computer screen. When you press on it, the frame is filled with black. When you
release the mouse button, it returns to its initial state.
Press
Release
Initial state
Figure 25. Abstract Interaction
The interaction of this example is orthogonal to its form and color. Both form and color can
be changed, but the interaction would still be recognizable as a change between two states
110 Understanding Interactivity
triggered by "press" and "release". The same interactive behavior can for example be found in
the 3D buttons of the Windows-95 user interface. Form and color is totally different in the
two cases, but the interaction is the same. Some doorbells also behave like this. When you
press on it, it lights up; when you release it, the light dies.
4.4 How is Abstract Art Experienced?
To what extent do theories of art like that of Kandinsky actually reflect what the ordinary
viewer sees in a painting? This question came up during his years as a teacher at the Bauhaus
school in Weimar. To answer it, he made a questionnaire to be filled out by his students. It
showed his three basic forms, and the students were asked to add the colors that made the result
"harmonious".
Figure 26. Kandinsky’s questionnaire about form and color
Figure 26 shows a questionnaire filled in by one of his Bauhaus students (from Droste, 1998,
p. 86). The answers all confirmed Kandinsky´s ideas, but he had forgotten that his art students
had already learned about his theory of color and form in his classes. They knew the "correct"
answer. Since the Bauhaus 70 years ago, little empirical work has been done on this subject.
The fact that the works of art from this period are still inaccessible to most people can be seen
as a sign that they did not succeed in getting "direct access to the soul". Modern art is
expressed in a language that has to be learned, like any other language. This was even the case
for the modernists themselves, but they wrongly interpreted their invention of a new visual
Part I, Theory 111
language as an exploration of a pre-existing “world”. An exploration of a medium is in itself
always an act of creation. The practical consequence of this is that if we “find” a nice set of
dimensions and elements we should not believe that these are the only possible ways of
making sense of the medium. At some stage, the theories must be tested with empirical
methods.
This misunderstanding comes as a natural consequence of Kandinsky’s idealistic
philosophical background. He saw his music of the canvas as a Platonic idea that existed prior
to man and culture. To the true idealist, only God creates ideas. Mortals can only hope to see
glimpses of eternal beauty in moments of inspiration. There is a certain poetic beauty to this
mediaeval epistemology, but it unfortunately leaves us blind to the fact that ”reality” to a
large extent is a cultural product that is constantly created and re-created.
How is abstract interactive art experienced, and what can be learned from that concerning
the interactive computer as medium ? To answer that question it is necessary to move from a
theoretical to an empirical study of interactivity. The next part is devoted to this.

Part II
Experiments

Part II, Experiments 115
Chapter 5 Doing Empirical Research on Interactivity
“One begins with an area of study
and what is relevant to that area is allowed to emerge”.
Anselm Strauss and Juliet Corbin, 1990.12
Part I can be seen as a theoretical investigation of interactivity in the context of interactive
computers. Part II will describe three psychological experiments aimed at grounding an
understanding of interactivity in empirical findings. This turn to psychology should not be
interpreted as a shift of focus away from the computer as a medium. In a similar manner as
when Kandinsky wondered how viewers actually experienced his color and form elements
and as a response distributed his famous Bauhaus questionnaire, my focus is specifically on
interactivity as experienced reality. Where experimental psychology would ask questions
about the psychological mechanisms involved, my intent is not to inform psychology as a
research field, but to build an understanding of how interactive media are experienced in use.
Qualitative vs. Quantitative methodologies
As the research question is of a qualitative kind, i.e. understanding the interactive experience,
it asks for a qualitative research methodology. There are obviously other ways to analyze
interaction data. A researcher from the human factors tradition would probably have focused
on the time an average subject needs to perform a certain task. From these data, one could for
example hypothesize a “law” of how the time to explore an artifact is correlated to its
complexity.
Although such an approach would most probably lead to interesting theories, it is hard to
see their usefulness when it comes to understanding the interactive experience. To illustrate
the point, how would the interactive experience be different if the resulting correlation was
linear vs. quadratic?
This is not to say that quantitative methodologies are irrelevant per se, only that when the
research question is of a qualitative kind, much is gained by applying a qualitative research
methodology. An analogy is Sigmund Freud’s research. Freud did not come up with his
theory of the subconscious from distributing questionnaires to the Vienna population about
12 (Strauss and Corbin, 1990, p.23)
116 Understanding Interactivity
their traumas. His understanding of the human mind came from detailed analysis of
qualitative data from a very low number of cases.
Field study vs. Experiment
Qualitative methods have largely been used for interpreting field data (Bannister, 1994). In
our context, this would correspond to the analysis of human-computer interaction in natural
settings. Such field data are available (e.g. Suchman, 1987), but they do not provide the level
of detail I am aiming at here. To be able to study the interactive experience qualitatively down
to the level of simple interaction elements, I have found it necessary to set up a
controlled experiment.
Up until recently, “qualitative experimental psychology” would be regarded by many
researchers as a contradiction in terms. Qualitative research techniques are still mainly used to
interpret field data and other data obtained by non-obtrusive methods, but there is a growing
openness for also using such methods to interpret data obtained by experimental methods. To
illustrate this, Richardson’s Handbook of Qualitative Research Methods for Psychology and
the Social Sciences (Richardson 1996) contains two chapters on protocol analysis and thinkaloud
techniques for use in psychological experiments.
5.1 Relevant Research Traditions
Qualitative research in the social sciences
The area of research loosely referred to as “qualitative” includes a multitude of research
traditions making use of a wide variety of methods. One important early contribution in the
development of qualitative research was Glaser and Strauss’ “The discovery of grounded
theory” (1967). In a recent introduction to grounded theory, Strauss and Corbin (1990) define
it:
“A grounded theory is one that is inductively derived from the study of the
phenomenon it represents. That is, it is discovered, developed, and provisionally
verified through systematic data collection and analysis of data pertaining to that
phenomenon.” (p. 23)
They continue:
“One does not begin with a theory, then prove it. Rather, one begins with an area of
study and what is relevant to that area is allowed to emerge”. (p. 23)
This inductive approach to theory development is an important property of all qualitative
research. Where a researcher in the natural sciences would in most cases start out with a
hypothesis to be tested, a qualitative study should start out with an openness towards the
phenomenon in question, and allow for a theory to emerge from the data. In Glaser and
Part II, Experiments 117
Strauss’ methodology, the resulting grounded theory should then be verified against the data
to assure that no part of the data contradict it.
An important element in their methodology is to be as true as possible to the “raw” data.
This means entering the research with as few ready-made categories as possible, and let the
categories used for coding the data emerge slowly in an iterative process.
Qualitative methods are today widely used in the social sciences. Grounded Theory is just
one of many methodologies in use. Others include ethnography, content analysis, discourse
analysis, case studies, life history, and action research.
All the above traditions work mainly with verbal data obtained in natural settings. This is
very different from the situation in experimental psychology where the setting is constructed,
and where the material often includes non-verbal data. Despite these differences, there is still
a lot to learn from the qualitative traditions in the social sciences concerning the focus on
human experience “from the inside”. The empathic understanding of how a phenomenon is
experienced in situ is very different from a detached observer’s “objective” understanding of
the phenomenon. The latter has often been the focus of experimental psychology.
Early qualitative work in experimental psychology
In his classical psychological experiments on the perception of causality, Michotte (1963)
found that his subjects showed a remarkable degree of agreement in their description of
simple motion pictures with two or more colored squares moving on a blank screen.
Figure 27. Snapshots from one of Michotte's experiments.
In a typical experiment with two squares he let a black square move towards a non-moving
red square (see Figure 27, from Bruce, Green, and Georgeson, 1996, p. 333). When the black
square reached the red square, he made it stop and made the red square start moving with the
same speed and direction. All subjects described this as the black square causing the red
square to move.
The central point to Michotte was not to show that people are capable of providing causal
explanations to what they see, but that causality in these cases was directly perceived by the
subjects. He claims that the primacy of these perceptions makes it plausible that our
understanding of causality is embodied.
118 Understanding Interactivity
Figure 28. Snapshots from one of Heider & Simmel's experiments.
Another classical work in experimental psychology of relevance is Heider and Simmel´s
study of the apparent behavior of simple geometric figures (Heider and Simmel, 1944). In one
experiment their subjects were shown a movie with a square, two triangles, and a circle
moving in various directions and at various speeds on a white background (see Figure 28,
from Bruce et al., 1996, p. 338). Nearly all their subjects projected human characteristics into
the objects and described their movements with words like "chasing", "coming", and
"leaving". In a similar manner as Michotte did concerning causality, Heider and Simmel
wanted to show that the perception of social situations to a large extent is embodied.
Expressed in the terminology of Merleau-Ponty, one could say that these psychologists
wanted to show that our interpretation of quite complex phenomena like causality and social
situations to a large extent are immediately available to us. These habits (i.e. skills) are part of
"the knowledge of the body". Referring to their novice-expert hierarchy, Dreyfus & Dreyfus
(1986) would have said that the subjects showed mastery in these domains at the highest level
of competence. As ordinary adults we are all "experts" in areas like the perception of causality
and the interpretation of social situations.
Both Michotte and Heider&Simmel’s experiments can be seen as specialized Rorschach
tests for particular domains of knowledge. Where the original Rorschach tests were designed
to unveil general contents of the mind, these tests were designed to answer questions that
were more specific. The moving abstract figures on the screens contained no explicit
references to real phenomena, but the test subjects still projected meaning onto them. From
analyzing the structure and content of these projections, the authors made conclusions about
the domains of knowledge in question.
Part II, Experiments 119
Gestalt Psychology
Michotte, and to a certain degree Heider, belonged within the loosely defined school of
Gestalt Psychology. Since its birth around 1912, its focus has been on establishing the rules
that govern our perception of reality. Central to this school is the concept Gestalt.
Encyclopædia Britannica Online (1999) describes Gestalt Psychology:
Its precepts, formulated as a reaction against the atomistic orientation of previous
theories, emphasized that the whole of anything is greater than its parts. The
attributes of the whole of anything are not deducible from analysis of the parts in
isolation.
Gestalt is described:
The word Gestalt is used in modern German to mean the way a thing has been
gestellt; i.e., "placed," or "put together." There is no exact equivalent in English.
"Form" and "shape" are the usual translations; in psychology the word is often
rendered "pattern" or "configuration."
Gestalt Psychology can be characterized as a phenomenological research tradition in that
ideally such research does not start out with a ready-made hypothesis to test. It is open to
whatever contents the test subjects project onto the stimuli of the experiment. Most work in
this area has been on visual perception, and the tradition has mainly dealt with relatively lowlevel
perceptual processes.
As a research methodology, the early work used introspection, which simply means
speaking out load what you see and experience. Introspection as used by the early gestalt
researchers differs from today’s think-aloud in that the test subjects were taught a complex
technical language for describing their perceptual processes.
Mental-models research
As introduced in Chapter 2.4, one research tradition that goes beyond the low-level perceptual
processes dealt with by the gestalt tradition, but still uses a qualitative methodology, is the
empirical study of mental models and naive theories. A collection of works in this area can be
found in (Gentner and Stevens, 1983). Studies of naive theories have been done on domains
as diverse as physics, electricity, heat, Micronesian navigation, and arithmetic. Naive theories
can give insights into the nature of alternative understandings of a domain, and have proven
valuable to designers and instructors.
The naive theories of physics present a paradigm case. Several experimental studies
indicate that it is common to reason about motion with pre-Newtonian concepts. In a study by
White and Horwitz (1987), high school students were shown the drawing in Figure 29 and
asked what happens when the runner holding the ball drops it. Does it follow path A
(correct), B or C? Only 20 percent got it right. When asked to explain their reasons for
120 Understanding Interactivity
answering B or C, the students made use of a naive theory of motion very similar to the
medieval impetus theory that was dominant in the millennium from the 6th century until the
time of Galilee and Newton.
Figure 29. Possible paths for a falling ball.
The research methodology used in the study of naive theories is similar to that of Gestalt
Psychology in that the researchers are open to whatever naive theory about the phenomenon
in question the subjects chose to express.
Metaphor Analysis
Naive theories have similarities with Lakoff & Johnson’s (1980) classification of important
implicit metaphors in everyday English. These metaphors are available as linguistic resources
to the language user. Lakoff & Johnson's work says nothing about what metaphor a given
language user will make use of in a certain situation, but it lists the metaphors available.
If we take the description of temporal phenomena as an example, Lakoff & Johnson say
that in English, time is described as either:
• something stationary that you pass through like in "We are approaching the end of
the year", or
• a moving object like in "The time for action has arrived".
No other implicit metaphors are in common use in English. A multitude of other metaphors
are theoretically possible like seeing time as color. From Lakoff & Johnson's work we can
conclude that sentences like "The time is green today" are meaningless in English or that the
meaning of creative metaphors of this kind needs to be explained at length to the listener.
To a designer of a natural-language user-interface for a scheduling system, this kind of
knowledge would be very useful even though it would not predict what a specific person
would say in a specific situation. It would on the other hand restrict the cases that would have
to be dealt with dramatically and justify putting some strong limitations on the grammar of the
user interface.
We see that also the research methodology of Metaphor Analysis is of a qualitative kind.
It differs from Gestalt Psychology and the study of Naive Theories in that there are no
experiments. The empirical basis is language as it is commonly spoken and written.
Part II, Experiments 121
Computer Science
Andersen & Madsen (1988) report on a study of the naive theory of Danish librarians
concerning their database system. Based on their empirical findings, the authors proposed a
redesign of the query language to better fit the conceptions of the librarians.
Little has been done on studying naive theories of graphical user interface (GUI)
behavior. Every usability test involving qualitative questions about the user’s intuitive
perception of a GUI can be seen as an attempt at searching for such theories, but I have found
no attempts at generalizing from these findings to a naive theory of the workings of the
computer medium itself. Turkle (1984) has described how users intuitively reason about and
relate to computers, but her study was done before the widespread use of GUIs and focused
on the verbal interaction between user and computer. With the advent of bitmapped graphics
and mouse input, it is possible to see the computer as an interactive medium where the
dominant mode of interaction is non-verbal.
5.2 Evaluation Criteria
In the natural sciences there is in most areas of research a well-defined set of evaluation
criteria that define “good research”. This is also the case in most quantitative areas of the
social sciences. In the qualitative traditions, there is still an ongoing discussion onto what
constitutes quality in research. Different authors focus on different elements, and they often
differ in vocabulary.
This does not mean that “anything goes” in qualitative research. It means that because the
field is relatively new each researcher has to reflect on methodology. This is very different
from fields like chemistry where all such issues have long been settled, and where we find a
well-established vocabulary and fixed criteria for evaluating the quality of research. In the
terminology of Kuhn (1962), chemistry’s evaluation criteria are part of its paradigm. The
field has reached the phase of normal science where methodological and philosophical issues
have moved into the background, and research has become the day-to-day activity of puzzlesolving.
From a Kuhnian perspective, qualitative research is still in an early phase where
different ways of doing, understanding, and evaluating research are competing.
This does not imply that the choice of evaluation criteria can be like a trip to the
supermarket. That would have made it impossible to know what studies to trust. Comparison
of studies would have been almost impossible, and it would be possible to justify almost
anything as research. To avoid this situation, each research project has to be explicit on its
choice of evaluation criteria. Each criterion must then be justified with reference to the
research question and the nature of the study. When the evaluation criteria are made explicit,
it is easier to know what conclusions to draw from a study. It thus becomes possible
to compare studies.
122 Understanding Interactivity
Relevant literature
Smith (1996) in (Richardson, 1996) provides the following list of evaluation criteria for
qualitative psychology:
• Internal coherence: Do the conclusions drawn follow from the data, and does it deal
with loose ends and contradictions in the data? Are alternative interpretations of the
data considered?
• Presentation of evidence: The nature of a qualitative study makes it hard to argue for
a certain interpretation of the data if not enough raw data is included in the report to
make it possible for the reader to follow in detail an example of the researcher’ s
process from data to theory.
• Independent audit: One way of validating the conclusions is by letting one or more
other researchers analyze a part of the data. Again this requires that the data is
recorded and coded in such a way as to allow for an independent audit to get a rich
picture of the phenomenon in question.
• Triangulation: The low number of subjects in most qualitative studies, its contextual
nature, and the unavoidable bias of the researcher, often makes it hard to argue for
validity. To improve the quality of the results, much qualitative research combine
different methods and perspectives through triangulation. By approaching a
phenomenon from different angles at the same time it is possible to get a richer and
more valid picture. “The essential rationale is that, if you use a number of different
methods or sources of information to tackle a question, the resulting answer is more
likely to be accurate” (p.193).
• Member validation: To check the conclusions drawn with the individuals of the
study.
• Investigator reflexivity: To make explicit the role of the researcher in the study.
Smith argues that it is impossible for a researcher to be 100% objective. The report
of a study should therefore include enough information about the researcher to make
it possible for the reader to judge the effect of the researcher’s bias.
• Participant reflexivity: To document the understanding that the participants have of
the study.
Banister et al’s Qualitative Methods in Psychology, a Research Guide (1994) has a chapter on
“Issues of evaluation”. The three main criteria are triangulation, reflexivity, and ethics.
They identify four kinds of triangulation (pp. 142-159):
• Data triangulation: To combine studies done with a wide diversity of subjects and
context.
• Investigator triangulation: To combine the results from studies performed by different
researchers.
Part II, Experiments 123
• Methodological triangulation: To use different research methods to collect the
information.
• Theoretical triangulation: To be multidisciplinary in the interpretation of the data.
They list the following ethical criteria:
• Informed consent: That the subjects are fully informed from the start of the important
elements of the research through an open and honest interaction between researcher
and participants.
• Protecting participants: To ensure that there is no exploitation of the participants.
This means for example that the participants should be told that they can at any time
withdraw from the research, or chose not to disclose some piece of information.
• Confidentiality and anonymity: To make it virtually impossible for the readers of a
report to trace the study back to the participants.
• Accountability: In the study of people in their natural setting, one may in some cases
be faces with moral dilemmas that force you to chose between the loyalty to your
participants and other loyalties. You should in these cases be explicit on where your
loyalty goes.
Miles and Huberman’s Qualitative Data Analysis, An expanded sourcebook (1994) list five
main evaluation criteria (pp. 277-280):
• Objectivity/Confirmability: “The basic issue here can be framed as one of relative
neutrality and reasonable freedom from unacknowledged researcher biases – at the
minimum, explicitness about the inevitable biases that exist” (p. 278). They include
here also the explicit description of method, procedure, and analysis. Richardson’s
criteria of investigator reflexivity is covered by this criteria.
• Reliability/Dependability/Auditability: “Have things been done with reasonable care”
(p. 278). This includes clear research questions, and a study design congruent with the
questions. It also includes showing a consistent process of study over time and methods.
• Internal validity/Credibility/Authenticity: The issue of internal validity includes
Richardson’s criteria Internal coherence, Presentation of evidence, Triangulation, and
Member validation. Concerning presentation of evidence they make use of Geertz’
(1973) term thick description. A thick description contains enough detail and context to
allow the reader to get a more direct feel of the phenomenon than one would get from
getting access to it only through the categories and interpretations of the researcher.
• External validity/Transferability/Fittingness: Can the conclusions be generalized and
made relevant in different settings, with different people? Is the scope of generalization
made explicit?
• Utilization/Application/Action orientation: Can the findings of the study be made use
of? Does it help solve a problem?
124 Understanding Interactivity
In (Morse (ed.), 1994) M. Leininger (1994) lists six criteria: Credibility, Confirmability,
Meaning-in-context, Recurrent patterning, Saturation, and Transferability.
Of these, only Saturation is not covered by the other authors. It refers to the
completeness of the study, i.e. that the researcher has dealt with all the data available and not
found any untried interpretations of the phenomenon.
Choice of criteria
From the above list of criteria I have picked those that make direct sense in a qualitative
experiment with the focus on the interactive experience. I find all five of Miles and
Huberman’s general criteria relevant, though with a slight modification because of the
experimental nature of the study. In addition I would like to include Banister et al’s criteria of
ethics.
This creates the following list:
• Objectivity: Is the role of the researcher in the study accounted for?
• Reliability: Does the process by which the data are gathered fit the research question,
and is the method, procedure, and analytical process well documented? For an
experimental study, it follows that the descriptions should be accurate, detailed, and
complete enough to allow for a replication of the study.
• Internal validity: Are the conclusions grounded in the data, and do they show
coherence?
• Transferability: Can the findings be made relevant for other people, places, and
contexts?
• Utilization: Can the findings from the study be made use of to solve problems?
• Ethics: Is the research done according to the ethical standards required for such
studies?
None of these strategies, alone or in combination, guarantee that the results will be valid in a
natural-science sense. This should not keep us from pursuing a qualitative approach, as no
other empirical road to my knowledge is available to get insight into the understanding of
interactivity.
5.3 Overall Experiment Design
Square World
Having decided to design a psychological experiment of this kind, one is faces with the
question of whether the stimuli should mimic a realistic situation or whether it is possible to
learn anything from analyzing how users interact with purely abstract stimuli. Again referring
Part II, Experiments 125
to the Bauhaus tradition, the present research has a lot in common with Itten’s systematic
investigation of how different colors interact (Itten, 1973). Itten’s resulting theory says things
like: If you want an object in a painting to appear as if it is behind another object, you should
make it darker than the one in the front. Thumb rules of this kind hold whether you paint fruit,
skyscrapers, or people, and whether the painting should hang on a wall or be used as an
illustration in an advertisement. If we assume that the kind of phenomena we are looking for
are of a general kind, there should be no need to use figurative examples or provide a
naturalistic context for interpretation. This opens up for using purely abstract interactive
examples as "stimuli" in the experiment.
Chapter 6 describes “Square World”, a collection of such non-figurative interactive
stimuli, constructed to span out an area of the design space of interactive behavior.
The three experiments
Following the idea that triangulation increases quality, three different experiments have been
designed to give insight into how non-programmers understand artifacts with interactive
behavior. Of Banister’s four kinds of triangulation, mostly methodological triangulation
applies to the study.
The three experiments use different methods, but for practical reasons, they are not very
different in choice of subjects, equipment, and physical environment. As I have been the sole
investigator, I have not been in the position to perform investigator triangulation. I have tried
to be as multidisciplinary as possible in the interpretation of the data, though within the limits
given by my background.
• The first experiment (A) is an investigation into what metaphors and mental models of
interactive behavior subjects spontaneously project into interactive artifacts. It follows
the methodology of the “Mental Models” tradition (see Gentner and Stevens, 1983).
• The second experiment (B) makes use of the results from the first experiment to
construct editors for designing interactive behavior. The usability of these editors is
then empirically tested to validate the initial results. Experiment B uses a
combination of “the linguistic approach to software design” as described in (Andersen
and Madsen, 1988) and state-of-the-art usability testing (Gomoll, 1990).
• The third experiment (C) is a controlled group activity of participatory design. The task
given to the group is to design an editor to enable them to express interactive behavior.
Through this process of creative creation, the subjects are expected to show their
implicit understanding of interactivity. The method used is state-of-the-art Participatory
Design as described in (Greenbaum and Kyng, 1991) combined with discourse analysis
(see Richardson, 1996).
The experiments, each with detailed experiment design, method, results, and a short
discussion, is presented in Chapters 7, 8 and 9 respectively. The interpretation of all three
experiments taken together is postponed until Part III.
126 Understanding Interactivity
5.4 Measures Taken to Meet the Criteria
For each of the six criteria, I list one or more measures taken to ensure that this criterion is
met in the experiments.
Objectivity
Is the role of the researcher in the study accounted for?
• I have included a short biography in the preface to document my bias (researcher
reflexivity).
• In the description of the experiments, I have included relevant information about my
role as researcher.
Reliability
Does the process by which the data is gathered fit the research question?
• This question was discussed at the beginning of this chapter, where I concluded that
the research question fits well with an experimental qualitative approach.
Is the method, procedure, and analytical process well documented?
• A detailed descriptions of method, procedure, and analysis can be found for each
experiment. Care is taken to make these descriptions detailed enough to allow for
replication of the experiments.
Internal validity
Are the conclusions grounded in the data, and do they show coherence?
I have made use of three techniques to ensure the internal validity of the research:
• Thick descriptions: I have included detailed examples of transcripts with the raw
data separated from my interpretation. This serves both to justify my conclusions and
to illustrate how I have analyzed the data.
• Triangulation: If the same result shows up in three different experiments on the same
topic, this gives more credibility to the result. The triangulation of the three
experiments is my most important technique for ensuring internal validity.
• Saturation: For all three experiments, I have been watching the tapes and reading the
transcripts a number of times to ensure that no parts of the data are left unaccounted
for in the analysis. Saturation is in this context the experience as a researcher of
going through the data without being able to come up with new interpretations.
• Validating the interpretation against the data : This means going through the material
without finding phenomena that are not analyzed, or finding data that contradict the
conclusions drawn.
Part II, Experiments 127
Transferability
Can the findings be made relevant for other people, places, and contexts?
In my discussions, I refer to existing software to illustrate the general nature of the findings.
Chapter 13 shows how the research strategy developed for the study can be applied to the
study of the perceived ontology of other technologies. Through this analysis, the scope of the
experiment becomes clear. Chapter 11 is a detailed discussion of how the results from the
experiments apply to our general understanding of interactivity as a phenomenon.
Utilization
Can the findings from the study be made use of to solve problems?
The four empirically-based editors of Experiment B can be seen as an example of a utilization
of the results, even though they are of little or no practical use to solve any real-world
problems. Chapter 12 is a discussion of how the findings apply to interaction design. This
chapter includes a description of a design tool based on ideas from the experiments.
Ethics
Is the research done according to the ethical standards required for such studies?
Apple Computer’s internal guideline for usability testing was used in the experiments. The
steps it prescribes follow the criteria of ethical standards listed by Banister et al.
128 Understanding Interactivity
Chapter 6 Square World
“God is in the details”.
Ludwig Mies van der Rohe (1886-1969), director of Bauhaus 1930-33.
To exemplify interactive behavior, I have built on the example of the interactive square from
Chapter 4.3 (Figure 25), and added examples with two and three squares. The resulting
“Square World” consists of 38 interactive examples of which only 14 are presented here. The
rest can be found in the appendix.
The examples can be seen as abstractions of a certain class of interactive software, i.e.
software with a user-friendly graphical user interface. This software creates what Laurel
(1991) calls first-personness. It has at least the following properties: perceived stability,
deterministic behavior, modelessness, and immediate feedback.
The squares can be either white or black. Changes in color always come as an immediate
response to a user's action. The two kinds of user actions detected are "mouse button pressed
in square X" and "mouse button released in square X". All examples are deterministic.
All examples can be described formally as an initial state and a set of production rules.
States can be represented as N-ary vectors of color values, where N is the number of squares
in the artifact. The production rules are triples: . Both
the pre-conditions and the responses are represented as states, and the user actions are pairs
. The legal action types are "press" and "release", and the
squares are numbered left to right.
An N-square examples can also be described as a Finite State Automaton (FSA) on the
alphabet {click-1, release-1,,, click-N, release-N}. The states are labeled from the 2^N
possible N-ary binary vectors. This simple formalism gives rise to an unexpected high number
of possible artifacts.
The complexity of the 38 examples range from one-state FSAs with no transitions to an
8-state FSA with 24 transitions. The 14 examples shown in this chapter are presented in an
increasing order of complexity, starting with all the examples with only one square, then some
two-square examples.
The examples are described here with State-Transition Diagrams (STDs). Their initial
state is indicated with a double circle. The examples are presented in this chapter with as little
Part II, Experiments 129
interpretation as possible. This is done intentionally to avoid creating the impression that there
is such a thing as a “correct” interpretation of an interactive behavior.
It allows the reader to make his/her own interpretation of the examples. Seeing the STD
of an interactive artifact is very different from actually experiencing its interactive behavior,
but it might be the closest we get in a static medium. One way to “experience” a STD is to
start in its initial state and follow its arrows from one state to the next without too much focus
on the totality of the diagram. This reading process resembles the interactive process.
Single-square examples
Press
Release
Figure 30. FSA #1.
The first example is the interactive square from Chapter 4. It starts out white. When you press
the mouse button while the mouse cursor is on it, it goes black. When you release the mouse
button, it goes white.
Press
Figure 31. FSA #2
The second example goes black when you press on it.
Release
Figure 32. FSA #3
The third example goes black when you release the mouse button on it.
Press
Release
Figure 33. FSA #4.
130 Understanding Interactivity
Example #4 starts out black. When you press on it, it goes white. When you release the mouse
button, it goes black.
Press
Press
Figure 34. FSA #5.
Example #5 starts out white. When you press on it, it goes black. When you press a second
time, it goes white.
Release
Press
Figure 35. FSA #6.
Example #6 starts out white. When you release the mouse button on it, it goes black. When
you press, it goes white.
Release
Press
Press
Figure 36. FSA #6 with an implicit transition added.
FSA #6 is a bit peculiar. It has only two transitions, but because a release always requires a
press, the implicit no-change transition from white to white on a press should be added to the
STD for clarity. Example #3 also has this property, but because its behavior is less complex, it
does not in the same way complicate the understanding of its behavior.
Figure 37. FSA #7.
Example #7 is black, and has no behavior.
Part II, Experiments 131
Figure 38. FSA #8.
Example #8 is white, and has no behavior.
Some double-square examples
Press any
Release any
Figure 39. FSA #9.
The first double-square example starts out with both squares white. Pressing in any of the two
squares makes both go black. Releasing in any makes both go white.
Press left
Release left
Press right
Release right
Figure 40. FSA #10.
Example #10 starts out with both squares white. Pressing in the left square makes the left
square go black. Releasing it makes it white. Pressing in the right square makes the right
square go black. Releasing it makes it white.
Press
left Press
right
Figure 41. FSA #11.
Example #11 starts out black-white. Pressing in the left square makes both squares change
color. Pressing in the right square in this state again makes both squares change color.
132 Understanding Interactivity
Press right
Release right
Press left
Press left
Figure 42. FSA #12.
Example #12 starts out white-white. Pressing in the left square makes that square black.
Pressing again makes it white. Pressing in the right square when the left is black, makes that
square black. Releasing it makes the right square white.
Press right
Press left
Press left
Figure 43. FSA #13.
Example #13 starts out white-white. Pressing in the left square makes the right square go
black. Now pressing in the right square makes both squares change colors. Then pressing in
the left square makes both squares change color.
Press any
Press any
Figure 44. FSA #14.
Example #14 starts out black-white. Pressing in any square makes both squares change color.
Pressing now makes both squares change color again.
Part II, Experiments 133
Chapter 7 Exp. A: Metaphors We Interact By
“"It is easy to do, but hard to explain",
One of the participants of the experiment.
The qualitative research methodology used in the experimental study of Mental Models
(Gentner et al., 1983) can be summarized:
1. Identify the domain of knowledge to be studied.
2. Build a number of stimuli that exemplify the domain.
3. Expose your test subjects to these stimuli and record what they say and do.
4. Use the verbal protocols to build a theory of how the domain is
intuitively conceptualized.
Following this scheme, an empirical study of interactivity should start out by identifying the
domain to be studied. Next, a number of stimuli should be designed that exemplify the
domain. These stimuli should then be “tested” on “users”, and from how the stimuli are
described, we should be able to get empirically based insights about the interactive
experience.
Scope
The scope of this experiment (and the two following) is constrained by the nature of the
stimuli, i.e. Square World. The Finite State Automata (FSAs) used in Square World capture
some basic properties of modern interactive software, while they leave others out. As none of
the examples have hidden states, this aspect is not explored. Others areas not covered include
analog input as in scrollbars, the use of sound, the use of animation, the use of color, the use
of voice input/output, and communication with other machines and people.
All these added dimensions open up new and interesting design possibilities and require
detailed studies to unveil their “psychology”. The simplicity and generality of the FSA makes
it an interesting starting point for an investigation of the “psychology” of the interactive
computer.
134 Understanding Interactivity
7.1 Method
Experiment design
Experiment A follows the outlined four-step research methodology. The test subjects are
asked to explore all 38 examples of Square World while describing what they are doing
(think-aloud protocol). All tests are taped, and the mouse actions are annotated to the
transcripts.
The verbal protocols are then analyzed in search of implicit metaphors. Lakoff and
Johnson’s “Metaphor We Live By” (1980) was the inspiration for this approach. In addition to
metaphor analysis, the non-verbal data are analyzed to gain insight into the processes at the
micro-level of interaction.
The resources needed to analyze qualitative data has forced me to keep the number of
subjects very low (N=7). The results from the experiment are consequently not statistically
significant. What results from the experiment are consequently not general design rules that
can be applied directly to the design of interactive software. It is a catalog of some of the
perceptual, cultural, and cognitive resources that are potentially available to a user of
interactive software.
Despite the qualitative nature of the material and the low number of subjects, numbers
have been added to give an indication of whether a certain way of describing a phenomenon
was rare or common in the experiment. The numbers should not be taken as anything but
indications of possible trends.
Subjects
The subjects were seven undergraduates, age 17-19, five male and two female. Their
experience with graphical user interfaces ranged from "almost nothing" to "a bit". They were
paid to participate.
Equipment
The examples were presented to the subjects on a Macintosh IIci personal computer running
SMALLTALK/V.
Stimuli: Square World
The 38 interactive examples of Square World, as described in chapter 5.3 and the appendix,
were implemented and presented to the subjects as shown in Figure 45. The subjects were
offered an additional "repeat" button that enabled them to bring the examples back to their
initial state whenever they liked.
Part II, Experiments 135
Next
No.: 10
Repeat
Figure 45. The examples as they were presented to the subjects.
Figure 45 shows a snapshot from the screen with a two-square FSA. The “next” button was
used to move on to the next example.
Procedure
The subjects were instructed to freely describe the examples as they explored them. They
were explicitly told that the experiment was set up to learn about how people experience
interacting with computers; not to learn about how good that individual subject was at
reaching some answer. The subjects ran the test individually.
Apple's Usability Testing Guideline (Gomel 1990, also reprinted in Tognazzini 1992,
pp. 79-89) was followed rigorously when setting up the experiment. This included instructing
the subjects on how to think-aloud, and explaining to them beforehand the purpose of all the
equipment in the room. The participants were told that if for any reason they felt
uncomfortable, they had the right to stop the experiment at any time without loosing their pay.
The 38 examples were presented to the subjects one at a time, and the subjects were given
control of the progress of the experiment. As seen in Figure 45, a "next" button was put on
the screen for this purpose. Care was taken not to induce in the subjects any preconceptions
about the nature of the artifacts they were about to explore. The experimenter referred to the artifact
by pointing to "that there" on the screen. The experimenter sat by the side of the subjects
throughout the session, and was available for technical assistance.
The sessions were videotaped. The camera was pointed at the screen, and the subjects
were aware that they themselves were never filmed. An external directional microphone was
used to increase audio quality.
In addition, all interactions were logged with time stamps that made it possible to
reproduce the interaction in detail. The resulting four hours of taped material was transcribed.
The protocols consist of the verbal accounts together with the corresponding actions in
brackets.
136 Understanding Interactivity
7.2 Three Cases in Great Detail
With 7 participants and 38 examples, we get 266 cases altogether. Before any analyses of the
results are presented, three of these cases are shown in detail to give the reader a feel of the
material.
A one-square FSA
0
Mouse button:
Verbal
accounts:
5 10 1 5 2 0 2 5
"Colors the field when I
press down the button"
State:
Figure 46. An interaction with FSA #1.
Example#1 consisted of one square that was initially white, and went black when you
pressed the mouse button in it. When you released the button, it went white again. Figure 30,
p.129, shows its STD. Figure 46 shows one subject's interaction with this example. The
numbers along the X-axis show seconds since the interaction started. The top line shows the
state of the mouse button, the middle line shows the state of the FSA, and at the bottom is
presented what the subject said.
We see that he (a male subject) first produced a very short click. This resulted in a
"blinking" of the square. In the subsequent clicks, he held down the mouse button for a certain
period. From the mouse interaction and the verbal protocol I conclude that at a certain point
(at around t=10 secs.) he understood that there was a causal relation between his holding
down the mouse button and the change of color in the square. The last mouse click was made
as an illustration to the experimenter of his description.
The subject described this FSA as:
"[It] colors the field when I press down the button"
Part II, Experiments 137
From this I draw the following conclusions:
1. The subject saw a causal relation between his actions and the observed response on the
screen.
2. He contributed the response to some mediator ("it"), which was not the square itself.
3. The square itself was perceived as a passive entity that was colored. The source domain
of the coloring metaphor could be a real-world canvas, or maybe the "canvas" of a
paint program.
A two-square FSA
0 5 1 0 1 5 2 0 2 5 3 0 3 5
Mouse:
Verbal
accounts:
"The black field is moved when I am
in the field and press the button"
2 2 1 2 1 2 2 1 2 1 2
1
2
Figure 47. The same subject's interaction with FSA #11.
Figure 47 shows the same subject's interaction with the 11th example of Square World (see
Figure 41, p.131, for its STD). This example consisted of two squares, the left initially black
and the right initially white. When you pressed the mouse button on a black square, the colors
swapped and it appeared as if the black square was moving. The notation used is the same as
in Figure 46, except that the position of the mouse cursor is indicated with a "1" for the left
square and a "2" for the right square.
We see that the subject first clicked on the white square three times. Nothing happened
and he went on to click on the left square. He held the mouse button down for approx. 1 sec.
(at t≈5). This can be seen as way of testing if there was a difference between "press" and
"release". This click lead to a swapping of colors. In response, he immediately moved the
mouse to the right square, which had now become black, and clicked on this one. He then
continued moving the black square around by clicking on it. Only once (at t≈16) did he again
try clicking on a white square.
From the log, it seems as if the example was "understood" already the first time he
observed the response to clicking on a black square (at t≈5). The subsequent interactions can
be seen as an attempt to verify his immediate hypothesis concerning its behavior.
138 Understanding Interactivity
He described the FSA as:
"The black field is moved when I am in the field and press the button".
From this description, I conclude the following about his interactive experience:
1. As with example #1, he saw a causal relation between his actions and the behavior on
the screen.
2. As with example #1, the response was contributed to some mediator existing between
the subject and the squares. ("The black field is moved").
3. The swapping of colors lead to spatialization ("The black field is moved"). The example
was described as if it consisted of a movable black square on a white background. The
source domain of this spatial metaphor is probably the domain of everyday physical
objects.
4. The cursor was experienced as being part of the extended body of the subject ("..when I
am in the field.."). The subject described the situation metaphorically as if he had a
position on the screen. This spatialization described the relation between mouse
movements and the position of the mouse cursor, and had nothing to do with the
spatialization caused by the swapping of the square colors.
A three-square FSA
Example #23 was the first presented to the subjects consisting of three squares. Its State
Transition Diagram can be seen in Figure 48.
Initial state
Press right
Press right
Press
left
Press
middle
Press right
Press right
Figure 48. The State Transition Diagram for FSA #23.
It starts out with the color combination white-black-white. It has four states. The color of the
rightmost square can be changed by clicking on it. It has "toggle behavior". When the
rightmost square is black, it is possible to swap color between the leftmost and the middle
square by clicking on the one being white.
Figure 49 shows one subject 's (female) interaction with this FSA. The notation is the
same as in the previous figure. The capital letters refer to the verbal protocol.
Part II, Experiments 139
0 2 0 4 0 6 0 8 0 100
32 1 13123121 23 1 3 1 2 1 3 32 3 1 2 3 2 1 23
3 3
1 2
3 3
AB CD E F G H J
A. ...Oh. Now there are three [squares] ...
B. ...I try this one [square 3] ...
C. ...It [square 3] got black...
D. ...Nothing happened to this one [square 2] ...
E. ...Then the one in the middle [square 2] . lost its colour...
F. ...Now it [square 1?] did not move when it [square 3] was not black...
G. ...The one to the right has to be black for the one to the left to be black...
H. ...but it is not possible to make all three black...
I. ...goes white [square 3] ...
J. ... I am not able to move square 1]
K. ...ok, you have to begin with the one to the right to be able to do anything.
Then you can move that one [square 1] ... And turn the one to the right on
and off...
I K
Figure 49. The interaction with FSA #23.
It took her about 100 secs. to explore the example. I will present here her exploration together
with a possible interpretation of the learning process.
(A) ...Oh. Now there are three [squares]...
140 Understanding Interactivity
This example was the first one presented with three squares. After having struggled with twosquare
examples for quite some time, a three-square example seem to have come as a surprise.
(B) ...I try this one [square 3]...
(C) ...It [square 3] got black...
By accident she went right to the only square begin "input sensitive" at that time. She clicked
on it, and observed that it changed color.
(D) ...Nothing happened to this one...
She then tried clicking on square 2 (middle square) and observed that nothing happened.
(E) ...Then the one in the middle [square 2] lost its color...
She had just prior to this clicked on square 1 (leftmost). The leftmost and the middle square
had then swapped color. From her comments it is only possible to conclude that she observed
that the one in the middle changed color.
(F). ...Now it [square 1] did not move when it [square 3] was not black...
The verb "move" is the first indication of spatialization. She had just "moved" the squares
around a couple of times (t=25 to 35), and had made the rightmost square white again. She
then tried "moving" again by clicking on the leftmost square and observed that this was not
possible.
(G). ...The one to the right has to be black for the one to the left to be black...
She had now made the leftmost square black by first clicking on the rightmost and then on the
leftmost. It seems like this was a goal she set herself; to make the leftmost square black.
(H). ...but it is not possible to make all three black...
This quote confirms that she had set herself the goal of making all squares black. She first
made the rightmost black, then the leftmost, but observed that clicking on the middle square
made the leftmost square lose its color again. From this she concludes that it is not possible to
reach her goal.
(I). ...goes white [square 3]...
This is just a comment on the fact that the rightmost square goes white when clicked on.
Together with what comes (J and K) it might be an indication that she had now observed that
the behavior of the rightmost square was independent of the rest of the squares.
Part II, Experiments 141
(J). ... I am not able to move [square 1]...
She has just tried clicking both on the leftmost and on the middle square while the rightmost
is white. As she has already observed (t=40) and commented on (F), a white square to the
right inhibits the two other squares.
(K). ...ok, you have to begin with the one to the right to be able to do
anything....then you can move that one [square 1]... and turn the one to the right on
and off...
The "ok" can be interpreted as "now I understand how it works". She went on describing the
behavior of the example. From this description, I conclude the following:
• As the other subject did with the one-square and the two-square examples, she saw a
causal and deterministic behavior between her actions and the observed behavior.
• Concerning agency of behavior, the quotes taken together show an interesting pattern.
She first described the behavior as something happening to the squares, without
explicitly referring to any mediator. In the last two quotes (I,K), the passive voice (“it
happens”) is changed to a more active voice ("...you can move that one..."). The
behavior is now seen as actions the user can do to the squares.
− In Merleau-Ponty’s terms, one could say that the example had been understood by
the body and had become part of its repertoire of "habits". When the behavior was
understood, the example changed from being an object in the "objective space" to
becoming part of the "body space" of the user.
− From an activity-theory perspective, one could say that through the learning process
the interaction with the example was operationalised (automated). It then changed
from being actions to being operations. Also through the learning process happened a
change of the interpretation of the material conditions from just being three squares
to being a switch and a movable white square.
• She made use of two metaphors here: a switch metaphor (“on”/”off”) and a spatial
metaphor (“move”). The rightmost square was described as a switch that could be
turned on (black) or off (white) by clicking on it. The behavior of the leftmost and
middle squares (when the rightmost was "on" ) was described as a white square that
could be moved on a black background. Figure 50 shows an illustration of this
combined metaphor. To the left is shown how it actually looked on the screen. To the
right is illustrated my understanding of how it was conceptualized, (i.e. her "mental
model"). I have used the standard Macintosh notation of graying out inhibited userinterface
elements.
142 Understanding Interactivity
The FSA as it appeared
on the screen:
The subject's mental
model of the FSA:
Figure 50. A graphical representation of how example #23 was described.
• In quote K she did not explicitly say that the rightmost square was a switch that
controlled whether the white square could be moved or not. She only stated that you
"have to begin with the one to the right to be able to do anything". This is a statement
about the temporal ordering of the actions, and not about mechanisms in the FSA.
She has here introduced a time perspective, i.e. linear time, in her description of the
behavior of the example.
• From this it is an open question whether she saw the "switch" as something
controlling the behavior of the white square, or whether it was the action "click in the
rightmost square" that started the magic.
Part II, Experiments 143
7.3 Results
General Observations
All subjects got engaged in the task very quickly. On some occasions, they asked the
experimenter whether they were doing the right thing, but they never lost concentration.
With a total of 3 exception, all examples were explored and described by all subjects. The
subjects all controlled the progress of the experiment in a very natural way. It seemed as if at
a certain point in the exploration of an example it was completely "understood", and the
subject was ready for the next one.
All subjects developed a good skill at exploring the examples. In some cases, complex
examples were fully "grasped" in a few seconds. They often had a much harder time
describing them. One subject put it: "It is easy to do, but hard to explain".
CODING SCHEME AND METHOD OF ANALYSIS
I have extracted 929 atomic descriptions of interactive behavior from the verbal protocols. An
atomic description is a clause containing at least one verb. This material was analyzed in the
following manner:
• The clauses are classified according to whether they used transitive or intransitive
verbs. Examples: "I turn it off" (transitive) vs. "It is a switch" (intransitive).
• For the cases where a transitive verb was used, the different loci of agency are
identified. Examples: "I turn it off" (the user is the agent), vs. "It turned itself off"
(the square is the agent) vs. "It was turned off" (passive voice, computer as implicit
agent).
• For the cases where a transitive verb was used, the kinds of references are classified.
• An analysis is done of how the modal verb "can" was used.
• The clauses are classified according to the implicit metaphors in use. This reveals 9
distinct metaphors in the material.
• There is an interesting correlation in the material between loci of agency and use of
metaphors. This is reported.
• The uses of negative descriptions like in: "It does not turn itself off when I click on
it" has motivated an analysis of the role of the interaction history as context.
144 Understanding Interactivity
THE USE OF VERBS
Transitive vs. Intransitive verbs
The 929 clauses can be split into two main categories concerning the use of transitive vs.
intransitive verbs.
• In 864 cases (93%), the behavior was described directly using a transitive verb as in:
"It is painted black", or "It moves to the left".
• In 65 cases (7%), the behavior was described indirectly by stating something about
the example using an intransitive verb as in: "It is a switch”, or "It is like the
previous one".
Transitive verbs: Agency of the behavior
As exemplified by the three cases, the subjects put the agency of the interactive behavior in
different places during the experiments. In the 65 cases in which the behavior is described
indirectly with reference to named mechanisms or previous interactions, it is hard to
determine where the implicit locus of agency should be put.
In the 864 clauses describing the behavior directly using transitive verbs, five distinct loci
of agency can be identified. They are here listed in decreasing order:
1. In 442 cases (48% of total), the passive voice is used as in: "When I press on it, it
gets black". The implicit agent might be "the computer", "the program", or
something similar, but we can not tell from the material. We see here an event/action
structure.
2. In 261 cases (28% of total), the agency is placed in the subject like in: "I move the
square".
3. In 145 cases (16% of total) the agency is placed in a square like in: "Then that square
turned itself off".
4. In 13 cases (1.5% of total) the agency is placed in the example as in: "Then the
whole thing turned itself off".
5. In three cases (0.3% of total) the agency is placed explicitly in some agent outside
the squares as in: "...then he paints it black".
Part II, Experiments 145
Intransitive verbs: References to named mechanisms (14 cases).
In 14 cases, the user described the example with a name:
• In eight cases, a square with toggle behavior was describes as an on/off switch.
• In two cases, a square was described as something that did not work, without saying
what this thing should have been.
• In one case, a square that controls the behavior of another square was described as an
"inverter button".
• One of the three-square examples consisted of three independent toggles. This
example was in one case described with reference to pixel-level editors in GUI
environments: "..as when you create your own screen..".
• One example was described as "..something that opens up..".
• In one case a square was described as "..it functions as something that turns on the
two others.."
Modal verbs
A total of 130 of the descriptions of interactive behavior began with "I can" or "I can not".
Modal verbs and conditionals were in many cases combined as in: "When it is black, then I
can blink with the other one".
• Of the 929 phrases analyzed, 90 cases (9.7%) described behavior as potential for
interaction, e.g.: "I can move it by clicking to the left".
• In 40 cases (4.3%) the behavior was described as inhibition, e.g.: "I can not move the
left one when the right one is on".
The locus of agency for the above cases was distributed:
• Potential for action (tot. 90):
− In 88 cases, the locus was by the subject as in: "I can make it white by clicking on
it".
− In one case, the modal verb referred to a square: "It can not move when the other one
is white".
− In one case, the modal verb was used together with a passive voice as in: "It is
possible to move it".
• Inhibition (tot. 40):
− In 27 cases, the locus was by the user as in: "I can not move it".
− In five cases, the locus was in a square as in: "It can not move".
− In eight cases, it was in a passive voice as in: "It is not possible to move".
146 Understanding Interactivity
THE METAPHORS
The search for the implicit metaphors in the linguistic material was done within the analytical
framework developed by Lakoff et al. (1980).
Many the descriptions contained composite metaphors as exemplified above in the threesquare
case. Next follows a description of the nine basic metaphors identified. The total
number of clauses making use of metaphor was 759. The percentages in parenthesis for each
metaphor refer to the total number of metaphor-clauses.
Colored objects (366 cases, 48.2%)
It was common to describe the examples as colored squares, as in the description of the
example in Figure 51: “It is white. It gets black when you click on it”. The “it” is an object
that has a color.
Press Release
A white field
A black field
Figure 51. The first example seen as a colored field
On 27 occasions (of the 366), a rapid change of color was described as blinking. The "it" that
was blinking was the squares as black and white squares. For some of the subjects it took
some time to realize that they had control over the "blinking". They "clicked" on the squares,
i.e. rapid press + release, without understanding that you could get a different effect by hold
down the mouse button.
The source domain of this "colored objects" metaphor is the physical world of objects. It
might not be correct to categorize these cases as "real world metaphor". It is common sense to
see black squares on a computer screen as no less real than black squares on a real
chessboard. By listing these cases among the metaphors, we do not make any distinction
between "reality" and "metaphor". This is in accordance with Lakoff & Johnson's concept of
implicit metaphors.
Part II, Experiments 147
The paint metaphor (62 cases, 8.2% )
Press Release
Paint Remove
Figure 52. Example #1 seen as a background to be painted
One subject described the example in Figure 52: “..an area that gets colored..”. When the
square got white again (to use the previous metaphor), it was described as “..it is removed..”.
The square is seen as having a background color (white) that is painted with another color
(black). In all 62 occurrences of this metaphor, white was seen as background and black as
paint.
On one occasion, a change of pattern from to was described as paint floating
out from the middle object.
In Figure 52 the paint metaphor is illustrated as if the source domain was pixel-based
drawing programs. As discussed in the description of the single-square case, the source
domain could as well be real-world painting.
The switch metaphor (72 cases, 9.5%)
Press Press Turn
on
Turn
off
Figure 53. An example seen as a switch
The switch metaphor was used by the subjects only to describe squares that had toggle
behavior. The subjects typically said: “it is on”, “it is off”, “I turn it on”, “I turn it off”, “It
gets turned off”, and “It was on”. On all 72 occasions, black was interpreted as 'on' and white
148 Understanding Interactivity
as 'off'. Physical switches that are an integral part of life in the 20th century, and constitute the
source domain of this metaphor.
In a majority of the cases (61 of 72), the functionality of the switch was seen as
something under user control.
A spatial metaphor: objects moving in a 2D space (191 cases, 25.2%)
Press
left Press
right
Move
Move
Figure 54. An example giving rise to a spatial metaphor
As shown in Figure 54, the example used in the two-square example starts out with one black
and one white square. The colors “swap” back and forth when the user clicks on the black
square. This kind of interactive behavior was often described as a 2-D object moving. One
subjects put it: “...It jumps back and forth when you click on it...”. This metaphor was only
applied when two squares of a different color changed color simultaneously. One color
becomes the object color, and the other becomes the background color. The source domain of
this metaphor is the world of movable physical objects.
On 7 of the 191 occasions, a change of pattern from ( ) to ( ) was described as a
splitting of the middle object.
It is locked (14 cases, 1.8%)
The two-square example shown in Figure 55 starts out all white. Both the left and the right
square behave as switches, but if both are turned on (black), nothing more can be done. One
subject (male) described this deadlock situation as "..then it is locked..".
Part II, Experiments 149
Press
left
Press
right
Press
right
Press
left
Locked
Figure 55. A FSA where one state was described as "locked".
The source domain of this metaphor is a physical object that is locked; an object that you can
no longer move or interact with.
The square is dead (4 cases, 0.5%)
In four cases, the inability to make changes to a square was described as "It is dead". In some
cases, this was used to describe squares that were seen as switches.
State-space metaphor #1: The subject moving in a state space (5 cases, 0.6%)
Press
left
Press
right
Press middle "... I press the right..
..and again go back to
the starting point..."
"... I press the left..
..and go back to the
starting point..."
Figure 56. A three-square example described as a state space.
The three-square example in Figure 56 has only two states, the initial one at the top. One
subject (male) described his exploration of this example as if he was moving in a state space.
He first pressed on the middle square and described this as: "..press on the middle one... ...and
invert it...". This brought him to the second state. He then pressed on the left square, and as
shown in the figure described this as going back to the starting point.
150 Understanding Interactivity
Press
right
Press
left "...[I] press on the white ...
...then both get black..."
"...and then it seems like you
are not able to get back ..."
Figure 57. A two-square example described as a state space.
The same subject also described the behavior of the two-square example in Figure 57 in statespace
terms. This example consisted of two states, starting out black-white. No matter where
you pressed, it went all black and you were not able to "go back" to the initial state. This was
described in negative terms as: "..you are not able to get back...".
In both examples, the states of the example constituted a space in which the subject was
moving. The source domain of this metaphor is the physical world with recognizable "places"
that the subject can move between, each state signifying a distinct place. The squares form
patterns that become the "signature" of each place.
State-space metaphor #2: The example is moving in a state space (10 cases, 1.3%)
Press
right
Press
left
"...If I press here, will both go white then...
...no, it is not possible to bring it back to
the starting point that both are white..."
Press
left
Figure 58. An example described as moving in a state space
The example in Figure 58 starts out with both squares being white. When you click on the left
square, you enter into a situation where you can move a black square around by clicking on it.
You never get back to the initial state. One subject (female) tried it out. As shown in the
figure, she described the lack of a return transition as ”..it is not possible to bring it back to the
starting point that both are white...". She here clearly states that the starting point is "that both
are white". The "it" can not be anything but the example itself.
Part II, Experiments 151
In this metaphor, the potential states of the artifact constitute a space in which the artifact
itself is moving. For the artifact to go back means to change back to an earlier state. To stop
means to continue being in a state.
It is interesting to note that on one occasion it was not possible to decide whether “..it is
not moving...” should be interpreted as object movement (the spatial metaphor) or change of
state (this metaphor). That case was classified as neither.
Temporal metaphor #1: The example is moving in linear time (11 cases, 1.4%)
The 'deadlock' situation in Figure 59 was described by one subject (female) as: “..but it
continues being black...”. This can be interpreted as the example moving forward in a linear
"time space". The source domain of this metaphor is our cultural notion of time as a space we
move through as in: "as we go further into the 90s".
As pointed out, Lakoff & Johnson (1980, p. 41-44) listed this metaphor as one of the two
ways time is conceptualized in English. The other one being TIME-IS-A-MOVING-OBJECT
as in: "The time for action has arrived".
Press
"...This one also
gets black.. ."
"...but it continues
being black..."
Time
Figure 59. A example where behavior was described in temporal terms.
To be precise, the metaphor used by the subject is of an object moving forward in time. This
is the same implicit metaphor as in the concept “time capsule”, i.e. to view time as a space we
can “shoot” objects into.
Temporal metaphor #2: The temporal ordering of user actions (24 cases, 3.1%)
The two-square example shown in Figure 60 starts out all white. If you click on the right
square, this square gets colored and you can do nothing more. It you instead click on the left
square, this square can be turned on and off. Again, if you click on the right square, its gets
colored and you are stuck.
152 Understanding Interactivity
Press
left
Press
right
Press
right
Time
Press
right
Press
left
"left after right"
Press
right
Press
left
"left before right"
Time
Figure 60. A two-square example with two temporal relations.
One subject (male) described this example:
"When the right field gets colored, then the left one is not colored afterwards, but if I
do the left one first, then I am able to color the right one".
We see that he described the behavior in terms of the temporal ordering of his actions.
Different ordering gave different results. There is an implicit time concept here, which is the
same as in the previous metaphor. The only difference is that it is the user actions and the
resulting changes that are described, and not the state of the example.
Correlation between metaphor and agency
The correlation between agency and metaphor shows some interesting patters. Table 4 lists
these numbers.
Agency
Metaphor
Passive Computer FSA Square User Total:
Color 273 (162%) 0 2 30 (45%) 61 (49%) 366
Paint 28 (130%) 2 0 1 (9%) 31 (147%) 62
Switch 3 (9%) 1 2 5 (38%) 61 (249%) 72
Spatial 23 (26%) 0 0 89 (260%) 79 (122%) 191
Lock/Dead 3 0 4 6 5 18
State #1 0 0 0 0 5 5
State #2 0 0 2 3 5 10
Temp. #1 4 0 2 4 1 11
Temp .#2 14 (127%) 0 0 1 9 24
Total: 348 3 12 139 257 759
Table 4. The correlation between agency and metaphor
Part II, Experiments 153
The numbers in parenthesis show the deviations from an expected equal distribution of
agency (100%). The numbers express how a certain agency for a certain metaphor differs from what
would have been expected if the agency had been distributed in the same way for all
metaphors. No deviations are calculated for numbers less than 10.
As can be seen from Table 4, there is an interesting correlation between metaphor and
agency in some of the cases:
• In 62% more cases than expected, the color metaphor was described in the passive
voice as in: "it gets black".
• In only 9% of what was expected, the paint metaphor was seen as initiated from a
square. This means, it was either seen as “just happening” (30% more than expected)
or initiated by the user (47% more than expected).
• The switch metaphor was in 149% more cases than expected seen as controlled
directly by the user as in: "I turn it off".
• The spatial metaphor was in 160% more cases than expected described as if the
agency was in the squares themselves as in: "It moves to the left".
Another way of analyzing the data is by seeing the locus of agency as a continuum going from
the less “engaged” (passive voice), through placing the agency in the computer, FSA, or
square, to “full engagement” when the agency is experienced as being by the user. If we give
numbers 1 to 5 for these five levels of engagement, we can calculate the mean locus of agency
for each metaphor. The resulting mean values are different from the corresponding medians.
The fact that a certain metaphor gets its mean locus of agency at a certain point on that axis
does consequently not reflect a strong correlation between that metaphor and the
corresponding locus. The values give an indication for each metaphor of the mean level of
engagement on a scale from 1 to 5 found for this metaphor. I have combined the two stateand
temporal metaphors in Table 5.
1 2 3 4 5
Color --------------> (1.9)
Paint --------------------------------> (3.1)
Switch -----------------------------------------------------------> (4.7)
Spatial ---------------------------------------------------> (4.1)
Lock/Dead -------------------------------------------> (3.6)
State --------------------------------------------------------> (4.5)
Temporal ----------------------------> (2.7)
Table 5. The mean level of engagement for each metaphor.
We see that the “color” metaphor is special in that its mean level of engagement is relatively
low. The highest mean levels of engagement are found for the switch and state metaphors.
154 Understanding Interactivity
BETWEEN-EXAMPLE EFFECTS
Intransitive verbs: Explicit references to previous examples (48 cases)
All subjects showed an ability to describe artifacts with respect to other artifacts in one way or
another. The 48 explicit references to previous examples or another part of the current
example can be classified as follows:
• On 13 occasions, the subject described an example as being identical to some previous
example as in: "It is like the previous one".
• On 12 occasions, a square within an example was described as identical to another
square in the same example.
• On two occasions, an example was described as a modification of another example.
• On 15 occasions, a square was seen as a modification of some other square.
• On a total of six occasions, an example was described as the negation of another
example:
− On four occasions they described a negation of color.
− On one occasion a negation was described with respect to press/release behavior.
− On one occasion a negation of the temporal ordering of event/actions was described.
Context: Implicit references to previous examples.
In many cases, it was not possible to interpret the meaning of a clause without knowing the
interaction history up to that point. In those cases, the interaction history constituted a context.
Press
#1: Release
Initial state
Press
#2:
Initial state
#3:
Initial state
Release
Figure 61. The State Transition Diagrams for examples #1, #2, and #3.
Figure 61 shows the State Transition Diagrams for the three first examples. One subject
described example #3:
Part II, Experiments 155
"It does not color the field when I press, but when I release".
The meaning of this description can not be captured fully without knowing the context of the
subject at that point. The meaning of the negatively phrased part of this utterance gets clear
only when we know that the previous example changes from white to black on press.
In some cases, the description of an example sheds light on the subject’s understanding of
some previous examples. This phenomenon can be seen in the same subject's description of
the two first examples:
Example #1 was described:
"[I] color the field when I press down the button".
Example #2 was then described:
"[I] color the field when I press the button, but it is not erased when I release the
button".
Example #1 has a push-button behavior. A complete description of this behavior using the
paint metaphor would have been to say that it was colored when pressed on and erased when
the mouse button was released. The subject did not include the “erase on release” in his
description of Example #1, but included it implicitly in his description of Example #2. The
description of Example #2 consequently contains negatively stated details about Example #1
that wasnot expressed in the description of that example.
Taken together, these two phenomena made it necessary to read the transcripts for a
subject more than once to get the intended meaning. In philosophical terms, one could say that
the material necessitates a hermeneutic reading, as an understanding of the parts required an
understanding of the whole, and understanding the whole required understanding the parts.
THE DIVERSITY OF THE DESCRIPTIONS
To illustrate the diversity of the descriptions, I have chosen a three-square example that was
described differently by all 7 subjects. The example starts out . As long
as you hold down the mouse button in the middle square, it changes to .
The STD is shown in Figure 62.
Release middle
Press middle
Figure 62. The STD for a example.
156 Understanding Interactivity
Subject I
The first subject described the behavior as an inversion of the whole example:
"..It seems like it gets inverted when I .... hold the button down in the middle...
Inverter button in the middle.."
The middle square was seen as a button that makes this happen. This is an example of the
colored-object metaphor.
Subject II
The second subject described it first as three dices, then as black color spreading to the sides,
and last as "an inversion function":
"Three dices.. hold down the mouse button and it looks as if the black spreads out
both to left and right... may be an inversion function.."
This is first the paint metaphor and then the colored-object metaphor. The subject used "dice"
to describe the squares in many cases. The "dice" metaphor was not evolved further.
Subject III
The third subject described the example as a black "something" that could be split in two:
"You can split it and get two.. Yes, I split the black one.."
This is a variation of the 2D spatial metaphor.
Subject IV
The fourth subject described it as blinking on the sides and the one in the middle
disappearing:
"I try the one in the middle... It blinks on the sides.. the one in the middle disappears
at the same time..."
Part II, Experiments 157
Subject V
The fifth subject first describes it as something that can be colored, then as color inversion:
"Here I can color them.. the areas change color when I hold the button down.."
This is first an example of the paint metaphor, followed by the colored-object metaphor.
Subject VI
Subject six described the behavior purely as color inversion:
"The one in the middle inverts it, so to say"
This is an example of the colored-object metaphor.
Subject VII
The last subject first described it as something moving, then as change of color, and last as
something that opens up:
"When I press on the black one, the black moves.. it gets white and the two others get
black... as something that opens up.."
We see here the 2D spatial metaphor, the colored-object metaphor, and an innovative "real
world" metaphor (a door).
7.4 Discussion
Agency
Referring back to the theories of HCI described in Part I, the phenomena related to placement
of agency can be understood through W&F's Heidegger, Laurel, Activity Theory, and
Merleau-Ponty. Neither the Model Human Processor, the cognitive psychology of Norman,
Semiotics, or Ethnomethodology can inform us on this.
The agency was put in 5 different places:
1. Passive voice. (It just happens)
2. In the computer. (The computer makes it happen)
3. In the example. (The example changes itself)
4. In a square. (A square makes changes to itself, or moves)
5. In the user. (Direct manipulation)
158 Understanding Interactivity
From the perspective of Laurel, the change of voice from passive to active would be a good
example of the "firstpersonness" resulting from "engagement". The user spontaneously
projects a "mimetic illusion" into the computer.
The analysis of correlation between metaphor and locus of agency showed that all
metaphors except the “color” metaphor correlate with a high level of engagement. This is a
strong indication that the “color” metaphor is not a “true” metaphor. There are no illusions being
created in the “color” metaphor. Seeing interactive squares as switches involve illusion, while
simply seeing black and white squares on a screen is not an illusion. The fact that there is a
strong correlation between the use of active voice and “true” metaphors illustrates Laurel’s
analogy with theatre: engagement appears when illusion is at work and the props and actors
become objects and characters of a play.
Laurel's theory only describes the difference between "illusion" and "not illusion". This
corresponds to cases 3, 4 & 5 vs. cases 1 & 2 in the above list. Her theory does not say
anything about the difference between placing the agency in the square/FSA or directly in the
user. Due to the passive role of the audience in traditional theatre, the theatre metaphor does
not fully capture this interactive aspect of human-computer interaction
In Winograd & Flores’ terminology, the change of locus of agency from the FSA/Square
to the user reflects a change of mode of interaction from treating the squares as PRESENT-ATHAND
to treating them as READY-AT-HAND. The squares go from being something you act on
to being something you act with or through.
An Heideggerian analysis (W&F) does not shed very much light on the change from
passive to active voice. The analysis starts when a certain interpretation of the situation is
given.
Taken together, Laurel and W&F complement each other to describe all loci of agency
from before the illusion is established, to the illusion is visually at work, until the user starts
acting within the illusion.
Activity theory provides an interesting addition to the two previous theories with their
concept of automation (Bødker: operationalisation). When a certain operation is learned, it is
operationalised and does no longer require conscious attention. Learning to see a square with
toggle behavior as a switch could be a good example of this phenomenon.
Merleau-Ponty's concept of body space provides a coherent theory that can be applied
here. As described in Chapter 3, this theory says that through being an experienced member of
a culture and its everyday life, our bodies learn to deal with our environment in a very natural
way. When I use a familiar object, the “skin” of my body space is where "the I" meets "the
world of objects". As the interaction in the experiment can be seen as a learning process, it
makes sense that the reach of the body space changes during the process.
Part II, Experiments 159
Square
"Direct
manipulation"
"...The whole thing turned itself on..."
"...He colours it..."
"...The square turned itself on.."
"...I turn it on.."
Passive
voice
"...It gets coloured..."
FSA
User Computer FSA Square
Computer
Body
space
Legend:
Figure 63. The body space for different loci of agency.
Figure 63 illustrates the body space for the different cases. In the two first cases the body
space is not in direct contact with the example or its squares. The user moves the cursor on the
screen, but this is the only extension of the body space that has been established. When the
agency is put in the example or its squares, this can be seen as if the body space extends until
it touches upon these. When in the last stage the interaction is described as direct
manipulation, the objects on the screen are integrated into the body space of the user. The
body has understood the example.
160 Understanding Interactivity
Spaces in the metaphors
One way of organizing a bulk of the metaphors is to have a look at the three different spaces
emerging from the material, i.e.
1) Physical space,
2) State space and
3) Linear time.
In all three spaces you can move either the user or the object. This is shown in Table 6.
User moves FSA/square moves
Physical space
(Cartesian 2D)
State space
(Like a maze)
Time as space
(Linear time)
"I go to the left
one and click"
"It moves to
the right"
"I go back.." "It goes back to
being all white"
"I do it the
opposite way"
"It goes on
being white"
Table 6. The three spaces.
This simplification of the more complex picture in Table 4 serves to focus on the spaces
emerging in the metaphors. A further elaboration on this is done in Chapter 10.
Composition
In addition to the “space” metaphors above come artifacts with named behavior, e.g. switches,
locks, and doors. This opens up for having a look at how these metaphors could be combined.
More complex interactive artifacts can be composed from simple interactive artifacts.
The composition can happen in 2D-space, as when an artifact is seen as composed of three
independent switches. The composition can also happen in state-space, as when one switch
determines whether two other switches should behave in synchrony or as separate switches. It
can also happen in time, as when the behavior of something changes over time.
This allows for composition of behavior through nested containment in 2D-space, statespace,
and time. Artifacts created through such composition can be given new names, and
used as building blocks for new composition.
Interaction Gestalts
The fact that all subjects spontaneously showed an ability to compare different examples
lends itself to some speculations. When the subjects compared two examples that only
Part II, Experiments 161
differed in behavior (like #1 and #2), it is an open question what was actually being
compared. The possibility of a purely linguistic comparison can be ruled out, at least for some
of the subjects, because in some cases they described aspects of a previous example for the
first time as differences from the one at hand. A purely linguistic comparison would require
that all aspects of the previous example had already been stated positively.
This leads to the assumption that what was being compared were the actual examples as they
were experienced in interaction. A possible term for the examples as perceived reality could
be Interaction Gestalts. For this assumption to hold, one must assume that the subjects
actually followed the instruction to think aloud, and that there is no such thing as
"unconscious verbalizations".
The point made here is similar to Michotte’s insistence that the perception of causality is
direct and immediate, and not an interpretation of “sense data”. It is also similar to J.J.
Gibson’s view concerning the perception of objects. The interaction gestalt view is in
opposition to the Human Information Processor model of Cognitive Psychology, as
interactions can not be first-class objects in a simple data-flow model. The discussion of the
Mr. Peters button made in Chapter 3 is of direct relevance here.
A note on language
All experiments were done in Norway with Norwegian participants. The quotes from the
experiments are all translations from the original Norwegian transcripts. The metaphors
emerging from the experiments are based on my interpretation of the original Norwegian
linguistic material. The translation of the material into English did not change those
metaphors. The reason for this is most probably that the two languages are very close at the
structural level. In the Viking era, just a 1000 years ago, they were almost identical also in
vocabulary and word ordering.
The fact that a translation of a quote to another language results in the same implicit
metaphor does not guarantee that a native speaker of that language would not have preferred
different words with a different implicit metaphor. The only way to resolve such questions is
by performing new studies.
My assumption is that, because of the closeness of the two languages, the results from the
study should apply also to users having English as their native language. A cross-cultural study of the
perception of interactive behavior could give interesting insight into how structural
differences at a deep linguistic level affect choice of metaphors.
162 Understanding Interactivity
Chapter 8 Exp. B: Designing Interactive Behavior
“A small matter of programming”.
Title of book by Bonnie A. Nardi on end-user-programming.13
From a computer science perspective, there is no point in building a theory of how a domain
is understood if this knowledge can not be applied to solve a problem of some kind. Moving
from a hermeneutic knowledge interest to a technical interest changes the question from
“How is this domain understood?” to “How do we make use of this understanding?” The
criterion of success changes from insight to applicability.
Does this mean that no insights can be learned from trying to apply the results from
Experiment A to the solution of a practical problem? The opposite is the case. Not only do
applications justify “pure” research; they can also give useful feedback and work as a
validation criteria for the original results.
One way of applying the results from a qualitative investigation of a domain is by using it
as input for the design of editors for this domain. An editor is in this context a program that
allows you to construct, modify, and test objects in that domain. There are at least two ways
in which knowledge of the user’s understanding of a domain can be made use of in the design
of editors:
• More design alternatives. Knowledge of the user’s perspective can give rise to design
ideas that would otherwise not have come up. Novelty does not guarantee quality, but in
the early phases of a design process it is always useful to have many alternatives to pick
from.
• Improved quality. By basing the design on a model of the users’ understanding, it
should be possible to match the editor to the terminology, concepts, and structure of
their world. One would assume that such an approach should lead to more “intuitive”
software than what would result from taking a traditional engineering approach to the
problem.
An example of a design process making use of qualitative data is described by Andersen and
Madsen (1988). Underlying their work is a belief in the fruitfulness of using the work
13 (Nardi, 1993)
Part II, Experiments 163
language of the users as a gate to their mental world. Bødker (1990) has coined this design
methodology "the linguistic approach". Andersen and Madsen made this explicit by applying
the metaphor analysis of Lakoff and Johnson (1980). Their example relates to a Danish
library. In their study, they used the results from a linguistic analysis of situated
communication among the librarians as input to the design of a database query language.
Unfortunately, they did no actual implementation or testing of the resulting system. I have
found no case studies in the literature exploring in detail the whole development process for a
project of this kind, from data collection to analysis, design, implementation, testing, and
evaluation.
8.1 Method
Experiment Design
Experiment B was designed to evaluate the applicability of the results from Experiment A on
the design of editors for expressing interactive behavior. The experiment can be seen as a
combined design-exercise and psychological experiment, where the criteria of success are both
novelty and quality-of-use.
As a baseline, four editors were designed based on available state-of-the-art techniques.
Four more editors were designed based exclusively on design ideas emerging directly from
the results of Experiment A. These two quadruples of editors were tested empirically in two
parallel studies to gain insight into their usability. The results from these two studies were
then compared. In the following, the two studies will be referred to as the state-of-the-art
study and the empirically-based study.
The two studies ran in parallel and followed the same procedure. The studies were not
interrelated in any way. The procedure can be described as a four-step process:
1. Run each subject individually through Experiment A as it is described in the previous
chapter. This is done to expose them to Square World.
2. Perform usability test of all four editors with all subjects (N x 4 tests).
3. Bring all subjects together as a design group, and let them as a team compare the four
editors, chose a favorite, suggest improvements, and bring in new ideas.
4. In between sessions, their resulting design is prototyped. It is then evaluated by the
group in a second “design meeting”. The group is again asked to suggest
improvements and bring in new ideas for their final design.
An alternative to splitting the subjects into two groups and running two separate studies as
was done here, could have been to bring all subjects together in a pool and expose all subjects
164 Understanding Interactivity
to all eight editors. My rationale for not doing it this way was the benefit I expected to get
from the chosen design concerning the iterative design process following the usability tests.
By not exposing the empirically-based group to the view on interactivity found in the
computer-science tradition, my purpose was to get a clearer insight into the naive theories
they brought into the study. Vise versa, I assumed that by not exposing the state-of-the-art
group to any of the design ideas that emerged from Experiment A, I would get a better picture
of how the computer-science view is received by “end-users”.
Subjects
For this experiment and Experiment C, I involved a class from a local high school (N=11, age
16-17, six male and five female). They all had little or no prior experience with computers.
From this school class I formed on random two groups of size 3 for this experiment.
Procedure
The two studies were run in parallel over a period of 20 weeks. To avoid crossover effects
between the two studies, the subjects were asked not to discuss what happened in the
experiment with any of their classmates until after the experiment was over.
After all subjects had explored the 38 examples of Square World, they each tested the
four editors in four separate sessions. The interval between sessions was one week for all
subjects in both studies. The order in which the editors were tested was different for each
subject. This was done to avoid any effects in the final group discussions concerning
preference from the order in which the subjects had been exposed to the editors.
As in Experiment A, Apple Computer’s usability testing guidelines (Gomoll, 1990) were
followed rigorously in the editor tests. The task given in all tests was to reproduce with the
editor being tested a set of examples from Square World. The examples were presented to the
subjects one by one. The speed was controlled by the subjects as in Experiment A. The
examples were presented “raw” as in Experiment A, i.e. as interactive squares with a “repeat”
button.
In the evaluation and design sessions, the groups were asked to be as open as possible to
new ideas. They were instructed that the purpose of the session was not for them to find the
“correct” answer, but to teach me as a researcher about how non-experts think about these
matters. In both sessions, they had the computer available to try out the existing editors. They
also had available paper, pencils, a flip-over, and a blackboard. In the final sessions they also
had available my implementation of their first designs.
Part II, Experiments 165
Equipment and software tools
I have used the term prototype to indicate software that is not of production quality, but
sufficient for the purpose of the experiment. All the described functionality was implemented,
and I experienced very few technical problems with the all together 12 prototypes that were
produced during the experiment.
To implement the editors I used an Apple Macintosh computer with Digitalk's
Smalltalk/V, the Acumen Widgets interface builder, and a public domain YACC-like tool for
Smalltalk. (For a discussion of YACC see (Levine et al., 1992) and (Aho et al.,1986). All tests
were run on a Macintosh IIci computer.
Data collection and analysis
All usability tests and design sessions were videotaped with a Hi-8 camera and a directional
microphone.
In the analysis of the usability test I first measured the number of tasks performed
correctly for each editor. Next I focused on a qualitative analysis of breakdown situations.
Following the theoretical framework of Winograd & Flores, a breakdown situation occurs
when the person performing a task is interrupted because the tools at hand behave differently
from what was expected, or because a problem can not be solved with the available tools.
Breakdown situations are interesting because in these cases we often make explicit and
question our “mental models” of tasks, tools, and materials. Breakdown situations thus
become “the king’s road” to understanding a person’s mental models (as Freud said about
dreams and the subconscious).
In the design sessions, I interpreted the choice of favorite editor as a strong indication of
how the domain was conceptualized. In the discussions they also implicitly showed their
understanding of interactive behavior.
8.2 The State-of-the-Art Editors
Based on a study of the technical literature on user-interface design tools, I designed and
implemented four different solutions to the design problem. I have tried to make the four
resulting prototypes as mainstream and "objective" as possible by applying only wellestablished
design techniques and user-interface guidelines.
AN EVENT-ACTION LANGUAGE
The first prototype can be seen in Figure 64. It is a syntax driven editor for a simple eventaction
language. Event-action languages are common in interface builders. (see Svanæs, 1991
or HyperCard for examples).
166 Understanding Interactivity
Figure 64. A syntax-driven editor for a simple event-action language.
The editor included seven one-square examples from Square World that the subjects were
asked to reconstruct with the given tool. The subjects could control the progression of the
tasks with the buttons “Neste oppgave” (Next problem) and “Forrige oppgave” (Previous
problem). The “Igjen” (Again) button brought the examples back to their initial state. By
clicking on the "Prøv/Igjen" (Try/Again) button, the subject could test the artifact created.
The available vocabulary was presented to the subjects both as labeled buttons and in a
scrollable list. The buttons were dynamically grayed out to make it impossible to construct
syntactically incorrect sentences. This prototype allowed only for constructing one-square
artifacts.
The language contains seven rules. Its corresponding grammar G(S) in English would be:
S ::= {}
Initial_State ::= “Start” “.”
Event_Action_Rule ::= “:” ( |) “.”
Condition ::= “If” “then”
Action ::= “make”
Color ::= “white” | “black”
Event ::= “Press” | “Release”
In Figure 65 is shown example #5 from Square World as it could have been expressed in the
above event-action language.
Part II, Experiments 167
Press
Press
Start white.
Press: If white then make black.
Press: If black then make white.
Figure 65. A single-square toggle with event-action description.
Results
All three subjects learned to use this editor, and all managed to recreate all examples.
A RULE-SHEET EDITOR
The second prototype can be seen in figure 66. It was a rule-based editor allowing the user to
define event-action rules by coloring squares in a matrix. It can be seen as a visualprogramming
editor incorporating simple direct-manipulation techniques.
Figure 66. A rule-based direct-manipulation editor.
This editor worked for up to three squares. The squares in the rule area to the right were
colored by clicking on them. The user selected current drawing color (white, black or gray) in
the area in the middle.
To gray out a square tells the editor that it should not appear in the artifact. The top line
defines the initial state. The subsequent lines are rules with pre-condition and post-condition.
168 Understanding Interactivity
The user chooses among the six circles to select the event (press or release on square one,
two, or three).
The behavior described in Figure 66 is the “jumping black square” (Example #12 in
Square World) shown in Figure 41 on page 131.
The subjects were given 16 examples to recreate: seven one-square, five two-square, and
four three-square. The tasks were presented in increasing order of complexity.
Results
All three subjects learned to use this editor, and all managed to recreate all examples. The
subjects ran into a problem when they went from simple one-square examples to more
complex examples with “branching” in the STDs. They typically started out with a model of
the rule-sheet as a sequential description of interactive behavior.
If you add to this an implicit rule of restarting from the top when you reach the bottom,
this model works well for describing one-square examples and examples without branching.
To be able to solve a problem like example #10 of Square World (two independent pushbuttons,
Figure 40, page 131), they had to re-conceptualize their model of the domain.
PROGRAMMING-BY-DEMONSTRATION
The third prototype can be seen in Figure 67. It is a programming-by-demonstration editor
allowing the user to teach the computer the behavior of an artifact step by step.
Figure 67. A programming-by-demonstration editor .
It is built around an idea that can be found in interface builders like Peridot (Meyers, 1993). A
number of other examples of such systems can be found in (Cypher, 1993).
The squares in the rightmost area toggle between white, black, and gray when clicked on.
To create a new artifact, the user first toggles the squares to indicate the initial state. By
clicking on one of the six buttons above, the user indicates an event (press/release). The
Part II, Experiments 169
squares are then grayed out to indicate that the editor does not yet know what the resulting
state looks like. The user then toggles the squares to indicate the resulting state. In this way,
the artifact is built up ("Slipp" = "Release", "Trykk"="Press", "Startsituasjon"= "Initial state").
Results
All three subjects learned to use this editor, and all managed to recreate all examples. In
solving complex problems they all discovered that it was possible solve the problem stepwise
by going back and forth between the task example and the copy they were building. With this
strategy they could deal with one transition at a time without having to take in the full
complexity of the examples.
Their reason for taking this route was the lack of visual feedback in the editor on the
example you have been building. On complex examples they easily lost track of what they
had done, and instead invented this more mechanical strategy as the only solution.
STATE-TRANSITION DIAGRAMS
The fourth prototype can be seen in Figure 68. It is an editor for constructing artifacts by
drawing state-transition diagrams.
Figure 68. A STD-editor.
The thick circle can be moved around. It indicates the initial state. Lines from the "T" buttons
indicate transition on press and lines from "S" buttons indicate transitions on release. (“T” is
the first letter in “Trykk” which is the Norwegian for “Press”, while “S” is the first letter in
“Slipp” for “Release”.
The artifact being constructed in Figure 68 is a “jumping black square” similar to the one
built with the rule-sheet editor. The only difference is that this three-square example has a
“static” third square to the very right. This is because the editor only allows for expressing
solutions with exactly three squares.
170 Understanding Interactivity
The button “Slett alt” in the bottom is an erase-all button. It erases all connections and
allows you to start drawing a new STD.
Results
All three subjects learned to use this editor, and all managed to recreate all examples.
THE IMPROVED VERSION
In Figure 69 can be seen the prototype resulting from the evaluation and iterative re-design by
the group. They selected the STD-prototype and wanted some extensions. An extra function
enables the user to erase connections. They also felt a need for an indicator in the middle that
always shows the last visited state in the design process.
Figure 69. The resulting prototype.
They also added an erase-last-drawn-line button (“Slett linje”) in addition to the erase-all
button (“Slett alt”).
Part II, Experiments 171
8.3 The Empirically-Based Editors
The four empirically-based editors were intended to draw as little as possible on the
established computer-science view on interactive behavior. They were built solely on the
material from Experiment A. From a computer-science perspective they might seem both
strange and inefficient.
A NATURAL-LANGUAGE EDITOR
A purely linguistic approach was taken to the problem in the first of the empirically-based
editors. Following the methodology of Andersen and Madsen (1988); a LR(0) grammar with
attached semantic rules was constructed based on recurrent patterns in the linguistic material.
The resulting grammar had 40 rules.
Figure 70. A natural-language editor.
The YACC-like compiler generator was used to construct a table-driven compiler for this
subset of Norwegian. The editor is shown in Figure 70. Except for the language, the editor
works like the event-action editor described before. The same 7 one-square examples from Square World
were given as tasks.
172 Understanding Interactivity
Results
All subjects learned to use this editor, and all managed to recreate all examples. On some
occasions, they were not directly able to express what they wanted. In those cases they looked
at what was available and treated the language more as an artificial language.
THE PHYSICAL METAPHORS
The second prototype can be seen in Figure 71. It makes use of four metaphors dealing with
physical objects that were identified in the material from Experiment A. As with the rulesheet
editor, the user here describes an artifact as an initial state and a set of what-happens-if
rules.
Figure 71. The editor based on physical metaphors.
The editor allows for the state of an artifact to be built up from elements of different physical
metaphors. The icons represent white and black in the different metaphors. The user
constructs an artifact by placing icons in the appropriate slots in the working area to the right.
In the current implementation, the icons in the state definitions are only different graphical
representations of the colors white or black. Their semantics reside with the user.
The following four metaphors from Experiment A were implemented:
• The paint metaphor: A paint brush was chosen because terms like "..then I paint it
black.." was frequently found.
• The locked square metaphor: A lock was chosen to illustrate "dead" squares. This
was done because the phrase "it is locked" were often used to describe "blocked"
input.
• The switch metaphor: The states of the toggle switches were indicated with the
words "on" and "off". ("På"/"Av"). This metaphor was used by the subjects only to
Part II, Experiments 173
describe squares that had toggle behavior. The subjects typically said: “it is on”, “it
is off”, “I turn it on”, “I turn it off”, “It gets turned off”, “It was on”. On all
occasions, black was interpreted as 'on' and white as 'off'.
• The 2D spatial metaphor: The frog was chosen to symbolize jumping behavior. The
cross is meant to symbolize a landing spot for the frog. Example #11 of Square
World is a good example. This was interpreted as a 2-D object moving on a
background. One subjects put it: “...It jumps back and forth when you click on it...”.
Results
All subjects learned to use this editor, and all managed to recreate all examples. All subjects
found it to be a very obscure editor. They found the icons to be very childish and of little use.
They ended up solving the tasks using only the pure black and white icons. It is interesting to
note that when you ignore the icons, you end up with the rule-sheet editor of the state-of-theart
approach.
THE HOUSE METAPHOR
The third prototype can be seen in Figure 72. The state-space metaphor leads naturally to the
idea of seeing an artifact as a house with doors and rooms. Referring back to Experiment A,
the return to an initial state was on one occasion described: "[I] click on the left one and go
back to the starting point". In this metaphor, the potential states of the artifact constitute a
space in which the subject is moving.
Figure 72. A house.
The "signature" of each room is the corresponding state, and the possible doors correspond to
"click" and "release" in the different squares.
174 Understanding Interactivity
The house seen from the outside symbolizes the initial state. All other states are
illustrated as rooms. The signature of each room is defined by toggling the individual squares.
New rooms are defined simply by opening the door leading to it and setting a new signature.
Results
All subjects learned to use this editor, and all managed to recreate all examples. Again all
subject found the editor to be very obscure. The house-metaphor never caught on. They
ended up solving the tasks using the “doors” as buttons. By doing so, they reduced the editor
to the programming-by-demonstration editor of the state-of-the-art approach.
DESIGNING WITH INTERACTION GESTALTS
Figure 73. An editor based on the idea of Interaction Gestalts.
On some occasions, the subjects of Experiment A described an artifact as being identical or
similar to some previous artifact. The artifacts were often described as modification of
previous artifacts: "It is like that other one, but when ...". All subjects showed an ability to see
artifacts with respect to other artifacts in one way or another. This lead me to assume that the
first-class objects in this domain were Interaction Gestalts.
Part II, Experiments 175
The editor shown in Figure 73 is based on the idea of Interaction Gestalts. It works only
with single-square artifacts. Each artifact is treated as a separate entity that can be cut, copied,
and pasted. A set of unary and binary operations have been defined. A single-square artifact
can be "inverted" along two different "axes": black/white, click/release. All transition rules in
the artifact are affected by these unary operations. An artifact’s initial state can also be
toggled between black and white. In addition, an artifact can be added to or subtracted from
another artifact. Addition and subtraction is implemented as corresponding set operations on
the transition-rule sets. ("Byggeklosser" = "building blocks")
The four interactive squares in the upper left corner are pre-defined and can not be
altered. The user can make copies of these building blocks, and use the working area to
construct new artifact. The "end product" is copied into a separate box. The content of the
clipboard (“Klippebord”) is always visible and tangible. By pressing the “Prøv” ("Try it")
button, the user can test all artifacts, including the clipboard.
The editor can be seen as implementing an algebra of single-square artifacts. The
potential for working directly with user interface elements as an interactive material in this
manner has not been exploited in any system known to the author.
The algebra
To be able to call this an algebra on single-square artifacts, we have to show that all such
artifacts can be constructed from the single-rule artifacts using the defined unary and binary
operations.
As described in Chapter 6, the artifacts of Square World can be described as an initial
state and a set of transition rules between states. In the case of single-square FSAs, there can
only be two states: and . Either one of these can be the initial state. The
possible actions are and . This gives rise to 4 possible transitions between
states as combinations of white/black and press/release:
Rule 1: <,press,>,
Rule 2: <,release,>,
Rule 3: <,press,>,
Rule 4: <,release,>.
Each of these rules can be present or not present in a FSA. For a given initial state this gives
rise to 24 = 16 possible combinations of transition rules. As two different initial states as possible, the
number of different single-square FSAs is 32.
Let us start out with two FSAs, and name them FSA-0 and FSA-1. FSA-0 is white and
has no rules. FSA-1 starts out white, and has only rule number 1. (FSA-0 is identical to the
leftmost building block in the editor, while FSA-1 is the third from the left in the list of
“Byggeklosser” (building blocks)).
To prove that all 32 FSAs can be created from FSAs 0 and 1, we need the binary
operation addition and the three unary operations for swapping black/white, press/release,
176 Understanding Interactivity
initial state. As in the editor we also need at least one working area. This will show up as one
level of parenthesis.
From the above it follows that each of the 32 FSAs can be identified by a unique fiveplace
Boolean vector:

If we represent false with a 0, and true with a 1, we can from each such vector create a unique
five-digit binary number as signature. As the unary operation for changing initial state does
not affect the rules, we only need to do the proof for one initial state:
• FSA-0 has the binary number <00000>, while FSA-1 is <00001>.
• Each of the four rules can be created from rule 1 by combinations of unary operations.
Let us use C to denote color swapping and M to denote swapping of the mouse button
events. Let AOp mean FSA-A modified by unary operator Op. Let A+B denote the sum
of FSAs A and B.
• From FSA-1 we can create the other FSAs containing only one rule as: 1M, 1C, and
1MC. This gives us FSA-2 (<00010>), FSA-4 (<00100>), and FSA-8 (<01000>)
respectively.
• As addition is the set union of transition rules, all FSAs <00000> until <01111> can be
created as combinations of FSA-0, FSA-1, FSA-2, FSA-4, and FSA-8. QED.
The first example of Square World is a FSA that starts out white and has push-button
behavior. In the above formalism it could be created as: 1 + (1MC). It would consequently be
FSA-9 (<01001>).
A detailed analysis of the 32 FSAs shows that only 16 different interactive behaviors are
possible. This is because some FSAs contain unreachable states, and because some have
transitions that can never fire because of the nature of press/release. This does not change the
argument that all single-square behaviors can be created with the algebra.
Results
All subjects learned to use this editor, and all managed to recreate all examples. This result
was not expected.
Part II, Experiments 177
THE IMPROVED VERSION
In Figure 74 can be seen the prototype resulting from the evaluation and iterative re-design by
the group.
Figure 74. The resulting prototype.
In the first iteration, they removed all the icons except white, black, and gray. The resulting
design is 100% identical to the rule-sheet editor of the state-of-the-art study.
In the last iteration they removed the icons all together and borrowed instead the idea of
color toggling from the "house" prototype. They ended up with a strongly simplified rulebased
editor. The resulting design in Figure 74 is very simple and easy to use.
178 Understanding Interactivity
8.4 Discussion
No scientifically valid conclusions can be drawn from this study due to the low number of
subjects. The artificial setting and the nature of the design task also makes it an open question
to what extent the results apply directly to real-world design processes. The results still
contain much interesting material that asks for interpretation.
With both approaches, all subjects were able to understand and use all four editors. From
a purely quantitative perspective this should indicate that the state-of-the-art understanding of
interactivity works well, and that all the findings from Experiment A tested for in Experiment
B were verified. This interpretation is too simple. The tasks were often solved in unintended
ways, and a qualitative analysis is therefore necessary.
The state-of-the-art editors
The fact that the subjects were able to solve the tasks with the state-of-the-art editors does not
contradict the findings from Experiment A. All implicit metaphors found in these four editors
can also be found in the linguistic material:
• Event-action rules: Both the event-action editor and the rule-sheet editor rely solely
on an understanding of interactive behavior in terms of if-then relations. If-then
structures like “It goes black if I click on it” were found very frequently in
Experiment A.
• State-space: The STD-editor is built on the idea of the subject moving in a state
space. This metaphor was also found frequently in Experiment A.
The programming-by-demonstration editor can be seen as building on both event-action rules
and an implicit understanding of a state-space.
Natural language interfaces
The lesson learned from building the natural-language editor was that making such interfaces
is very resource consuming. When 40 language rules were not sufficient for the simple
problem of adding behavior to a single square, the complexity of a real-life problem would
probably be unmanageable.
The fact that the participants managed to understand the limitations of the language and
found ways of getting around these limitations shows that they had no problems dealing with
it as an artificial language.
The fact that the editor works does not say anything about the correctness of my analysis
of the implicit metaphors in the linguistic material. It shows that my interpretation of the
syntactic structure of their “work language” is correct. It also shows that my interpretation of
the semantics of this language is correct, but only at the shallow level of being able to
translate the sentences into the correct corresponding FSAs.
No deep understanding of the users’ “mental models”, “naive theories”, “implicit
metaphors”, “structuring principles” of the domain is necessary to be able to create a natural
language interface. All you need to do is to map the natural language of the users, and
Part II, Experiments 179
implement a compiler/interpreter for this language. The complexity of the task though makes
this brute-force approach impractical for most purposes.
A comparison of the resulting grammar of the natural language editor with the linguistic
material from Experiment A, shows some interesting semantic differences. The syntax allows
for negatively stated facts like: "..but when you release, it is not erased". The corresponding
grammar rules have no semantics attached. The reason for this is that the editor has a static
"context", while the context for the subject at any given time includes all previous interactions
up to that point, i.e. the interaction history.
Winograd and Flores (1986) argued that problems of this kind are due to the positivistic
paradigm in current computer science. For a negatively stated fact about an artifact's behavior
to be meaningful to a computer, the implicit references to previous artifacts must be made
explicit, or they must be induced automatically.
Advanced contextual user modeling might enable editors to make use of this aspect of
natural language in situ. Suchman (1987) analyses the fundamental difficulties confronting
designers of such systems. She argues that these difficulties are due to the physical differences
between humans and computers.
The nature of metaphor
All subjects in the empirically-based group found the "house" and "physical metaphor"
prototypes to be obscure. They found it very unnatural to compare an artifact with a house,
and they never used the pictograms in the "physical metaphor" prototype.
One explanation could be that the icons were very badly designed and did not give the
right associations to jumping frogs, locks, rooms, and doors. My visual design was indeed not
very impressive, but I still do not find this to be a plausible explanation. The subjects were
initially told what the icons were intended to mean, and I am therefore not convinced that
improved visual design would have improved the usability.
Another possibility is that the subjects did not find the icons of any use because they did
not come with built-in behavior. A possible improvement would consequently have been to
add functionality to the icons. The frog could for example become a jumping square, and the
lock could inhibit input. This sounds plausible, but I suspect that the usability would only
increase slightly. Only a usability test of a new design of the editor can give further insight
into this.
A third and more radical explanation is to question the fundamental understanding of
metaphor underlying the design of the two editors. When the user is asked to move a frog
around to indicate jumping behavior, the underlying assumption is that spontaneous
descriptions like “The black square jumps” are metaphorical. A metaphorical description is
something that understands a phenomenon in terms of another phenomenon. The jumping
square is understood as moving because it behaves like something physical that moves.
An alternative to this view is that the descriptions were not metaphorical, but litearal.
This would explain why the two editors did not work. According to the latter interpretation,
the subjects of Experiment A actually saw the squares moving. To them, the jumping squares
were just as real as jumping frogs. This view fits well with Michotte and Heider&Simmel’s
180 Understanding Interactivity
findings concerning the immediate perception of phenomena on screens. Following this line
of thought, the “metaphors” from Experiment A must now be reinterpreted because what they
describe is not a metaphorical understanding of the domain, but the direct perception of the
domain.
This confusion about metaphor can come from a certain reading of Lakoff&Johnson.
Lakoff insist that everything is metaphorical. This gives no room for literal meaning. I do not
think this is Lakoff’s intention. His point is that there is an underlying structure even in
meaning that we ordinarily see as literal.
Another possible source of confusion is the use of the term metaphor in user interface
design. The Apple Macintosh “desktop metaphor” has become the prototypical example of
user interface metaphor. I will argue that to most users this is not a metaphor, but a visual
formalism for structuring information. The link back to the office desktop is long lost. The
fact that kids who have never been in an office work fluently with “the desktop metaphor”
should make us question our use of the term. The Apple Macintosh Finder works because it
simulates on the screen objects in a flat physical world. Any “metaphor” would probably do
as long as it makes use of a spatial representation of information. Even a non-metaphorical,
i.e. formal, representation would probably work just as well.
Formal systems vs. metaphor
The insistence on metaphor in user interfaces is rooted in the belief that end-users can not
learn formal systems. As Nardi (1993) points out, this view has been widely held among
computer scientists (me included). She argues well that we find numerous real-world
examples of people without a scientific training who use quite complex formal systems. Her
examples include knitting instructions, baseball scoring forms, and drumming in New Guinea.
These examples all indicate that as long as the formalisms are domain-specific, users will
have no problems accepting them. The problems arise when the formalisms require an
understanding of abstract concepts outside of the domain, such as the Von Neuman machine.
The empirically-based group stripped away every visual metaphor in their final design,
and ended up with a simplified version of the rule-sheet editor. This can be taken as another
indication that Nardi is correct in assuming that end-users deal well with domain-specific
formal systems. The domain in this case is the interactive squares, while the formalism is the
visual event-action language of the rule-sheet editor.
The lesson to be learned is that GUI metaphors are not always superior to abstract
representations when it comes to ease-of-use and ease-of-learning. Applied to GUI design,
this means that one should not be afraid of considering new formalisms for end-user systems.
As with metaphors, the formalisms and their visual and/or textual representations should be
rooted in an understanding of how the end-users conceptualize their domain of knowledge.
Interaction gestalts
The fact that the interaction-gestalt editor worked well is an indication that this is a viable
approach to the problem. The editor relies on the assumption that it is possible to work
Part II, Experiments 181
directly with building blocks having inherent behavior without the need for any external
representation of their working. No such additional visual or textual representations were
present in the editor. We can therefore conclude that this approach worked for the
participants.
I consequently conclude that this usability test supports the interaction-gestalt
interpretation of the results from Experiment A.
182 Understanding Interactivity
Chapter 9 Exp. C: Participatory Tool Design
“Design is where the action is”.
Susanne Bødker, 1990, quoting Newell & Card.14
As discussed in Chapter 1, there is a strong tradition in Scandinavia of involving the end-users
in the design of information systems. The main rationale for this approach to software design
was originally emancipatory, i.e. to prevent alienation, job stress, and deterioration of quality
of work. Recognizing that the end-users will always be the ultimate experts of their domain of
knowledge, the participatory-design (PD) tradition has developed design techniques to ease
the communication between end-users and software designers. Lately there has been a
growing awareness that the inclusion of end-users in the design process also makes sense
from a purely economical, i.e. technical, point of view.
Through the process of designing tools to solve a problem, designers are often
forced to make explicit their understanding of the problem. This observation opens up for
using PD not only as a method for constructing software, but also as a research methodology
for getting insight into how a domain is understood by a user group. By using participatory
design as a methodology in qualitative experimental psychology, we change the knowledge
interest from being emancipatory and technical to being hermeneutic.
Following this line of thought, Experiment C was designed as an artificial
participatory-design process. The participants were asked to design a tool to enable them to
construct artifacts in Square World. The design process was then analyzed to get
empirical data on the participants’ understanding of interactive behavior in Square World.
14 (Bødker, 1990, p.3)
Part II, Experiments 183
9.1 Method
Experiment design
The PD-tradition has developed a large number of techniques for involving the end-users in
the design process (see Greenbaum and Kyng, 1991 and Ehn, 1988). A common theme in the
rationale for many of these techniques is the shortcomings of abstract representations as
medium for communicating about design. Both scenario-based techniques and techniques
involving the participants in paper-based design make use of concrete representations of the
design. The concrete examples of possible systems become a common ground for
communication between participants from different backgrounds. The need for working with
concrete representations of possible systems has spurred the development in the PD
community of tools for rapid prototyping (see CACM, 1992).
Recognizing the importance of being able to discuss concrete prototypes, I have
borrowed the idea of iterative design from the usability-engineering tradition (see Nielsen,
1993a, 1993b). Iterative design applied to PD means to structure the design process as a circular
movement between design sessions and prototype development.
After each design session with the participants, I implemented a running prototype of
their design. The prototype was brought into the next design session for discussion. In the
actual experiment this iteration was done 4 times. Figure 75 shows an illustration of this
process.
Figure 75. The steps of the PD approach.
Subjects
The participants for the experiment were taken from the same school class as the participants
of Experiment B. A group of size five was picked on random, three female and two male. As
with the participants of Experiment B, they were of age 16-17 and had little or no prior
experience with computers.
Design session Prototyping
Four iterations
184 Understanding Interactivity
Procedure
Each of the 4 sessions lasted 45 minutes. The interval between sessions was approx. two
weeks. The sessions took place at the university in a room shielded off from the rest of the
activities at the department.
As for Experiment B, all participant had prior to the first design session ben through the
procedure of Experiment A to familiarize them with Square World.
In the first design session, the group was given the collective task of designing a piece of
software that could enable them to construct new interactive squares. I told them that I would
do all the programming. I gave no help or hints as to what such an editor should look like. I
was prepared to help them, but wanted to see the results of just letting them struggle with their
own frustrations for some time. I encouraged them to develop a positive group attitude and
stopped attempts at killing off ideas too early.
Near the closing of each session I asked them to decide on a design that I should
prototype and bring back into the next session. In all sessions they had the computer available
to try out Square World and the prototypes. They also had available paper, pencils, a flipover,
and a blackboard.
Equipment and software tools
The tools and computer used for creating the prototypes was identical to Experiment B. In the
design sessions was used a Macintosh laptop PowerBook 100 with the same software.
Data collection and analysis
Three kinds of data were collected:
• A video recording of each design session.
• The written material resulting from the sessions.
• The running prototypes.
Interesting parts of the video material was transcribed, together with references to written
material and running prototypes. The underlying assumption of the experiment was that by
letting the participants get involved in the process of designing a tool for building artifacts in
Square World, they would make explicit their understanding of the "phenomenon".
An analogy to this is to see the design process as a process of scientific discovery, i.e. a
scientific discourse. According to this metaphor, Square World is the phenomenon, and the
design ideas are the scientific theories. This analogy works because the computer programs
were designed for the purpose of constructing such artifacts. In use, the programs describe the
artifacts, just as applications of a scientific theory attempt to explain a phenomenon.
This gives me a framework for analyzing the data. Seeing design as theory building,
focus is now on how different design ideas emerge, change, and compete. The analysis will
thus resemble the historical analysis of a scientific discourse. All "theories" give me valuable
input on possible ways of understanding interactivity in Square World.
Part II, Experiments 185
The idea of seeing the editor design as theory building, is similar, but different from Peter
Naur’s view on programming as theory building (Naur, 1985). Naur’s uses the analogy that a
program is a theory that describes the working of the software, while “theory” here means a
set of building blocks and operations (tools) that enable an end-user to construct
“explanations” of an interactive phenomenon.
9.2 The First 10 minutes
I find the design process interesting enough to include extensive parts from the first 10
minutes of the first session. The resulting “thick description” (Geertz, 1973) will give the
reader the opportunity to judge the correctness of the conclusions I draw from the material.
The setting
I started the first design session by placing the participant around a table with a laptop running
Square World. The participants had all prior to the session explored the examples, and this
was the first time they met with the rest of the group. The participants were classmates, and
there was no need for the round of presentation otherwise required in such situations.
Figure 76 shows the room used in the experiment, and how the experimenter (Exp) and
the five participants were placed in the first session. The participants are in the following
referred to as m1, m2 (male) and f1, f2, f3 (female).
Exp.
m1
m2 f1
f2
f3
Table
Laptop
Blackboard
Figure 76. The room used in Experiment C.
186 Understanding Interactivity
Framing the design task
After we had sat down, I introduced the participants to the design task. To give an idea of the
process by which the task was understood, I have included a part of the transcript.
Exp.: The job to be done is to make such [referring to Square World], but different ones,
with one, two or three.
m1: Yes, you want to...
f2: You can not have more than three then?
Exp.: Yes, one could imagine using the same to make...
f2: Yes, you can take more than three and do different things
f1: But does it need to be only squares?
Exp.: No not necessarily, but we thought first to do it with squares, and then may be to see
if we can make it with other things later.
f2: different things, some squares, some circles, different things like that. One could have a
big square, square, triangle...
Exp.: Mmmmm
f1: What is it really you want to make then?
m1: How?
f1: How we shall make it?
Exp.: How you would like such a program to look like.
f1: With squares et cetera
Exp.: That makes it possible to tell the computer how such squares should behave.
f1: Oh yes..
Here ends their explicit discussion about the nature of the design task. Immediately after f1
said “Oh yes”, m1 presented the first design idea. I take this as an indication that at least f1
and m1 had understood the design task at that point. In the next couple of minutes all
participants showed in one way or another that they have understood the task.
Even with my clear reference to Square World, they at first made a very wide
interpretation of the design problem. They wanted to allow for free composition from multiple
forms with multiple behaviors. From a computer science perspective, they wanted to build
something like HyperCard. By focusing on the behavior, I managed to make them narrow
down the task.
Part II, Experiments 187
In a real-world design project, the formulation and reformulation of the design problem
would have been a major part of the process in all its phases. Keeping the design problem
fixed as I did in this experiment would in a real project not have been in line with the
democratic spirit of the PD tradition.
Design idea #1
The next transcript starts immediately after f1’s “Oh yes..” of the previous transcript. The design
task has now been understood, and the first design idea comes up:
m1: But then. Then I would have had different possibilities. All the possible ways such a
square could behave. Kind of placed in a row below, and then you press the mouse on
the square and then you place it up where you want it to be. Then you can have different
arrangements of squares. nothing happens and.. I am not sure how many possibilities
there were. When I press on one, the other one change color and the,,
f2: A little bit simple.. Either it does not change color, or it changes color.
Exp.: Do anybody want to walk up to the blackboard and draw it.
m1: [walks up to the blackboard and starts drawing]. I imagine it works like you have the
big square like this [Drawing 1, Figure 77]. Here are a lot of squares that you press on
with the arrow and then hold the button down [see Drawing 2]. One that nothing
happens, and when you press on it nothing happens. And one that is black and then
nothing more happens to that one [see Drawing 3]. And then you can place different
squares like that. This is kind of my idea about how it should look.
Drawing 1 Drawing 2 Drawing 3
Figure 77. Three drawings done on the blackboard.
The idea presented here was simply to identify all possible artifacts in the design space, and
list them as artifacts. As theory it says: “An artifact is one of this, this, this,,, or this”. The
construction becomes simply a matter of identifying the correct artifact among the alternatives
presented. It involves no naming, decomposition, representations or abstraction.
188 Understanding Interactivity
A critique of idea #1
Next followed a discussion about this idea:
Exp.: [to the group] What do you think about that?
f3: ...that square. What happens when you press on it...
m2: You actually have to try out how they behave...
m1: that is because when they are all white then you do not know...
m2: yes exactly
f1: yes, difficult to find out what is what..
m1: yes..
We see that participant m2 argued against just presenting the artifacts on the screen. He
observed that to be able to pick the right one, you have to try them all. The rest of the group
followed his argument. What they observed was that you can not guess an artifact’s behavior
from its visual appearance.
Questioning their understanding of the design problem
Next followed a short dialogue where the group started to question their understanding of the
design task.
f1: But what is it all about then?
m1: I only though I would make a program such that you have one to 35 different so when
you look at a new one you make such a program.
m1: I am sure that is the idea.
Exp.: It is how to make such a program...
m1: If we are to make one of those [pointing to the computer on the table] then...
Exp.: Make arbitrary such..
It is interesting to note that this intermezzo came during their first “crisis”. We have all
experienced that in time of crisis we often start doubting our own understanding of the
problem at hand.
Part II, Experiments 189
Design idea #2
Having verified that their understanding of the design problem was correct, they went on
discussing the design:
m2: Wouldn‘t it have been just as well to have a list of how...
m1: If you take the mouse and press four or five times, then you see how they behave, and
if it is the kind you want up there. And then you just hold down the button and drive...
m2: [nods]
Exp.: [to m2] You had a suggestion.
m2: Yes, if you have the possibility to draw with the mouse and just press the corners for
example. And then it draws the lines itself in the big square. And then you have a list of
how it behaves. Written. You read it, and then you figure out what you want on it.
Exp.: Could you draw that.
Participant m2 starts drawing on the blackboard.
Figure 78. The second idea (m2).
m2: If you have that square here. You can draw the square with the mouse... Then you
have a list down here. Written how the squares behave [see Figure 78]. Then you
choose. If you choose one then may be if becomes all black or no reaction.
f3: You have to use the mouse for in a way....
m2: To pick up...
f3: yes...
m2: Or the keys the...
190 Understanding Interactivity
The second idea was similar to idea #1 in that it listed all possible artifacts. It differed in that
it involved both a naming of the artifacts, and a simple description of their behavior in an
informal language.
Describing behavior
The discussion then continued about how the behavior should be described:
Exp.: Could you try.. Yes, great... Could you try to give some examples on what should be
written down there.
m2: [smiles]
Exp.: The rest of you too, if....
f2: Becomes black.
m1: yes, be black, comma, then white and then - when you press then it becomes black and
then white right away.
f1: press on the button then it keeps.. Then you press on it again then it gets white.
m1: press one it becomes black, and ones more and it becomes white.
Exp.: Could any of you try to write in how it should look on the screen..
m1: It will just be written like that under there then...
Exp.: If I should draw it. Here it says [points to first “line‘]. What is written here?
m1: Stays black it says.
f1: One
m1: One yes.
Exp.: One [Goes up to the blackboard and writes a “1“]
m1: Stays black
Exp.: [Writes “1: Stays black“. See Figure 79]
m1: When you press on it it gets black... and then no matter how much you press, nothing
more happens there.
Exp.: When I press on what?
m1: On the square as we do on the screen there.
Exp.: Yes
m1: Nothing happens then... stays black.
Part II, Experiments 191
1: Stays black
Figure 79. Idea #2 with written list of square behavior.
As a theory of Square World, design idea #2 can now be formulated: “An artifact is either a 1,
which is described “nothing happens“, or a 2 which is described “..“ etc.”
We see that already after 10 minutes of design work the group had come up with some
interesting ideas and insights.
9.3 Results
The group turned out to be very creative, and they needed my assistance in the design sessions
mainly as a “communications consultant”.
The design ideas of the first session
Around 30 minutes into the first session, a third idea emerged. Idea #3 was totally different
from #1 and #2. It stated that an artifact could be seen as layer upon layer of squares with
different color that are flipped when you interact with the mouse.
In a way, idea #3 sees the behavior of an artifact as a movie that is advanced one frame
each time a user interacts with it. The group found that this idea lacked the expressive power
to bring the user back to an initial situation. It was for this reason given up.
Having given up idea #3, they returned to their old ideas and fused idea #1 with idea #2.
A sample artifact to be tried out was added to each description. This made each description
both textual and interactive.
Symmetry
It was next recognized that for every artifact starting white, there is a corresponding one
starting black. Stated as a theory of Square World: “For every artifact initially white, there is a
corresponding one that is “inverse“ with respect to color, i.e. black and white is swapped”.
192 Understanding Interactivity
It was then recognized that for every artifact reacting to a mouse-button press, there was a
corresponding one reacting similarly to a mouse-button release. As theory: “For every artifact
there is an “inverse“ artifact with respect to mouse-button press / mouse- button release”.
The first prototype
The first prototype can be seen in Figure 80. In the original version all texts were in
Norwegian. To ease readability I have translated all text fields word by word into English.
Figure 80. The first version.
The design builds on the described combination of design ideas #1 and #2. The design group
numbered the possible elementary one-square artifacts (behaviors) from 1 to 10, and from
then on referred to them with these numbers. When asked to construct a certain artifact they
simply said: "That is a number 5". The press/release symmetry is made explicit through the
button “At release”, while the color symmetry is implicit in the ordering of the 10 artifacts
The ten elementary interactive squares in the list to the left can all be tried out. The tiny
buttons to the left of the squares are "repeat" buttons. The group also let each interactive
square be accompanied by a simple description. An artifact is selected simply by clicking on
it.
Part II, Experiments 193
interesting solutions going beyond the structure of the life-world should be preferred just because they are interesting. Part III, Reflections 243 Chapter 14 Conclusions and Future Work “something more is missing in the gap between people trained in graphic arts and people trained in programming. Neither group is really trained in understanding interaction as a core phenomenon”. Terry Winograd, 199720. This study has touched upon many topics, and has taken many detours from its main course. This last chapter gives a short summary of the main conclusions and implications. The chapter ends with a presentation of some directions for future work. The study started out from a dissatisfaction with the current state-of-the-art concerning software tools for interaction design. A possible cause for this situation was found in the current metaphorical understanding of the computer, and how this limited understanding is “hardwired” into the implicit models of the computer found in the tools. From this it followed that a new and less metaphorical understanding of the computer was necessary to be able to build a new class of software tools. The search for this new understanding focused on the most important property of the modern computer: Interactivity. The search for the nature of interactivity led to a search for a new understanding of the interactive experience. 14.1 Conclusions Blind spots in the HCI theories From my reading of seven important contributions to the HCI field, I draw the conclusion that the field at present can not provide a deep understanding of the nature of the interactive experience. One reason for this is the ignorance of the role of the body in human-computer interaction. It was found that all seven works used different theories to deal with nonsymbolic (“body”) interaction and symbolic (“mind”) interaction. 20 (Winograd, 1997) p. 162. 244 Understanding Interactivity The relevance of the mind/body discussion in philosophy This dichotomy between body and mind is reflected in the two most commonly used metaphors for understanding the computer: as tool (body) and as medium (mind). To go beyond this split understanding of the computer-in-use requires a less split understanding of the human user. This point to an ongoing discourse in philosophy on the relation between body and mind. One interesting contributor to this discussion is the French philosopher Merleau-Ponty. “Interaction Gestalts” The need for empirical grounding led to the design of three psychological experiments. From the experiments, I draw the conclusion that in every interaction with an interactive artifact there is at work a perception of interactive behavior. This perception is direct and immediate, and shows gestalt properties. What is perceived is an understanding of the working of the artifact as an interaction gestalt. Interaction gestalts are similar to visual gestalts in that they are wholes, and not compositions of analytical elements. “Kinesthetic Thinking” The experiments further shed light on the cognition involved in interaction design. In describing, reasoning about, and designing interactive behavior, we make use of our ability to think with interaction gestalts. This kinesthetic thinking has strong similarities with visual thinking and musical thinking. Kinesthetic thinking involves the recall, comparison, combination, imagination, etc. of interaction gestalts as experiential wholes. This way of working with interactive behavior is very different from the analytical method of computer science, where interactive behavior is broken down into discrete states and the transition rules between these. The kinesthetic sense modality has been long neglected in Western culture. The body as medium and background In every interaction between a human being and an interactive artifact, there is an implicit lived body. Having a body is necessary to understand interactive behavior. Without a body, the interactive experience would be just a formal construct without meaning. As human beings we have prior experience of having a me and a not me, a The “Save” button and the “<” button to its right was included to allow for saving artifacts that had been built. The “Undo” button simply cleared the working area. We see in the description of the artifacts an implicit rule-based event-action formalism very similar to the language of the event-action editor of state-of-the-art study in Experiment B. These descriptions were never decomposed and turned into a language for describing interactive behavior. The second session The second session started by trying out the prototype I had developed from their initial design. They found it to reflect what they had tried to express, and continued discussing improvements. In the discussions they now referred to the artifacts by number. It is interesting how these “novices” already after a couple of hours developed their own local “expert language”. For an outsider their discussions would now have sounded totally absurd. The following is an example of one such dialogue from the second session: m1: Yes, that‘s a one. m2: One m1: That is a four f2: A three In this session they went on to tackle the problem of designing two-square artifacts. They specified that this should be done by dragging building blocks from the list onto two empty squares. They soon realized that the behaviors of the two squares often were interrelated. To deal with this problem they identified a set of modifiers and relations between the squares. Their final design from this session is shown in Figure 81. It allows for the construction of two-square artifacts. In this editor, complex artifacts are built from the elementary interactive squares by drag-and-drop. The check buttons define relations between the interactive squares that an artifact is built up from. 194 Understanding Interactivity Figure 81. The second version. The “R” check boxes over the squares in the working area are used for inverting press/release behavior as in the first prototype. Press in left Press in right hite toggle Black toggle Shared input, Only on black Figure 82. Reconstruction of jumping square In Figure 81 and Figure 82 can be seen how the group reconstructed example #10 from Square World (the black square that jumps back and forth when you click on it). First they dragged a no. 8 and a no. 7 onto the squares. These elementary interactive squares have toggle behavior. No. 8 is initially black and toggles when you click on it. No. 7 is initially Part II, Experiments 195 white and toggles when you click on it. The second relation was then activated by pressing the “Both react” button. From a computer science perspective this means that user events are sent to both squares. The fifth relation was also enabled. It says "functionality only on black". In technical terms this means that all non-black squares are blocked for input. This gives the correct artifact. The right square is blocked for input because it is not black. A click in the left square makes both squares toggle. The left square now gets blocked because it is non-black. A click in the right square again leads to a toggling of both squares, and we are back in the initial situation. The last two sessions The prototype did not change much from version 3 to the fourth and final version. I therefore describe the two last sessions together. Figure 83 shows the final version of their design. Figure 83. The fourth and final version. In the implementation I took the liberty of cleaning up a bit and change some buttons into check boxes. Having added the modifier and relations in version 2, they found that some behaviors still could not be expressed. To solve this problem they allowed for more complex artifact to be constructed by defining new "states". The “state” of an artifact was in this editor not the static property of its squares (patterns of black and white), but their dynamic properties (inherent behavior) and relations. 196 Understanding Interactivity A typical discussion among the subjects could go: “I think it is a five and a seven. When you click on the left one it becomes a two and a four. When you click on the right one, it goes around and becomes a five and a seven again”. To define a new “state”, you clicked on the “Continue” button and where given four choices: “Happens on click in left”, “on release in left”, “on click in right”, or “on release in right”. A choice would then empty the workspace, and create a new empty line in the upper right window. A coded description of each state was automatically placed there, and a star indicated what “state” was being worked on. When the result was tested out, the star indicated what “state” you were in. In their final session they also expressed a need to visualize branching behavior between states. When the experiment ended, they were very close to having developed the idea of state-transition diagrams to visualize the transitions between their “states”. Their first attempt at this is seen in the lower right window of the editor. 9.4 Discussion Theories, design ideas, and metaphors Keeping focus on the participants’ understanding of Square World, I have identified three levels of formalization in the material: • At the highest level of formalization are the four editors as running prototypes. Seen as theories of the domain, they resemble mathematical theories. They are unambiguous, complete, consistent, and are built up from a set of formal building blocks and operations on these. • Less formal are the design ideas as they emerged in the design sessions. Taking idea #3 as an example, it contained the idea of describing the examples as sequences of states, but did not specify details such as what transitions between states were possible. The design ideas are similar to the running prototypes in that they are intended as general theories capable of describing all possible artifacts of the domain. • At the lowest level of formalization are the implicit metaphors / mental models that result from an analysis of the linguistic material. Their scope is in most cases restricted to the concrete example being described. In the complex cases, it might even be restricted to just some aspects of the example. Their structure is further never made explicit and has to be induced from the data. The textual descriptions of the ten building blocks in the prototypes fall somewhere in between the last two levels. They are informal, but their scope is general. Labels like “1. P: Black/White” and “1. P: White 2. P: Black” look like formal representations of behavior written in a formal language. For a computer-science person, it is easy to construct such a Part II, Experiments 197 language from the ten descriptions. The participants of the experiment though never reflected further on the structure of these descriptions. Any formal language we construct to “explain” the data will therefore never be more than a theory of what the participants could possibly have meant. It is as such less formal than a design idea. The internal validity of the theories is higher the more formal the original data. This means that we should put more trust in our analysis of the prototypes than in the analysis of the design ideas and the linguistic material. Summary of the theories of Square World The development of the participants’ understanding of Square World can be summarized: 1. The design space consists of a finite (small) set of possible artifacts. You chose between artifact by trying them out. The artifacts are numbered. 2. As some artifacts differ only in behavior, a textual description of their behavior is required to enable the user to identify them. 3. An artifact can be described as a linear sequence of static states. This design idea was rejected before it got implemented. 4. There is no conflict between ideas 1 and 2. They can therefore be fused. Each artifact now has a number, a textual description of its behavior, and can be tested out. (This idea was implemented) 5. There are two axes of symmetry in the design space: Press/Release and Black/White. (Press/Release was implemented as a function, Black/White was implicit in the ordering of the artifacts in the editor) 6. The behavior of an artifact can be described as a sequence of event/action rules. It is implicit in the formalism that when you reach the end of the sequence you return to the top. Press/Release pairs are described “Color-1/Color-2”. (This theory was implicit in the descriptions of the building blocks) 7. Two-square artifacts are constructed as pairs of one-square artifacts. (Implemented) 8. To describe relationships between the behaviors of the two squares, a set of modifiers and relations can be applied. These are: Happens in the other, Both react, Works only in left, Works only in right, Works only on black, Works only on white. (Implemented) 9. The behavior of more complex artifacts can be expressed by connecting more than one pair of building blocks. Each pair now becomes a “state” in a FSA where the states are not static patterns of color, but artifacts with behavior. The legal transitions between “states” are press/release in left or right square. (Implemented) 198 Understanding Interactivity Comparison with Experiment A Most of the important metaphors and linguistic constructs from Experiment A can also found in the material from Experiment C: • The Interaction Gestalt idea is found in the way the participants composed complex artifacts from building blocks with inherent behavior without breaking their behavior up further. • The Linear-Time metaphor is the basis for design idea #3. • The State-Space metaphor is to a certain degree implicit in their design idea #9. • The event-action way of expressing behavior was found in design idea #6. Participatory design as experimental method Surprisingly little knowledge of computers was needed before the participants could start doing quite advanced design work. This came as a surprise to me. Initially I was prepared to step in and give them hints, but this was never needed. The only really novel design idea that resulted from months of hard struggle with the empirical data from Experiment A was the Interaction Gestalt idea. Exactly the same idea popped up spontaneously in the PD group after 10 minutes of brainstorming. I find this especially interesting because I have found no reference in the literature to similar design ideas. Comparing Experiments A and C, the PD approach was an order of magnitude more cost effective in getting to the core of the domain. Comparing with Experiment B, the state-of-the-art group was not able to conceive the Interaction Gestalt idea in any of their design sessions. Neither this group, nor the empirically-based group conceived any new design ideas at all. They made interesting and well functioning improvements of existing prototypes, but they never stepped out of the frame of reference given by the prototypes they had tried. This makes me conclude that running prototypes have a very strong effect when it comes to closing a domain. An important consequence for participatory design is that care should be taken not to expose the participants to existing solutions too early in the design process. Concluding from Experiment C, the participatory-design methodology as modified here works well as a way of getting insight into how a domain of knowledge is understood. The use of design as a way of getting insight into the “naive theories” of a domain involves some pitfalls. It is important to note that after a couple of design sessions, the participants are no longer “naive” about the domain. Through the design process they develop their own local language and understanding of the domain. This new understanding is not always similar to an existing “expert” understanding of the domain, but it can on the other hand no longer be said to be “naive”. The design process creates in the participants a “blindness”, which resembles the blindness we as researchers want to overcome in the first place by doing experiments. The experiment then goes from being an explication of the naive ontologies of the domain, to becoming a creation of micro-culture. These effects should warn us from using the results from artificial design studies without some caution. Part II, Experiments 199 Applied to Experiment C, this means that we should put more trust in the early design ideas than in the late ideas as “naive theories” of interactive behavior. We see this suspicion confirmed if we try to find support in Experiment A for the late design ideas. Nowhere in Experiment A do for example the subjects talk of “both react”. The modifiers described in design idea 8 are constructions made to solve a problem, and do not reflect the structure of the participants’ spontaneous understanding of Square World. That does not make them less interesting as an example of a non-standard way of conceptualizing interactive behavior. Part III Reflections Part III, Reflections 203 Chapter 10 Making Sense of Square World “Objects are not seen by the visual extraction of features, but rather by the visual guidance of action”. Francesco Varela et al., 1991.15 Experiment A took the subjects on a journey through an unknown world. The findings from the experiment tell us how they made sense of Square World. The results from Experiments B and C give further credibility to the findings. Taken together, the material allows us to build a coherent local (grounded) theory of how Square World was perceived. The resulting interpretation of the results ignores individual differences. The phenomenological perspective of Heidegger and Merleau-Ponty opens up for trying to reconstruct from the linguistic material the “world” of the subjects. There is unfortunately no way of avoiding that the resulting analysis will be colored by the general worldview of phenomenology. As such, the result must be seen as one of many possible interpretations of the data. The analysis will deal with the interactive experience (process); with the perceived structure of Square World (content); and with what can be learned from the two concerning the nature of the perceiving subject. 10.1 The Interactive Experience The changing locus of agency found in Experiment A is a good starting point for an analysis of the interactive experience. “ME” AND “NOT ME” At first, the subjects clicked on the squares and observed simply that change took place. At this point there had already been established an “extension” of the subject into Square World through the mouse-cursor mechanism. All subjects were prior to the test familiar with the use of mouse. They controlled the cursor, and their “extended bodies” extended to the edge of Square World. As the experiments went on, the subjects moved deeper into the world. At the 15 (Varela et al., 1991, p. 175) 204 Understanding Interactivity same time, the experienced locus of agency moved from being by the computer as a whole to being by the squares. We see from this that whenever a subject’s view of Square World changed, so did the implicit lived body. The figure-ground relation in visual perception can be used as a metaphor to illustrate this relation between body and world. The experienced body of the subject can be seen as the white space emerging when one makes the experienced world ground. The background of the vase becomes two faces. The changes in corporeality found in the experiments were so closely linked to the changes of worldview that it is tempting to draw the conclusion that we are describing a single phenomenon. Following Heidegger, a single word for this sum of “me” and “not me” could be “Existence”. As the living body is our medium for having a world, there is in every perception an implicit perceiving body. “Body” should here not be understood physiologically, but as the subject’s implicit image of his/her body. It is only through interaction with the world that both the subject’s understanding of the world and the implicit body is created. The early phases of an interaction with something new has similarities with a baby’s exploration of the world. As newborn, we have no strong divide between ourselves and the world. By watching our hand move, we slowly learn what is “me” and what is “not me”. As the psychological experiments with kittens and tactile viewing discussed in chapter 3.2 showed, the transition from an unstructured “Existence” to the divide between “Body” and “World” requires action. Only through the interplay between intentions, actions, and reactions does this creation of world and body happen. To talk about a living system’s interaction with its environment in terms of actions and reactions, as if it was a board game, is a bit misleading. There are in most cases no delay between action and reaction, and the interaction is consequently in most cases better described as jazz improvisation than as chess. As human beings, we are in the world, and in constant interaction with it. Merleau-Ponty uses communion as a metaphor to describe this total involvement with the world. Most attempts to break interaction into pairs of “my move” / ”your move” will be after-the-fact intellectualizations that do not make justice to the true nature of being-in-the-world. INTENTIONALITY AND THE PHENOMENAL FIELD Behind every interaction there is a will, - an intention towards something. For most of the interactions of the experiments, there were no conscious intentions at the detailed level. As the detailed analysis of the three cases in Experiment A showed, much of the interaction was “tacit”. The speed made the interactions resemble perceptual processes. The objects that the subjects saw in the Square World did thus not result from a logical analysis, but emerged through interaction in the same way as the world emerges to us through perception. Merleau-Ponty uses the term “pre-objective intentionality” to describe our instinctive will to explore the world. As living beings we can not avoid having this drive to perceive and interact. Part III, Reflections 205 Prior to any perception there has to be an openness for perceiving something. What a given person perceives in a given situation is not arbitrary, but depends on the person’s expectations of what is possible. These expectations do not form a set of detailed possible outcomes, but constitute what Merleau-Ponty calls the phenomenal field. Heidegger uses background to describe the same. It is what each observer can not avoid bringing into a situation of previous experiences, perceptual skills, and expectations. The phenomenal field both influences what is being perceived, and becomes a context for interpreting what is perceived. In everyday English one often uses “openness” to describe an important aspect of the phenomenal field. This is seen in expressions like “open minded” and “narrow minded”. From the experiments it follows that large portions of the interactions can better be described as perceptions. From the interaction with the Square World there emerged a world. As the subjects were not told what to expect, the emerging worlds can be seen as reflections of the structure and content of the phenomenal field of the subjects. This is much in the same way as Rorschach inkblot tests are intended to show the structure of a patient’s subconscious. Taken together, the subjects’ descriptions of the Square World show what they brought into the interactions. I CAN – CONTROL In a lot of cases the subjects described their experience in terms of “I can” and “I can not”. By saying “I can make it black by clicking on it”, the subject describes a skill and at the same time shows a belief in causality and in the stability of the world. Implicit in “I can not” is an understanding of control, i.e. as background. Without a prior experience of control, there would be nothing to negate. The subjects also described the behavior of the squares in terms of rules like “It goes black when I click on it”. Again, we see a belief in cause and effect. The cause being the subject’s actions, and the effect being predictable behavior in the squares. From these observations we can not conclude that predictable behavior is what the subjects expected from the computer, because none of the experiments involved random behavior. But as all subjects interpreted the behavior in this way, one can at least conclude that they were open for the possibility of a stable, predictable world with immediate feedback on user actions. Other experiments have to be constructed to investigate what other worlds users are open for. If one for example makes a world with much random behavior, and the subjects to a large extent describe it in terms of how it is not stable and not causal, this would make it plausible that stable and causal worlds are what users expect. INTERACTION GESTALTS As the subjects got into Square World, they started identifying objects and describing their behavior and the spaces in which these objects resided. As previously concluded, the rapid interactions that went on in the exploration of the examples can best be seen as “pre206 Understanding Interactivity objective” perceptual processes. With the danger of building a rigid psychological model, the interactions can be described as going on at two levels simultaneously: • At the perceptual level closest to the computer are the rapid mouse movements and button clicks that the subjects did when they explored new examples. • At the cognitive level above emerge the Interaction Gestalts that result from the interactions. When the subjects said “It is a switch”, they did not come to this conclusion from a formal analysis of the State Transition Diagram of the example. Nor did they conclude it from the visual appearance of the square, as the squares all looked the same. The switch behavior slowly emerged from the interaction as the square repeated its response to the subject’s actions. Descriptions like “It goes black when I click on it, and white when I release” must be seen as analysis of the structure of already perceived Interaction Gestalts. What was being analyzed and described was not the actions and reactions of the interaction at the perceptual level, but the resulting image of the behavior at the cognitive level. The experiments seem to indicate that the subjects’ capacity to deal with complex Interaction Gestalts were limited. As the examples got complex, some of the subjects lost track of what they had been doing. As with all perceptual modalities, our capacity to deal with complexity and our capacity to remember is limited. From this one can conclude that the experienced atomic elements of Square World (at the cognitive level) have inherent behavior. This conclusion is supported both by the detailed analysis of the rapid interactions of Experiment A; by how the subjects in Experiment B easily made sense of the Interaction-Gestalt editor; and by how the subjects in Experiment C spontaneously chose as basic building blocks squares with inherent behavior. At the perceptual level, the atomic elements are the single action/reaction pairs. During normal interaction we are too involved to be aware of this level. To be able to access it, we have to slow down the interaction by an order of magnitude and make the single actions and reactions the focus of our attention. This is a mode of perception offered by human-computer interaction and by tactile perception that is not possible with visual and auditory perception. Even though we can take control of the movement of our eyeballs, we can not slow down the rapid eye movements that make up visual perception. Part III, Reflections 207 10.2 The Perceived Structure of Square World CARTESIAN SPACE As the examples got more complex, the subjects started describing the objects of the Square World not only in term of their behavior, but also in terms of the “spaces” in which they resided. As with the Body/World relationship, there is a complementary relationship between the perceived objects and the spaces in which these objects where seen to reside. The analysis now consequently moves from describing the interactive process, to describing the perceived structure of Square World, i.e. its ontology. One of the simplest examples of a primitive interactive object is the toggle switch. Most subjects perceived single squares with toggle behavior as switches. Even if real switches always have a position in the world, the single-square switch was never described in spatial terms. The lack of spatial descriptions indicate that no spatializations took place. The same was true for all single-square examples. Spatial terms could have been “The switch is on a black background”, “The switch does not move”. No such descriptions were found. The lack of spatialization does not mean that the subject would not have been able to give meaningful answers to questions like “Where is the switch?” What it says is that no Cartesian space with relations like left-right naturally emerged to the subjects from the interaction. There is a methodological problem related to this interpretation of the data. To interpret the lack of a certain kind of verbal data in a think-aloud experiment as a lack of a certain kind of cognitive activity poses a problem. Strictly speaking, we can not exclude the possibility that one or more of the subjects actually saw the switches as residing on a background, but just failed to report it. On the other hand, we have in general no guarantee that any of the subjects’ descriptions actually reflected what they perceived. This is a possible source of error that we simply have to accept as a drawback of protocol analysis. The first true spatial description came with the two-square example “The jumping square” (#11 in Square World). The same effect can be achieved by letting two light bulbs oscillate 180 degrees off phase as shown in Figure 84. 208 Understanding Interactivity Figure 84. A jumping square and a jumping light. In both cases a figure and a ground is created. What becomes figure and what becomes ground depends on the surroundings. With the light bulbs, the physical world itself becomes ground, while the illusion of a moving light becomes figure. With the Square World there is no physical world to work as a natural background, and both figure and ground must be formed from the colors on the screen. Figure/Ground By changing the context of the two squares, we can change what becomes ground. None of the examples in the experiments explored the effect of changing the context. The following conclusions are consequently based on my own observations trying out some examples made for this purpose. Figure 85. Three variations of “the jumping square”. Figure 85 shows three variations of “The jumping square”. In the example to the left, black naturally becomes the figure color, while in the example to the far right, white becomes Part III, Reflections 209 figure. The example in the middle has no privileged color because the context is as much white as it is black. If the behavior of the two squares is such that both squares accept clicks to swap the colors, we have symmetric behavior, and one can choose what to make figure. The effect is similar to the face/vase drawing, where the viewer can choose what to take as figure. When the behavior is made asymmetric such that only squares with a certain color accept clicks, my informal test show that the “active” color is perceived as foreground. If for example black is chosen as the active color, our perception chooses to interpret the behavior as a black square that jumps away when you click on it, and not as a white square that comes to you when you click. This illusion is to a certain extent maintained even when the background is made the same color as the “active” square. From these simple examples one can only conclude that both context and behavior contribute to the choice of ground color. As the effects seem to add up, a rule of thumb for interaction design would be to make what one wants to move active, and at the same time as visually different as possible from the background. To measure the effect of each factor on the choice of figure and ground, detailed experiments would have to be designed that varied systematically both behavior and the color of the context. The logic of 2D+ space The perceived Cartesian space in Square World is organized along the two orthogonal axis left/right and figure/ground. A figure covers part of the ground, and with more complex examples the illusion of layers can be achieved. With more complex examples an up/down dimension orthogonal to left/right would probably have emerged. With more than two layers, the figure/ground dimension becomes in-front-of/behind. This is still not a full 3D space, but more powerful than a pure 2D “Flatland”. This 2D+ space has strong similarities to the space constituted by board games like chess and Reversi (Othello). The logic of moving in 2D+ space can be expressed mathematically as an algebra. Let the 6 movements {Left, Right, Up, Down, Forward, Back} be the elements. By letting the operation “A + B” denote the effect of first making movement A and then making movement B, certain properties of the algebra can be expressed mathematically: • The inverse property. − For any element A there is an inverse A-1 such that A + A-1 = Ø (no action): Left = Right-1 , Right = Left-1 Up = Down-1 , Down = Up-1 Forward = Back-1 , Back = Forward-1 • Commutative. − For any elements, A and B: A + B = B + A, e.g. Left + Up = Up + Left • Associative. − For any elements A, B, and C: A + (B + C) = (A + B) + C, e.g. Left + (Up + Right) = (Left + Up) + Right 210 Understanding Interactivity From these rules one can for example show that moving in a circle Left+Up+Right+Down will take you back where you started: 1. Left+Up+Right+Down = Up+Left+Right+Down (Commutative) 2. Up+Left+Right+Down = Up + Down, because Left+Right = Left + Left-1 = Ø, (Inverse) and A + Ø = A 3. Up + Down = Ø, because Up + Up-1 = Ø. (Inverse) 4. Left+Up+Right+Down = Ø Q.E.D. Users of GUIs are rarely able to express their knowledge in this way, and there is no need because Cartesian space is part of our everyday lives, and is consequently internalized. In Experiment A, the subjects’ position in 2D+ Cartesian space was always identified with the position of the cursor. The cursor can only move left-right and up-down, not between layers. After 20 years with Graphical User Interfaces, the technology has become second nature for us to such an extent that we often forget that mechanisms like the mouse cursor are man-made and could have been different. All GUIs place the cursor on top of all other layers. An alternative could have been a cursor that moved between layers. To avoid losing track of it, one could for example make the layers semi-transparent. Features like this are relatively easy to implement, but convention has placed the cursor above all other layers. STATE SPACE Only certain combinations of appearance and behavior create the illusion of Cartesian space. In most of the examples there wasno obvious way of seeing the behavior in terms of Cartesian space. Here other kinds of spatialization took place. Statements like “I am back where I started” referred in most cases not to positions on the screen, but to states of the FSA in question. The space constituted by the states of a FSA could be called its State Space. Each state becomes a place in this space. Either the subject is moving as in “I go back”, or the FSA moves as in statements like “It went back to being as it was”. State Space has no orthogonal dimensions, but is structured by the transitions between states. State Space has no left or right, up or down, or figure-ground. The physical analogy to moving in State Space is moving between places. A certain action can take you from one place to another. In the cases where actions were directly or indirectly reversible, the reverse actions where described as “going back”. In the cases where no “ways” where leading out of a place, the situation was described as “being stuck” or “no way out”. All systems that let the user navigate in an information space make use of this metaphor. Madsen and Andersen (1988) found this metaphor to be widely used by librarians in Part III, Reflections 211 describing their operation of database systems. They "entered into" and "left" the databases in the system. The metaphor can also be seen in HyperCard's "Home" button. The initial card is called "the home card", and the user is instructed that the button enables a "return" to this card. We often talk about "going in" and "going out" of running applications in a multiprocessing environment. We “visit” pages on the Web, and we all have our own “home page”, and we all have a "start page" that we can “return” to. The implicit metaphor of moving between places is also found in our everyday descriptions of situations. We “end up” in situations, try to “get out of” situations, and “position” ourselves “into” situations. Some times we find ourselves “back where we started”. As these metaphors are built into the language, it is hard to tell whether what appears to be a State Space is actually a Situation Space. From the perspective of the subject, the two are quite similar. Being in a known situation means recognizing important properties of the environment as familiar. This is similar to how we experience being in a known place. As with all aspects of the interactive squares, the structure of the state space only emerges through interaction. “Known space” expands as new actions are tried out. In State Space there is no way of knowing for sure how to get to a place without trying it. This is different from Cartesian Space where the fact that the axis are orthogonal makes it possible to reason about the effects of actions. With State Space it is only possible to reason about the effects of actions that one has already tried out. LINEAR TIME The last dimension found in the material is Linear Time. As with the two previous dimensions, both metaphors describing the subject moving in time, and metaphors describing the FSA moving in time where found. Examples are “I do it the opposite way” and “It goes on being white” respectively. Linear time is structured along the dimension Before-After. In addition, there is a Present, a Past, and many Possible Futures. “Doing it the opposite way” means swapping Before and After in a sequence of operations. It might seem unnecessary to describe the obvious, but to be able to make an analysis of experienced time in Square World, it will be useful to spell out the details of the temporality of everyday life. At first glance, Linear Time looks identical to one-dimensional Cartesian Space. Before and After becomes Left and Right, and Present becomes Here. The only difference is that in lived time, what is currently the Present is already history the next moment. In Cartesian Space we are in control of the movement of the Here, while in lived time we can not control the movements of the Present. In Square World, the temporality is somewhat different from everyday life in that everything that happens on the screen is a direct response to a user action. This means that time moves forward only when the user does an action. All applications that provide change logs enable the user to "go back" in linear time. This metaphor is also implicit in simulation programs that enable the user to "spool back" to an earlier situation by way of a cassette player metaphor or a slider. 212 Understanding Interactivity There is also a close resemblance between Linear Time and State Space. One way of understanding the behavior of the examples is to model it as an animation sequence. The third design idea from Experiment C builds on this understanding of interactive behavior. This model works for simple examples, but brakes down when there is a need to express branches and circularities. Initial state Click . 2. Figure 86. A state-transition diagram with a corresponding time sequence. In Figure 86, the State-Transition Diagram describes an example without branches or circularities. This gives rise to only one possible time sequence as illustrated to its right. Initial state Click Click 1. . 3 Figure 87. A FSA with the beginning of an infinite time sequence. The FSA described in Figure 87 has a circularity, and its behavior can consequently not be described as a finite time sequence. Linear time works well for representing interaction history. The past has a simple Before-After structure that fits well for this purpose, but the future has no such simple structure. COMPARISON OF THE THREE SPACES An example illustrates how interactive behavior can be understood from all three perspectives. Let us use as an example a three-square FSA with three states. Initially it is ( ). Whenever you click on a white square, that square goes black and the other two go white. Its State-Transition Diagram is seen in Figure 88. This is the state-space view. Part III, Reflections 213 Press right Press left Press right Press left Press middle Press middle Figure 88. An example with three states. Seen as Cartesian Space, it is a black square that moves between three positions. Figure 89 illustrates this. Figure 89. The example seen as Cartesian Space Together with the Cartesian spatialization there must be rules to account for the interactive behavior. In this example, the behavior could be described by the rule: “The black square goes to the position you click in”. Linear Time has a Past, a Present, and Possible Futures. Let us assume that the user has done two clicks, first in the middle square, then in the rightmost square. The two first actions form the Past, while the current state is the Present. From the present there are two possible 214 Understanding Interactivity immediate futures. These potential next states again branch out into an infinite number of distant futures. Figure 90 shows this. Click in middle Click in rightmost Time = 0 The present Click in middle Click in leftmost PAST PRESENT FUTURE Figure 90. The example from a Linear Time perspective All interactive examples in Square World have a corresponding State-Transition Diagram, and all interaction histories can be expressed as time sequences. But only certain combination of color and behavior allow for Cartesian spatialization. 10.3 The Implicit Structure of the Perceiving Subject The above analysis describes the perceived ontology of Square World. If we combine the source domains of the metaphors used, we get a human being with a physical body in an environment. The environment is populated with objects. The relations between the objects all refer to the subject as observer: left/right, up/down, and a in-front-of/behind. The subject is always in a place, and can move to other places. There is always a past, a present, and a number of possible futures. We see that this ontology mirrors some fundamental aspects of the spatiality and temporality of everyday life. It indicates that in our struggle to make sense of abstracts worlds like Square World, we bring with us our experience of being human beings with a body in a physical environment Not only did the body emerge in the implicit metaphors projected onto Square World, but also the interpretation of these metaphors requires an understanding of our physicality. To be able to interpret descriptions like “I go back to where I started”, the interpreter needs an Part III, Reflections 215 understanding of the physical realities referred to. This requires knowledge of simple facts like that the user can only be in one place at a time. Without this vast background of common sense, the metaphors would not make sense. As an interpreter, I take for granted that the user has a body similar to mine and live in a similar physical environment. Even small differences in physicality could make the interpretation difficult. If for example a similar test had been done in weightless space with subject who had acquired a different understanding of up and down, the interpreter would have needed an understanding of the peculiarities of weightlessness. The importance of a common background of lived experience becomes evident when we try to make a computer understand natural language. The lack of attention to this aspect of language and cognition in the early work in AI was one of the main reasons why Winograd and Flores looked to continental philosophy for an alternative perspective on AI. It might come as a surprise that our reasoning about interactive behavior should be so closely linked to our experience of being physical. It contradicts the common idea of computers as the ultimate expression of abstract, i.e. disembodied, thinking. One could argue that the subjects in the Square World experiments had no training in computer science, and that professional programmers and interaction designers do not need to fall back upon metaphor. The popularity of object-oriented programming (OOP) among programmers is one argument against this view. OOP makes it simple to visualize systems behavior because it models software as something physical. It takes for granted an understanding of the physical world of objects. This return to the body indicated that a deeper understanding of interactivity requires an understanding of embodied cognition. In philosophy this corresponds to the “Corporeal turn” that can be found for example in the works of Merleau-Ponty (1962), Johnson (1987), Varela et al. (1991), and Sheets-Johnstone (1990). 10.4 The Interactive World Arnheim (1977) describes the structure of architectural space by telling an evolving story of an observer and her environment. He starts out with an observer and a point in space. This creates a line between the two, and the dimension here/there. With two points in space in addition to the observer, a plane is created. We can now get a new dimension left/right. When a third point is added, space is created. We now get an inside/outside, and an up/down. He continues by adding gravity, light etc. What would be a similar story in “interaction space”? Imagine that you are “thrown” into an unknown world. Imagine further that the kinesthetic sense modality is your primary source of information. This would be the case if it was all-dark, and no sounds could be heard. Imagine also that you do not initially know what kind of a body you have been given in this world. To learn anything, you have to act. The simplest interactive experience comes from the observation that an action can lead to a reaction. You cross a line, and something happens. To make it concrete, let the reaction be a simple tone being played. The first time you experience a reaction there is no way of knowing if it was caused by you, or if it happened by pure chance. It is only when you experience the 216 Understanding Interactivity same reaction to the same action for a second or third time you will you perceive it as interactive behavior. This is causality and control. It creates a “me” and a “not me”, and a relation between you and the world. The next level of complexity is when there is a pair of actions that can “do” and “undo” a change. To make it concrete, let the change be a tone being played continuously. You cross an invisible line to turn the tone on, and cross it back to turn it off. This is the experience of reversibility. With a reversible device, it becomes an extension of your lived body. Something in the world is now under your control, and becomes part of you. The experience of reversibility also creates the potential to experience irreversibility, i.e. that an action can not be “undone”. An interesting variation of the “do/undo” behavior, is when you find a line you can cross once to turn a tone on, and a second time to turn it off. In switch terms, this is toggle behavior. This enables you to have control over “things” that are not directly part of you. If we return to the one-square examples of Square World, we see that all the described behaviors can be found among these examples. In a similar way, one could add more lines (squares to press/release) and tones (squares lighting up), and get the same kind of experienced phenomena as in Square World. Part III, Reflections 217 Chapter 11 Understanding Interactivity “The body is our medium for having a world”. Maurice Merleau-Ponty, 194516 How does the previous analysis of the Square-World experiments apply to the understanding of interactivity in general? The experiments dealt with the interaction between a single user and a single computer with screen and mouse. The scope of the present discussion will consequently be restricted to GUIs with a similar hardware configuration. Much will be similar with different hardware, but we can then no longer base the analysis on the results from the experiments. A discussion of how to apply the research methodology developed for the experiments to the study of different technologies, can be found in Chapter 13. A natural start is therefore to begin with a discussion of the interactive dimension of Graphical User Interfaces (GUIs). 11.1 The Lost “Feel” Dimension Since the birth of the modern GUI in the early 80s, the focus of attention has mainly been on improving the visual aspects of the interface. We live in a culture where the eye is the dominant sense, and large investments have been made in making the GUIs around us visually appealing. Comparatively less has been invested in the interactive aspects of the interfaces. The interactive aspects of an interface are perceived not primarily with the eye, but with our sense-for-the-interactive-quality-of-things. This is not a sixth sense with dedicated sense organs, but a faculty of man enabling us to perceive, judge, imagine, design, and reason about the behavioral aspects of our environment. Kids often have a strong sense-for-the-interactive-quality-of-things. Put something interesting like your latest electronic gizmo in front of a two-year-old and observe. She will not place herself at a decent distance to the object and watch it like some cultivated guest in 16 (Merleau-Ponty, 1962, p. 146) 218 Understanding Interactivity an art gallery. She will most probably say “See!”, grab the gizmo with both hands, turn it around in her hands, and finally bite it to feel how it tastes. As grown-ups most of us need to re-learn how to make use of the “feel” sense. (Although not to the extent that we start biting on our colleagues’ gadgets). A simple exercise will illustrate: Pick up a pen. Keep the cap on, and hold it as you do when you are writing. Touch the tabletop or something else in you environment with the pen, and sense the form and texture of your immediate environment through the pen. Close your eyes, and concentrate on how your perceive the world. Block out as many other senses as you can. When I do this exercise in my Interaction Design classes, the whole atmosphere in the classroom changes. A deep calm fills the room, and it often stays until the next brake. In the terminology of Merleau-Ponty, the pen becomes an extension of our lived body. It becomes a medium through which we can perceive the world. In the “feel” dimension it becomes obvious that all perception is active, i.e. interactive. There would have been no perception of the environment without the movements of the hand. Merleau-Ponty would have added that indeed the perception was more than a passive reception of stimuli, and that an exploration of the tangible aspects of our environment could better be described as entering into and becoming part of that environment. This is Merleau-Pontys “communion with the world”. Because we kept our eyes closed in the exercise, we were able to concentrate on the “feel” dimension. Try repeating the exercise with your eyes open. Now the visual apparatus take over, and the “feel” is subordinated to the job of merely correcting and amplifying the visual data. This is similar to what happened to the interactive aspect of GUI design, i.e. to the “feel” of “look and feel”. With GUIs it is even more difficult to argue for the “feel” dimension because of the hardware of today’s PCs. The feedback from the computer is primarily visual. All GUIs rely heavily on this, so when you close your eyes in front of a GUI, both the visual and the interactive dimension collapse. That leaves us with the difficult job of arguing for the existence of something “tangible” that you can neither see nor touch. 11.2 Kinesthetic Thinking The data from the experiments indicate the interactive experience has gestalt properties, i.e. that its first-class objects are interaction gestalts. These gestalts are similar to visual and auditory gestalts. In the same way as you see a rose, and not a collection of petals; hear a familiar musical theme, and not a sequence of tones; you perceive the interactive behavior of Part III, Reflections 219 a GUI widget not as a collection of action/reaction pairs, but as a meaningful interactive whole. The fact that the subjects of the experiments were able to reason directly with interaction gestalts without the need to break them up further, leads to the assumption that we are dealing here with a mode of thinking that has a lot in common with both visual thinking (Arnheim, 1969) and musical thinking. Meaningful experiential wholes were mentally compared, superimposed, inverted etc. as if they were images. Johnson (1987) proposes "kinaesthetic17 image schemata" as a term to describe experiential wholes resulting from interaction with our physical environment. He claims that these schemata have a lot in common with visual image schemata, and that they have a very fundamental role in cognition and understanding. Following his terminology, I propose the term "Kinesthetic Thinking" to signify direct cognitive operations on tactile-kinesthetic sense experiences, i.e. on Interaction Gestalts. For many purposes the term "tactile" can be used as a synonym for "kinesthetic". Alex Reppening (1993) uses the term “tactile programming” to describe how his users program with the tool AgentSheet. In the case of GUIs with mouse input, I would find it misleading to talk about "Tactile Thinking" as no true tactile feedback is provided by the computer. To support the view that kinesthetic image schemata are important in human cognition, Lakoff (1987) reports on psychological experiments where blind subjects perform mental operations on tactile pictures: It seems to me that the appropriate conclusion to draw from these experiments is that much of mental imagery is kinaesthetic - that is, it is independent of sensory modality and concerns awareness of many aspects of functioning in space: orientation, motion, balance, shape judgement, etc. (p. 446). The notion of a separate kinesthetic sense modality can be traced back to the early work of the dance theoretician Rudolf Laban (1988). He defines the kinesthetic sense: ..the sense by which we perceive muscular effort, movement, and position in space. Its organs are not situated in any one particular part of the body, as those of seeing and hearing,....(p. 111). He says about the process of composing a dance: ...this cannot be an intellectual process only, although the use of words tends to make it so. The explanatory statements represent solely a framework which has to be filled out and enlivened by an imagery based on a sensibility for movement. (p. 110). His "imagery based on a sensibility for movement" is very close to what I mean with Kinesthetic Thinking. It is important to note that imagery in the context of choreography does not mean visual imagery, but an imagery which happens directly in the kinesthetic sense modality when the dancer uses the body as medium and material. 17 Johnson’s spelling. I have chosen to use the more common spelling. 220 Understanding Interactivity This brings us again back to the role of the body in human-computer interaction. 11.3 The Body in Interaction The implicit experienced body of the interacting subject appeared as a complementary in the ontology of Square World. To make sense of its interactive behavior, the subjects projected onto Square World their prior experience of being physically present in the world. From the structure of these projections, (i.e. me/not me, left/right, here/there), it was possible to induce the structure of the interacting body. This analysis is in many respects similar to Johnson’s (1987) analysis of the implicit body in language. To him, almost every understanding is rooted in a prior embodied experience. Our understanding of words like “inside” and “outside” require a familiarity with physical containment. Without such an experiential grounding the words would be without meaning. Even logical thinking would collapse, he argues, because our understanding of true and false rely on our understanding of containment. In what way is an understanding of the role of the body relevant for an understanding of interactivity? It is relevant because most of the theories of HCI have treated the body merely as a machine executing the commands of a disembodied logical mind. The exceptions are of course W&F and their application of Heidegger (see Chapter 3.1). Even though Heidegger made an important break with the Cartesian mind/body split, he never went into detail about the embodied nature of “Dasein”. Even though he used practical examples like hammering, he never showed how the structure of Dasein’s corporeality influences Dasein’s way of being in the world. Restoring language Similar to what Reddy (1993) pointed out about our language about meaning, we have at the root of Western language a split between the physical and the mental that makes it very difficult to peek out of our epistemological prison. To break the Cartesian dichotomy, we need to restore our language of description. One approach is to refer to the human being as “Body-Mind”. This is to me not satisfying. First, “Body-Mind” has been over-used by Alternative Medicine and the New Age movement, and has thus been given a mystical/spiritual twist that is paradoxically very much Cartesian. Next, by just combining the two opposites of the dichotomy, one does not create any new understanding of the nature of the synthesis. Another approach is to take one of the two out. This is close to what Merleau-Ponty does when he talks about “Body” (le corpse propre) as something close to “the human being”. This makes us bodies all the way down; and all the way up. Using “Body” in this way makes us aware of our bodily nature; the part most neglected by the Cartesian split. The drawback with taking “Mind” out of the equation in this way is that it can easily give rise to a misunderstanding about the nature of the remaining “Body”. “Body” is often though of as a “Beast” without moral judgement and capacity for everything we like to call human. Part III, Reflections 221 The opposite of removing “Mind” is to take “Body” out of the equation, and talk about the human being as pure “Mind”. This requires that we give mind-like properties also to the world. Such a turn would by most be regarded as mysticism, even though there is much support in new physics for making such an interpretation (David Bohm's theory of the Holographic Universe). Again, such an approach would by many have been interpreted as mysticism. As the point is definitely not to end up in Cartesian mysticism, this approach is not fruitful. A fourth alternative is the Heideggerian solution of inventing a new term for the human being. His “Dasein”, is an attempt at overcoming the linguistic roots of the Cartesian split. Unfortunately, Heidegger’s trick did not have the effect he anticipated, at least not outside a small circle of Heidegger scholars. To sum up, I find none of the four approaches satisfying. In lack of anything better, I prefer to follow Merleau-Ponty in using “Body” as synonym for subject as he does with his “lived body”, le corpse propre. The body in human-computer interaction Having made this detour, we can return to the ways in which the body appears in humancomputer interaction. Every attempt at building psychological theories, or actually any theory saying something about living systems, runs into the danger of reductionism. With living systems, we have to start out with the fact that they are integrated wholes, and not machines that can be decomposed and understood fully as a sum of atomic entities. The holistic nature of living systems, like Body, makes it dangerous to say that it is composed of entities. When we still do this, it is because an analysis requires us to make a sequential description of the problem at hand. If we keep in mind that the analysis is just one of many possible, and that it does not refer to any divide in the phenomenon itself, we can proceed with an analysis. The body shows up in human-computer interaction in at least four ways: • The structure of the subject’s p hysical body determines what kind of physical interactions are possible. • The b odily skill s of the subject determine what of the interactions are actually possible. These skills include the body’s given pre-objective intentionality towards the world. • The body g ives meani n g to the interactions. Meaning is created by the body, in interaction. The body is the meaning-giving subject in interaction. • The structure of the body, and the bodily experiences, form the b ackground that colors the meaning given to the experiences. Concerning the last point, we saw in Experiment A how the structure of the human body, and the way in which we move, was reflected in how the subjects gave meaning to their interactive experiences. As already mentioned, these “levels” must be seen only as analytical tools. In the process of interacting, the “levels” affect each other. The meaning given to an 222 Understanding Interactivity experience changes the course of the interaction, and this change of course again leads to new meaning being created. The physical body Every interaction requires a body in a very physical sense. Without a body to “give input” to the artifact there can be no interaction. The structure of the “normal” human physical body is reflected in the design of keyboard and mouse. Disabled users, elderly users, and kids require different input devices. Bodily skills Given that a physical body is present, every interaction requires bodily skills. These skills are always a result of previous interactions; with the artifact in question and/or with other artifacts. For GUI interaction, these skills include the internalized control of the cursor, how to select in menus, and how to scroll windows. These skills come from having a physical body. In the end, the skills in use change the physical body through muscle development. Skills like knowing how to delete a document by dragging its icon to the trash bin require an implicit understanding of the logic of physical space. This means that below every specialized skill there is always layer upon layer of bodily skills acquired through years of having a living body in the physical world. The bodily skills shape our interactions in many ways. The lack of a required skill makes certain areas of an interface inaccessible to us (to use a state-space metaphor). In cases where we do not have other means of knowing the structure of the state-space (e.g. from reading a user manual), a lack of skill will also restrict our view of the state-space. Our bodily skills consequently always influence the development of the interaction. A lot of these skills and experiences lead to interaction patterns that can not adequately be predicted with formal mental models at the cognitive level. A simple example is the directmanipulation method in Macintosh Finder for ejecting a floppy disc. To make the floppy pop out of the Mac, you can drag it to the trash bin. This does not make sense. Rumor has it that it was implemented as a “fix” by a programmer at Apple who had not taken part in the original discussions about the desktop metaphor. Because all the programmers found it convenient, it was never taken out. Once shipped, it could not be taken out for compatibility reasons. In a formal analysis of the desktop mental model it will show up as an inconsistency, but not as a problem that can not be solved by adding a tiny rule of exception. With this rule added, the interaction should flow smoothly according to a mental-model analysis. But it does not! After 15 years as a Mac user I still feel uncomfortable dragging my floppies to the trash bin. The feeling makes me pause a bit, and it thus changes the flow of the interaction. Against this argument one can say that my reason for feeling uncomfortable is the brake with real-world metaphor. Yes, but I have no problems dragging free-floating windows around in empty space, or magically pulling down menus from the roof, or scrolling text by dragging the cursor in the opposite direction. This is because none of these metaphor inconsistencies awake in me the almost bodily fear of destroying something by accident. Part III, Reflections 223 The body as perceiving and meaning-producing subject The next level is the role the body plays in making sense of the interactive experiences. As we saw from the experiments, the participants projected their experience of being in physical space onto Square World. The bodily experiences thus became the background that gave meaning to the interactions. The body is at the same time the seat of this meaning-creation. It is therefore misleading to describe the process of meaning-creation as an “interpretation” of the interaction. The interactive experience never exists to the subject as anything else than an experience already filled with meaning. “Interpretation” would require some other representation of the interactive experience that was interpreted. This is where much theory on human-computer interaction fail. Assuming a level more primary than the already meaningful, only makes sense in a Cartesian epistemology where meaning exists only for “Mind”. With the Cartesian reduction of “Body” to mere matter, we get this split between “interactive experience” and “interpretation”. 11.4 Meaningful Interactions How can an interaction be meaningful in and by itself? If we give back to Body the capacity to create, perceive, and communicate meaning; we no longer require all meaning to be verbal or visual. We give life back to the body, and get Merleau-Ponty’s lived body. Common ground Before we continue further along this line of thought, it is necessary to discuss the preconditions for making an analysis of bodily meaning, and for making the results of such an analysis intelligible. Of relevance here is Wittgenstein’s analysis of language-games (see Chapter 2.8). If all use of language is simply disembodied manipulation and communication of symbols in language games, there is no room for bridging from one game to the next. To explain that this still is possible, we have to question the disembodied nature of cognition and communication. We can then draw the conclusion that it is our common ground of having lived human lives with human bodies that enable us to cross between language games. Our common embodied nature makes it possible for us to comprehend the meaning of words as more than mere references to other word. As Wittgenstein pointed out, this understanding requires a human body, having lived a human life: “If a lion could talk, we could not understand him” (Wittgenstein, 1953, p. 223). From this it follows that our discussion of bodily meaning can not avoid resting on the embodied nature of the reader. An exercise The best way to illustrate what I mean with bodily meaning, is to ask the reader again to do a small exercise: 224 Understanding Interactivity 1. Raise your right hand to the height of your chest. Move it forward slowly with the palm away from you. Move it as if you were pushing something that required you to push it very gently. When your elbow is stretched out, move the hand back slowly. You can again imagine that you are receiving something that requires you to receive it very gently. Repeat this pushing and receiving one or two times. Each push-receive should take at least 10 secs. Focus on the movement. If necessary, close your eyes. 2. Repeat the exercise with both hands. 3. Having done it with both hands, rest the left hand, and repeat it with only the right hand. In what way is the meaning of the movement different in (1) and (3)? At least to me, the resting hand is in (3) still part of the picture. Compared to (1), I am now moving one hand and resting another. This is different from the experience in (1) of simply moving a hand. The left hand has in (2) been “given life” by having been experienced in movement. This difference in meaning given to the same movement illustrates the gestalt nature of bodily experiences. It further illustrates the importance of movement history as context. Similar movement, synchronized with breath, form the basis of Tai Chi Chuan, an ancient Chinese martial art / meditation technique aimed at “vitalizing” the body. (see Shapiro, 1980). The tacit nature of bodily meaning What we experienced in the exercise was meaning in the kinesthetic dimension. In our everyday coping with our environment, we are normally not aware of this aspect of life. We are too busy doing whatever it is we are doing; and that is of course the way things should be. Despite its tacit nature, bodily meaning is there if we chose to pay attention to it. A consequence of the embodied view on cognition that we find in Merleau-Ponty and Johnson is that we must assume that “tacit” bodily meaning affects us, even when we do not pay attention. This is much in the same way as architects assume that the buildings we inhabit affects our lives, even when we do not reflect on the influence of architecture. Part III, Reflections 225 Chapter 12 Interaction Design “....imagery based on a sensibility for movement”, Rudolf Laban, on the process of composing a dance.18 The term Interaction Design originates from the 80s. It was developed by Bill Moggridge and his colleagues at the company ID2 to describe their design work (Winograd, 1996, p. 165). By focusing on interaction, the term goes to the core of how the design of user interfaces differs from Graphics Design. As this study has illustrated, the interactive dimension of the graphical user interface is fundamentally different from its visual dimension. This should have some consequences. I will deal here with two areas in which I see the results from this study apply to Interaction Design: (1) the education of interaction designers, and (2) tools for supporting interaction designers. 12.1 Educating Interaction Designers In the context of Interaction Design, kinesthetic thinking involves not only "a sensibility for movement" (Laban, 1988), but also a sensibility for orchestrated responses to movement, i.e. interaction. In “TOG on Interface”, Bruce Tognazzini (1992) gives an interesting example of kinesthetic thinking in interaction design. One of Tognazzini’s case studies in the book deals with the design of a “One-Or-More Button”. He has set himself the task of designing a combination of the Macintosh radio button and the check box. He wants a widget that allows the user to specify only one or more of a number of choices. After a lengthy description of how he and his colleagues struggled with the problem, he reports on how the solution was found. We are now at the end of a design session where many alternatives have been tried out: As the din died down, and the spitballs ceased flying, I became aware of my friend, Frank Ludolph, repeatedly pressing his thumb down on the table in front of him as though he were attempting to assassinate some offensive bug. “You remember,” he said, “when we were kids, playing with a drop of mercury (before we all knew it was 18 (Laban, 1988, p. 110) 226 Understanding Interactivity poisonous)? Remember how it used to squish out of the way when you tried to press down on it? Why don’t you make your buttons do that?” We were in the presence of genius. Such a simple solution, and no one else has tumbled to it. (p. 274) We see from this example how kinesthetic thinking to a large extent is "tacit". It does not involve the manipulation on symbolic representations. In the terminology of Polyani (1966), the trained kinesthetic thinker possesses "tacit knowledge". Applied to Interaction Design, it is possible to identify the different “levels” of this tacit knowledge: • Closest to the surface are the practical skills of storyboarding, dialog creation, doing paper design, building prototypes, and analyzing usability tests. • At a deeper level, we find the faculties that make these skills possible. This includes the ability to imagine, invent, and “tacitly” reason about interactive behavior as illustrates by Tognazzini’s example. It also includes the ability as an observer to understand the interactive experience of others through empathy. A more advanced skill is the ability to combine the two: to be able to imagine and “tacitly” reason about how someone else will experience interactive behavior. • At an even deeper level, these faculties rest on what Howard Gardner calls “bodilykinesthetic intelligence” (Gardner, 1993). The Multiple-Intelligence theory In an attempt to challenge the view that only logical-mathematical and linguistic abilities should count as intelligence, Gardner suggests that at least seven separate human “intelligences” exist. In Gardner’s multiple-intelligence (MI) theory these are: linguistic intelligence, musical intelligence logical-mathematical intelligence, spatial intelligence, bodily-kinesthetic intelligence, intrapersonal intelligence, and interpersonal intelligence. He says about bodily-kinesthetic intelligence that it is the ability to control fine and gross actions, manipulate external objects, and create and mimic movement. Persons with a well developed bodily-kinesthetic intelligence can be found among for example choreographers, dancers, jugglers, and athletes. Bodily-kinesthetic intelligence is close to what I have called Kinesthetic Thinking. Reformulating a definition with reference to Gardner; Kinesthetic Thinking is the “mental” part of bodily-kinesthetic intelligence, i.e. the ability to create, imagine, and reason directly in this mode. “Mental” is actually a bit misleading here as this is not “mind work”, but “thinking with your body”. Gardner notes about these intelligences that they always work in combination. A dancer needs bodily-kinesthetic intelligence to control the body, musical intelligence to follow the music, and interpersonal intelligence to be able to communicate the dance. Following Gardner, the interaction designer needs both bodily-kinesthetic intelligence to imagine interactive behavior, and interpersonal intelligence to be able to understand the interactive experience of others. In addition, most interactions designers need spatial intelligence to be able to do graphics design. If we add to this the logical-mathematical intelligence needed to Part III, Reflections 227 do scripting in today’s prototyping tools, we get a feel of the multi-disciplinary nature of interaction design. The interaction designer as choreographer If we let each intelligence be identified with a profession, we find in the current view on User Interface Design represented: the programmer (logical-mathematical intelligence), the graphics designer (spatial intelligence), and the psychologist (interpersonal intelligence). What is lacking is the choreographer (bodily-kinesthetic intelligence). The consequence for the software industry is that multi-disciplinary teams for userinterface design should, if possible, include persons with backgrounds such as dance, martial arts, or choreography. This does not make any of the three other professions less needed in a team. The Interaction-Design-as-Choreography metaphor implies that the user is the dancer. This fits very well with our focus on interaction as a bodily activity. It also makes clear that an Interaction Designer’s product is not a dance to be viewed from a distance, but a dance to be danced, i.e. the experience of the dancer. Interaction as Tango For the Interaction-Design-as-Choreography/Interaction-as-Dance metaphor to be precise, it needs a slight adjustment concerning the interactive nature of the experience. Some choreographers allow for their dancers to improvise on stage, but in most cases the dancer is very much a puppet on a string for the choreographer. In the latter case we are almost back to Laurel’s theatre metaphor (see Chapter 2.9). What we are after is for the user to have an interactive dance experience. We get this in dances like Tango. Tango, as it was first danced in the bars of Buenos Aires, is a constant improvisation within the rules given by the dance. With Interaction-as-Tango, the role of the Interaction-Designer-as-Choreographer becomes to compose the dance as interaction. This includes the steps of the dance, the interaction between the dancers, and its rules for improvisation. When the “design” job is done, the choreographer leaves the ballroom, but the dance continues as a life form; constantly changing patterns of movement as the dancers explore the possibilities for improvisation and interaction given by the dance. The dance is danced. It becomes lived experience, and a medium for the dancers to project themselves into the dance. This is the role of the Interaction Designer: to create interactive forms for the users to live. Curriculum development As multi-disciplinary teams are not the solution to every problem, we need to include in the training of interaction designers a development of bodily-kinesthetic intelligence. For a curriculum in Interaction Design this means the inclusion of practical exercises involving the body. For this purpose, a lot can be learned from the exercises developed for dancers, choreographers, and in the martial arts. 228 Understanding Interactivity In the same manner as students in interaction design are often advised to take classes in drawing and visual thinking, the students should also be encouraged to take classes in dance, choreography, Tai Chi Chuan, or in other practices that help develop their bodily-kinesthetic intelligence. It might sound farfetched that taking dance classes should make designers better at developing interactive software, but I find this no different from how drawing classes make designers better at visual design. The combination of bodily-kinesthetic and interpersonal intelligence is harder to develop. This is the combination of skills needed to be able to design interaction together with the users, and to learn from observing users at work. The training needed for this exceeds dance classes. It is most similar to the training of choreographers. Unfortunately, very few classes are available to give such training. On the other hand, the ability to empathetically “feel” the interactive experience of others is also an advanced skill in Interaction Design, and might come naturally as the designer’s sense-for-the-interactive-quality-of-things develops. 12.2 The Bricoleur Designer As interaction designers we need more tools than just our body. The software tools we have available as designers shape both the design process and its products. It also shapes our “mental-model” of what can be designed. The design of tools for supporting design consequently becomes also a design of practice and ontology. Following this line of thought, it is natural to start a discussion of design tools with a look at design practices. Schön (1983) reports on professional practices in as diverse fields as architecture and psychoanalysis. He found an important aspect of the design practices he studied to be a dialogue between the designer and the material. Each practice has its own specific "language of designing", which to a large extent is influenced by the medium through which the design ideas are expressed. Architects communicate their ideas mainly through sketches on paper. Most of their thinking also happens in this medium. To use the terminology of Norman (1993), the act of drawing sketches on paper is an integral part of the cognitive process of the architect. Architects think with brain, hands, and eyes. They often experience it as if the materials of the situation "talk back" and take part in the process of creation. Most writers have experienced the same phenomenon: the process of writing brings forth ideas that were not necessarily present before the actual writing started. The writer has entered into a dialogue with the text. This view of the design process is very different from the view held by some theoreticians, e.g. Newell and Simon (1972), that design is a rational process that is best explained as goal seeking behavior. Rationalistic views of this kind were until recently widely held within computer science with respect to the software design process. As a result of extensive field studies of software design practices, it is currently hard to defend a view of software design as a rational top-down process. Papert (1992) borrows the term bricolage/bricoleur from the French anthropologist Levi- Strauss to describe design processes involving large elements of improvisations based on the Part III, Reflections 229 material available. As the "father" of the educational programming language LOGO, he has observed how children are able to construct interesting programs in a bottom-up fashion. Their unstructured and playful behavior shows important similarities with what Levi-Strauss observed among "primitive" tribesmen. The latter constructed their artifacts through playful improvisation from what was available in their natural environment. Papert's colleague Martin (1995) observed the same phenomenon among university students in the LEGO-Robot course at MIT. In this course, the students were given the task of constructing working robots using an extended version of the LEGO-Technics construction set. This set is extended with useful electronics, and with a programming environment for a microprocessor on-board the robots. He observed that: While some students do take a "top-down" approach towards their projects, enough do not so that we as educators should take notice. Most students' projects evolve iteratively and in a piecemeal fashion: they build on top of their previous efforts, sometimes pausing to re-design a previously working mechanism or integrate several previously separate ones. From this he concludes that the design environments and materials should encourage playful design and creative exploration in the same way as the LEGO bricks do. Applied to software environments, he claims that this means providing clean levels of abstraction with welldefined and observable building blocks. 12.3 Interaction Gestalts vs. Objects What are the natural building blocks and levels of abstraction for interaction design? Can anything be learned from the Square World experiments? The experiments showed the importance of kinesthetic thinking in interaction design. The current user-interface design tools do not support kinesthetic thinking well. Great benefits should be expected concerning productivity, creativity, and job satisfaction if the design tools could better supported this artistic way of working with interactive behavior. The experiments suggested that interaction gestalts are the natural building blocks in kinesthetic thinking. From a computer-science perspective, the interaction-gestalt editor of Experiments B and the resulting editor from Experiment C represent hybrid solutions that introduce an unnecessary level of abstraction on top of the underlying internal state-transition formalisms. From a user centered (phenomenological) perspective it seems that simple widgets with builtin behavior (perceived as Interaction Gestalts) were the natural atomic objects for the participants. The state-transition formalisms normally used for describing GUI behavior rest on an abstraction of linear time with stable states. When designing interactive behavior directly and non-verbally with interaction gestalts, there seem to be no need to involve abstractions of time. 230 Understanding Interactivity If we contrast these results with the dominant programming paradigm in computer science (Object-Oriented Programming), we find some striking differences concerning the conception of time, and concerning the nature of objects. Time In Object-Oriented Programming (OOP), and in all other algorithmic programming approaches, interactive behavior is described in algorithmic terms. Underlying an algorithmic description is the notion of linear, discrete forward-moving time. In the naive theory emerging from the experiments, linear time is only one out of many possible ways of conceptualizing interactive behavior. In most of the cases the interactive experiences were not dissected into discrete events, but were dealt with as meaningful wholes either directly or through metaphor. Objects In OOP, an object is described as a fixed entity having an objective existence “in the external world.” This way of seeing objects conform to the way natural phenomena are described in systems theory and in most of the natural sciences. Philosophers have called the underlying scientific paradigm “positivism” and “objectivism”. The objects described by the subjects in the experiments existed for them only through interaction. The objects emerged as a result of the interplay between the intentions of the users, the users’ actions, and the feedback given by the system. When the intentions of the users changed, different objects emerged. From this perspective, it is meaningless to talk about objects as existing “in the external world” independent of the intentionality of the subjects. The latter view fits Merleau-Ponty’s epistemology where the physical world emerges to us only as a result of interacting with it. 12.4 Supporting the Kinesthetic Thinker As Turkle (1984) has pointed out, the computer differs from other media in that it is both a constructive and a projective medium. In physics you can not change the laws of nature to better fit naive theories of motion, but with a computer you can create software tools that enable users to construct their systems with abstractions and representations that are close to their own intuitive concepts. To support the bricoleur interaction designer, it is important that the tools and materials allow for the designer to work fluently in an iterative and explorative modus operandi. To use the terminology of Schön (1983), it is important that the software environment enhances a "dialogue with the material" in the design process. The popularity of WYSIWYG (What You See Is What You Get) interfaces in computer-based tools for graphics and layout design Part III, Reflections 231 indicates that this is best done in the sense modality of the resulting product. For the interactive aspects of user interfaces, this means designing in the kinesthetic sense modality. The design tools of Experiments B and C have a lot in common with the end-userprogramming environment AgentSheets, developed by Alex Repenning (1993). AgentSheets provides an agent architecture that allows for domain experts to build domain-specific construction sets. The construction sets consist of agents with built-in behavior that can be placed on a grid in a visual programming environment. A number of construction sets have been built and successfully tested on end users. Repenning has coined the term "tactile programming" to describe this way of constructing interactive software. The findings from the three experiments reported here indicate that it is possible to create tools that enable designers to "tacitly" construct interactive behavior directly in the kinesthetic sense modality. From this it is possible to sum up the previous discussions in a set of guidelines for making user-interface design tools more supportive to Kinesthetic Thinking: • Avoid forcing the designer to work in a sense modality different from the sense modality of the product. For interaction design, this means providing tools that enable the designer to express the interactive aspects of the product directly through interaction. • Avoid representations of behavior involving abstractions of discrete time and states. • Provide building blocks with inherent behavior, and implement algebras that allow for the construction of complex behavior from these simple interaction elements. • Avoid a "run" mode different from the "build" mode. 12.5 Painting with Interactive Pixels In an attempt to try out the ideas presented here, a very first prototype was made of a design tool that enables interaction designers to construct graphical user interfaces by "painting" with pixels with inherent behavior. I built this prototype to show the basic functionality of a possible new class of design tools. Informal user tests have been done that encourage me to continue developing the prototype along the lines outlined here. The basic idea is to allow the designer to construct interactive behavior directly at the pixel level with "tools" resembling tools in pixel-based paint programs like MacPaint. The prototype has strong similarities with AgentSheets (Reppening 1993). In AgentSheets, the agents are represented as icons on a grid. Here, I have taken this approach to an extreme by letting each pixel be an interactive agent communicating with its neighboring agents (i.e. pixels). The functionality of the prototype is best shown through an example. Figure 91 shows an example of an interactive drawing that was constructed with the prototype. It illustrates an on/off switch that controls a light bulb. The switch has toggle behavior. 232 Understanding Interactivity Click on switch or light bulb Click on switch or light bulb Figure 91. An interactive drawing. To construct this interactive drawing in black & white, four additional "colors" were needed as illustrated in Figure 91: • White toggle. This "colour" is initially white, and goes black when clicked on. It returns to white the next time you click on it. • Black toggle. This "colour" is the reverse of the previous one. It is initially black, and goes white when clicked on. It returns to black the next time you click on it. • White and black connectors. These "colours" are used to connect interactive pixels without changing their own colour. A runtime system takes care of making adjacent interactive pixels share input. This makes it possible to construct large areas, like the switch in the above example, that behave as interactive objects. Interactive objects can similarly be split into separate interactive objects simply by taking them apart on the canvas. Part III, Reflections 233 Always black Always white Connector White toggle Black toggle Figure 92. An interactive drawing in the making. To extend this approach from the well-defined domain of the experiments to the domain of today’s complex software, we are faced with a trade-off between visual elegance and flexibility. To be able to give the end-users back the control of their tools, we might have to rebuild today’s software systems from scratch using simple interactive building blocks that fit together in simple ways. By doing this we will probably lose some of the elegance of the current software, but we will gain a new kind of flexibility. The situation is very similar to the difference between a toy built from LEGO bricks and other toys. Most ordinary toys have only one use, and when the kids are tired of them they are thrown away. A LEGO toy can be taken apart, recycled, modified, and extended forever. The drawback is of course that a toy made of LEGO bricks looks much less like what it is supposed to mimic. It is currently an open question how the market would react to user interfaces and application that have different look and feel than today’s software, but can be easily opened up, studied, understood, modified, and re-used by most ordinary users. 234 Understanding Interactivity Chapter 13 New Technologies, new Ontologies “Technology is a way of revealing... Once there was a time when the bringing-forth of the true into the beautiful was [also] called techné.”. Martin Heidegger in The Question Concerning Technology, 1954.19 There is a long tradition in the design field for investigating the design spaces of new technologies. An early example of this can be found in Itten’s design classes at Bauhaus (1975). Itten encouraged his design students to experiment with different materials and techniques to learn about their essential nature. Such investigations have always been formal in the sense that their aim has been to find out what is possible with a new technology. To be able to make full use of a technology, such investigations are very useful. Much of the research being done at MITs Media Lab can be characterized as investigations of new design spaces (see Ishii and Ullmer, 1997). Such investigations rarely tell us much about how the new technology will be experienced in use by its end-users. If we use Square World as a metaphor for a technology, the three experiments gave insights into the perceived ontology of this technology. This ontology was very different from the formal analysis that led to the construction of the 38 examples. Hopefully, an insight into the perceived ontology of a technology can give designers a powerful tool when designing the interactive experience. Such a general “psychology” of a technology can of course never totally remove the need for doing usability tests and iterative design, but it might reduce the number of iterations needed before products can be shipped. Can the empirical methods used to investigate the ontology of Square World be applied to the investigation of other technologies and domains of knowledge? To answer this question it is useful to start by giving a generalized description of each of the three experiments. 19 (Heidegger, 1977, pp. 294 and 315) Part III, Reflections 235 13.1 The Research Methodology Generalized Experiment A: Exploration with think-aloud The methodology used in Experiment A can be generalized as follows: • Given a constrained design space that can be expressed formally as a set of elements and configurations/operations on these elements. The interactive squares from the previous chapters constitute such a design space. Other examples could be: the possible patterns you can create with a 6x8 matrix of monochrome pixels on a screen, the set of possible four-note sequences you can compose from a single octave on a piano, or the possible intonations for the sentence “The cat is on the mat”. In the terminology of Heidegger such formal descriptions are at the ontic level. • The empirical research question we ask is how this design space is experienced. Depending of the media involved this means by seeing it, hearing it, interacting with it, or some combination. In the terminology of Heidegger we are asking for the ontology of the domain. What we are after is not just a corpus of isolated “experiences”, but an empirically based theory that is able to make sense of all the data. • The experimental design consists of picking an interesting subset of the design space and exposing a number of subjects to these stimuli. In addition to recording the subjects’ tacit reaction to the stimuli, they are asked to describe the stimuli in their own words. In the case of interactive stimuli they are also asked to describe their interaction with the stimuli. Tacit reaction/interaction can be everything from mouse clicks to eye movements or body posture. • The resulting corpus of tacit and verbal data is then transcribed in a suitable way and analyzed in search of recurrent patterns. • The theories emerging from such experiments can be of many different kinds. In our example I have focused on four aspects: − The analysis at the detailed micro-level of perception gives an idea of how a single artifact is explored. The data used here are to a large extent non-verbal, i.e. mouse operations. The result is a description of the micro-level perceptual process. − The search for implicit metaphors in the verbal data leads to a cognitive theory of how the domain is intuitively structured. In our example, there emerged from this analysis dimensions like time, state, and space. These descriptions are static and describe the structure of the cognitive content. Due to the small number of subjects, the resulting analysis is mainly qualitative and has no predictive power. The numbers added must be interpreted mainly as indications of possible correlations. − By looking at the ways the subjects related to the stimuli, we got a theory of different levels of involvement. This includes both the possible relations between the subject and the stimuli (content), and the dynamics of how these relations change (process). 236 Understanding Interactivity − By looking at aspects of the data that was not captured by the above theories, there emerged the concept of Interaction Gestalts. This allowed us to identify an interesting aspect of how interaction is experienced (content). It also made us aware of the process of perception at a higher level, i.e. how the perception of one artifact is colored by the perception of the previous. Experiment B: Testing out editors The methodology of Experiment B is harder to generalize as not all technologies do allow for building computer-based editors. For some domains we would have to substitute editor with formalism. The methodology would then go: • Build as many formalisms for the domain as possible. The formalisms can come from a formal analysis of the domain, or from ways of understanding the domain resulting from experiments like Experiment A. • If possible, build running editors for these formalisms that allow for the participants of the experiment to construct with. • Let a group of participants individually be exposed to examples like in Experiment A. • Give the participant examples to reproduce with the formalisms and/or editors. Record how they work, and ask them to comment on the formalisms and editors. • Ask the participants as a group to choose their favorite formalism/editor or combination of such. Experiment C: The participatory design of tools The same reservation about running editors apply to Experiment C. If we allow for the design group to develop formalisms instead of editors, we get the following generalized methodology: • Let a group of participants individually be exposed to examples like in Experiment A. • Give the participants as a group the task of designing a formalism or editor to enable them to express/construct such examples. • If the domain allow for editors, implement a running prototype of the editor. • If thedomain only allows for building formalisms, try to clean up the formalisms. • Repeat the design-prototype/design-cleanup cycle a number of times (4-5). • Analyze both the design ideas that emerge and the resulting editors/formalisms. From three such experiments, it should be possible to construct an ontology of the technology similar to my analysis of Square World in Chapter 10. It is difficult to predict what methodology will be the most efficient for other technologies, but my findings indicate that the PD approach is the most efficient of the three. If a “low-cost” version of the experiments was required, the findings indicate that two or three PD sessions with a small group of end-users could give interesting results fast. Part III, Reflections 237 13.2 Example: Tangible User Interfaces One interesting emerging technology that could benefit from an empirical investigation of its ontology is Tangible User Interfaces (TUIs). Much current HCI research aim at moving the user interface out of the screen and into the physical environment of the user, or as Ishii and Ullmer put it: “to change the world itself into an interface” (Ishii, p.237). Figure 93. The Marble Answering Machine. One of the most interesting examples to date of a Tangible User Interface is Durrel Bishop’s Marble Telephone Answering Machine (Crampton-Smith, 1995) (Figure 93). In his design, each incoming voice message is represented by a physical marble that pops out of the machine. To listen to a message, you simply place the corresponding marble in a tray marked with a load-speaker icon. To delete a message, you recycle the marble into the machine. Figure 94. MediaBlocks and Triangles. 238 Understanding Interactivity Figure 94 shows two more examples of TUIs. To the left is Ullmer’s MediaBlocks (Ullmer, 1997). He explored a similar idea as Bishop, but let his blocks represent media content like pictures, video clips etc. The picture to the right shows Gorbet’s Triangles (Gorbet, 1998). The triangles each held a microprocessor that communicated its configuration to a computer. This allowed for using the triangles as a general input device. Information could be associated with each triangle, and different configurations could give different meaning. When physical objects work as representations of information in this way, new kinds of user experiences are created. The experience of physically recycling a marble into an answering machine is obviously different from the experience of dragging an icon to the trash bin on the Macintosh. The interesting question is how it is different. To be able to answer this question adequately a combined theoretical and empirical study similar to the present might be of use. Studying the ontology of mobile interactive artifacts Such a study could begin witha formal exploration of the design space of tangible user interfaces. First, its dimensions would have to be identified. Then a number of abstract expressions similar to the examples in Square World would have to be developed. One way of approaching this could be to bring the squares from Square World out of the computer as shown in Figure 95. This would imply building a set of interactive mobile squares (“tiles”) with built-in input, output, computation, and communication. A tool would have to be developed to allow for programming of their behavior. This allows for exploring some of the basic properties of mobile interactive artifacts. Part III, Reflections 239 Figure 95. Bringing the squares out of the computer. Slight modification in behavior would probably lead to very different interactive experiences. Next follows as short sketch of how such a formal exploration could start. Figure 96. Interactive behavior. Figure 96 shows a storyboard ofan interaction with a tile with push-button behavior. Other behaviors such as toggling should also be explored. Figure 97. Remote control. 240 Understanding Interactivity If we allow for inter-tile communication, we can explore more complex concepts. Figure 97 shows an example where an event in one tile leads to behavior in another. This gives us a simple “remote control”. Figure 98. Shared identity. Figure 98 shows an example where an event in any tile leads to behavior in both. This allows us to experiment with concepts of shared identity. Empirical grounding The above examples where part of a possible formal analysis of the design space. To be able to build a theory of how TUIs are experienced in use, an experiment has to set up. Following the outlined methodology, a first experiment could be to let subjects try out different behaviors for the interactive tiles while their reactions where recorded. An analysis of the linguistic material could then be used to form a theory of the implicit metaphors and mental-models involved. Next, different ways of programming the tiles could be explored with the participants of the study. These experiments would then hopefully lead to interesting insights into how mobile interactive artifacts are understood in use. A possible ontology In the absence of necessary empirical grounding, the perceived general ontology of mobile communicating interactive artifacts can only be guessed. With the introduction of true physical space, and one or more subjects in this space, a new dimension is added. We now get the difference between an artifact that is put on a table, and an artifact that is carried around. If we use a “jumping light” behavior as example (example #11 of Square World with colors inverted), we get with two communicating physical squares a double spatialization. The “light” will now be something that jumps between places in “virtual space”, but at the same time, each position in this virtual space has a true physical position in real space. By moving the “light” between the squares, we change a virtual object’s position in virtual space. By moving the squares around, we change the relationship between real and virtual space. An example illustrates this: Imagine that I leave the room with the two squares on a table, the “light” in one of them. Imagine further that there are other people around, and when I come back the “light” has moved to the other square. If we assume that the squares are Part III, Reflections 241 physically identical, I have no way of knowing if someone has swapped the squares, or just clicked on the square with the light. In the same manner, if I come back and see no difference, I have no way of knowing if someone has both clicked on the “light”, and swapped squares. With this double spatialization, we get also a double presence of the subject. With the above example, the finger of the subject touched “directly” on the light in virtual space. With the introduction of virtual intermediaries, this situation changes. Imagine a scenario with one subject and four mobile communicating interactive squares. Let us for convenience name the squares A, B, C, and D. Let squares A and B have “jumping light” behavior as in the previous example, but let the “input” to these two squares be mediated through squares C and D respectively. Now square C “controls” square A, and D “controls” B. To move the light back and forth between A and B, you now have to click on squares C and D respectively. This configuration creates a situation where you move your experienced body around between positions A and B in virtual space by moving your finger between squares C and D. With different physical arrangements of the squares, we now get a changing “corporeality” of the interacting subject. Yet another example of this interplay between spaces can be seen if we let the subject’s virtual position be moved through interaction. To illustrate this, let us keep squares A and B from the previous example with “moving light” behavior, and instead of squares C and D now add a new square E. Let square E control the position of the light: each time you click on E, the light moves from A to B, or back. Through clicking on E, you thus “become” the light that moves between A and B. By also physically moving squares A and B around, you can move your virtual “I” in two ways. If you yourself, carrying squares A, B, and E, also move around, you get the interplay between three kinds of spatiality: Your position in virtual space, the real position of this virtual position in physical space, and your real position in physical space. If we add to this picture the subtle differences between holding something, interacting with it, carrying it, wearing it, having it placed somewhere, and having it mounted on the wall, we see the possible complexity of an ontology of such artifacts. We see also how only parts of what we learned from the experiments with “mounted” squares apply to this new technology. The moment we metaphorically “take the squares down from the wall”, a whole new world opens up. It is important to note that the new world of the subject will not be a sum of spaces like presented here, but an integral “world”. The moment this new world is integrated into the lived body of the subject, it becomes his or her lived space. The structure of this lived space can be analyzed with concepts such as “real space” and “virtual space”, but to the subject it is simply where he or she dwells. To reflect on its structure requires that he or she “chooses to take that horizon as a theme of knowledge.” (Merleau-Ponty, 1962, p. 22) 242 Understanding Interactivity 13.3 Discussion The TUI example has shown that the research methodologies developed for this study can be applied to the analysis of other technologies and media. Different areas of application will probably require slightly different methods of investigation, but the basic idea should work across technologies. Inherent in the methodology is sensitivity to theory. I have in this study drawn heavily on Heidegger and Merleau-Ponty. Other areas of application might be better off with different theoretical approaches. A similar study of net-based technologies would for example probably require good theories of social interaction and draw heavily on ethnomethodology, social psychology, and sociology. A study of film would have to draw on film theory etc. What is left as the essence of the methodology is a search for each technology’s individuality. Individuality is in this sense not only what Kandinsky says a technology “conceals within itself”, but most of all how each technology gives rise to a particular experience-in-use. By understanding the individuality of a technology in the latter sense, a designer is given the opportunity not only to design functionality and style, but also to design the interactive experiences. Beyond the life-world By focusing only on what the users project onto a new technology, there is a danger of losing of sight some interesting potentials for creative design. There is also a danger of ending up with “boring” solutions. Scenario III in Chapter 2.1 introduced Scott Kim’s interactive puzzle “Heaven and Earth”. This game has a “magical” structure that goes beyond the structure of the user’s lifeworld. The game still “makes sense” because it has a coherent logic and structure that reflects a possible life-world for human beings. This illustrates the possibility for solutions that go beyond the life-world of the user. When the “naive theory” of a technology has been mapped, the search can begin for other meaningful “theories” beyond the naive. This is not to say thathere and a there, a left and a right, an up and a down. This makes us give a certain meaning to the interactive experience that would have been different for a subject with a different corporeality. Our ability to be imaginative in the kinesthetic domain also has its root in this bodily experience of being physically present in the world. In addition, the ability to interpret the interactive experiences of others requires a body with a similar history of bodily experiences. Without such a common experiential and corporeal ground, our understanding of the interactive experience of the other would be just as shallow as Weizenbaum’s Eliza’s “understanding” of its “patients”. Part III, Reflections 245 The body as meaning-producing subject One deep philosophical question touched upon by the study is the role of the body in human existence. The experiments indicate that the body not only is “medium” and “background”, but that the human body is the seat of meaning-production for interactive experiences. To change the role of the human body from being an entity among other entities in a Cartesian res extensa, to being the ultimate creator of meaning and purpose, challenges deeply held assumptions in Western thinking. To start asking such questions is part of a ongoing Corporeal Turn in philosophy with roots back to Merleau-Ponty. 14.2 Implications The Training of Interaction Designers Interaction design is a new craft that is different from both graphics design and computer science. The conclusions from the study has some important consequences for the training of interaction designers. In addition to competence in visual design and programming, an interaction designer should have a well developed sense-for-the-interactive-quality-of-things. This sense includes appreciation for detail in interactive artifacts, and training in kinesthetic thinking. Such training should probably include exercises to develop the body consciousness of the designer, as the root of the skill can be found here. The empathic ability to interpret correctly the interactive experience of others, from usability tests and observations, should also benefit from an advanced sense-for-the-interactive-quality-of-things. Tools for supporting kinesthetic thinking in interaction design Current tools for constructing interactive behavior rely on a decomposition of the behavior into discrete events and actions. Construction is further always done through representations of the behavior. A focus on kinesthetic thinking opens up for allowing the interaction designer to work in a more direct fashion with building blocks having inherent interactive behavior. Both experiments B and C, and the prototype editor presented in Chapter 12.5 show that it is possible to design tools that enable the designer to work in a modus operandi closer to kinesthetic thinking. This shift from designing through representations to working directly with the product is similar to the shift in visual design on computers that came with the WYSIWYG principle. A methodology for analyzing new technologies The experimental research methodology was here used to investigate interactivity in GUI-like artifacts. The results from this analysis can not directly be applied to new technologies going beyond the current PC paradigm, but the research methodology can be re-applied. Chapter 13 gives an example of one such analysis. It shows how the formal/experimental methodology 246 Understanding Interactivity developed for this study can be used in the analysis of the experienced ontology of a new class of technologies loosely referred to as “Tangible User Interfaces” (TUIs). 14.3 Future Work New tools The prototype editors developed show a possible new direction for design tools. This potential should be investigated further by trying to go beyond the pixel-level simple problems tackled so far. One important question is to what extent the ideas presented here scale to real-life design tools. The tools should be developed in close cooperation with a potential user group, preferably experienced interaction designers. Theory development It is to me an open question how the psychology of Kinesthetic Thinking should be studied further. From a psychological point of view, the nature of Interaction Gestalts and Kinesthetic Thinking is only pointed to here. How should the experiments be set up, and how should the results be interpreted? Will it be possible to identify simple gestalt principles in the kinesthetic domain similar to those in the visual domain? If so, will these principles be as applicable to interaction design as the gestalt principles of the visual domain and the insights about visual thinking have been to graphical design? Methodology development and the study of other technologies The early analysis of TUIs indicates a direction of research involving both new experiments, new “abstract” artifacts, and the refinement of the research methodology. TUIs, or in a broader sense “Mobile Computing”, is a good first candidate for new experiments. This research should be cross-disciplinary, involving contributors from at least experimental psychology, computer science, industrial design, electronics design, and philosophy. The promising results from involving artists in interaction design (see Crampton-Smith and Tabor, 1996) indicates that the list should also include art. Curriculum development One of the main conclusions from the current study relates to the training of interaction designers. Both curriculum development and empirical follow-up studies are necessary to verify these conclusions. In an invited ACM paper on the next 50 years of computing, Terry Winograd (1997) predicts that Interaction Design will evolve to become one of the most important fields of computer science. By its cross-disciplinary nature, it will change computer science as we know it today. From this perspective, curriculum development in interaction design is worth while. Part III, Reflections 247 In (Winograd 1997), he describes the current situation in Interaction Design, and gives directions for change: “A striking example at the time of this writing is the chaotic state of "web page design". The very name is misleading, in that it suggests that the World Wide Web is a collection of "pages," and therefore that the relevant expertise is that of the graphic designer or information designer. But the "page" today is often much less like a printed page than a graphic user interface -- not something to look at, but something to interact with. The page designer needs to be a programmer with a mastery of computing techniques and programming languages such as Java. Yet, something more is missing in the gap between people trained in graphic arts and people trained in programming. Neither group is really trained in understanding interaction as a core phenomenon. They know how to build programs and they know how to lay out text and graphics, but there is not yet a professional body of knowledge that underlies the design of effective interactions between people and machines and among people using machines. With the emergence of interaction design in the coming decades, we will provide the foundation for the "page designers" of the future to master the principles and complexities of interaction and interactive spaces.” (p.162) Hopefully, the present study’s focus on “understanding interaction as a core phenomenon” can help fill this gap in Interaction Design between “people trained in graphic arts and people trained in programming”. 248 Understanding Interactivity

2 comments:

Blogger said...

Did you know that you can generate money by locking special sections of your blog or website?
Simply join AdWorkMedia and run their content locking tool.

Blogger said...

On EasyHits4U you can earn free advertising credits by surfing other website-ads from a user base of over 1,200,000 accounts. Earn credits fast with a 1:1 exchange ratio.