HCI and Research Design: 2012

Wednesday, March 28, 2012

Wrapping up things related to working memeory

The new definition of working memory

In 
his seminar work on the capacity differences of individual working 
memory, Carpenter (Just & Carpenter, 1992) redefines its concept by 
borrowing a computational theory. They posit that both information 
restoring and processing are supported by the very same property named “Activation”.

When
 we are taking in new information from computations, each element is 
associated with an activation level. The element can be a word, sentence
 or a physical object such as a cat.

During the comprehension, 
only the element whose activation level exceeds the threshold value can 
enter working memory; in other words, not all the candidates are able to
 be further processed by brain even it is well retrieved from long-term 
memory.

However, people differ from each other with respect to their work 
memory capacity. In this case, which is seen as the total amount of 
activation level that the system can sustain.

Therefore, if at one
 time, the sum is about to go beyond the system limitation, certain 
chunk of old information must be de-allocated in order to accommodate 
more incoming data and computations.

Serving as one of fundamental concepts in cognitive science, we often find it in other literature.

Complementary learning systems

The
 hippocampal (HC) systems will be active in receiving information and 
temporally restoring the information, which means that the novel 
material cannot be truly learned immediately.

The neocortical 
systems will incorporate the new information gradually, as any quick 
absorption could be detrimental to existing structures of knowledge. 

The
 novel input need to go through various consolidation stages to obtain a
 membership from neocortical systems. Appropriate external events can 
serve as good opportunities and will help adjust neocortical 
connections.

Usually, the incorporation of unfamiliar material is 
slow, especially for arbitrary and idiosyncratic materials. Thus, when 
the memory traces degrade with time passing by, it is possible to lose 
them before they can be built into the shared structures in neocortical 
systems.

The complementary learning systems are more related to information storage and maintenance than information processing.

However,
 the weight changes of neurological connections may also be bound with 
“activation”. Without appropriate activations, the connections are not 
re-constructed or even established.

Although McCelland (McCelland 
& McNaughton, 1995) does not explicitly mention how the 
incorporation rate is to do with working memory capacity, the concept of
 “learning rate” may be a potential factor affected by the level of 
activation.

Metaphor

Metaphor 
in linguistic practices is fairly common, if a person wants to express 
certain idea from utterances, the hearer often can detect the meaning 
rather than the literal meaning of sentences.

If “S is P” actually refers to “S is R”,
 the comparison theory attempts to address the question of “how can we 
compute the potential value of R”; on the other hand, the interaction 
theory tries to look into the range restriction for R’s value by 
referencing the relationship between S and P.

To make this 
communication possible, the speaker and hearer must have something in 
common such as the principles to translate utterances.

Searles 
provides eight principles and suggests the methods by which the 
utterance of P term calls to mind the meaning of R term in the way 
pertaining to metaphor. For example, one principle is about human 
beings’ sensibility, and it is applicable for people from multiple 
cultures, as it is naturally determined.

From the perspective of 
working memory capacity, the duty to locate possible links between S and
 P or narrow the scope of R values heavily rely on activation for each 
property.

I maintain that the task to interpret metaphor is less 
like to challenge us in that there are normally three base elements to 
begin with, unless the range of R is quite large, or the “overlapping” 
attributes of S and P are too many.

In fact, an interesting 
question here is that: which principles consume more resources in 
working memory. The seemly more complex principle is not necessarily the
 one that requests a larger “brain”; it also depends on other constructs
 such as hearer’s knowledge base related to a specific domain.

Categorical Perception

The
 interplays between perceptual information and high-level knowledge of a
 particular object are another universal cognitive process. There are 
basically four rules that people use to group object.

While 
prototypes allow us to bring in more attributes to formulate 
semi-stereotypical representations, the exemplar models are stored 
stably in the memory, and they appear in a highly-abstracted form.

The
 process of establishing prototypes and exemplar sees the differences in
 the light of McCelland’s theory on complementary learning systems. 
Specifically, to build a new prototype may take a longer time than 
modeling an exemplar, as the former has to undergo a systematic 
procedure to live within existing structures that is shaped based on 
other surviving prototypes.

From the standpoint of working memory, it should work well to predict the efficiency of categorizing a perceived object.

Nevertheless, it may not directly determine how soon the new token will reach neocortical systems.

In
 the context of metaphor, in addition to referencing currently stored 
categories for the purpose of producing the metaphorical meanings of the
 utterance, we can also use the outcomes to re-categorize an object.

Cognitive Breakdowns 

The
 cognitive breakdowns appear in multiple forms; for example, the loss of
 balance, the difficulty to focus, or the lack of reasoning 
capabilities.

Working memory is one major component being 
affected, for instance, some patients need to consistently re-chunk 
information in order to continue searching for the answer to a simple 
inquiry.

If the brain sees too much cognitive load, certain part 
of it will cease functioning. From that perspective, a wise usage of 
limited working memory resources is recommended for pertaining subjects.

The Representation of Personality in the Affective Reasoner

The situation is presented in frames, and frame matching asks for sufficient resources from working memory.

The
 computations on situation frames are supposed to generate emotion. 
Working memory is crucial for instantiating this intensive process on 
left hemisphere (LHS).

The emotion is only available if over 
intensity threshold, which is determined by bindings on LHS. Working 
memory is attributable to the binding process.

Some Implications for Design

First, designers should realize that users have different ability to store and process information; thus they should not be self-centered as the solution provider.

Second, designers must avoid using a solution that is too novel; although it may surprise users for its creativity, it may damage existing usage patterns that the users hold for other tasks at that moment.

Third, providing a "buffer-zone" for chunking/rechunking is recommended, as well as “Recognize Rather Than Recall”. Methods such as those will reduce users’ cognitive load, thus decrement the chances of over-charging working memory.

Forth, metaphor is equally important in that it is relatively easier for users to figure out the underlying meaning of design elements such as button or a gesture. 

Furthermore, the calling for “universal design” does not only apply to people with physical disabilities, but also patients with brain damages at various levels.

Last but not least, if by its definition, the title of “User Experience” caters to user’s emotion responses, designers are expected to understand a little bit about the principles of emotion generation.

(&copy; 2012 Miaoqi Zhu)

Tuesday, March 6, 2012

What if?

I just had this "crazy" thought:

What if each neuron represents each planet, what if each neurological connection represents each galaxy. We may just live in our brains...

One of my friends responded with his understanding. He said "we may live in other beings' brain." However, if I see "you", you are likely to be perceived by me, in that sense, it is not implausible to say that you are just "living" in the world of my brain.

The question may need to be answered by philosophers, and it is difficult to prove or disprove anything by science, although it could be possible someday.

(&copy; 2012 Miaoqi Zhu)

Saturday, March 3, 2012

Complementary Learning Systems, Dreams

The motivation for this post stems from my recent reading on McClelland's complementary learning systems on “Hippocampus (HC) and Neocortex”. This paper is also kind of survey papers from certain perspective. In particular, the authors smartly put relevant theories together to reach the goal of demonstrating his own idea, which is: "the HC provides training trials, allowing the cortical system to select representations for itself through interleaved learning." The paper, of course, starts with two key question s of interest:

Why is the HC system needed? If we have neocortex system that owns more neurons, why do we want HC to do?
Why do the changes in neocortical connections take so long? In other words, can the new materials fully absorbed rapidly?

It looks like HC system is responsible for storing recent memory, and neocortex is used to reserve remote memory. To avoid interference with the knowledge stored in neocortex, HC is here to help accommodate the initial storage from a new learning event; in the meanwhile, to not bother existing system of knowledge structure, the changes should be made slowly within neocortex. There are certainly “communications” between HC and neocortex, otherwise, no information will be transferred, and no changes will be made. 

Marr (1971) propose that the HC systems stores experience in the day time, and replayed the memories in the HC back to neocortex in the night. Does it imply that brain activities in the night are caused by consolidation (or the opposite)? For example, dream. 

A researcher in this domain indicates that dreams can be emotional as we replay old memories and update them with novel information from recent events. Associating with personal experience, I agree with it, but I become more interested in the verb “update” in terms of its functionality. Specifically, why the brain needs to update in what sub-systems of our brains?

So a ROUGH idea here is that: if the dream happens partially owing to the connection changes (learning) in neocortex system from HC’s input, let us contemplate on some questions below:

Is it due to too many computations requested from day-time tasks, so we better carry this out in the night time?
Why should it appear in the form of visual imagery? Does it facilitate the learning process?
Why cannot we dream every single night? If it stops dreaming, does it mean dreams are not caused by learning in HC and neocortex systems? Perhaps dream causes learning, not the opposite.
The memory trace in HC will decay, if we have an exceptional set of material, will it take more priority to be incorporated into existing knowledge structure using this precious period of dreaming?
What is the implication for HCI? Designing novel interfaces “encouraging” people to dream, thus users can learn it faster? This question may be addressed after knowing why exactly people dream.

Reference:

Web article: http://www.livestrong.com/article/78256-parts-brain-produce-dreams/

Marr, D. (1971). Simple memory: A theory for archicortex. The Philosophical Transactions of the Royal Society of London, 262(Series B), 23–81.

McClelland, J. L., McNaughton, B. L., & O'Reilly, R. C. (January 01, 1995). Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychological Review, 102, 3, 419-57.

(&copy; 2012 Miaoqi Zhu)

Monday, February 20, 2012

Activation and direct emotion generation

If by definition, the title of “User Experience” indicates catering to users' emotion responses, “activation" can help in understanding the principles of emotion generations. In his ground-breaking work on affective reasoning (Elliot, 1992), Elliot summarizes the emotion generation based on his previous studies. Taking the direct emotions for illustration, when an agent perceives a potential relevance with a situation, the frames including the goals, standards and preferences (GSP) are matched against the eliciting situation frame. If a match is established, the situation is officially the agent’s concern in the form of construals. The working memory determines the matching success rate in that the agent must retrieve information for GSP. Bindings take place in the left hemisphere when the situation frame slots marry the variables in slots of construal frame from the previous phase, then an emotion eliciting condition relation (EECR) is created for each construal of that situation. Because there may be many interpretations of a situation, there are multiple EECRs to be confirmed individually and gathered together by involved agent. In the following step, compound-emotions EECRs are formed from the separation and recombination of the event-based construals and attribution-based construals. The prospect results will work with the domain-independent rules to generate an emotion instance such as hope.  From my understanding, a basic underlying activity is activation, for instance, the matching process depends on working memory, and we have covered the meaning of activation to working memory earlier. The same story may happen to binding as well, because there are intensive computations to analyze and integrate frame slots of two different resources.

For instance, if a scenario tells us a group of users are more likely to be frustrated under a circumstance, what can a designer do to make users feel emotionally easier?  We can manipulate some variables to suppress activation in the working memory. Imaging a busy Mom standing in front of an airline kiosk with two playful little boys, she may have developed a prospect-base emotion of fear already, as she can construe the situation based on existing frame slots gained from latest experience in relation to her goal, standards and preferences. In this case, designers can either employ a quick interface walk-through demo as a light tutorial when users are still in waiting areas, or adjust environmental variables such as lighting and private space, as long as those measures help diminish the chance of activation.  

Figure 2: the mapping structure from situation to
emotion (Elliot 1992)

I agree that the above theoretical framework is hard to validate, but it is still a good model to look at. As HCI practitioners, we do not have to be "addicted" to a particular theory, or over-criticize it. A well-shaped science mind is good in terms of understanding design solutions as the inputs to human brain.

Reference:

Elliott, C. D., & Northwestern University. (1992). The affective reasoner: A process model of emotions in a multi-agent system. Evanston, Il: Northwestern University 

(&copy; 2012 Miaoqi Zhu)

Friday, February 10, 2012

When the reality tells me the "feeling" is wrong

I was on the north-bound train almost approaching my destination; an idea suddenly came to my mind, why did I feel the train is heading to a different direction at this time comparing to the time when I was aboard. 

There is no doubt that the train is driving to North; otherwise, I may never get back home. Having this assumption in mind that the train is right, I began recalling the scenario in which I got aboard from Adam/Wabash. This does not sound like a very difficult task; however, here are a few things making it more complicated than expected:

The stop where I departed is underground, thus there are a few clues for me to get a sense where the train is going. In other words, the sort of survey knowledge I established before cannot take effect here, at least for me;
The entry of a particular station: specifically, for those stations with centered platform, there are typically two entries located on two sides. If a passenger going North enter from the north gate, since there are two floors, she or he need to turn around to catch the Northbound train once reaching the platform, vice versus;
The seats are placed towards to two directions. One is with where the training is driving to, while the other is the opposite. Having said that, there could be another couple of mental rotations to carry out;
The train's door opening at either sides depending on the design of station platform.

When the first factor is relatively independent, another three in fact has certain interactions. Imaging you are heading to North: you firstly enter the station from the South gate, which means that you don’t have to turn around; fortunately, when the trains arrives, after boarding from the right-side door, you decide to use the seats on the right is following train’s direction, in this case, you  may feel less striving to recognize the direction. Nevertheless, if you get into the train through the left door and decide to sit on the left side where those seats are facing backward, it might add extra work for the agent in terms of figuring out where s/he is going to. If you start from an above-ground station, the puzzle could be even easier to resolve, because you have a number of visual clues to reference such as landmarks. 

Having listed the possible noises preventing me from feeling right for the direction, I continue recalling each scene from the time to enter the station to board the train. Since I am more type of visual thinker, I enjoy playing back every single memory frame. The first challenge was presented because of the internal construction structure of the station, it is important in that the number of turns along with its direction co-determine my actual orientation in the world. I resolve this problem by putting myself in an imaginary blueprint of the station, and re-experiencing the journey virtually in head. Then the second challenge follows because I have to map the spatial relationship between my seat and the door from which I got in. Please be aware each cart has two doors on one side, which means that if I sit closer to the door where I boarded, it will be easier; but if I choose a seat that is further away, or if I don’t remember which door I entered, it would be another story. 

Finally, I got the “feeling” corrected by reasoning the spatial relationships among each object that I am able to recall. Further, playing back the scenario is helpful in terms of structuring the space I was in before. A big question here is that: why did I lose the feeling of direction in the first place? I mean when the train is operating underneath, the feeling is alright; yet once the train drives out of the tunnel, I became a little anxious since my intuition tells me that feeling is incorrect. I was also surprised by how many chunks of information and attention resources I have employed to get that feeling fixed, let alone the intensive computations occurring in my head. Every time I spare a little attention for other activities (e.g. talking to friends), I may need to restart the process again, because a sub-process generates so much data that needs to be temporarily stored in working memory for the next thread of computation(s).

(&copy; 2012 Miaoqi Zhu)

Sunday, February 5, 2012

Activation

When I was in the “flow” state of reading Anderson’s paper of “ACT A Simple Theory of Complex Cognition”, I could not help myself referring back to Just and Carpenter’s seminar work on capacity theory of comprehension. I am wondering perhaps there is a connection between two theories, which is the “Activation-Level”. 

First of all, just a little recap of Anderson’s paper: He tries to understand the basic components and processing principles of our cognition. By studying how people write recursive programs, he claims that there are two elements: productions (procedural knowledge) and long-term chunks (declarative knowledge). Then he further explains what are “Knowledge Acquisition” and “Knowledge Deployment”. 

For “Knowledge Acquisition”, it was said that long term chunks come from environment encoding, and to transform those chunks, ACT-R looks for some existing chunks for mapping; From “Knowledge Deployment”, the author answers the question: how humans select the most appropriate knowledge for a particular context. Based on a rational analysis that “knowledge is made available according to its odds of being used in a particular context, activation process implicitly performs a Bayesian inference in calculating these odds”, he elicits a basic equation, which is:

Activation_Level = Base_level + Contextual_Priming

Anderson further illustrates this equation with three domains; memory, categorization, and problem solving.  It looks to me that for those three domains, the way we see “contextual priming” slightly differs. For instance, for memory, it is the association of chunk n and chunk m; for problem-solving, it is about the effect of distance to the goal that participants set up.  

When we go back to Just and Carpenter’s paper: they “redefine” the concept of working memory by presenting a computational theory, which suggests both storage and processing are fueled by an identical property called “Activation”. More specifically, each element (e.g. word, phrase, objects from real world) carries an associated activation level. During a course of understanding, relevant chunks are activated from either a computation or long term memory; however, not all of them can enter working memory; only the one that meets certain minimum threshold value obtain the permission. As long as the total amount of activation level is within system limitation, we are good to process the information; but if the sum exceeds system limitation, we need to de-allocate some old elements. It may be not hard to get the idea if you draw a scenario in which you try to understand a difficult sentence with several clauses embedded. Some people with large working memory capacity may understand it quicker than those who own a small capacity.

You may notice that the activation level is a common thread for both works. If Anderson is right, can he help explain where the activation level from in Just and Carpenter’s theory? Just and Carpenter adopt assess working memory capacity using the "Reading-Span" task; while Anderson address his curiosity from studying people writing recursive programs, and his idea is published a few years after. I am just wondering if we swap their methodologies, are we still seeing this common thread?

Reference:

Anderson, J. R. (1996). ACT: A simple theory of complex cognition. American Psychologist, 51, 355-365.

Just, M. A., Carpenter, P.A., (1992). Capacity Theory of Comprehension: Individual Differences in Working Memory. Psychological Review, 99(1), 122-149.

(&copy; 2012 Miaoqi Zhu)

Friday, February 3, 2012

Sing a song

I have kept asking myself this question: why I feel using just a little effort to remember a whole piece of lyrics after I recall the first or first few words?

When I am reading the book written by Dr. Lawrence Barsalou, specifically the Chapter 6 dedicated to “long-term memory” encoding, I found the above question partially addressed. I admit that a person may listen to a song many times; as basically, she/he must very much enjoy it for some reason. In that sense, the amount of processing on the lyrics is improved by presentation duration, the chances to rehearse, and the number of presentations. Those three variables are found to be the significant factors determining the information processing quality.

What appears more interesting to me is the sort of elaboration that contributes to data encoding: first, incidental versus intentional learning. Human beings are in fact ready to encode any information without "realizing" it. Although there is a benefit for us to try to remember something, because it may result in increasing number of rehearsals, but it is not fair to say that the information cannot be learned well, because people are not inducing to do so. When we are listening to a favorite song, of course, it is fine for us to try hard to remember the lyrics word by word, but let us think about it again, how many times you found yourself practicing the song perfectly without the intention to memorize. Second, the depth of information processing. If you are given a stimulus such as a car’s engine part that you have never seen before, how can you restore it? Well, ideally, you may develop a set of characteristics pertaining to that objective stimulus, with those characteristics becoming more conceptual, the stimulus will be remembered better. In fact, when you try to recall that stimulus later, the previous generated conceptual information will be activated and retrieved as well. Applying this point to explaining my question, each word in the lyrics relates to another, and they come together to form a meaning that musicians want to convey. I can say I still recall a song easily from a decade ago, as it is my favorite one from my favorite show. When I retrieve the lyrics, I can still perceive its meaning to the show. In addition, I am wondering whether the rhythm helps extending the depth, although it is different type of information, the phonological loop could be affected by it, as we sing out individual word with the sound. Third, imagery, in other word, you can visualize the information, which may produce more conceptual information to elaborate. For example, you are asked to remember “subway airport”, then you can picture a scene that “a CTA blue line subway is heading to the Chicago O’Hare airport.” It does not have to be that rich decorations attached to the original materials, yet we see an improvement in remembering. Again, going back to my question of interest, as I said before, I am able to play back a portion of the show (e.g. the main characters get back together), as the lyrics correspond to the story quite well; in the meanwhile, some words can be transformed to real objects (e.g. river, horse) from the show.

The reason why my question is partially un-addressed at this point is that: are the lyrics stored as a whole in a chunk? or divided into several chunks with some special “linkage”? Because when we succeed in recalling or recognizing the first one or two word, the rest is usually flowing out like a river. Perhaps people will say it is up to individual depending on her or his capacity and strategy to encode information. :)

Reference:

Lawrence W. Barsalou. (1992) Cognitive Psychology, An overview for Cognitive Scientists , Hillsdale, NJ: Lawrence Erlbaum Associates. ISBN 0-89859-966-0

(© 2012 Miaoqi Zhu)

A few words from many years ago

I have
been intrigued by a comment from my Mom’s colleague when I was in China last
month, after about 5 years since I saw her last time. She can still remember a
joke that I said over 10 years ago. Although she tried to make fun of me by
quoting that phrase, I started contemplating about a question arisen by the
instance: why human being can recall a tiny piece of information about a person
after such a long time.

After
reading the capacity paper (Just et al., 1992), I am aware of its theory about
the activation on information processing and storage, but when it comes to
retrieve information from long term memory, despite the low-span or large-span
people’s capacity differences in terms of working memory, does element/item in
long term memory carry activation level? If so, will the activation also
mediate the process, thus affect the outcomes?

I have
come across three models pertaining to information retrieval from long-term memory , and it seems
that “matching” is a key word for all of them. For example, imaging the
information seeking process resembles the scenario of searching for your bags
at luggage claim, you are going to examine each matching item until the first
piece is found, then the entire process starts over again. There is also another
one called Resonance Retrieval Theory that treats information as a vector with
elements representing different conceptual subjects.

Let me
propose this: people may tend to organize or categorize long term chunks according
to a specific object such as a pet, a friend, etc.  Each item appears like a vector with an equal
number of attributes as long as that person put them into the identical
category. 

Element CategoryHuman_Male_001111 = { A[0], A[1], A[2] …. A[n-3], A[n-1]), A[n-1] }

The
attribute can be an individual element such as hair color, relationship
status, education degree, etc. However, they are assigned with different
weights due to the product of external stimulus and the person’s strategy to
encode information. There is an pointer that always waits at the first
attribute- A[0], while the first attribute is reserved to any elements bearing
the most substantial weight. That being said, they are changeable, because any attributes
can be strengthened or weakened by various events, and this process may be done
by an unknown internal mechanism. Thus, when the working memory calls for
information activated by means of retrieving and decoding items/elements from
long term memory, the weight of the candidate may affect the efficiency of the
process, and I would like to name the weight as activation level for
each information unit in long term memory. 


The above model/process is just my "random" thought, I am looking forward to reading more literature to see it is "correct" or the opposite. :)

Reference:
Wickens, C. (1980). The structure of attentional resource. In R. Nickerson (ed.), Attention and Performance VIII, 239-257, Hillsdale, NJ: Lawrence Erlbaum.



(&copy; 2012 Miaoqi Zhu)

Tuesday, January 31, 2012

My question about GPS UI

My
question here is to deal with the interface design on Global Positioning System (GPS). As
we may be aware, a GPS device usually provides three features: 1. Audio system
broadcasting instantly when direction changes; 2. Textual information showing
instructions; 3. an up-to-present map occupying a large portion of LCD display.
Based on Baddeley’s (1986) working memory model, as Wicken et al. (1983, 1984)
argued before, the display format should fit the subsystems used to perform the
task in the working memory. This implication somehow triggers my interest in
two coarsely defined inquiries below:

Which feature is most helpful for people to drive in an unfamiliar area;
If that feature is found, will the effect be improved or diminished by adding additional feature(s)?

What
people hear from audio will be encoded in the phonological store, so will the
textual data presented to drivers; at the same time, the spatial information
from the map will go to visuospatial sketch pad. The driving routes may upgrade
at varied intervals depending on characteristics of the area, so people have to
receive new information from time to time while following the old instructions
correctly. Although the verbal information will eventually go to visuo-spatial
sketch pad, the central executive need to transfer information, synthesize
information, and analyze information. Those activities add the pressure of
cognitive load that constantly looks for available attention resources. In that
sense, from the perspective of decrementing cognitive load while satisfying
working memory’s need, we may just need the map, or a combination of two
features; perhaps the presence of certain feature will deteriorate the benefits
from other feature(s).  

Another aspect
of this problem is pertaining to human information processing, specifically the
attention. In a complex environment like driving, we are watching the traffic,
listening to radio/GPS instructions, controlling the speed, probably talking on
the phone. Thus, we have used three attention resources already, which are
namely sight, touch and hearing. Furthermore, when we discuss the concept of
selective attention, they are most likely related to visual perception, and it
is basically the product of four factors: salience, effort, expectancy and
value (Wicken et al., 2003). 




To explain each of them in order, first of all, as
what the “salience” means, some
unique feature may attract you right away from other less salient objects, for
instance, a horse running in the high way with cars; typically, the auditory
signal is more attention-grabbing than visual affairs, that is why we often
take actions upon hearing the alarm. If the sound is replaced by the flashing
LED light, we would imagine how long it is going to take us to notice a
potential miserable event. The second variable is expectancy, which is defined as the knowledge with respect to probable
time and location of information availability. Imagining that you are driving
in a mountain with lots of sharp turns, you should keep an eye on the curvy
road more continuously when you are traveling fast; on the other hand, if you
are just cruising around Chicago on lake-shore drive, most of the road are
straight, so you can relax a little bit and attend to your music more than the
road. The third factor is the value of information. 




Let us illicit the concept
by referencing the same instance of driving, it makes sense for people to look
forward most of time, because you want to be quick to take actions whenever see
something occurring. With the second and third factor introduced, there is actually
an interaction between them, if we multiply the expectancy with value; we
can obtain a better function of selective attention, which can be de-abstracted
in this way: an experienced air traffic control of certain airport know where
to scan most and when. Last but not least, effort.
As we know, most of people only can pay attention to one thing at one time, if
you focus on the mirror trying to change lane, you probably would not share
this effort with looking toward, because it simply cannot be achieved by the
majority of human beings. Hereby, the effort seems to appear as a negative
value, as attention allocations to multiple places may decrement operation
performance as a result. 

To
summarize the massage for selective attention, researchers often refer to the
following model:

P(A) = sS
– efEF + exEX * vV (Wickens et al. 2003)

(“sS” refers to “salient feature”; “efEF” means
“effort”; exEX corresponds to “Expectancy”; and “vV” points to “value”)

Applying
this model to our question of interest, GPS can produce audio instructions that
are more salient than visual in theory; in the meanwhile, the effort to
receiving auditory information usually cost less compared to reading
information from the screen, although the GPS is attached to the windshield. However,
this effort can reach a margin-value when it comes to a complex area such as
6-way intersection. In this scenario, visual feedback is better, because in
other case, our center executive has to consume more attention resources, as
the textual information needs to be rehearsed by articulatory loop, and then
converted to visual data for further processing. With respect to value and
expectancy, they are inconsistent depending on the routes, thus hard to
predict. Collectively speaking, the problem domain is very interesting to look
into, and we may even resort to AI to help us determine which method is more
effective and safe. 

To
address those two questions, an experiment may be acquired. In addition, a set of
independent/ dependent variables should be defined along with a reasonable
measurement. Without going to much detail, I think the independent variable
would be of categorical type (e.g. audio, text, and map), the dependent
variable could be “helpfulness” gauged by “time to drive to destination”,
“level of satisfactions”, and “fuel consumed”, while each element carries
different weight. For the concern of safety, the study can also be conducted
virtually in a lab, given the fact that there are many driving training games
available; additionally, we are capable of controlling possible confounding
factors relatively easier in the lab condition. In terms of the analysis
method, ANOVA would be optimal; because three types of treatments are seen
here, and we aim to find out whether there are real differences among them
(statistics such as F–value need to
be reported). 

Reference:

Baddeley, A. D. (1986). Working memory.
New York: Oxford University Press. Baddeley, A. D. (1992). Working memory. Science, 225, 556-559.

Wickens, C. D., and Carswell, C. M.
(2006). Information Processing. In Salvendy, G. (Ed.), Handbook of Human
Factors and Ergonomics, 3rd Edition, 111-149, Hoboken, NJ: Wiley.

Wickens, C. (1980). The structure of
attentional resource. In R. S. Nickerson (ed.), Attention and Performance VIII,
239-257, Hillsdale, NJ: Lawrence Erlbaum.

Wickens, C. D., Vidulich, M., &
ILLINOIS UNIV AT URBANA ELECTRO-PHYSICS LAB. (1982). S-C-R Compatibility and
Dual Task Performance in Two Complex Information Processing Tasks: Threat
Evaluation and Fault Diagnosis. Ft. Belvoir: Defense Technical Information
Center.

(&copy; 2012 Miaoqi Zhu)

Sunday, January 29, 2012

From Quantum Computing to HCI

What is quantum computing? 



At this point of time, we are still relying on digital
computing, which basically run on this simple principle: “the bit is on, or the
bit is off”. These bits will “grow” into larger forms or structures such as
phrases, words, which can represent other type of information like sounds,
animations. Although scientists and engineers are surely affording every possible effort
to improve the computation densities of a regular personal computer, but it still
cannot match a human’s brain. You may ask why?  

The human brain has about 100 billion neurons. With an
estimated average of 1000 connections between each neuron and its neighbors, we
have nearly 100 trillion connections, each capable of a simultaneous
calculation. That's massive parallel processing capability, which is one of the
most critical strengths of human thinking.

On the other hand, the weakness is the slow pace for neural
circuitry for human beings, and it is about 200 calculations a second. However,
with 100 trillion connections, each computing at 200 calculations a second per connection,
we will get an unbelievable capacity. Moreover, we do not need to worry about
running out of memory, as long as you are willing to spend time in rehearsing information,
it will go to long-term memory, as we have around 100 trillion connections. 

Since neural net emulations benefit from both strands of the
acceleration of computational power, this capacity will double every twelve
months. Thus by the year 2020, it will have doubled about twenty-three times,
resulting in a speed of about 20 million billion neural connection calculations
per second, which is equal to the human brain. Besides the help from parallel
processing, the memory is important as well. Nowadays, the memory circuits are
doubling their capacities every 18 months.  

Collectively speaking, Implied by the Law of Accelerating
Returns, the exponential growth of computing will prove that shortly in the
future, machinery’s thinking power will surpass ours. (I refer to a regular personal computer, not
supercomputer)

Different from digital computing, quantum computing has this
“qu-bits”, which are zero or one in
the meanwhile. The state of a particle will stay “ambivalent” until a process
of disambiguation forces the particle to choose which position it should stand
at. For example, a stream of photons are about to hit the lake surface at 45
degree. Each individual photon will have to determine whether to bounce off the
surface at the same angle, or “dive” into the water. In other words, those photons are too “lazy”
until the process pushes them to go on with only one path. The same principles
apply to carbon-based neurons.

The series of qu-bits
represents simultaneously every possible solution to the problem. For instance,
a single qu-bit represents two
possible solutions; two linked qu-bits
represent four possible answers. A quantum computer with 1,000 qu-bits represents 21,000
(this is approximately equal to a decimal number consisting of 1, followed by
301 zeroes) possible solutions at the same time. 

Yes, it is fast, it is like a genius who can handle a
complex math problem by counting for every possible combination. In that sense,
it is a good tool for decoding encryption.  Dr. Nicolas Gisin found out the "communication"
speed in-between a photon pair is way faster than the speed of light. But they
are not communicating information; instead, they are passing each other a sort of randomness. If we can convert the randomness into information, we
may be able to disentangle the photon pair. Nevertheless, this needs a lot of
observations on those photons' decisions. 

Do human beings have quantum
computing inside brains?

Gödel's famous
"incompleteness theorem" has been considered as the most important
theorem in mathematics. A corollary of Gödel's theorem is that there are
mathematical propositions that cannot be decided by an algorithm. In essence,
these Gödelian impossible problems require an infinite number of steps to be solved.
Dr. Roger Penrose conjects that machines cannot do what humans can do because
machines can only follow an algorithm. An algorithm cannot solve a Gödelian
unsolvable problem. But humans can. Hence, in this regard, humans are more
advanced. Penrose also suggest that human beings’ consciousness comes from
quantum computing and quantum de-coherence. 

Well, there is something inaccurate
with Penrose’s conjecture: first, human being can only estimate, yet estimation
is not equivalent to knocking down the problem; second, if we own quantum
computing capability, currently we do not have it outside ourselves yet, and
then how come we know a machine cannot solve Gödelian impossible problems?
Plus, if a computing task requires an incredible amount of calculations, the
quantum computing may still die under certain circumstances. 

But if a human brain indeed
displays quantum computing ability, that means this technique is feasible, thus
we need to dig out the mechanism that nurture this special treat, and hopefully
“feed” it into another piece of hardware. That is why scientists are busy with
inventing high-resolution scanning technology such as Magnetic Resonance
Imaging. 

Collectively speaking, we do
not know much about our brain, not even a little. We carry such a huge potential
that we even dare to imagine; that being said, conducting “Reserve Engineering”
on our brain is a possible means to understand those God-given abilities and
apply those to machinery.

What is the implication for HCI?

Will quantum computing change the way we write the code?
Will it change the way we design the interface?  Are we still having browsers? …

On the other hand, What if a machine can have emotion and
personality at that time? They may have “a user experience” based on the
feedback that human give. Perhaps they are intelligent enough to figure out whether
design can convey a good usability for humans, as they have perception then… Lol,
those are just some random thoughts. 

The thing is that quantum computers may only be used to
tackle a certain kind of problem that is intractable; the power they carry does
not apply across the board. BUT what I am more interested is that: during the
process of studying our brain (partially driven by quantum computing), how do
the outcomes affect the manner we deploy more or less profound design
mechanisms to take full advantage of human thinking capability. Perhaps by the
time we get this thing run, out neuron-cognitive structure will change due to
computational development rather than biological evolution. I mean, thinking
about it, interacting with an agent that is far more intelligent than our
brains, should we get better fit by not thinking about hard questions from then
on? I don’t know. 

Reference:

Kurzweil, R. (1999). 
The age of spiritual machines: When computers exceed human 
intelligence. New York: Viking.