Tuesday, May 16, 2006

Journal Update

International Journal of Human Computer Interaction

Paula talked to Julie Jacko, the editor of the International Journal of Human Computer Interaction about a special edition journal from our workshop:

She [Julie Jacko, editor] is interested and said that the first available issue would be the summer 2007 issue, which is due to the printer in March 2007. If we would like to move forward with this, the next steps would be to put together a short description of the proposed special issue and send it to her in preparation for a more detailed discussion on the special issue, the review/publication process & timelines, etc.


About the journal:
"The International Journal of Human-Computer Interaction addresses the cognitive, social, health, and ergonomic aspects of work with computers and emphasizes both the human and computer science aspects of the effective design and use of computer interactive systems. The Journal presents original research both in the generic aspects of interface design and in the special application of interface design in a variety of diversified leisure and work activities." [Source]



Interacting with Computers

Gitte has been in contact with the General Editor of Interacting with Computers (IwC) (Gitte is the Deputy Editor). We have not heard yet from the General Editor about a special edition.

Updated information about IwC

It seems like Interacting with Computers is overbooked with special issues. We would have to submit in August this year, but publication date would not be until around August 2007.


About the journal:


  • acts as an international forum for the discussion of HCI issues
  • fosters communication between academic researchers and practitioners
  • encourages the flow of information across the boundaries of its contributing disciplines
  • stimulates ideas and provokes widespread discussion with a forward-looking perspective

Sunday, April 30, 2006

Reality Testing Workshop Pictures

Reality Testing Workshop Pictures

Reality Testing Workshop Pictures

Reality Testing Workshop Pictures

Reality Testing Workshop Pictures

Reality Testing Workshop Pictures

Saturday, April 29, 2006

Full Workshop Notes

Highlights from the Reality Testing Workshop,
CHI’06, Saturday 22 April 2006, Hyatt Hotel, Montréal, Canada

Military
Presentation highlights
Challenges
• Hostile environment
• users work with small handheld computers
• users carry heavy pack loads (20 pounds); wear thick protective gear (gloves, helmets, goggles), making it impossible to work with small PDAs
• users are in motion much of the time.
• Researchers cannot observe and capture data in situ
• users are not available for walkthroughs
• software designed for desktops and transported to PDA

Methods used:
• 1st Heuristic Evaluation
• 1st field observations: simulated field use with no connectivity and field interviews
• 2nd field observations: limited connectivity
• 2nd. Heuristic Evaluation
• Competitive HE trying to push team towards smaller, simpler solution.
• Interactive training snippets presented on laptop (heavier than small PDA; don’t blow away)
• Reports for new shifts handover
• Implemented trouble-shooting tips, which have to be approved before being presented.

Lessons learned:
• Evaluation of system while completing real-world tasks: be ready to spend some down time with testers or be on-call while waiting on a task that meets your requirements.
• Usability inspection in real-world environment: be sure to have reserved an on-site location designated for testing; include one participant at a time.
• Pluralistic evaluation: prepare strategies to help keep testers focused when they veer off track.
• Trial period: system was available for about three weeks, but locked to ensure that they were not using the systems for unintended use.

Discussion
Q: Which HE methods did you use?
A: Nielsen method as well as methods available in military. Found many of the same problems in field study, but subjects did not use system frequently enough to give a lot of feedback.

Q: Why did you use the HE method first followed by field study?
A: The process we presented in our paper is not what we would do next time. We had a very short lead-time; then we realized that there were bigger problems, so we requested additional funding. In the first field study we simply received training as users; in the second field study we observed airplanes in the middle of nowhere

Q: Was the competing device operational?
A: Yes.

Q: Could you have conducted HE on both devices?
A: Yes! So next time we will try to get hold of competing devices first.

Q: How were your methods organized?
A: We followed a T&E Plan, but modified it on the run.

Q: You both (Katie, Keesah) used multiple testing methods – is that what you’d recommend?
A: Yes. This is absolutely necessary in these studies.

Q: What methods would you use if you had limited access to users?
A: Observe them do actual tasks to learn what users actually do. This helps us to ask them for specific time to gain most from the observations. We also worked in a group,as we could not record any data electronically, so we relied on coded handwriting (activity analysis); after a session, we went over it with users. Katie observed 6-7 people simultaneously, but everyone shadowed only one person. Keesah had everyone in one room (several people working simultaneously: two usability people, one SME in the work environment, one developer. SMEs are helpful as they can translate the process, procedures, language and observations; they also help with user acceptance and specify what questions to ask – key element as we had limited access to users.

Q: Did users accept testing situations?
A: Some users had limited computer experience; users concerned about spelling; hard to get ‘truth’ from users e.g. would not admit that load was heavy and cumbersome. Another problem was that we worked with systems engineer who were of a higher rank than the mechanics who were the users – this may have hampered what users were willing to say

Q: What would be an ideal method in your situation? What kind of data would you recommend collecting?
A: More info on how users were designing a system to support their tasks? Katie Siek observed in her (dialysis patients) study that users did not deliver anything of value, but simply regurgitated what she had given them. Katie Minardo would give them several options; people know what they like when they see it. So, rather than completely open –ended, give them something to pull to pieces. Trust developed in Keesah’s study as she followed users through whole process, got to know them personally – probably made users willing to make suggestions for the system, and opened dialog. Katie M. was introduced by high-ranking office, which seemed to help build trust; spent 12 hours with them; spent lot of time sitting around waiting for things to happen; probably took ½ day to build trust. Involving multidisciplinary evaluation team could help increase validity of data, and also to select what data to collect. Some data were automatically logged in Keesah’s case – could see what users were using and not using; used these data for interpretation and allowed users to inspect and go over it at the end.

Q: What was the number of users involved?
A: (Keesah: 9 usually, but in some cases 15; in Katie’s case, 5). We won’t be able to do statistical analyses, but qualitative studies have a lot of value; deep stories have different value, and you gather much more data from that one person that you spend a lot of time with.

Everyday life
Presentation highlights
• User mobility was a big problem for one research group; they exhibited sporadic use, there are privacy issues, settings in which the technology is used is unpredictable, and tasks as well as task settings are open-ended.
• The problem is that the number of users outnumbered the number of researchers in that study.
• The Panopticon method turned out to be a good solution; it facilitates extended field trials leading to more reliable results, and this also facilitates updating of prototypes
• Studying ITV is a big challenge, as users don’t like multiple cameras in their homes; the best studies of people in their homes come from architecture
• When conducting 3-D studies we found that users ran in front of the screen and continued until they hit the wall
• In some studies it is unclear what behavioural data to collect when measuring user tolerance towards delays, where these are a reality of new technologies. This group wanted to predict ‘user reactions’, but these were not observed as anticipated. Apparently, the problem was our failure to achieve ecological validity
• Triangulation of methods, and new methods required; existing methods insufficient
• Information and data not accessible in public domain
• How can we simulate challenging environments? (military used training, competitive analyses, HOTLab used situations, Regina used outdoor test environments, Panopticon used a mixture of remote and on-site observations; Francis used a variety of situations and technologies
• The Panopticon researchers found that a courtyard enables slightly more natural observations than when a study is confined to usability lab
• Need to know orientation of PDA when collecting data esp. rotating maps

Discussion
• Sampling issues: Would population segmentation have benefited the studies?
• Austria did do this, based on population demographics; took demographic data in the home first (e.g. Grandparents living with family), then selected the most interesting ones (based on financial distribution); Finland went for ‘natural’ groups, much broader, heterogeneous population, used very large group, living in the same area as the researchers. In favour of this argument vis à vis the Austrian approach is that you cannot generalize across populations; instead of a priori population selection, analyze data from heterogeneous population to learn which factors correlate
• Giving people ability to choose between two technologies (tested barcodes and voice-input), see what participants select

Q: What behavioural measures did other teams use?
A: SF had to throw away lots of quantitative data
• Tendency to collect lots of data and only decide post hoc what data may yield meaningful information
• Methodological approach not always clear if it is best to start broad or narrow, i.e define data to collect before the study starts, or collect data and then look for patterns
• Sometimes we need to back up field studies with lab data; other times it works best the other way round – lab studies preceding field study
• Field studies demand different research questions than lab studies – hence method triangulation.
• Apparently we need hybrid methods: we must import methodologies from other disciplines, then combine and compare methods from diverse fields (communication science, psychology, computer science, architecture)
• Probably the interplay between lab and field studies will yield better results WRT ecological validity
• Studies should be designed such that we would start broadly, then go into depth – collect lots of data first, then define what you need; stated in another way, begin with qualitative data, then continue with quantitative data once the research problems have been tightly defined
• In qualitative studies where no response data are generated, look at frequencies and usage patterns, so log everything until you can discern patterns



• Best way to gain access to research that is not published is to get to know the relevant researchers personally; perhaps establish contact with medical schools (these papers are often very expensive, although they are accessible in the public domain), but it is still a challenge to get hold of papers due to these being commercial, proprietary, military secrets, can only get relevant papers during a project. One major hassle is that even if researchers can get proprietary papers, they usually cannot publish their results;
• When working with industry timelines are extremely short. One lesson learned was the need to translate data into dollar statements wherever possible and ensure that these are backed up by data
• Select data carefully that you give to industry sponsor (e.g. in some environments we know that we get ‘silly’ or embarrassing data anyway)
• Privacy problems: e.g. in museum cannot get names of people; necessary data may not be available or accessible
• Some corporations are not willing to test their prototypes because they are too clunky – as a consequence, some researchers build their own prototypes and applications; often this work is done by students, but this creates gaps between prototypes and eventual product that are not always foreseeable.


Summary of limitations:
• Access to users
• Technology does not exist yet
• Access to relevant literature
• Privacy issues
• Interaction with industry
• Cannot predict what behavioural data will answer research question
• Timeframes (e.g. short-term studies; difficulty understanding other disciplines’ language, sets back study)
• Resources


Medical
Presentation highlights
• Placement of sensors is a challenge. For example, we placed some sensors too close to vents which produced a lot of noise in our early data. Once you have too much data it becomes uninterpretable. It is essential to set individual thresholds very carefully so as to avoid too many false positives. Therefore we found that it makes sense to collect trend data rather than continuous data. We also learned that “human validity” is extremely important: if information is not presented as people want it, they simply won’t use the system
• Stakeholders have different agendae  your role is to make your boss’s boss look good in his boss’s eyes (i.e. how does your boss earn her brownie points?)
• Some users have very low literacy levels, so researchers must be sensitive to the various stakeholders’ cultures and take this into account. Right from the beginning, researchers must develop relationships with users, as users will only volunteer a lot of information once they trust the researcher. We have found this development to take ½ day or more.
• Researchers must plan to multitask, as in some environments only one researcher is allowed with a user at a time
• Researchers must also realize that they will encounter a great deal of variability in work practices between different professionals, different work places, and different users. Furthermore, researchers have no control over the product, the development cycle, or product deployment; they often come on board late in the process.
• In a longitudinal study it may be worthwhile to train users in HF/HCI methods to create a continuous channels of feedback

Discussion
Q: If you could do this again, what would you do differently?
A: Get out of university environment; chasing patents took far too long; too many meetings
Q: How to test reliability of a real-time system placed in real people’s homes?
A: Ensure that system records motion when motion actually performed and not in the absence of motion.
Q: How did you do reliability checks?
A: We had six motion detectors and cross-reference outputs. Note, however, that the data analysis is extremely time consuming. Note also that environment: signals cannot get through steel in the walls. Activity thresholds were determined for each individual; it takes three months to get valid longitudinal data. It is important to note that researchers do not have control over the study or the time frame.

Q: When is it safe to start collecting data in a longitudinal study?
A: We ignore the first two weeks because we know that people play with the system – they wave at the sensors to see it they work; they open and shut fridges, go to the bathroom just for the sake it, and so on, so the signals are completely unreliably. Over time, they forget about the sensors and settle down to normal life. One should, however, be sensitive to changes in patterns in the data. For example, people behave differently on weekends and on public holidays than during the week.

Other challenges we faced:
• It is extremely difficult a priori to determine the nature of the outcome of a longitudinal study
• Difficult to measure impact of technology and intervention
• There is a distinct disconnect between the various stakeholders and costs/benefits
• Often it is difficult to determine who gains from such a study (patients, physicians, hospitals, health system??)

Workshop Summary

Gitte Lindgaard

10 Themes

1. Challenges across domain
- Methodological issues

- Is triangulation of methods an advantage? Are we collecting too
much data How many methods should we be using.
- Existing methods inadequate
- Need hybrid methods? Which ones? How to select?

- Study-type issues
- laboratory
- field
- hybrid

- Sampling Issues
- "natural" heterogeneous populations
- determine population demographics a priori
- Which was is better? When should we do it?

2. Challenges across domains
- Data collection analysis issues
- Collect everything, then look for patterns
- Determine data to be collected before starting study
- Determine what data to collect
- Determine when to start collecting data in longitudinal study
(first 2 weeks don't count; are their guidelines we can come up
with - how do you do a longitudinal study)
- Determine the source of variability in data patterns (Halloween,
weekends, etc.) Distinguish between "noise" and signal

3. Situational Issues
- Cannot use electronic recording devices
- Mobility issues
- Privacy issues (researcher is intrusive, sensitive) - Patient
issues, authority (military) issues.
- Informed Consent issues
- time-related issues - Guidelines that help with shortness of study

4. Challenges across domains
- Environmental issues
- indoor/outdoor
- confined space
- only one research allowed at any one time - how to check reliability
- User Related Issues
- Access to enough issues
- Access to right issues
- Access to too many issues
- Trust: benefits of developing relationships with users
- Literacy levels; health levels - you want to get your data,
but if patient has a heart attack, it won't help.

5. Challenges across domains
- Industry-interaction issues
- cost/benefit analysis
- understanding the business model
- understand what makes your client look good in her boss' eyes
- understand the different stakeholders (they have different agendas)
- who gains/saves? Who wins? Came up in the health issues. Is
it the patient, researcher, clinician? Who is the loser?
Whenever we solve one problem, we create a problem
somewhere else

- Multidisciplinary-team issues
- takes time to understand language of other disciplines

- Technology issues
- Technologies do not exist yet
- Prototypes are clunky
- Incompatibility of technological platforms
- Heterogeneity of systems

summary

10 Themes:

1. Methodological Issues - 1 - 17 votes
2. Study Type Issues - 9 votes
3. Sampling Issues - 0 votes
4. Data collection issues - 17 votes

COMBINED 5 & 6 - 8 votes
5. Situational Issues -
6. Environmental Issues -

6. User-related issues - 0 votes
7. Industry-interaction issues - 2 votes
8. Multi-disciplinary team issues - 3 votes
9. Technology issues - 0 votes

Keesha - where would health go? Environmental.

Kevin - Tough to simulate? We can try to simulate it, but will it get
the true? How to generalize it? Put under data collection/analysis
issues.

Kay - Give patients an "out" to use. Put under methodological.

Nice job Gitte.

Rank Order the issues.

Everyone gets three votes...


What we're talking about is...

1. Methodological Issues
2. Data collection issues
3. Study Type Issues
4. Situational/Environmental Issues
5. Multi-Disciplinary team issues

1. Methodological Issues

Triangulation

Gisele - Everyone is collecting a massive amount of data? Why because
we can because we have gotten lazy? What is causing this? How do we
filter appropriately?

Regina - Because we have to answer more questions with one trial. So I
have to address these questions. We cannot answer the questions with a
simple factor. It is a multi-factor question - we found this because
X,Y, Z. I really have to measure interesting factors to make a valid,
possible answer. (Gitte - Have to please a lot of people and have
economic limitations).

Gitte - Sometimes it is not justified - so if I don't quite know what
is going to give me, I have to try this.

Paula - There is primary and secondary data. We have certain outcome
measures we are interested in understanding. We can look at the data
to see what is going on.

Kevin - (1) The domains are very complex. So we have to collect more
and more data to try to understand the effects. (2) Running a standard
control laboratory type of study is nearly impossible to make it
worthwhile. So you do not have neat measures. It is always more
exploratory because we do not have constrained environments.

Antti - Other system may fail, so have to do parallel data collection
for different systems.

Tony - Stage of research. You collect a lot of stuff at the beginning
because of complex domain. Don't know initially about what is
important. Need to funnel because you have to focus more and more. You
get enthralled in your data because you love it. There are 2, 3, and
4, stages to look at what you are collecting and what should be
brought to the next stage.

Paula - You can see what measures are more sensitive at time. It is
better to collect a lot and then iterate.

Giselle - An iterative process.

Gitte - Can military domain afford multiple stages. Do you have the
opportunity to go back?

Katie M. - No we don't have the opportunity.

Keesah - Each of these types lead to the bigger picture. Have to
monitor smaller things like clicking and put them together to get
broader picture.

Gitte - Two different things we're talking about

1 - Exploratory at the beginning - Collect data because not
quite sure what we're looking at

2 - Triangulation to provide us with a funnel affect to see
broader picture

Do you have to collect everything because the environment is unpredictable?

Paula - You have to do both. You have to know what are the best
measures and bring a few other things in to help anticipate what is
going on.

Gitte - IF we neglect to collect, we'll miss patterns we'll never discover.

Antti - Many techniques to collect data are unreliable and can fail so
some redundancy is needed. Colleague used 4 cameras to videotape so
something can work at sometime. Have to leave some room for new
findings - just for fun - incorporate some new findings.

Jambon - It is unpredictable, so you need multiple methods to try to
figure out what is happening.


Mine is a little different because training is in non-trad environment
and testing is... Most people train in controlled environment and then
test

Gitte - Is triangulation an advantage?

Keesah - It is an advantage, different studies give you different data.

Katie M. Cover your butt with extra methods.

Gitte - Definition of triangulation - Deliberately selecting different
sets of methods to enrich the data

Antti - Approach the same phenomena from different theories.

Gitte - How to be prepared for opinion vs. performance. Is there a
magic number for the number of methods?
- Resources - time and money
- Flexibility - able to move from one methodology;
Tony - I don't care how we collect the data; you tell me how to collect it
- Complexity (Paula)
- Have to integrate into single message and have them compliment
each other (Avi)
- Be aware of failure areas (Antti)
- Observation, interview, to supplement empirical data (Kevin)

Methods are not adequate

Regina - Should validate methods by

Gisele - Could a new method be?

Antti - method - data collection methodologies

Gisele - methodology - a particular technique that can give me data
that can answer a question. What way are existing methods are in

Paula - Get a better job of communicating what type of methods work
for different populations.

Avi - Method - what is our definition

Regina - Mousa - Project to mature usability evaluation methods. How
to categorize methods. Classify them and try to help people figure out
when to use which method then.

Gitte - Regina - could you keep us updated on this?

Avi - we are only talking about collecting data. And we have to talk
about how to analyze it...We will get there.

Katie S. - We need an adaptable studies.

Gitte - We are dealing with very random populations

Gitte - Are their hybrid methods that have worked with us...

Antti - Connectivity that goes on two levels - data logging and
interviewing. None of it can work with out the other. Complement each
other.

Paula - Keep your eye on what are your research goals and what are the
outcome you want - methods must give you the data you want. Research
questions really influences data you are collecting.

Katie - Need to transcribe quickly through personal codes or short
hand if you do not have a recording device.

Gitte - Summarizing this section

- We may not be collecting too much data because of the time and what
we need to get

- We cannot say there is a particular number of methods but they
should combine for the same message.

Data Collection and Analysis

Collecting Everything and figure out what data may mean...

Start with a hypothesis and go with data collection...

Paula - Depends what stage you are in your research.

Gitte - Collect more data and look for patterns

Gisele - Playing devil's advocate. It seems like it is a trial and
error method as opposed to clearly outlining a research project and
saying, "These are the things I am going to measure." Look at it from
the other end - we have to spend more time deciding what outcomes we
want.

Gitte - It depends whether your focus is theory or application data...

Paula - You'd only collect something that may have a reason... Not
because I can methodology.

Katie - But we are not experts in the interdisciplinary fields we are
in. So we have to look at a lot of data and then when we discuss it
with the experts say, "Of course we knew that." But now you have data
to identify the trend and back it up.

Kevin - It is limited by the scope of your knowledge of the system.

Avi - Things are unpredictable, we don't know what we're getting
into. Take an exploratory approach. So maybe you know a little bit more.

Gisele - This should be a message we convey to others. Collect as much
data as possible.

Gitte - But don't be too data driven.

Katie M. - Everyone talks about having backup data - so it helps with
reliability.

Paula - I go back to the nurses and doctors and say, "This is what we
saw. This is what we think caused it. Is it correct?" DO a "reality
check" and see if you should be collecting something else. Don't just
collect data to collect data. Have a reason behind it.

Francis - Must collect all low level data as a back up. Keystrokes are
very useful. If you are focused on the tasks, it is easy to find the
data needed.

Gisele - We have to caution the community and let them know we do
collect a lot of data and don't let it drive the solution. Don't
manipulate the data to get your answer.

Paula - If you are drawing conclusions, make sure all triangulation
and back up data is telling the full story. Just don't filter. Tell
the whole story.

Antti - You can read the same data with different types of
analysis. Should we also draw a conclusion with every method.

Gisele - Is it economically feasible?

Paula - Caution people to not get caught up in statistical
significance and understanding the story of the people you are
using. If you have a large enough sample size, anything can be
statistically significance. Just give them the practical take away
point.

Analysis

Determine what data to collect?

Is there a way to determine

Katie - Give them an "out" If they use the out - they are not very
impressed with the system.

Avi - We have a human tendency to go for the easier alternative.

Sheila - Familiarity. Kids now prefer the web, but older prefer paper.

Kay - This goes back to who the users are.

Paula - Depends on the research question.

Bruce - The new thing has to be 10 times better than for people to adopt it.

Avi - Data metrics drive analysis. They really have to go together.

Antti - You don't just consider data collection or data analysis.

Paula - Have to decide how you are going to analyze the data.

Avi - There are a lot of methods that support exploratory
techniques. Data visualization is very important. How do you present
your data? In graphs or time lines. To help you see patterns.

Paula - Data filters to look through vast amount of data.

Gitte - Summary

- Early in a study or in an environment we are not familiar
with, we should be broad. Then when we have knowledge, we can
sharpen up and get better hypotheses.

- We come up against reliability and validity.

- Meaningfulness is very important.

- Pieces of advice - determine when to start collecting data in
longitudinal study

- Determine source of variability in data patterns

- Cultural and situation context

What comes after the workshop? Where do we go from here?

Bruce is doing a workshop in June.

Papers

Gisele - We all came here because everyone had ad hoc approaches and
there was no literature around it. Special issue journal?

Gitte - It would not get too difficult to expand on position
papers. Discussion paper that talks about what happened today.

Should we do special issue of a journal or book?

Gisele - Getting a special issue is a good thing. Books are
hard because getting all the information in a timely manner can
delay us.

Gitte - Journals weigh more heavily. Gitte could get us a
special issue in her journal. Special issue in 2008

Paula - Will talk to Julie Jacko to see if we can get it earlier.

Avi - Journal of Usability Studies. Maximum - 15 page
limit. Avi is the editor.

Gitte - Organizers will organize the journal.

Journals
Int'l journal of HCI
Interactive with computers
Journal of Usability Studies (JUS)

In TWO weeks we'll know what we'll do next. Timelines for
journals.

Regina - We need a deadline for things. Please send up proper
deadlines with emails.

Gitte invites people to submit to Usability

Gitte - Dinner

Shamima - Because we are dealing with non-traditional environments, we
cannot deal with non-trad methods or data. Let's not get hung up on
significant differences. Let's look at other things.

Everyday Life

Francis Jambon - ADAMOS Project

Meta Evaluations - Classic and "in the wild" experimental settings are
used at the same time
- Analysis performed separately and their results are compared
afterwards

Map Mobile 2005 Experiment

User moves within a professional building floor during work hours
His/her position and suggested routes are displayed on a moving map
Contextual messages are proactively displayed
Real time logs

Data collection
- Usability analysis
- user comments
- device screen copy
- context dv camera (not remote)
In The Wild Analysis Methodology
- User actions
- Interface feedbacks
- User localization

Compare results
- Many false negatives (problems not detected)
- facilitator intrusion preventing errors
- data analysis techniques limitations

- Talks seemed too long and potential error in software when
participant was really talking with friends

New experiments
- eSkiing - using system on ski hill - cold, snowfall
- Museum - many thousand visitors, one month duration
- New analysis method - automated usability analysis
- It took us 3 months to analyze data by hand; need automation

Antti Salovaara - The Panopticon

Challenge: user mobility, sporadic use, privacy issues,
unpredictable open-ended task settings, and latent social
conventions.
vs.
Capturing a rich picture in order to get new findings.

Panopticon - hybrid method
- remote data collection
- direct data gathering
Allows for - unobtrusive data logging and increased sensitivity

He met with 7 high school students every 2 weeks

Potentials
More reliable results; Extended field trial durations
- Better timed and prepared interviews - know most
important times of their use

Content interventions
- changing the content in the system remotely
- analyzing reactions
Prototype updates
- remote parameter tuning of the prototype
- analyzing the reactions

Regina - Home or Factory

Testing in the Home Environment

How can we test/measure/find out usability problems in non-traditional
areas

Three month field trial on interactive field trial

Problem - they don't want 10 cameras in their home and how do I afford
10 cameras?

Not helping to do one method. We have to bundle them to contribute to
the additional information we have.

1. Literature analysis - We found the best stuff in the field of
architecture. We have to learn from psychologists, architects,
etc. We have to know everything...

2. What kind of new methods do we need? How can we integrate
methods from other fields

They do half and half - somewhat in a lab and make it realistic.

3D games - novice users were very enthusiastic and were running into
the walls. You must be careful.

We have to invent new methods where we can test. We really tried to
separate the data step by step. If the context is relevant we test it
in this setting.

Combining and comparing methods from several fields in HCI (e.g.,
mobile HCI methods at home)

Extending the basics in HCI by importing new methodology from other
disciplines

Entertainment Applications - Shamima

How are people going to react to delays with new technologies? Push to
talk and next generation of television - internet based
television. What are people's tolerance for these technologies.

First study

Early speech - people did speak before they were supposed to. Showing
they were impatient with delays. They were annoyed by the delays. We
could not find any behavioral evidence.

When the delays were increased, they were even less likely to speak
before the signal.

No effect of urgency.

Second study

Simulated conversation, hid equipment, gave them a scenario

They found the same thing. When there was an urgency scenario, they
did not show frustration in their study.

Third study

We had to do the t.v. experiment in a lab

Conclusion
- Unable to achieve ecological validity
- Suitable performance measures
- companies want the data even though technology does not exist yet
- sponsors want requirements - magic numbers (e.g., no one will buy
your system if the system takes 2 seconds)
- industry wanted the results yesterday

Testing moving map algorithms - Sheila Narasimhan

How map rotation helps with navigation.

Challenges
- will map rotation help way finding
- how to test the rotation

Study 1: Simulation in the lab
Were only using users preferences and wanted to look at perceptions

Study 2: Had them use PDA to navigate through campus

Conclusions and open issues
- Lab study provided sensible results
- identified need to improve adaptive algorithm
- did not help in deciding on a specific algorithm
- not ecologically valid
Is lab-field combination effective and efficient?
Takes a lot of effort and resources

Is the field orientation test ecologically valid if the field studies
support the lab studies?

Panel Discussion

Which combination will work for all our particular cases.

Morning we had a focused in the field lab experiment.

This group is combining lab and field experiments.

Kay - Automatically gathering and logging data. It is difficult to go
back and analyze it. Key strokes may be too detailed.

- Francis - all of it is important. If you find something wrong,
you need to go deeper and look at what is incorrect. Start at
the high level and go deeper for special cases. Can look at
variation of task duration.

- Antti - Quality of data comes first. If we get a feeling about
certain important issues, then we may go into the log data and
look at it there and how it looks in those terms. We are not
measuring response times. We wanted to know about frequency of
use. Exits from screens are difficult to find because you
cannot draw conclusions from all sides.

- Francis we want to know the orientation of the PDA. How are
they carrying it? Their activity?

- Antti - the clocks have to be synchronized.

Katie M. - Anyone can be your user - different from the military
domain. Did you do any kind of segmentation to say we think 18-34 use
it like this and teens use it like this...

- Francis - we did not have enough people. Women like the map and
not every man. Too few to create these segmentations - we had 12
people.

- Sheila - we found rotating the map reduces positive load. It is
difficult to do client segmentation because of the ages. We
tested 18-34.

(Katie Would it been better to target a certain group and say we know
it would work for this group?)

- Marianna we choose our groups based on ethnographics. We
emphasize recruiting elderly people. We use demographic data and
get the most interesting people with families that have
multigenerations in the house. Our focus is on elderly's use of
remote controls and decreasing the amount of remotes and
complexity.

Paula - Push to talk - They voiced their frustration with delays but
they didn't change their behavior. But in military, they just did not
use it. In a lab environment could you give them an out to use a
different method to see if they'll just use a different method?

Gitte - this is the problem with lab environments. Our
company wanted certain results in this time frame so we
couldn't do it.

Antti to Shamima - You talked about entertainment. Perhaps you can
look at this first person shooter game that uses Push to talk
technology that simulates stress.

Antti - User sampling from Katie's question. We did not try to
generalize. It is very dependent on who we find. We took one group of
users and we identified natural users of the application. We don't
create artificial groups so they start communicating. It was a natural
clique and we chose the best one.

Avi - retroactive partitioning can be good as well and see if anything
you learned about them correlates to the experiment.

Bruce - Measures of multi-user interfaces. We have people communicating
through multiple systems. What kind of measures did you use? What were
successful? What did not work?

Antti - We had to throw away a lot of quantitative data. Individual
phone logs did not work well. How to log stuff from a mobile phone is
difficult. (Bruce: Was satisfaction measured?) No

Gitte - How many people have background in psych? Four people have a
background in psychology.

Gisele - What were your instructions by your sponsor? Did they define
a frustration level?

Shamima - They did not define a frustration level. They were trying to
decide if they would invest money in these technologies.

Gitte - We were told, we want to know how much of a delay will people
take and still buy our product? We had to translate this into
something that is measurable.

Gisele - When you got your data, it may have shown how frustrated people were.

Gitte - We know what people say and what they do are 2 different
things. We were trying to categorize the frustration.

- Missed some questions while tried to get construction noise lowered. -

Antti - Field studies have different questions than field studies.

Katie S. - Did they notice the observer?

Antti - Yes - they knew I was there. I had cameras strapped to my
waist on them. They got used to me.

Kay - When participants asked to help you, did you do anything? Did
you help them?

Antti - Only if it was a system problem because without the system
working, you get no data. I helped by facilitating their use, but not
giving guidelines.

Francis - On mobile skiing - one device failed during the study. It
can fail very quickly. So you have to have people there to help.

Avi - Regina said maybe we should learn from other
disciplines. What disciplines?

Regina - Marianna helps because she is from communications. Research is slowed
down because you have to look it. It is not a rush in experience.

Antti - HCI is a field with no own theories - we have to take from
other fields. We need a lot of interaction analysis. Learning sciences.

Katie S. - Have to expand your network to find papers in other areas.

Paula - Has login to medical school.

Glascock - Commercial loan - interlibrary loan is too small. Problem
is getting access to information. Companies to share, they sue. The
stuff that is really happening is not published. How do you get access
to the stuff that is not published. How do you publish? What do you
do? Where do you publish? Licensing agreements are too slow - 6-9
months to get something reviewed and approved and published is too
hard. It is paleotechnology. Companies want a two week study.

Katie M. - Worked for gesture interface for General Motors. You are
inventing everything and making a lot of assumptions. Was there any
way you think the quality of your prototype and how it affects your
results?

Shamima - We said here are the issues with this. We got some help
because the companies gave us some ideas of what the project looks
like.

Glascock - How did the company respond when asked for requirements?

Gitte - They just said to do the best you can.

Glascock - We are talking about reliable in a research world. They are
talking about reliable in a Mr. Coffee world. Language issues -
corporations take it in a marketing sense. They just choose what they
want to hear.

Sheila - Outdating is a big problem too. A masters student created the
application. Now PDAs come with these with lots of orientation
technology. But nothing says which method works best when. Maybe that
is our job - to say what works when and how to market it.

Glascock - How do you present your results to a company with high
demands?

Gitte - We collected a lot of data for a system and management did not
really care about the problems we found. They were not owning the
problem.

Antti - We observe people in their real life so they are not willing
to tell everything. We can find this out by logging stuff, but we
still do not know everything. How do we deal with this?
Jambon - we have a committee that approves our studies

Antti - People have to sign a consent form. We have to configure
phones in front of the people so it is really complicated. We
use the same form as the company. The company has a legal
department.

Edwards - We have a twist on medical records and consent. There is a
lot of regulations around HIPAA. Even having someone who is not an
employee of the hospital comes into question. You can get into legal
question by looking at the data. We are working with clinicians and
giving them a crash course in human factors and they are the
observers.

Marianna - some enterprises are not willing to test a prototype with
someone not related to that enterprise. The enterprise is in Japan. So
we test Austrians assuming they will not talk to the Japanese.

Antti - We always combat that problem by building our own prototypes.

Katie M. - How to minimize affect of creating your own prototype? How
to manage the gap of something that is not available yet.

Marianna - It depends. The prototype is to get you to the next phase
of the product.

Antti - It is all tradeoffs - if design is more important, than that
is what we focus on. We are not building real products - they have to
be redesigned to make them proper. But to find out answers to certain
issues, they help us.

Shiela - For us it was important. It was a very specific research
question. The CS student was also interested in the question. The
software is for the evaluation. It is not a ready to deploy type thing
- it is for testing.

Overall Ideas:
- Communication/language is important
- Lab and Field combination of experiment
- Testing prototype of rough system vs. testing full system
(military full system)

Gisele's Sum Up:

The answer: It depends...
- which methodology
- which field vs. lab
- prototype vs. full system

I don't think we're going to come up with a solution. We have to look
at how to combine the approaches to solve these problems.

Saturday, April 22, 2006

Military Domain Session



Katie Minardo, MITRE Corporation


They created a system to call in air strikes for U.S. Air Force. Users
in the field calling in strikes during times of need (high
stress). Created methodology working with expert users to help with
observations (interviews and expert reviews used). Attempted to
distribute inherent bias risk by using multiple evaluators.


Keesah Hall, GTRI


This paper describes a performance support system for aircraft
maintenance. They used many evaluation techniques with varying
results:



  • Evaluation while completing real world tasks (think aloud technique)
  • Users did controlled scripted tasks with an evaluator
  • Cognitive Walk through (not well described, I’m interested in hearing
    more)
  • Pluralistic Expert Evaluation (many expert users working through
    tasks together with an evaluator)
  • Surveys to collect problems when the evaluators were not there, but
    users were still using the application.


Katie Minardo, MITRE Corp.

Looking at how to do remote evaluation of military application.

What we did:

  • Detailed Heuristic Evaluation
  • Field Observation #1
  • Field Observation #2
  • Secondary Heuristic Evaluation - New Hardware
  • Competing Heuristic Evaluation - Army testing


Keesah Hall, GTRI, MEPSS Maintainer's Electronic Performance Support System


Adapted for unmanned vehicle
Hosted on a tough book - rugged laptop
Main Components:

  • Online manuals; Very good for use in Iraq - laptops are heavier and
    more sturdy and won't blow away
  • Maintenance History;
  • Shift pass down
  • Trouble shooting tips - gives tips to future mechanics (tips have to
    be approved before posted)



Evaluating the system while completing real-world tasks - be ready to
spend some down time with your testers waiting for the appropriate
problem to appear.



Formal usability inspection in real world environments - be sure you
have reserved an onsite location designated for testing and include only
one test subject at a time.


Pluralistic evaluation - prepare strategies to help keep testers focused
when they veer off track.

Surveys - Explain to testers that even bad feedback is positive. It's
not you - it's the system we're testing.

Trial Period - Lock down system to prevent unintended use of system.
People spent time playing games.

Questions

Avi Parush For Katie: Two rounds of heuristic evaluation - who did them according to which method? Did you verify?



[Katie M.] Heuristic results were shown in the field.



[Keesah] We used Jakob Nielson heuristics. People weren't using the system
enough to get results. Targets like drop down menus were too tiny,
screen too small, too heavy.

We don't know how accurate our findings are. Heuristic evaluation may
not be best to start out with in these environments.


Antti S. - How did you create your methodology? How did you decide on field studies and then heuristic studies?

[Katie M] I wouldn't do it this way if I did it again. We were hired as
consultants.

Second time we had people who had used it more recently. The first time
we had people who had been deployed for a year and then they came back
and told us what they did. But they didn't use it enough.

They preferred to use voice.

Bruce Tsuji - Did the competing device - was it operational? Is there a degree of acceptance for that alternative? Is there some way to do a heuristic
evaluation with the competing device.

[Katie M.] Yes that would have been great. The Marines have a similar system. Let's compare the Marine system to our Airforce system.

Gisele Bennett - Were you involved in the original design?

[Katie M.] No, I came in to find out what was wrong with it.

Katie S. - Did you have the same problems as Katie is mentioning about
weight and not wanting to use it?

[Keesah] No because they are used to heavy manuals

Shamima Khan Katie - they didn't want to use it - how did that affect your study.

[Katie M.] Yes it did really affect how we evaluated it. That is why we used
testing field studies to evaluate it.

[Shamima] We had problems too with a vending machine reloading system
about figuring out order of user studies.

Gitte Lindgaard - How did you create your plan?

[Keesah] We modified our plan based on how things were going. If we needed more
data, we would change our plan as necessary.

Avi - You used multiple methods. How did you correlated? Is this your
recommendation for Reality Testing - to use multiple methods. Would
you do this next time?

[Katie M.] Yes, we had to do it this way. You patch stuff together
because you cannot model reality.

[Keesah] You have to have different methods - Nielson's work. But his
methods have to be tweaked a little bit. Trial periods and surveys
were very helpful. Coming up with tasks were key.

[Katie M.] Ask people if they would do it differently. Training was not
realistic - we asked them if they would do it this way and they said,
"No we would never do it this way." We have to identity where the
disconnects are between training, tests, reality.

Regina - We have limited access to the users (e.g., 2 hours with 2 users). What kind of method would you use when you only have 2 hours with 2
peoples?

[Keesah] Perform an actual task using the full system. You can see
exactly what they would do. Did they have trouble getting it into the
cockpit and using it. We couldn't use video tape, we had a coding
scheme with our own symbols. We had to write as fast as we could. As
soon as we got back to the hotel, we completely wrote down
everything. IF you do it tomorrow you'd lose it.

[Regina] I learned something - ask them to give me the time during
the most important tasks.

Antti - They were a few you and it may be helpful to tape record
yourselves talking about things.

[Katie M. & Keesah] We each typed up our own notes and discussed it later.

Kay - Can you please tell us about your field evaluations?

[Katie M.] We could not bring laptops so we did our own coding and wrote notes
quickly. They had three people doing field studies and they took
notes together right after during the debriefing. So someone had each
people covered.

[Keesah] - (did you do something similar? Did you do shadowing?)
We had something different. We had two laptops going with three
evaluators. We'd break up 2 and 1.

Avi - All the observers are human factors/usability. Is this correct?

[Keesah] 2 usability and 1 developer. At one point we had 3
usability. At one point we had subject matter experts working on the
environment. They helped us translate the information that some could
not follow.

[Katie M.] We had the same type of thing. We had an Airforce expert that
could help us review our notes and information about what they were
doing and why they did it.

[Keesah] It also helped with user acceptance. Because they saw someone
like them . User acceptance Translated process and procedures
Helped with "guessing" since we had such limited access to end users
It helped minimize the number of iterations

[Katie M.] It also helped us create good, relevant questions. Need
someone who can answer the basic questions.

Avi - Were there any problems having end-users accept the system?

[Keesah] we had a few mechanics who had never used a web browser. They
were really concerned about their spelling. We ended up putting in a
spell checker because they were nervous about doing something
wrong. It had something to do with education and comfort levels with
computers.

[Katie M] The only problem we had is the military does not like to
complain about things. They don't want to say something is hard, they
are having problems with it, etc. You have to let the know it is okay
to complain about things and we won't tell your supervisor. Had to get
the supervisor out of the way so he was not interrupting the
system. Had to get the developer out of the way so he wouldn't say how
great it is in front of people.

Antti - Do you have an expert with you on your development team?

[Katie M. and Keesah] Yes, we do that did help us.

(Did he work as a "translator"?)

[Katie M.] We had some of that. He had a contract of interest because he was on
the team to make it a success. So we had to understand the reality of
the situation. So he was there to create the product.

[Keesah] We had that. We had a military higher ranking individual. So
he was not the same rank as the mechanics we were working with. The
mechanics were interested in impressing the higher ranking
official. But that could have been an issue.

Regina - What would be your new method? What kind of data would you want to
collect? Some methods have some short comings? What kind of data is
still missing? In this environment - what data is missing? What would
you want to have additionally?

[Keesah] I want to know what the user would design. That would be the
most helpful. What would they create?

[Katie M.] We have 3 weeks before the next system comes out. What if you
redesign the screen they spent 90% of their time on. Keep giving them
the options? Would this be the correct direction? They know it when
they see it even if they cannot draw it. We found that they preferred
a system they used 10 years ago.

Shamima - They are very enthusiastic about what features they want. But that
may not necessarily be the right thing because they are thinking bout
other things. People do not use a lot of what is available in every
system. You do your every day tasks in your preferred system.

How much users are affected by us using them. I want to be
invisible. Get the knowledge of what they do when no one is
looking. What if the military superior is there?

Gisele - We need to be invisible but we need the interaction to get
data. Sitting there and talking with you is just as critical.

Gitte - People want umpteen functions, they don't think how these
features will be used. They'll think it is fine if it makes coffee
too. They get used to us being there fairly quickly.

Katie S. - Who do they think you are?

[Keesah] They thought we were the rah rah girls. They thought we were
just students. We spent time with them doing ethnographic studies. We
were integrated into the system.

[Gisele] The repeat visits were useful. Embedded usability.

[Katie M] They knew we were not military and we were out of place. We
told them we knew the system was horrible. We were on their side and
there to improve it?

Antti - Total different than I want to use. (He does panopticon) We
are talking about ergonomics, usability, methodology, and acceptance.

Gitte How does this relationship helps you trust you? Does it affect
the value of your data?

[Keesah] I think they would tell me things more candidly because I did
know them. They were more open because we were changing the design.

Paula Edwards Had the same experience in health care environments. They
would disclose more.

Antti - Visiting multiple times help.

Kay - Katie, visiting once like you id - did it create a "hump" in
communication?

[Katie M.] - We were out three or four days with them. They were getting
used to us. We had to wait for planes coming in. IT would take .5 to 1
day to get comfortable with everyone.

Avi- In Katie's case the were a Q about acceptance of the system. Maybe a
more multidisciplinary research team could help. There are probably
organization and selection of personal influences acceptance of
system. Perhaps training people, organizational psychologists, etc.

[Katie M.] that is definitely the case. They passed the system, but it is difficult. There is not a lot of acceptance in complete organization .

Bruce - Automatic data logging - keystroke level facilities. DId you use it?

[Keesah] - we did have logging - not at the key stroke level. (Did you
use that data?) Yes we did. And they didn't know it. Did they take
the time to use feedback.

[Katie M.] We were not allowed to change the system and add that in. We
were not allowed to capture that type of data anyway.

Katie S - Did this personal relationship affect number of participants
you can have?

[Keesah] 9 mechanics; There are times we'd visit another group and let
them look at the system. Probably a total of 15.

[Katie M.] we looked at 4 to 5.

Paula - the war stories are important.

Regina - Would argue for having both. Stats and qualitative. If I am
testing mobile phones or home environments. Context related depends on
number of people for study.

Sunday, April 09, 2006

Reality Testing Workshop Schedule and Papers

Reality Testing Workshop Schedule and Papers are here:
http://www.cs.indiana.edu/surg/CHI2006/WorkshopSchedule.html

Example of a non-traditional environment.

Welcome to the Reality Testing Workshop blog for CHI 2006