Everyday Life
Francis Jambon - ADAMOS Project
Meta Evaluations - Classic and "in the wild" experimental settings are
used at the same time
- Analysis performed separately and their results are compared
afterwards
Map Mobile 2005 Experiment
User moves within a professional building floor during work hours
His/her position and suggested routes are displayed on a moving map
Contextual messages are proactively displayed
Real time logs
Data collection
- Usability analysis
- user comments
- device screen copy
- context dv camera (not remote)
In The Wild Analysis Methodology
- User actions
- Interface feedbacks
- User localization
Compare results
- Many false negatives (problems not detected)
- facilitator intrusion preventing errors
- data analysis techniques limitations
- Talks seemed too long and potential error in software when
participant was really talking with friends
New experiments
- eSkiing - using system on ski hill - cold, snowfall
- Museum - many thousand visitors, one month duration
- New analysis method - automated usability analysis
- It took us 3 months to analyze data by hand; need automation
Antti Salovaara - The Panopticon
Challenge: user mobility, sporadic use, privacy issues,
unpredictable open-ended task settings, and latent social
conventions.
vs.
Capturing a rich picture in order to get new findings.
Panopticon - hybrid method
- remote data collection
- direct data gathering
Allows for - unobtrusive data logging and increased sensitivity
He met with 7 high school students every 2 weeks
Potentials
More reliable results; Extended field trial durations
- Better timed and prepared interviews - know most
important times of their use
Content interventions
- changing the content in the system remotely
- analyzing reactions
Prototype updates
- remote parameter tuning of the prototype
- analyzing the reactions
Regina - Home or Factory
Testing in the Home Environment
How can we test/measure/find out usability problems in non-traditional
areas
Three month field trial on interactive field trial
Problem - they don't want 10 cameras in their home and how do I afford
10 cameras?
Not helping to do one method. We have to bundle them to contribute to
the additional information we have.
1. Literature analysis - We found the best stuff in the field of
architecture. We have to learn from psychologists, architects,
etc. We have to know everything...
2. What kind of new methods do we need? How can we integrate
methods from other fields
They do half and half - somewhat in a lab and make it realistic.
3D games - novice users were very enthusiastic and were running into
the walls. You must be careful.
We have to invent new methods where we can test. We really tried to
separate the data step by step. If the context is relevant we test it
in this setting.
Combining and comparing methods from several fields in HCI (e.g.,
mobile HCI methods at home)
Extending the basics in HCI by importing new methodology from other
disciplines
Entertainment Applications - Shamima
How are people going to react to delays with new technologies? Push to
talk and next generation of television - internet based
television. What are people's tolerance for these technologies.
First study
Early speech - people did speak before they were supposed to. Showing
they were impatient with delays. They were annoyed by the delays. We
could not find any behavioral evidence.
When the delays were increased, they were even less likely to speak
before the signal.
No effect of urgency.
Second study
Simulated conversation, hid equipment, gave them a scenario
They found the same thing. When there was an urgency scenario, they
did not show frustration in their study.
Third study
We had to do the t.v. experiment in a lab
Conclusion
- Unable to achieve ecological validity
- Suitable performance measures
- companies want the data even though technology does not exist yet
- sponsors want requirements - magic numbers (e.g., no one will buy
your system if the system takes 2 seconds)
- industry wanted the results yesterday
Testing moving map algorithms - Sheila Narasimhan
How map rotation helps with navigation.
Challenges
- will map rotation help way finding
- how to test the rotation
Study 1: Simulation in the lab
Were only using users preferences and wanted to look at perceptions
Study 2: Had them use PDA to navigate through campus
Conclusions and open issues
- Lab study provided sensible results
- identified need to improve adaptive algorithm
- did not help in deciding on a specific algorithm
- not ecologically valid
Is lab-field combination effective and efficient?
Takes a lot of effort and resources
Is the field orientation test ecologically valid if the field studies
support the lab studies?
Panel Discussion
Which combination will work for all our particular cases.
Morning we had a focused in the field lab experiment.
This group is combining lab and field experiments.
Kay - Automatically gathering and logging data. It is difficult to go
back and analyze it. Key strokes may be too detailed.
- Francis - all of it is important. If you find something wrong,
you need to go deeper and look at what is incorrect. Start at
the high level and go deeper for special cases. Can look at
variation of task duration.
- Antti - Quality of data comes first. If we get a feeling about
certain important issues, then we may go into the log data and
look at it there and how it looks in those terms. We are not
measuring response times. We wanted to know about frequency of
use. Exits from screens are difficult to find because you
cannot draw conclusions from all sides.
- Francis we want to know the orientation of the PDA. How are
they carrying it? Their activity?
- Antti - the clocks have to be synchronized.
Katie M. - Anyone can be your user - different from the military
domain. Did you do any kind of segmentation to say we think 18-34 use
it like this and teens use it like this...
- Francis - we did not have enough people. Women like the map and
not every man. Too few to create these segmentations - we had 12
people.
- Sheila - we found rotating the map reduces positive load. It is
difficult to do client segmentation because of the ages. We
tested 18-34.
(Katie Would it been better to target a certain group and say we know
it would work for this group?)
- Marianna we choose our groups based on ethnographics. We
emphasize recruiting elderly people. We use demographic data and
get the most interesting people with families that have
multigenerations in the house. Our focus is on elderly's use of
remote controls and decreasing the amount of remotes and
complexity.
Paula - Push to talk - They voiced their frustration with delays but
they didn't change their behavior. But in military, they just did not
use it. In a lab environment could you give them an out to use a
different method to see if they'll just use a different method?
Gitte - this is the problem with lab environments. Our
company wanted certain results in this time frame so we
couldn't do it.
Antti to Shamima - You talked about entertainment. Perhaps you can
look at this first person shooter game that uses Push to talk
technology that simulates stress.
Antti - User sampling from Katie's question. We did not try to
generalize. It is very dependent on who we find. We took one group of
users and we identified natural users of the application. We don't
create artificial groups so they start communicating. It was a natural
clique and we chose the best one.
Avi - retroactive partitioning can be good as well and see if anything
you learned about them correlates to the experiment.
Bruce - Measures of multi-user interfaces. We have people communicating
through multiple systems. What kind of measures did you use? What were
successful? What did not work?
Antti - We had to throw away a lot of quantitative data. Individual
phone logs did not work well. How to log stuff from a mobile phone is
difficult. (Bruce: Was satisfaction measured?) No
Gitte - How many people have background in psych? Four people have a
background in psychology.
Gisele - What were your instructions by your sponsor? Did they define
a frustration level?
Shamima - They did not define a frustration level. They were trying to
decide if they would invest money in these technologies.
Gitte - We were told, we want to know how much of a delay will people
take and still buy our product? We had to translate this into
something that is measurable.
Gisele - When you got your data, it may have shown how frustrated people were.
Gitte - We know what people say and what they do are 2 different
things. We were trying to categorize the frustration.
- Missed some questions while tried to get construction noise lowered. -
Antti - Field studies have different questions than field studies.
Katie S. - Did they notice the observer?
Antti - Yes - they knew I was there. I had cameras strapped to my
waist on them. They got used to me.
Kay - When participants asked to help you, did you do anything? Did
you help them?
Antti - Only if it was a system problem because without the system
working, you get no data. I helped by facilitating their use, but not
giving guidelines.
Francis - On mobile skiing - one device failed during the study. It
can fail very quickly. So you have to have people there to help.
Avi - Regina said maybe we should learn from other
disciplines. What disciplines?
Regina - Marianna helps because she is from communications. Research is slowed
down because you have to look it. It is not a rush in experience.
Antti - HCI is a field with no own theories - we have to take from
other fields. We need a lot of interaction analysis. Learning sciences.
Katie S. - Have to expand your network to find papers in other areas.
Paula - Has login to medical school.
Glascock - Commercial loan - interlibrary loan is too small. Problem
is getting access to information. Companies to share, they sue. The
stuff that is really happening is not published. How do you get access
to the stuff that is not published. How do you publish? What do you
do? Where do you publish? Licensing agreements are too slow - 6-9
months to get something reviewed and approved and published is too
hard. It is paleotechnology. Companies want a two week study.
Katie M. - Worked for gesture interface for General Motors. You are
inventing everything and making a lot of assumptions. Was there any
way you think the quality of your prototype and how it affects your
results?
Shamima - We said here are the issues with this. We got some help
because the companies gave us some ideas of what the project looks
like.
Glascock - How did the company respond when asked for requirements?
Gitte - They just said to do the best you can.
Glascock - We are talking about reliable in a research world. They are
talking about reliable in a Mr. Coffee world. Language issues -
corporations take it in a marketing sense. They just choose what they
want to hear.
Sheila - Outdating is a big problem too. A masters student created the
application. Now PDAs come with these with lots of orientation
technology. But nothing says which method works best when. Maybe that
is our job - to say what works when and how to market it.
Glascock - How do you present your results to a company with high
demands?
Gitte - We collected a lot of data for a system and management did not
really care about the problems we found. They were not owning the
problem.
Antti - We observe people in their real life so they are not willing
to tell everything. We can find this out by logging stuff, but we
still do not know everything. How do we deal with this?
Jambon - we have a committee that approves our studies
Antti - People have to sign a consent form. We have to configure
phones in front of the people so it is really complicated. We
use the same form as the company. The company has a legal
department.
Edwards - We have a twist on medical records and consent. There is a
lot of regulations around HIPAA. Even having someone who is not an
employee of the hospital comes into question. You can get into legal
question by looking at the data. We are working with clinicians and
giving them a crash course in human factors and they are the
observers.
Marianna - some enterprises are not willing to test a prototype with
someone not related to that enterprise. The enterprise is in Japan. So
we test Austrians assuming they will not talk to the Japanese.
Antti - We always combat that problem by building our own prototypes.
Katie M. - How to minimize affect of creating your own prototype? How
to manage the gap of something that is not available yet.
Marianna - It depends. The prototype is to get you to the next phase
of the product.
Antti - It is all tradeoffs - if design is more important, than that
is what we focus on. We are not building real products - they have to
be redesigned to make them proper. But to find out answers to certain
issues, they help us.
Shiela - For us it was important. It was a very specific research
question. The CS student was also interested in the question. The
software is for the evaluation. It is not a ready to deploy type thing
- it is for testing.
Overall Ideas:
- Communication/language is important
- Lab and Field combination of experiment
- Testing prototype of rough system vs. testing full system
(military full system)
Gisele's Sum Up:
The answer: It depends...
- which methodology
- which field vs. lab
- prototype vs. full system
I don't think we're going to come up with a solution. We have to look
at how to combine the approaches to solve these problems.
Meta Evaluations - Classic and "in the wild" experimental settings are
used at the same time
- Analysis performed separately and their results are compared
afterwards
Map Mobile 2005 Experiment
User moves within a professional building floor during work hours
His/her position and suggested routes are displayed on a moving map
Contextual messages are proactively displayed
Real time logs
Data collection
- Usability analysis
- user comments
- device screen copy
- context dv camera (not remote)
In The Wild Analysis Methodology
- User actions
- Interface feedbacks
- User localization
Compare results
- Many false negatives (problems not detected)
- facilitator intrusion preventing errors
- data analysis techniques limitations
- Talks seemed too long and potential error in software when
participant was really talking with friends
New experiments
- eSkiing - using system on ski hill - cold, snowfall
- Museum - many thousand visitors, one month duration
- New analysis method - automated usability analysis
- It took us 3 months to analyze data by hand; need automation
Antti Salovaara - The Panopticon
Challenge: user mobility, sporadic use, privacy issues,
unpredictable open-ended task settings, and latent social
conventions.
vs.
Capturing a rich picture in order to get new findings.
Panopticon - hybrid method
- remote data collection
- direct data gathering
Allows for - unobtrusive data logging and increased sensitivity
He met with 7 high school students every 2 weeks
Potentials
More reliable results; Extended field trial durations
- Better timed and prepared interviews - know most
important times of their use
Content interventions
- changing the content in the system remotely
- analyzing reactions
Prototype updates
- remote parameter tuning of the prototype
- analyzing the reactions
Regina - Home or Factory
Testing in the Home Environment
How can we test/measure/find out usability problems in non-traditional
areas
Three month field trial on interactive field trial
Problem - they don't want 10 cameras in their home and how do I afford
10 cameras?
Not helping to do one method. We have to bundle them to contribute to
the additional information we have.
1. Literature analysis - We found the best stuff in the field of
architecture. We have to learn from psychologists, architects,
etc. We have to know everything...
2. What kind of new methods do we need? How can we integrate
methods from other fields
They do half and half - somewhat in a lab and make it realistic.
3D games - novice users were very enthusiastic and were running into
the walls. You must be careful.
We have to invent new methods where we can test. We really tried to
separate the data step by step. If the context is relevant we test it
in this setting.
Combining and comparing methods from several fields in HCI (e.g.,
mobile HCI methods at home)
Extending the basics in HCI by importing new methodology from other
disciplines
Entertainment Applications - Shamima
How are people going to react to delays with new technologies? Push to
talk and next generation of television - internet based
television. What are people's tolerance for these technologies.
First study
Early speech - people did speak before they were supposed to. Showing
they were impatient with delays. They were annoyed by the delays. We
could not find any behavioral evidence.
When the delays were increased, they were even less likely to speak
before the signal.
No effect of urgency.
Second study
Simulated conversation, hid equipment, gave them a scenario
They found the same thing. When there was an urgency scenario, they
did not show frustration in their study.
Third study
We had to do the t.v. experiment in a lab
Conclusion
- Unable to achieve ecological validity
- Suitable performance measures
- companies want the data even though technology does not exist yet
- sponsors want requirements - magic numbers (e.g., no one will buy
your system if the system takes 2 seconds)
- industry wanted the results yesterday
Testing moving map algorithms - Sheila Narasimhan
How map rotation helps with navigation.
Challenges
- will map rotation help way finding
- how to test the rotation
Study 1: Simulation in the lab
Were only using users preferences and wanted to look at perceptions
Study 2: Had them use PDA to navigate through campus
Conclusions and open issues
- Lab study provided sensible results
- identified need to improve adaptive algorithm
- did not help in deciding on a specific algorithm
- not ecologically valid
Is lab-field combination effective and efficient?
Takes a lot of effort and resources
Is the field orientation test ecologically valid if the field studies
support the lab studies?
Panel Discussion
Which combination will work for all our particular cases.
Morning we had a focused in the field lab experiment.
This group is combining lab and field experiments.
Kay - Automatically gathering and logging data. It is difficult to go
back and analyze it. Key strokes may be too detailed.
- Francis - all of it is important. If you find something wrong,
you need to go deeper and look at what is incorrect. Start at
the high level and go deeper for special cases. Can look at
variation of task duration.
- Antti - Quality of data comes first. If we get a feeling about
certain important issues, then we may go into the log data and
look at it there and how it looks in those terms. We are not
measuring response times. We wanted to know about frequency of
use. Exits from screens are difficult to find because you
cannot draw conclusions from all sides.
- Francis we want to know the orientation of the PDA. How are
they carrying it? Their activity?
- Antti - the clocks have to be synchronized.
Katie M. - Anyone can be your user - different from the military
domain. Did you do any kind of segmentation to say we think 18-34 use
it like this and teens use it like this...
- Francis - we did not have enough people. Women like the map and
not every man. Too few to create these segmentations - we had 12
people.
- Sheila - we found rotating the map reduces positive load. It is
difficult to do client segmentation because of the ages. We
tested 18-34.
(Katie Would it been better to target a certain group and say we know
it would work for this group?)
- Marianna we choose our groups based on ethnographics. We
emphasize recruiting elderly people. We use demographic data and
get the most interesting people with families that have
multigenerations in the house. Our focus is on elderly's use of
remote controls and decreasing the amount of remotes and
complexity.
Paula - Push to talk - They voiced their frustration with delays but
they didn't change their behavior. But in military, they just did not
use it. In a lab environment could you give them an out to use a
different method to see if they'll just use a different method?
Gitte - this is the problem with lab environments. Our
company wanted certain results in this time frame so we
couldn't do it.
Antti to Shamima - You talked about entertainment. Perhaps you can
look at this first person shooter game that uses Push to talk
technology that simulates stress.
Antti - User sampling from Katie's question. We did not try to
generalize. It is very dependent on who we find. We took one group of
users and we identified natural users of the application. We don't
create artificial groups so they start communicating. It was a natural
clique and we chose the best one.
Avi - retroactive partitioning can be good as well and see if anything
you learned about them correlates to the experiment.
Bruce - Measures of multi-user interfaces. We have people communicating
through multiple systems. What kind of measures did you use? What were
successful? What did not work?
Antti - We had to throw away a lot of quantitative data. Individual
phone logs did not work well. How to log stuff from a mobile phone is
difficult. (Bruce: Was satisfaction measured?) No
Gitte - How many people have background in psych? Four people have a
background in psychology.
Gisele - What were your instructions by your sponsor? Did they define
a frustration level?
Shamima - They did not define a frustration level. They were trying to
decide if they would invest money in these technologies.
Gitte - We were told, we want to know how much of a delay will people
take and still buy our product? We had to translate this into
something that is measurable.
Gisele - When you got your data, it may have shown how frustrated people were.
Gitte - We know what people say and what they do are 2 different
things. We were trying to categorize the frustration.
- Missed some questions while tried to get construction noise lowered. -
Antti - Field studies have different questions than field studies.
Katie S. - Did they notice the observer?
Antti - Yes - they knew I was there. I had cameras strapped to my
waist on them. They got used to me.
Kay - When participants asked to help you, did you do anything? Did
you help them?
Antti - Only if it was a system problem because without the system
working, you get no data. I helped by facilitating their use, but not
giving guidelines.
Francis - On mobile skiing - one device failed during the study. It
can fail very quickly. So you have to have people there to help.
Avi - Regina said maybe we should learn from other
disciplines. What disciplines?
Regina - Marianna helps because she is from communications. Research is slowed
down because you have to look it. It is not a rush in experience.
Antti - HCI is a field with no own theories - we have to take from
other fields. We need a lot of interaction analysis. Learning sciences.
Katie S. - Have to expand your network to find papers in other areas.
Paula - Has login to medical school.
Glascock - Commercial loan - interlibrary loan is too small. Problem
is getting access to information. Companies to share, they sue. The
stuff that is really happening is not published. How do you get access
to the stuff that is not published. How do you publish? What do you
do? Where do you publish? Licensing agreements are too slow - 6-9
months to get something reviewed and approved and published is too
hard. It is paleotechnology. Companies want a two week study.
Katie M. - Worked for gesture interface for General Motors. You are
inventing everything and making a lot of assumptions. Was there any
way you think the quality of your prototype and how it affects your
results?
Shamima - We said here are the issues with this. We got some help
because the companies gave us some ideas of what the project looks
like.
Glascock - How did the company respond when asked for requirements?
Gitte - They just said to do the best you can.
Glascock - We are talking about reliable in a research world. They are
talking about reliable in a Mr. Coffee world. Language issues -
corporations take it in a marketing sense. They just choose what they
want to hear.
Sheila - Outdating is a big problem too. A masters student created the
application. Now PDAs come with these with lots of orientation
technology. But nothing says which method works best when. Maybe that
is our job - to say what works when and how to market it.
Glascock - How do you present your results to a company with high
demands?
Gitte - We collected a lot of data for a system and management did not
really care about the problems we found. They were not owning the
problem.
Antti - We observe people in their real life so they are not willing
to tell everything. We can find this out by logging stuff, but we
still do not know everything. How do we deal with this?
Jambon - we have a committee that approves our studies
Antti - People have to sign a consent form. We have to configure
phones in front of the people so it is really complicated. We
use the same form as the company. The company has a legal
department.
Edwards - We have a twist on medical records and consent. There is a
lot of regulations around HIPAA. Even having someone who is not an
employee of the hospital comes into question. You can get into legal
question by looking at the data. We are working with clinicians and
giving them a crash course in human factors and they are the
observers.
Marianna - some enterprises are not willing to test a prototype with
someone not related to that enterprise. The enterprise is in Japan. So
we test Austrians assuming they will not talk to the Japanese.
Antti - We always combat that problem by building our own prototypes.
Katie M. - How to minimize affect of creating your own prototype? How
to manage the gap of something that is not available yet.
Marianna - It depends. The prototype is to get you to the next phase
of the product.
Antti - It is all tradeoffs - if design is more important, than that
is what we focus on. We are not building real products - they have to
be redesigned to make them proper. But to find out answers to certain
issues, they help us.
Shiela - For us it was important. It was a very specific research
question. The CS student was also interested in the question. The
software is for the evaluation. It is not a ready to deploy type thing
- it is for testing.
Overall Ideas:
- Communication/language is important
- Lab and Field combination of experiment
- Testing prototype of rough system vs. testing full system
(military full system)
Gisele's Sum Up:
The answer: It depends...
- which methodology
- which field vs. lab
- prototype vs. full system
I don't think we're going to come up with a solution. We have to look
at how to combine the approaches to solve these problems.

0 Comments:
Post a Comment
<< Home