Saturday, April 29, 2006

Full Workshop Notes

Highlights from the Reality Testing Workshop,
CHI’06, Saturday 22 April 2006, Hyatt Hotel, Montréal, Canada

Military
Presentation highlights
Challenges
• Hostile environment
• users work with small handheld computers
• users carry heavy pack loads (20 pounds); wear thick protective gear (gloves, helmets, goggles), making it impossible to work with small PDAs
• users are in motion much of the time.
• Researchers cannot observe and capture data in situ
• users are not available for walkthroughs
• software designed for desktops and transported to PDA

Methods used:
• 1st Heuristic Evaluation
• 1st field observations: simulated field use with no connectivity and field interviews
• 2nd field observations: limited connectivity
• 2nd. Heuristic Evaluation
• Competitive HE trying to push team towards smaller, simpler solution.
• Interactive training snippets presented on laptop (heavier than small PDA; don’t blow away)
• Reports for new shifts handover
• Implemented trouble-shooting tips, which have to be approved before being presented.

Lessons learned:
• Evaluation of system while completing real-world tasks: be ready to spend some down time with testers or be on-call while waiting on a task that meets your requirements.
• Usability inspection in real-world environment: be sure to have reserved an on-site location designated for testing; include one participant at a time.
• Pluralistic evaluation: prepare strategies to help keep testers focused when they veer off track.
• Trial period: system was available for about three weeks, but locked to ensure that they were not using the systems for unintended use.

Discussion
Q: Which HE methods did you use?
A: Nielsen method as well as methods available in military. Found many of the same problems in field study, but subjects did not use system frequently enough to give a lot of feedback.

Q: Why did you use the HE method first followed by field study?
A: The process we presented in our paper is not what we would do next time. We had a very short lead-time; then we realized that there were bigger problems, so we requested additional funding. In the first field study we simply received training as users; in the second field study we observed airplanes in the middle of nowhere

Q: Was the competing device operational?
A: Yes.

Q: Could you have conducted HE on both devices?
A: Yes! So next time we will try to get hold of competing devices first.

Q: How were your methods organized?
A: We followed a T&E Plan, but modified it on the run.

Q: You both (Katie, Keesah) used multiple testing methods – is that what you’d recommend?
A: Yes. This is absolutely necessary in these studies.

Q: What methods would you use if you had limited access to users?
A: Observe them do actual tasks to learn what users actually do. This helps us to ask them for specific time to gain most from the observations. We also worked in a group,as we could not record any data electronically, so we relied on coded handwriting (activity analysis); after a session, we went over it with users. Katie observed 6-7 people simultaneously, but everyone shadowed only one person. Keesah had everyone in one room (several people working simultaneously: two usability people, one SME in the work environment, one developer. SMEs are helpful as they can translate the process, procedures, language and observations; they also help with user acceptance and specify what questions to ask – key element as we had limited access to users.

Q: Did users accept testing situations?
A: Some users had limited computer experience; users concerned about spelling; hard to get ‘truth’ from users e.g. would not admit that load was heavy and cumbersome. Another problem was that we worked with systems engineer who were of a higher rank than the mechanics who were the users – this may have hampered what users were willing to say

Q: What would be an ideal method in your situation? What kind of data would you recommend collecting?
A: More info on how users were designing a system to support their tasks? Katie Siek observed in her (dialysis patients) study that users did not deliver anything of value, but simply regurgitated what she had given them. Katie Minardo would give them several options; people know what they like when they see it. So, rather than completely open –ended, give them something to pull to pieces. Trust developed in Keesah’s study as she followed users through whole process, got to know them personally – probably made users willing to make suggestions for the system, and opened dialog. Katie M. was introduced by high-ranking office, which seemed to help build trust; spent 12 hours with them; spent lot of time sitting around waiting for things to happen; probably took ½ day to build trust. Involving multidisciplinary evaluation team could help increase validity of data, and also to select what data to collect. Some data were automatically logged in Keesah’s case – could see what users were using and not using; used these data for interpretation and allowed users to inspect and go over it at the end.

Q: What was the number of users involved?
A: (Keesah: 9 usually, but in some cases 15; in Katie’s case, 5). We won’t be able to do statistical analyses, but qualitative studies have a lot of value; deep stories have different value, and you gather much more data from that one person that you spend a lot of time with.

Everyday life
Presentation highlights
• User mobility was a big problem for one research group; they exhibited sporadic use, there are privacy issues, settings in which the technology is used is unpredictable, and tasks as well as task settings are open-ended.
• The problem is that the number of users outnumbered the number of researchers in that study.
• The Panopticon method turned out to be a good solution; it facilitates extended field trials leading to more reliable results, and this also facilitates updating of prototypes
• Studying ITV is a big challenge, as users don’t like multiple cameras in their homes; the best studies of people in their homes come from architecture
• When conducting 3-D studies we found that users ran in front of the screen and continued until they hit the wall
• In some studies it is unclear what behavioural data to collect when measuring user tolerance towards delays, where these are a reality of new technologies. This group wanted to predict ‘user reactions’, but these were not observed as anticipated. Apparently, the problem was our failure to achieve ecological validity
• Triangulation of methods, and new methods required; existing methods insufficient
• Information and data not accessible in public domain
• How can we simulate challenging environments? (military used training, competitive analyses, HOTLab used situations, Regina used outdoor test environments, Panopticon used a mixture of remote and on-site observations; Francis used a variety of situations and technologies
• The Panopticon researchers found that a courtyard enables slightly more natural observations than when a study is confined to usability lab
• Need to know orientation of PDA when collecting data esp. rotating maps

Discussion
• Sampling issues: Would population segmentation have benefited the studies?
• Austria did do this, based on population demographics; took demographic data in the home first (e.g. Grandparents living with family), then selected the most interesting ones (based on financial distribution); Finland went for ‘natural’ groups, much broader, heterogeneous population, used very large group, living in the same area as the researchers. In favour of this argument vis à vis the Austrian approach is that you cannot generalize across populations; instead of a priori population selection, analyze data from heterogeneous population to learn which factors correlate
• Giving people ability to choose between two technologies (tested barcodes and voice-input), see what participants select

Q: What behavioural measures did other teams use?
A: SF had to throw away lots of quantitative data
• Tendency to collect lots of data and only decide post hoc what data may yield meaningful information
• Methodological approach not always clear if it is best to start broad or narrow, i.e define data to collect before the study starts, or collect data and then look for patterns
• Sometimes we need to back up field studies with lab data; other times it works best the other way round – lab studies preceding field study
• Field studies demand different research questions than lab studies – hence method triangulation.
• Apparently we need hybrid methods: we must import methodologies from other disciplines, then combine and compare methods from diverse fields (communication science, psychology, computer science, architecture)
• Probably the interplay between lab and field studies will yield better results WRT ecological validity
• Studies should be designed such that we would start broadly, then go into depth – collect lots of data first, then define what you need; stated in another way, begin with qualitative data, then continue with quantitative data once the research problems have been tightly defined
• In qualitative studies where no response data are generated, look at frequencies and usage patterns, so log everything until you can discern patterns



• Best way to gain access to research that is not published is to get to know the relevant researchers personally; perhaps establish contact with medical schools (these papers are often very expensive, although they are accessible in the public domain), but it is still a challenge to get hold of papers due to these being commercial, proprietary, military secrets, can only get relevant papers during a project. One major hassle is that even if researchers can get proprietary papers, they usually cannot publish their results;
• When working with industry timelines are extremely short. One lesson learned was the need to translate data into dollar statements wherever possible and ensure that these are backed up by data
• Select data carefully that you give to industry sponsor (e.g. in some environments we know that we get ‘silly’ or embarrassing data anyway)
• Privacy problems: e.g. in museum cannot get names of people; necessary data may not be available or accessible
• Some corporations are not willing to test their prototypes because they are too clunky – as a consequence, some researchers build their own prototypes and applications; often this work is done by students, but this creates gaps between prototypes and eventual product that are not always foreseeable.


Summary of limitations:
• Access to users
• Technology does not exist yet
• Access to relevant literature
• Privacy issues
• Interaction with industry
• Cannot predict what behavioural data will answer research question
• Timeframes (e.g. short-term studies; difficulty understanding other disciplines’ language, sets back study)
• Resources


Medical
Presentation highlights
• Placement of sensors is a challenge. For example, we placed some sensors too close to vents which produced a lot of noise in our early data. Once you have too much data it becomes uninterpretable. It is essential to set individual thresholds very carefully so as to avoid too many false positives. Therefore we found that it makes sense to collect trend data rather than continuous data. We also learned that “human validity” is extremely important: if information is not presented as people want it, they simply won’t use the system
• Stakeholders have different agendae  your role is to make your boss’s boss look good in his boss’s eyes (i.e. how does your boss earn her brownie points?)
• Some users have very low literacy levels, so researchers must be sensitive to the various stakeholders’ cultures and take this into account. Right from the beginning, researchers must develop relationships with users, as users will only volunteer a lot of information once they trust the researcher. We have found this development to take ½ day or more.
• Researchers must plan to multitask, as in some environments only one researcher is allowed with a user at a time
• Researchers must also realize that they will encounter a great deal of variability in work practices between different professionals, different work places, and different users. Furthermore, researchers have no control over the product, the development cycle, or product deployment; they often come on board late in the process.
• In a longitudinal study it may be worthwhile to train users in HF/HCI methods to create a continuous channels of feedback

Discussion
Q: If you could do this again, what would you do differently?
A: Get out of university environment; chasing patents took far too long; too many meetings
Q: How to test reliability of a real-time system placed in real people’s homes?
A: Ensure that system records motion when motion actually performed and not in the absence of motion.
Q: How did you do reliability checks?
A: We had six motion detectors and cross-reference outputs. Note, however, that the data analysis is extremely time consuming. Note also that environment: signals cannot get through steel in the walls. Activity thresholds were determined for each individual; it takes three months to get valid longitudinal data. It is important to note that researchers do not have control over the study or the time frame.

Q: When is it safe to start collecting data in a longitudinal study?
A: We ignore the first two weeks because we know that people play with the system – they wave at the sensors to see it they work; they open and shut fridges, go to the bathroom just for the sake it, and so on, so the signals are completely unreliably. Over time, they forget about the sensors and settle down to normal life. One should, however, be sensitive to changes in patterns in the data. For example, people behave differently on weekends and on public holidays than during the week.

Other challenges we faced:
• It is extremely difficult a priori to determine the nature of the outcome of a longitudinal study
• Difficult to measure impact of technology and intervention
• There is a distinct disconnect between the various stakeholders and costs/benefits
• Often it is difficult to determine who gains from such a study (patients, physicians, hospitals, health system??)

0 Comments:

Post a Comment

<< Home