Saturday, April 22, 2006

Military Domain Session



Katie Minardo, MITRE Corporation


They created a system to call in air strikes for U.S. Air Force. Users
in the field calling in strikes during times of need (high
stress). Created methodology working with expert users to help with
observations (interviews and expert reviews used). Attempted to
distribute inherent bias risk by using multiple evaluators.


Keesah Hall, GTRI


This paper describes a performance support system for aircraft
maintenance. They used many evaluation techniques with varying
results:



  • Evaluation while completing real world tasks (think aloud technique)
  • Users did controlled scripted tasks with an evaluator
  • Cognitive Walk through (not well described, I’m interested in hearing
    more)
  • Pluralistic Expert Evaluation (many expert users working through
    tasks together with an evaluator)
  • Surveys to collect problems when the evaluators were not there, but
    users were still using the application.


Katie Minardo, MITRE Corp.

Looking at how to do remote evaluation of military application.

What we did:

  • Detailed Heuristic Evaluation
  • Field Observation #1
  • Field Observation #2
  • Secondary Heuristic Evaluation - New Hardware
  • Competing Heuristic Evaluation - Army testing


Keesah Hall, GTRI, MEPSS Maintainer's Electronic Performance Support System


Adapted for unmanned vehicle
Hosted on a tough book - rugged laptop
Main Components:

  • Online manuals; Very good for use in Iraq - laptops are heavier and
    more sturdy and won't blow away
  • Maintenance History;
  • Shift pass down
  • Trouble shooting tips - gives tips to future mechanics (tips have to
    be approved before posted)



Evaluating the system while completing real-world tasks - be ready to
spend some down time with your testers waiting for the appropriate
problem to appear.



Formal usability inspection in real world environments - be sure you
have reserved an onsite location designated for testing and include only
one test subject at a time.


Pluralistic evaluation - prepare strategies to help keep testers focused
when they veer off track.

Surveys - Explain to testers that even bad feedback is positive. It's
not you - it's the system we're testing.

Trial Period - Lock down system to prevent unintended use of system.
People spent time playing games.

Questions

Avi Parush For Katie: Two rounds of heuristic evaluation - who did them according to which method? Did you verify?



[Katie M.] Heuristic results were shown in the field.



[Keesah] We used Jakob Nielson heuristics. People weren't using the system
enough to get results. Targets like drop down menus were too tiny,
screen too small, too heavy.

We don't know how accurate our findings are. Heuristic evaluation may
not be best to start out with in these environments.


Antti S. - How did you create your methodology? How did you decide on field studies and then heuristic studies?

[Katie M] I wouldn't do it this way if I did it again. We were hired as
consultants.

Second time we had people who had used it more recently. The first time
we had people who had been deployed for a year and then they came back
and told us what they did. But they didn't use it enough.

They preferred to use voice.

Bruce Tsuji - Did the competing device - was it operational? Is there a degree of acceptance for that alternative? Is there some way to do a heuristic
evaluation with the competing device.

[Katie M.] Yes that would have been great. The Marines have a similar system. Let's compare the Marine system to our Airforce system.

Gisele Bennett - Were you involved in the original design?

[Katie M.] No, I came in to find out what was wrong with it.

Katie S. - Did you have the same problems as Katie is mentioning about
weight and not wanting to use it?

[Keesah] No because they are used to heavy manuals

Shamima Khan Katie - they didn't want to use it - how did that affect your study.

[Katie M.] Yes it did really affect how we evaluated it. That is why we used
testing field studies to evaluate it.

[Shamima] We had problems too with a vending machine reloading system
about figuring out order of user studies.

Gitte Lindgaard - How did you create your plan?

[Keesah] We modified our plan based on how things were going. If we needed more
data, we would change our plan as necessary.

Avi - You used multiple methods. How did you correlated? Is this your
recommendation for Reality Testing - to use multiple methods. Would
you do this next time?

[Katie M.] Yes, we had to do it this way. You patch stuff together
because you cannot model reality.

[Keesah] You have to have different methods - Nielson's work. But his
methods have to be tweaked a little bit. Trial periods and surveys
were very helpful. Coming up with tasks were key.

[Katie M.] Ask people if they would do it differently. Training was not
realistic - we asked them if they would do it this way and they said,
"No we would never do it this way." We have to identity where the
disconnects are between training, tests, reality.

Regina - We have limited access to the users (e.g., 2 hours with 2 users). What kind of method would you use when you only have 2 hours with 2
peoples?

[Keesah] Perform an actual task using the full system. You can see
exactly what they would do. Did they have trouble getting it into the
cockpit and using it. We couldn't use video tape, we had a coding
scheme with our own symbols. We had to write as fast as we could. As
soon as we got back to the hotel, we completely wrote down
everything. IF you do it tomorrow you'd lose it.

[Regina] I learned something - ask them to give me the time during
the most important tasks.

Antti - They were a few you and it may be helpful to tape record
yourselves talking about things.

[Katie M. & Keesah] We each typed up our own notes and discussed it later.

Kay - Can you please tell us about your field evaluations?

[Katie M.] We could not bring laptops so we did our own coding and wrote notes
quickly. They had three people doing field studies and they took
notes together right after during the debriefing. So someone had each
people covered.

[Keesah] - (did you do something similar? Did you do shadowing?)
We had something different. We had two laptops going with three
evaluators. We'd break up 2 and 1.

Avi - All the observers are human factors/usability. Is this correct?

[Keesah] 2 usability and 1 developer. At one point we had 3
usability. At one point we had subject matter experts working on the
environment. They helped us translate the information that some could
not follow.

[Katie M.] We had the same type of thing. We had an Airforce expert that
could help us review our notes and information about what they were
doing and why they did it.

[Keesah] It also helped with user acceptance. Because they saw someone
like them . User acceptance Translated process and procedures
Helped with "guessing" since we had such limited access to end users
It helped minimize the number of iterations

[Katie M.] It also helped us create good, relevant questions. Need
someone who can answer the basic questions.

Avi - Were there any problems having end-users accept the system?

[Keesah] we had a few mechanics who had never used a web browser. They
were really concerned about their spelling. We ended up putting in a
spell checker because they were nervous about doing something
wrong. It had something to do with education and comfort levels with
computers.

[Katie M] The only problem we had is the military does not like to
complain about things. They don't want to say something is hard, they
are having problems with it, etc. You have to let the know it is okay
to complain about things and we won't tell your supervisor. Had to get
the supervisor out of the way so he was not interrupting the
system. Had to get the developer out of the way so he wouldn't say how
great it is in front of people.

Antti - Do you have an expert with you on your development team?

[Katie M. and Keesah] Yes, we do that did help us.

(Did he work as a "translator"?)

[Katie M.] We had some of that. He had a contract of interest because he was on
the team to make it a success. So we had to understand the reality of
the situation. So he was there to create the product.

[Keesah] We had that. We had a military higher ranking individual. So
he was not the same rank as the mechanics we were working with. The
mechanics were interested in impressing the higher ranking
official. But that could have been an issue.

Regina - What would be your new method? What kind of data would you want to
collect? Some methods have some short comings? What kind of data is
still missing? In this environment - what data is missing? What would
you want to have additionally?

[Keesah] I want to know what the user would design. That would be the
most helpful. What would they create?

[Katie M.] We have 3 weeks before the next system comes out. What if you
redesign the screen they spent 90% of their time on. Keep giving them
the options? Would this be the correct direction? They know it when
they see it even if they cannot draw it. We found that they preferred
a system they used 10 years ago.

Shamima - They are very enthusiastic about what features they want. But that
may not necessarily be the right thing because they are thinking bout
other things. People do not use a lot of what is available in every
system. You do your every day tasks in your preferred system.

How much users are affected by us using them. I want to be
invisible. Get the knowledge of what they do when no one is
looking. What if the military superior is there?

Gisele - We need to be invisible but we need the interaction to get
data. Sitting there and talking with you is just as critical.

Gitte - People want umpteen functions, they don't think how these
features will be used. They'll think it is fine if it makes coffee
too. They get used to us being there fairly quickly.

Katie S. - Who do they think you are?

[Keesah] They thought we were the rah rah girls. They thought we were
just students. We spent time with them doing ethnographic studies. We
were integrated into the system.

[Gisele] The repeat visits were useful. Embedded usability.

[Katie M] They knew we were not military and we were out of place. We
told them we knew the system was horrible. We were on their side and
there to improve it?

Antti - Total different than I want to use. (He does panopticon) We
are talking about ergonomics, usability, methodology, and acceptance.

Gitte How does this relationship helps you trust you? Does it affect
the value of your data?

[Keesah] I think they would tell me things more candidly because I did
know them. They were more open because we were changing the design.

Paula Edwards Had the same experience in health care environments. They
would disclose more.

Antti - Visiting multiple times help.

Kay - Katie, visiting once like you id - did it create a "hump" in
communication?

[Katie M.] - We were out three or four days with them. They were getting
used to us. We had to wait for planes coming in. IT would take .5 to 1
day to get comfortable with everyone.

Avi- In Katie's case the were a Q about acceptance of the system. Maybe a
more multidisciplinary research team could help. There are probably
organization and selection of personal influences acceptance of
system. Perhaps training people, organizational psychologists, etc.

[Katie M.] that is definitely the case. They passed the system, but it is difficult. There is not a lot of acceptance in complete organization .

Bruce - Automatic data logging - keystroke level facilities. DId you use it?

[Keesah] - we did have logging - not at the key stroke level. (Did you
use that data?) Yes we did. And they didn't know it. Did they take
the time to use feedback.

[Katie M.] We were not allowed to change the system and add that in. We
were not allowed to capture that type of data anyway.

Katie S - Did this personal relationship affect number of participants
you can have?

[Keesah] 9 mechanics; There are times we'd visit another group and let
them look at the system. Probably a total of 15.

[Katie M.] we looked at 4 to 5.

Paula - the war stories are important.

Regina - Would argue for having both. Stats and qualitative. If I am
testing mobile phones or home environments. Context related depends on
number of people for study.

0 Comments:

Post a Comment

<< Home