Pragmatic Evaluation: Lessons from Usability

C. N. Quinn
School of Computer Science & Engineering
The University of New South Wales
Sydney, NSW 2052
C.Quinn@unsw.edu.au

Evaluation is an important component for any development project, requiring both formative and summative feedback. What is desired is a systematic approach that yields reliable results for the minimum investment. In the Human-Computer Interaction field, the usability inspection method has been validated as providing just such a cost-effective evaluation method. This framework may be adaptable for educational evaluation. Here I present the usability approach, which has direct application to educational software development, and then derive the educational equivalent to suggest a pragmatic evaluation approach for educational effectiveness.

1. Introduction

The current enthusiasm for multimedia applications to education has led to financial resources being made available for development. What typically has not been made available is any support for accomplishing evaluation of the developed product [8]. Even if there is money for evaluation, there is pressure to minimise cost. What is needed is a pragmatic approach to realising the maximum evaluative information for the minimum investment.

An approach that has been developed in the Human Computer Interaction area over the past few years is known as usability inspection. This method has been validated as providing an effective balance between cost and results. Empirical studies have been used to compare different usability inspection methods, and this approach has been shown to produce useful results for low cost [6].

Usability inspection is particularly directed towards evaluating the ease of use of software. This is an area with which educational software must contend, so a presentation of this approach is useful in and of itself. However, there is a further use to which this approach may lend itself.

Evaluation of educational effectiveness is the primary area with which developers of learning software must be concerned. Software for learning might be usable but not educational, or vice versa. The final goal must be for systems to be both usable and educational. It would be desirable for educational evaluation to share the properties of speed, cost-effectiveness, and ease that the usability methods offer.

An analogy will be drawn from the usability methods to suggest a method for low-cost educational evaluation. This is a first consideration of the idea, and the result here will point to areas for further inquiry.

2. Usability Inspection Methods

Usability inspection methods evaluate user interfaces by having individuals examine a particular implementation with specific procedures. There are many methods to evaluate interfaces, including automated testing, formal modelling, empirical testing, and informal evaluation. Usability inspection methods fall in the latter category. The definitive guide to this approach is Nielsen & Mack's Usability Inspection Methods [11], from which much of this section is drawn.

During the development cycle, many evaluations are needed, and different methods are useful at different times during the development cycle [1]. After an information gathering stage, there is a design phase where iterations of the interface are prototyped and evaluated to converge on a final specification. Finally, the product is implemented and trialed. Usability inspection methods are best used as formative evaluation during the design phase, before a final specification is created.

There are a variety of dimensions along which usability inspection methods can vary. These include whether individuals work alone or together during the evaluation; whether experts, or users, or both, are the evaluators; and whether specific use-scenarios are used or whether the evaluators are allowed free reign to explore the implementation.

The specific types of usability inspection methods of most interest include heuristic evaluation [10], which uses a usability checklist; pluralistic walkthroughs [3] that use a mix of evaluators; and cognitive walkthroughs [12], where specific attention is paid to the thought processes of hypothetical users. Each has strengths and weaknesses. What will be presented here is the heuristic approach. Heuristic evaluation has proved to be the easiest to use successfully, as it is the easiest to learn [7, 9].

The essential idea starts with a prototype that evaluators can sit down with. Each evaluator is provided with a checklist of usability heuristics. At least two passes are made through the interface: the first to get a feel of the overall flow; the second to focus on the dialog elements of the specific interactions. The evaluator plays the role of a user. Evaluation sessions typically last for an hour or two. For more complex software the sessions should be broken down into sub-sessions of no more than two hours.

Each evaluator does the evaluation individually, and then the evaluators are brought together to compare notes. The evaluation team can come from three different pools of people: the developers, usability experts, and sample users. The usability experts and sample users are easy to justify; the reason to include the developers is to help sensitise them to usability issues.

An observer should be present during the evaluation, to answer evaluator questions and to prompt the evaluator to record all problems found. The observer will need to be familiar with the design, and typically is a member of the design team. However, the observer may not assist the evaluator through a difficult point and not have the evaluator record the difficulty.

For each problem found, the evaluator needs to identify what the problem is, with respect to either the usability checklist or any other specific statement of the problem. A specific list of ten items has been shown [10] to be effective (see Table 1) as a heuristic.


Ensure visibility of system status: the user should be kept informed about what is happening, told what is needed and in what format it should be provided
Maximise match between system and real world: use language and conventions familiar to the userMaximise user control and freedom: provide ways to exit or undo actionsMaximise consistency and match with standards: follow platform and interface conventions and standardsPrevent errors: anticipating and designing out potential errors is better than an elegant recovery mechanismHelp users recognise, diagnose, and recover from errors: errors that can not be designed out should clearly indicate the problem in plain language, provide options, and recommend a solutionSupport recognition rather than recall: make options and objects available for browsing, information should not be remembered from one part of the system to another; users should be able to point to, and not have to type, names that the system already knowsSupport flexibility and efficiency of use: provide shortcuts for experienced users, and allow users to customise frequent actionsUse aesthetic and minimalist design: maximise the 'signal to noise' ratio - remove irrelevant or infrequently used information and provide an appealing overall designProvide help and documentation: better if can be used without, but any complex system may require at least one reference to such information; should be oriented around users' goals, and easy to search
Table 1.

It is possible to do heuristic evaluation with paper prototypes, allowing early use of the technique. If the evaluators are not domain experts, a usage scenario (a fully worked out, and ideally comprehensive, sample task with the interface) can be used as a guide for the evaluator.

Nielsen has found that 3-5 evaluators are the minimum to be used to find a significant percentage of usability problems. He has done considerable analysis to demonstrate that this is a minimum number to achieve useful results, and that further results accrue with more evaluators, although such benefits follow the law of diminishing returns [10].

Note also that it is recommended that usability inspection methods alternate with empirical user testing. Results show that they find different usability problems, in addition to the benefits of the lower cost of a mixed approach as compared to using user testing exclusively.

3. An Educational Analog

Analogy can be used to suggest modification of usability inspection for educational use. The usability inspection method systematises the informal approach an expert would take in evaluating an interface, and quantifies the outcomes. The original process of a team of evaluators using a list of heuristics to work through a prototype will remain. However, several aspects will differ. The purpose here is to suggest how that process would appear, to provide a likely candidate for a mechanism to improve educational technology, and to start interest and debate to evaluate the method.

The mapping from usability to educational effectiveness requires several modifications to the standard usability process. In this case, the evaluators should include not only learners, but educational design experts and content experts. In addition, the list of design heuristics is now a compilation of elements of good educational design, drawn from typical educational problems found in existing learning software.

One approach would be empirical, to base the heuristic list on common problems found in educational software. The list Nielsen settled on [10], was developed through a factor analysis of 249 usability problems. However, lacking such a comprehensive list of educational problems, this approach cannot be used here. There is clear room for such an investigation.

A second approach would be for the list of elements of good pedagogic design to come from theoretical perspectives on learning and instruction. There are a number of candidate theories; the problem is to choose between them. A compromise would be to propose that there is a convergent model that covers the coarse elements that all agree on, and let this form the basis for an evaluative checklist.

Candidate theories include Cognitive Apprenticeship [5 ], Anchored Instruction, [4 ], Problem-Based Learning [2 ], and Laurillard's model of technology-mediated learning [8 ]. While the methods may disagree on the order in which the steps should be taken, these approaches implicitly or explicitly share an overall structure that includes two major components: having the learner engage in providing a sequence of designed activities that match the learner's abilities; and guided reflection, where the learners state their understanding and receive feedback. The activity should reflect and support individual differences, and the goals and outcome should be clear.

Specifically, the learner should initially be provided with clear goals of the learning activity. This could be through an advance organiser. The activity should be relevant to the domain, and meaningful and motivating to the learner. That is, choosing a problem for solving should be one where that skill is required in the real world, but is also important and understandable to the learner. Information resources should provide multiple paths of access and navigation to support different learner approaches. For example, learners could choose to look at the problem statement or read the underlying conceptual treatment first. Similarly, the conceptual treatment might represent a process both visually and with a text description. Further, activities must allow learners to take on significant responsibilities in an activity, but with support to ensure that the task is within their capability. This means that large problems could be supported through parts having already been accomplished, or by working on simple examples, or accomplished by teams.

Learners also need to be able to examine their actions and express their understandings, and should get support on this basis. Feedback needs to be personalised to be specific to the learner's actions and expressions, and constructive. Interactive systems can keep records of use that can be discussed later, or communication can be built into the environment. Further, the activity should have clearly stated goals that can objectively be evaluated. The desired outcome for the students should be clearly specified in terms of the context under which performance would be expected and how that performance could be measured. Finally, the learning process should provide support for recognising the utility of the learning for other contexts and support transfer to problems outside the learning environment. This could be through explicit pointers about different contexts in which the skill is applied, or practice across scenarios.

Based upon this synthesis of various learning theories, a preliminary draft can be proposed (see Table 2). The list has room for refinement.


Clear goals and objectives: the learner should understand what is to be accomplished and what is to be gained from use
Context meaningful to domain and learner: the activity should be situated in practice and engaging to the learnerContent clearly and multiply represented, and multiply navigable: the message should be unambiguous and support different learner preferences, and allow the learner to find relevant information while engaged in an activityActivities scaffolded: learner activities need to be supported to allow working within competence yet on meaningful chunks of knowledgeElicit learner understandings: learners need to articulate their conceptual understandings as the basis for feedbackFormative evaluation: learners need constructive feedback on their endeavoursPerformance should be 'criteria-referenced': the outcomes should be clear and measurable; competency-based evaluation should be a goalSupport for transference and acquiring 'self-learning' skills: the environment should support transference of the skills beyond the learning environment, and facilitate the learner becoming able to self-improve
Table 2.

Armed with this list and a prototype to be evaluated, the evaluation would proceed by having the evaluators make two passes through the system, noting any comments and explicitly connecting them to one of the checklist items. The evaluators would then be brought together to discuss their findings. A typical usage scenario should not be needed here, as the learning task serves as such a goal, but the variety of different evaluator types will be important.

Note that the list also has a second use. Besides serving as criteria for evaluation, it can serve as a guide for development. Reference to the criteria in the design stage will minimise the likelihood that problems in these areas will be identified (as is the case with heuristic usability evaluation).

This evaluation is not designed to ascertain usability problems, and consequently should be used in addition to, not as a replacement for, a usability evaluation. This has several implications. It is likely that a similar number of evaluators as used in a usability inspection would be needed to be effective for the educational evaluation. However, as the two areas are not completely orthogonal, it is also likely that problems will be related and may be discovered in either evaluation. This would suggest that the number of evaluators could be kept on the low side for each evaluation, thus 3 or 4, and the total number of evaluators would be between 6 and 8. This is an area for specific investigation.

As has been discussed for heuristic evaluation, this process undoubtedly serves most appropriately as formative evaluation, and as a complement to, rather than as a replacement for, user testing. For example, it does not provide an assessment of the learning outcome. For this reason, it is probably best used early in the design stage. It should also work with paper prototypes as well as programmed versions.

This approach formalises the informal approaches a good evaluator would use. The advantage of this approach is that the problems found can be tied back to explicit guidelines which in turn suggest remedies.

4. Conclusion

Heuristic evaluation provides a pragmatic method for assessing usability. This should be a component of educational development as well. Effective educational assessment is important, but may be costly. It would be desirable to have a model for an efficient method.

The pragmatic aspects of usability inspection may also serve as a model for efficient and effective educational evaluation. Minimising the dependency on expensive methods, and providing useful outcomes makes it easier to justify the processes and increase the likelihood of instructional improvement.

Armed with a list of educational design heuristics, a product to be evaluated, and suitable evaluators, this process should lead to qualitative feedback for improvement of educational product design.

References

[1] Alexander, S., & Hedberg, J.G. (1994). Evaluating technology-based learning: which model? In K. Beattie, C. McNaught, & S. Wills (Eds) Interactive Multimedia in University Education: Designing for Change in Teaching and Learning. Amsterdam: Elsevier.

[2] Barrows, H. S. (1986). A taxonomy of problem-based learning methods. Medical Education, 20, 6. 481-486.

[3] Bias, R.G. (1994). The pluralistic usability walkthrough: coordinated empathies. In J. Nielsen & R. L. Mack (Eds) Usability Inspection Methods. New York: John Wiley & Sons.

[4] Cognition and Technology Group at Vanderbilt, The (1990). Anchored instruction and its relationship to situated cognition. Educational Researcher 19(6), 2-10.

[5] Collins, A., Brown, J. S., & Newman, S. (1989). Cognitive apprenticeship: Teaching the craft of reading, writing, and mathematics. In L. B. Resnick (Ed.) Knowing, learning and instruction: Essays in honor of Robert Glaser. Hillsdale, NJ: Lawrence Erlbaum Associates.

[6] Desurvire, H. W. (1994). Faster, Cheaper!! Are Usability Inspection Methods as Effective as Empirical Testing? In J. Nielsen & R. L. Mack (Eds) Usability Inspection Methods. New York: John Wiley & Sons.

[7] Karat, C-M. (1994). A Comparison of User Interface Evaluation Methods.  In J. Nielsen & R. L. Mack (Eds) Usability Inspection Methods. New York: John Wiley & Sons.

[8] Laurillard, D. (1993). Rethinking University Teaching: A Framework for Effective Use of Educational Technology. London: Routledge.

[9] Mack, R. L., & Nielsen, J. (1994). Executive Summary. In J. Nielsen & R. L. Mack (Eds) Usability Inspection Methods. New York: John Wiley & Sons.

[10] Nielsen, J. (1994). Heuristic evaluation. In J. Nielsen & R. L. Mack (Eds) Usability Inspection Methods. New York: John Wiley & Sons.

[11] Nielsen, J., & Mack, R.L. (1994). Usability Inspection Methods. New York: John Wiley & Sons.

[12] Wharton, C., Rieman, J., Lewis, C., & Polson, P. (1994). The Cognitive Walkthrough method: a practitioners guide. In J. Nielsen & R. L. Mack (Eds) Usability Inspection Methods. New York: John Wiley & Sons.

Copyright

C.N. Quinn © 1996. The author assigns to ASCILITE and educational and non-profit institutions a non-exclusive licence to use this document for personal use and in courses of instruction provided that the article is used in full and this copyright statement is reproduced. The author also grants a non-exclusive licence to ASCILITE to publish this document in full on the World Wide Web and on CD-ROM and in printed form with the ASCILITE 96 conference papers, and for the documents to be published on mirrors on the World Wide Web. Any other usage is prohibited without the express permission of the author.