Evaluation is an important component for any development
project, requiring both formative and summative feedback. What
is desired is a systematic approach that yields reliable results
for the minimum investment. In the Human-Computer Interaction
field, the usability inspection method
has been validated as providing just such a cost-effective evaluation
method. This framework may be adaptable for educational evaluation.
Here I present the usability approach, which has direct application
to educational software development, and then derive the educational
equivalent to suggest a pragmatic evaluation approach for educational
The current enthusiasm for multimedia applications to education has led to financial resources being made available for development. What typically has not been made available is any support for accomplishing evaluation of the developed product . Even if there is money for evaluation, there is pressure to minimise cost. What is needed is a pragmatic approach to realising the maximum evaluative information for the minimum investment.
An approach that has been developed in the Human Computer Interaction area over the past few years is known as usability inspection. This method has been validated as providing an effective balance between cost and results. Empirical studies have been used to compare different usability inspection methods, and this approach has been shown to produce useful results for low cost .
Usability inspection is particularly directed towards evaluating the ease of use of software. This is an area with which educational software must contend, so a presentation of this approach is useful in and of itself. However, there is a further use to which this approach may lend itself.
Evaluation of educational effectiveness is the primary area with which developers of learning software must be concerned. Software for learning might be usable but not educational, or vice versa. The final goal must be for systems to be both usable and educational. It would be desirable for educational evaluation to share the properties of speed, cost-effectiveness, and ease that the usability methods offer.
An analogy will be drawn from the usability methods to suggest a method for low-cost educational evaluation. This is a first consideration of the idea, and the result here will point to areas for further inquiry.
2. Usability Inspection Methods
Usability inspection methods evaluate user interfaces by having individuals examine a particular implementation with specific procedures. There are many methods to evaluate interfaces, including automated testing, formal modelling, empirical testing, and informal evaluation. Usability inspection methods fall in the latter category. The definitive guide to this approach is Nielsen & Mack's Usability Inspection Methods , from which much of this section is drawn.
During the development cycle, many evaluations are needed, and different methods are useful at different times during the development cycle . After an information gathering stage, there is a design phase where iterations of the interface are prototyped and evaluated to converge on a final specification. Finally, the product is implemented and trialed. Usability inspection methods are best used as formative evaluation during the design phase, before a final specification is created.
There are a variety of dimensions along which usability inspection methods can vary. These include whether individuals work alone or together during the evaluation; whether experts, or users, or both, are the evaluators; and whether specific use-scenarios are used or whether the evaluators are allowed free reign to explore the implementation.
The specific types of usability inspection methods of most interest include heuristic evaluation , which uses a usability checklist; pluralistic walkthroughs  that use a mix of evaluators; and cognitive walkthroughs , where specific attention is paid to the thought processes of hypothetical users. Each has strengths and weaknesses. What will be presented here is the heuristic approach. Heuristic evaluation has proved to be the easiest to use successfully, as it is the easiest to learn [7, 9].
The essential idea starts with a prototype that evaluators can sit down with. Each evaluator is provided with a checklist of usability heuristics. At least two passes are made through the interface: the first to get a feel of the overall flow; the second to focus on the dialog elements of the specific interactions. The evaluator plays the role of a user. Evaluation sessions typically last for an hour or two. For more complex software the sessions should be broken down into sub-sessions of no more than two hours.
Each evaluator does the evaluation individually, and then the evaluators are brought together to compare notes. The evaluation team can come from three different pools of people: the developers, usability experts, and sample users. The usability experts and sample users are easy to justify; the reason to include the developers is to help sensitise them to usability issues.
An observer should be present during the evaluation, to answer evaluator questions and to prompt the evaluator to record all problems found. The observer will need to be familiar with the design, and typically is a member of the design team. However, the observer may not assist the evaluator through a difficult point and not have the evaluator record the difficulty.
For each problem found, the evaluator needs to identify what the problem is, with respect to either the usability checklist or any other specific statement of the problem. A specific list of ten items has been shown  to be effective (see Table 1) as a heuristic.
It is possible to do heuristic evaluation with paper prototypes, allowing early use of the technique. If the evaluators are not domain experts, a usage scenario (a fully worked out, and ideally comprehensive, sample task with the interface) can be used as a guide for the evaluator.
Nielsen has found that 3-5 evaluators are the minimum to be used to find a significant percentage of usability problems. He has done considerable analysis to demonstrate that this is a minimum number to achieve useful results, and that further results accrue with more evaluators, although such benefits follow the law of diminishing returns .
Note also that it is recommended that usability inspection methods alternate with empirical user testing. Results show that they find different usability problems, in addition to the benefits of the lower cost of a mixed approach as compared to using user testing exclusively.
3. An Educational Analog
Analogy can be used to suggest modification of usability inspection for educational use. The usability inspection method systematises the informal approach an expert would take in evaluating an interface, and quantifies the outcomes. The original process of a team of evaluators using a list of heuristics to work through a prototype will remain. However, several aspects will differ. The purpose here is to suggest how that process would appear, to provide a likely candidate for a mechanism to improve educational technology, and to start interest and debate to evaluate the method.
The mapping from usability to educational effectiveness requires several modifications to the standard usability process. In this case, the evaluators should include not only learners, but educational design experts and content experts. In addition, the list of design heuristics is now a compilation of elements of good educational design, drawn from typical educational problems found in existing learning software.
One approach would be empirical, to base the heuristic list on common problems found in educational software. The list Nielsen settled on , was developed through a factor analysis of 249 usability problems. However, lacking such a comprehensive list of educational problems, this approach cannot be used here. There is clear room for such an investigation.
A second approach would be for the list of elements of good pedagogic design to come from theoretical perspectives on learning and instruction. There are a number of candidate theories; the problem is to choose between them. A compromise would be to propose that there is a convergent model that covers the coarse elements that all agree on, and let this form the basis for an evaluative checklist.
Candidate theories include Cognitive Apprenticeship [5 ], Anchored Instruction, [4 ], Problem-Based Learning [2 ], and Laurillard's model of technology-mediated learning [8 ]. While the methods may disagree on the order in which the steps should be taken, these approaches implicitly or explicitly share an overall structure that includes two major components: having the learner engage in providing a sequence of designed activities that match the learner's abilities; and guided reflection, where the learners state their understanding and receive feedback. The activity should reflect and support individual differences, and the goals and outcome should be clear.
Specifically, the learner should initially be provided with clear goals of the learning activity. This could be through an advance organiser. The activity should be relevant to the domain, and meaningful and motivating to the learner. That is, choosing a problem for solving should be one where that skill is required in the real world, but is also important and understandable to the learner. Information resources should provide multiple paths of access and navigation to support different learner approaches. For example, learners could choose to look at the problem statement or read the underlying conceptual treatment first. Similarly, the conceptual treatment might represent a process both visually and with a text description. Further, activities must allow learners to take on significant responsibilities in an activity, but with support to ensure that the task is within their capability. This means that large problems could be supported through parts having already been accomplished, or by working on simple examples, or accomplished by teams.
Learners also need to be able to examine their actions and express their understandings, and should get support on this basis. Feedback needs to be personalised to be specific to the learner's actions and expressions, and constructive. Interactive systems can keep records of use that can be discussed later, or communication can be built into the environment. Further, the activity should have clearly stated goals that can objectively be evaluated. The desired outcome for the students should be clearly specified in terms of the context under which performance would be expected and how that performance could be measured. Finally, the learning process should provide support for recognising the utility of the learning for other contexts and support transfer to problems outside the learning environment. This could be through explicit pointers about different contexts in which the skill is applied, or practice across scenarios.
Based upon this synthesis of various learning theories, a preliminary draft can be proposed (see Table 2). The list has room for refinement.
Armed with this list and a prototype to be evaluated, the evaluation would proceed by having the evaluators make two passes through the system, noting any comments and explicitly connecting them to one of the checklist items. The evaluators would then be brought together to discuss their findings. A typical usage scenario should not be needed here, as the learning task serves as such a goal, but the variety of different evaluator types will be important.
Note that the list also has a second use. Besides serving as criteria for evaluation, it can serve as a guide for development. Reference to the criteria in the design stage will minimise the likelihood that problems in these areas will be identified (as is the case with heuristic usability evaluation).
This evaluation is not designed to ascertain usability problems, and consequently should be used in addition to, not as a replacement for, a usability evaluation. This has several implications. It is likely that a similar number of evaluators as used in a usability inspection would be needed to be effective for the educational evaluation. However, as the two areas are not completely orthogonal, it is also likely that problems will be related and may be discovered in either evaluation. This would suggest that the number of evaluators could be kept on the low side for each evaluation, thus 3 or 4, and the total number of evaluators would be between 6 and 8. This is an area for specific investigation.
As has been discussed for heuristic evaluation, this process undoubtedly serves most appropriately as formative evaluation, and as a complement to, rather than as a replacement for, user testing. For example, it does not provide an assessment of the learning outcome. For this reason, it is probably best used early in the design stage. It should also work with paper prototypes as well as programmed versions.
This approach formalises the informal approaches a good evaluator would use. The advantage of this approach is that the problems found can be tied back to explicit guidelines which in turn suggest remedies.
Heuristic evaluation provides a pragmatic method for assessing usability. This should be a component of educational development as well. Effective educational assessment is important, but may be costly. It would be desirable to have a model for an efficient method.
The pragmatic aspects of usability inspection may also serve as a model for efficient and effective educational evaluation. Minimising the dependency on expensive methods, and providing useful outcomes makes it easier to justify the processes and increase the likelihood of instructional improvement.
Armed with a list of educational design heuristics, a product to be evaluated, and suitable evaluators, this process should lead to qualitative feedback for improvement of educational product design.
 Alexander, S., & Hedberg, J.G. (1994). Evaluating technology-based learning: which model? In K. Beattie, C. McNaught, & S. Wills (Eds) Interactive Multimedia in University Education: Designing for Change in Teaching and Learning. Amsterdam: Elsevier.
 Barrows, H. S. (1986). A taxonomy of problem-based learning methods. Medical Education, 20, 6. 481-486.
 Bias, R.G. (1994). The pluralistic usability walkthrough: coordinated empathies. In J. Nielsen & R. L. Mack (Eds) Usability Inspection Methods. New York: John Wiley & Sons.
 Cognition and Technology Group at Vanderbilt, The (1990). Anchored instruction and its relationship to situated cognition. Educational Researcher 19(6), 2-10.
 Collins, A., Brown, J. S., & Newman, S. (1989). Cognitive apprenticeship: Teaching the craft of reading, writing, and mathematics. In L. B. Resnick (Ed.) Knowing, learning and instruction: Essays in honor of Robert Glaser. Hillsdale, NJ: Lawrence Erlbaum Associates.
 Desurvire, H. W. (1994). Faster, Cheaper!! Are Usability Inspection Methods as Effective as Empirical Testing? In J. Nielsen & R. L. Mack (Eds) Usability Inspection Methods. New York: John Wiley & Sons.
 Karat, C-M. (1994). A Comparison of User Interface Evaluation Methods. In J. Nielsen & R. L. Mack (Eds) Usability Inspection Methods. New York: John Wiley & Sons.
 Laurillard, D. (1993). Rethinking University Teaching: A Framework for Effective Use of Educational Technology. London: Routledge.
 Mack, R. L., & Nielsen, J. (1994). Executive Summary. In J. Nielsen & R. L. Mack (Eds) Usability Inspection Methods. New York: John Wiley & Sons.
 Nielsen, J. (1994). Heuristic evaluation. In J. Nielsen & R. L. Mack (Eds) Usability Inspection Methods. New York: John Wiley & Sons.
 Nielsen, J., & Mack, R.L. (1994). Usability Inspection Methods. New York: John Wiley & Sons.
 Wharton, C., Rieman, J., Lewis, C., & Polson, P. (1994). The Cognitive Walkthrough method: a practitioners guide. In J. Nielsen & R. L. Mack (Eds) Usability Inspection Methods. New York: John Wiley & Sons.
C.N. Quinn © 1996. The author assigns to ASCILITE and educational and non-profit institutions a non-exclusive licence to use this document for personal use and in courses of instruction provided that the article is used in full and this copyright statement is reproduced. The author also grants a non-exclusive licence to ASCILITE to publish this document in full on the World Wide Web and on CD-ROM and in printed form with the ASCILITE 96 conference papers, and for the documents to be published on mirrors on the World Wide Web. Any other usage is prohibited without the express permission of the author.