Evaluation of how an adaptive tutor improves student learning

Pamela J. Woods and James R. Warren

Advanced Computing Research Centre

School of Computer and Information Science

University of South Australia, Levels, Pooraka 5095

South Australia.

Email: pam.woods, james.warren @unisa.edu.au

Keywords

intelligent tutoring systems, adaptive systems, Pascal programming

Abstract

Students learn conceptually difficult material best when they can persist and revisit the concept at a pace appropriate for their current understanding, until the concept is mastered. This will generally take longer than a traditional single lesson to master a foundation concept, but subsequent topics, dependent on this foundation concept will be learned by analogy more quickly than the first. We evaluated the use of the Loop Tutor, an interactive, adaptive tutor for learning loop construction in Pascal, with two groups of novice Pascal students. Those who used the Loop Tutor scored significantly higher in a subsequent test than those who did not. Examination of the logs of use of the tutor using alternate navigational strategies showed that many students persist and revisit concepts until they are mastered.

1. Introduction

Adaptive tutors provide one alternative teaching mechanism to emulate the one-to-one tutoring (Sokolnicki,1991) of large classes of students from diverse cultural, educational and life experiences. The Loop Tutor was developed to assist loop construction in Pascal to such a diverse group of students.

This paper describes the evaluation of the Loop Tutor by two successive classes of 92 and 60 introductory Pascal students. Students gave useful feedback on the user interface and logs and a survey were collected to describe how they used the system.

The first class established that the use of the Loop Tutor was beneficial by relating its use to test scores before and after use of the tutor.

The second class was allocated at random to one of two navigation methods (menu, adaptive). A paper-based pre-test was run immediately prior to use of the tutor to measure students' prior knowledge in addition to survey and log material collected as for the first class. These two navigational groups did not differ in respect to their scores in the tutor. The examination of their logs showed that many students in the menu navigation group developed a strategy similar to that used by the adaptive tutor and revisited the basic concept lessons and some others until they scored successfully.

2. The Loop Tutor

2.1 Design

The Loop Tutor is an intelligent tutoring system (ITS) application in the domain of Pascal loop construction. It is built to have a degree of domain-independence using the Rapid Prototyping Adaptive Intelligent Tutoring System, RAPITS (Woods and Warren, 1995). Domain material and tutoring layers are linked by a meta-strategy layer. This meta-strategy is a set of rules based on the student history that determines the appropriate teaching strategy, enabling flexible adaptation.

We are particularly interested in developing a practically useful ITS authoring environment that is not domain-specific and can teach intelligently, that is adapt its style of tutoring to the student and the topic. To support these goals, RAPITS provides a general template and adaptable navigational strategy in an electronic book format. RAPITS uses a general teaching model based on the general teaching ideology of the COCA-1 system (Major and Reichgelt, 1991). RAPITS' lesson template defines a rule, illustrates it with an example, provides students with an example to execute and gives immediate feedback on their action. Figure 1 illustrates a test screen that measures the student's knowledge of Pascal WHILE loop construction.

Figure 1: The Loop Tutor under RAPITS

After a series of lessons on the concepts of each topic, there are tests that evaluate the extent to which the student has understood the topic. In the case of the Loop Tutor, the test involves guided construction of a specified loop, and the ability to compile and execute the constructed program in the Turbo Pascal environment. So students develop the problem solution in the context of a complete pascal program that they can compile and execute. This approach was used by Boyle et al (1994) in developing CLEM a CORE learning environment for Modula 2. The meta-strategy selects the teaching strategy (next topic, detail level, teaching action) based on the results of the test as recorded in the student history. So the Loop Tutor is an adaptive system in the style of Benyon and Murray, 1993, which can compare student and domain models and automatically change its teaching style.

Pirolli and Anderson (1985) as cited in Escott and McCalla, 1988, reported students learning Pascal recursion by analogy. But Escott and McCalla demonstrated that novices can also make incorrect analogies due to their limited experience, so care must be taken in selecting analogous situations. We provide the option to teach subsequent topics based on analogy to the foundation topic.

2.2 The Teaching Paths

In the adaptive tutoring mode, the metastrategy directs the student to the next lesson according to their test results. There are three alternate paths for the next lesson, following the test on a topic. These are:

a repeat of the summary lesson of the same topic,
the full set of 5 lessons on the next topic,
the shortcut summary lesson on the next topic.

For example, if the student has a poor WHILE test result she is directed to repeat the topic with the WHILE summary lesson. If her WHILE test is average she progresses to the full set of lessons on repeat loops, and if she does well in the WHILE test she can move to the shortcut summary REPEAT loop lesson.

When lesson topics are selected from the menu, the full set of 5 lessons must be traversed, then the summary and test are done. Students can exit the system at any time from the menu. Figure 2 graphically depicts the teaching paths of the Loop Tutor.

Figure 2: Alternative lesson paths

2.3 Evaluation

The Loop Tutor was evaluated by two classes of novice Pascal programmers. The students were enrolled in the introductory programming subject of various engineering and surveying degrees at the University of South Australia. Students used the Loop Tutor from pools of networked IBM-compatible personal computers (with which they were familiar) as part of their regular practical classes. Students received a bonus mark for logging on to the system, but could exit at any time. Students were informed that they were participating in an experiment and that their performance, beyond the bonus point for logging in, would in no way influence their grade in the subject.

The methodology suggested in Cox and Walker was used to test the useability of the system. This included co-operative user observation of one class of 14 students, tests of student performance before, during and after use of the tutor, questionnaires about their impressions of the system and its interface design and automatic logging.

2.3.1 Experiment 1.

Design

This was intended mainly to evaluate the user interface and the stability of the system across multiple pools of computers. In addition, test results were collected prior to and after the use of the tutor by the 92 students to determine if learning was affected by use of the Loop Tutor. Detailed observations were made on and logs were collected from one group of 14 students; 34 students voluntarily returned their impressions of the system via the survey.

Results

As was reported in Tyerman, Woods and Warren (1996), there were more higher scores in the post-tutored test for those 45 students who used the tutor than for those 47 who did not. The groups were evenly matched on the basis of their pre-tutored test scores. Figure 3 illustrates the distribution of test scores for WHILE loops and table 1 gives mean scores for all three loop constructs.

Figure 3: Frequency distribution of test scores.

Table 1: Mean post-tutor test scores

Topic Used Tutor Not Used

WHILE 7.58 (45) 6.15 (47)

REPEAT 7.72 (39) 6.99 (53)

FOR 4.26 (38) 2.83 (54)

Table 2: Comparison of tutored and untutored test scores

Topic Difference, D probability

WHILE 0.29 0.01 < P < 0.05

REPEAT 0.16 P > 0.05

FOR 0.57 P < 0.001

WHILE and FOR loop construction were significantly improved by use of the tutor. However, REPEAT loops seemed to be more intuitively understood, and there was no significant difference between tutored and untutored groups. The Kolmogorov Smirnov one-tailed test was used to compare the cumulative frequency distributions of the post-tutor test scores, to test the hypothesis that use of the tutor improved the post-tutor test score. The maximum difference, D and the probability corresponding to such an extreme value are given in table 2.

From the survey, the features students reported that they liked included:

the simulated one-to one tuition,
the interactivity of example they could try,
the immediate feedback,
the concise explanations,
the ease of use of the system,
the ability to work at their own pace,
the ability to rework their solutions and correct their mistakes.

Analysis of the logs

From the 14 logs, two students exited the system immediately. (It was not compulsory to use the tutor, but they gained a bonus for logging on.) One student overrode the intelligent path and returned to the menu to select her next lesson. One student needed to only visit each topic once.

Ten students accepted the intelligent navigation and persisted with the WHILE lessons until they scored adequately to move to the next topic. Time prevented some students from completion of all topics, but most did the WHILE lessons completely.

Figure 4: General use pattern of WHILE lessons

Their general usage pattern was as illustrated in figure 4. When mastery of WHILE loops was acquired, they generally scored well enough to shortcut REPEAT loops, and needed little further repetition. A common path following mastery of WHILE lessons was:

WHILE tests,
REPEAT summary,
REPEAT tests,
FOR summary,
FOR tests.

Conclusion

We concluded that students persisted with the base topic (WHILE loop construction) until they mastered it. The large number of repetitions needed by some students suggests they were also experimenting with the interface and sometimes skipped the test they needed to proceed further.

This conclusion from the logs is consistent with student feedback that they liked to be able to work at their own pace, correcting their attempts until they mastered the topic.

2.3.2 Experiment 2.

Design

This was designed to compare two methods of navigating the loop lessons: menu driven and adaptive path. Students were allocated at random to either group: 33 in the menu navigation group, 27 in the adaptive navigation group. The students' knowledge of loop construction was measured prior to the Loop Tutor use in an additional (to experiment 1) short unsupervised test. The two groups had similar mean scores (15.5 menu, 14.9 adaptive). The same other measures were made as in the first experiment: pre-tutor and post-tutor test scores, survey and logs were collected. It was anticipated that the two measures made prior to the use of the tutor might be able to reduce the large variation in students scores and provide a more precise comparison of the two navigation methods. The post-tutor test is still to come and will be reported at the talk.

Results

Analysis of the difference between prior assessed loop knowledge and in-tutor test scores (total of WHILE and REPEAT tests) gave no significant difference between navigation methods (-1.92 menu, -0.59 adaptive). The pre-test scores were always better than those in the Loop Tutor, suggesting students may have colluded on the unsupervised preliminary paper-based test. Also, since the pre-test scores were significantly different prior to the experiment (menu 8.1, adaptive 5.6), a covariance adjustment of the in-tutor test scores for the pre-test scores reduced the variation by 20% but still gave no significant differences between the navigation methods.

From the survey, students reported they liked the:

ease of use;
clear instructions and layout;
ability to work at their own pace;
ability to correct solutions whilst solving problems;
ability to create a program and run it in Turbo Pascal; and
interactive examples better than a text.

Two students disliked the program execution in Pascal as this slowed them down. Two wanted more precise error feedback. Half the students found the help facilities (an "objective" button for each lesson and a "guide" template to the screen lesson layout) useful. 96% liked to be able to execute the program chunk they had constructed in the Turbo Pascal environment. This had not functioned in the first experiment.

Analysis of the Logs

Logs were collected from 12 in the menu group and 9 in the adaptive navigation group.

The menu group included 2 students who completed one sequence of the lessons and tests - WHILE, REPEAT and FOR - with very good or perfect results and no repetitions. One student did the WHILE lessons and test only, scoring perfectly. One repeated lessons prior to doing just one set of tests. The remaining 8 students repeated the WHILE test until it was mastered and 3 repeated REPEAT or FOR lessons and tests as well. One student appeared to get lost or just wanted to explore initially.

In the adaptive group, two students used the menu to override the adaptive path selected and reviewed full WHILE lessons before doing the test. Two revisited the WHILE lessons prior to the test using navigation buttons, then resumed the adaptive path. So 7 students completed the lessons as directed by the adaptive path.

Conclusion

Since most students in both groups persisted with WHILE loops until mastered, irrespective of the allocated navigation path, it is not surprising that final scores weren't different. What is apparent is that the menu group used the menu intelligently to revisit lessons, particularly the fundamental while loop lessons until they scored well in the tests. So they seem to use a similar strategy to that built in to the adaptive tutor .

The menu group spent a little longer (40.5 minutes) than the adaptive group (37.1 minutes), but time in the system was constrained to a 50 minute class.

3. Discussion

Both experiments support the notions that students like to learn a problem solving skill, such as pascal loop construction in an interactive environment that provides immediate feedback on their attempts. Boyle et al (1994) and Eliot and Woolf (1995) have also developed systems based on this.

Irrespective of the navigation method provided, students like to persist with a topic until it is mastered, and can then readily learn similar topics more quickly than the foundation concept. We suggest that although the methods of navigation differ, what is critical is the feedback students get on completing the test. If the system tells them they haven't mastered a topic they will return to that topic until it is understood. They have confidence that the tutor can reliably estimate their knowledge. Eliot and Woolf (1995) have also reported such tutor confidence, and the corollary that a student's faith might be undermined by a mistake in the tutoring system. We had a small bug in the scoring of the FOR test which led to feedback consistent with this loss of faith from a couple of students.

So although the system may adapt to the most appropriate lesson, or may just return the student to the menu, the student has such a strong belief via feedback from the test as to their competence in the topic, that they will follow an adaptive path, irrespective of the navigation applied.

We have not yet collected the post-tutor test data, but the scores collected in the tutor do not indicate any significant difference in proficiency in the two navigation groups. It now seems likely that there will not be a difference in the post-test scores, since the majority of the students were in fact using the same adaptive strategy, very strongly guided by the feedback on the Loop Tutor tests. So our conclusion is not that adaptive tutoring is no different from menu driven tutoring. We need to look deeper at the pedagogical design, and even be aware of other factors affecting the systems use. One of these, reported in Tyerman, Woods and Warren (1996) is that students generally navigate a menu of lessons from top to bottom, assuming an underlying sequence of information. We had only one student who attempted a "lower" topic on the menu before the topics above it.

Other explanation for why the students spent a lot of time on the WHILE topic and far less on the others could be just time constraints of a 50 minute lesson. We hope to investigate this more fully from automatic logs as students use the system for exam revision. Another explanation is that proposed by Soloway et al, that students are satisfied to just learn one looping construct, and will prefer that and not develop mastery in other constructs. Perhaps the design of our menu hierarchy could be improved by swapping REPEAT and WHILE constructs, since tutored and untutored students scored similarly on REPEAT loops, and this may be an intuitively simpler base lesson. However, it seems that certainly the pattern of Loop Tutor usage supports our fundamental goal of delivering successful one-to-one tuition.

Our future work involves using RAPITS design to develop adaptive tutoring systems in several other domains, including file organisation in Information Systems, computability theory and possibly a simple parallel to the Loop Tutor for loop construction in C.

We are particularly interested in student use of these systems where the topic hierarchy and complexity may be quite varied from that in the Loop Tutor. Will students be prepared to be guided to the next appropriated lesson by an adaptive tutor, or is the sequential lesson organisation from a paper based system too strongly embedded for them to readily change. We hope adaptive systems offer a quicker and more interesting path to effective learning for students whose background and learning rates may be diverse.

4. References

Benyon, D. and Murray, D. (1993). Adaptive Systems: from intelligent tutoring to autonomous agents. Knowledge Based Systems 6, 197-219.

Boyle, T., Gray, J., Wendl, B. and Davies, M. (1994). Taking the plunge with CLEM: The design and evaluation of a large scale CAL system. Computers Educ 22, 19-26.

Cox, K. and Walker, D. User Interface Design. Prentice Hall, 1993, 101-113.

Eliot, C. and Woolf, B. (1995). An Adaptive Student Centred Curriculum for an Intelligent Training System. User Modelling and User Adapted Interaction 00: 1-20.

Escott, J.A. and McCalla, G.I. (1988) In Proceedings of ITS-88, Montreal, 312-319.

Major, N. and Reichgelt, H.(1991). Using COCA to build an intelligent tutoring system in simple algebra. Intelligent Tutoring Media 2, No3/4 ,159-169.

Murray, T. and B. Woolf (1992). Results of Encoding Knowledge with Tutor Construction Tools. Proceedings of the 10th National Conference on A.I., MIT press, 17-23.

Pirolli, P.L and Anderson , J.R. (1985) The role of learning from examples in acquisition of recursive programming skills. Canadian Journal Psychology, 39, 240-272.

Soloway, E., J. Bonar, and Ehrlich, K. (1983). Cognitive Strategies and Looping Constructs: An Empirical Study. Communications of the ACM 26(11): 853-860

Sokolnicki, T. (1991) Towards knowledge based tutors: a survey and appraisal of Intelligent Tutoring Systems. The Knowledge Engineering Review 6, 59-95.

Tyerman, S., Woods, P. and Warren, J.(1996). LoopTutor and HyperTutor: Experiences with Adaptive Tutoring Systems, accepted for ANZIIS'96, Adelaide, November 1996.

Woods, P. and Warren, J. (1995). Rapid Prototyping of an Intelligent tutorial system. Proc. ASCILITE'95, Melbourne 1995, 557-563.

5. Acknowledgment

We would like to thank all the students of Programming Pascal at the Levels Campus, University of Adelaide in 1996 for their enthusiasm in testing the prototype Loop Tutor, despite the usual teething problems with the implementation of such a system over a personal computer network of variable age and reliability. We are also grateful to our colleague Sally Rice for permitting her Pascal students to participate in the evaluation.

Topic	Used Tutor	Not Used
WHILE	7.58 (45)	6.15 (47)
REPEAT	7.72 (39)	6.99 (53)
FOR	4.26 (38)	2.83 (54)

Topic	Difference, D	probability
WHILE	0.29	0.01 < P < 0.05
REPEAT	0.16	P > 0.05
FOR	0.57	P < 0.001