Conference logo
[ ASCILITE ] [ 2004 Proceedings Contents ]

Animated text: More than meets the eye?

George Borzyskowski
Department of Design
Curtin University of Technology
The universal integration of computers into the environments of work and leisure has been a major factor in the increasing reliance upon imagery to communicate information of all kinds, challenging the supremacy of text as the dominant communication modality in some areas. As a consequence there is a greater emphasis on the importance of visual literacy, albeit specialised, alongside conventional literacy and numeracy as a requisite for both learning and effective functioning in the workplace. It is evident, that the sort of skills acquired from an early age for the processing of textual and numerical information, are not matched by equivalent skills for the processing of visual information, especially when encountered in unfamiliar formats. This can impede learning. This paper proposes the strategic use of animated text labels as didactic agents for directing the spatial and temporal attention of learners when introduced to unfamiliar visuals, and the capacity of such animated text labels to perform a preliminary explanatory function in addition to identification. The origin of text animation within screen based media is addressed as well as aspects of recent research into cognition and attention that can inform the use of such animations in accomplishing instructional goals.


Until the final quarter of the 20th century, printed text was the dominant means of recording, disseminating, discussing, explaining and presenting information. With the advent of computers however, new tools and methods for powerfully manipulating information in multiple ways have been developed. Because many of these manipulative opportunities concern graphic information, there has been a challenge to the traditional dominance of text over images in the delivery of information. This shift from text to graphics has been technologically driven as a result of the increasing capacities of computers to generate and display graphic representations. It is a trend that appears to be irreversible and as shown by the increasingly pictorial nature of conventional print materials, is not limited to the screen alone. The implications of this change are particularly profound for education where the text component of instructional resources has traditionally been accorded a much higher status than images. Compared with those printed 20 years ago, today's textbooks in many domains reflect an increasing reliance upon imagery of various kinds. Educational multimedia and resources on the Internet show an even stronger preference for graphic representations.

Two aspects of information technology have been major contributors to this scenario. One is the ability to assemble and manage databases and automate their visual representation, making it possible at the touch of a button, to illustrate statistical information, for example, using devices such as fever graphs, pie and bar charts, first seen in the 1786 publication by William Playfair, the Commercial & Political Atlas (Holmes 1984, pp. 15-17). The second was the development of visualisation tools, allowing rapid assembly and editing of any kind of imagery, for example, the production of diagrams similar to Charles Joseph Minard's Napoleon's March produced in 1869 (Tufte, 1983, p.41), that combines timeline, events and quantities with cartography in one visual, without having to possess the high level, manual graphic skills once necessary. New paradigms of charting, diagramming and display have also emerged as a result of discipline after discipline adapting to the efficiencies and advantages of computing. Such developments have challenged, and in some areas overtaken the dominance of text over images as the primary information delivery modality.

Visual literacy and learning

While established methods and priorities are in place to provide effective training in literacy and numeracy from an early age, there is no equivalent provision with respect to 'visual literacy' (Lowe, 2000), the ability to read information from images. Nor in a general sense, might this be possible, because of the sheer generic diversity of imagery in use, and the associated, domain specific, visual syntax that that apply. What is apparent from this rapid increase in reliance on imagery for communication, is that an inability of learners to read from and understand graphics presented in instructional scenarios, places them at great disadvantage, comparable to illiteracy.

It is easy to assume that because we live in a visual world and have evolved, in no small part, as a consequence of the ability to see, that the reading of pictures and diagrams should be second nature. However, what can easily be overlooked, is that unlike text, with respect to which, how to direct attention is learnt from an early age according to rules derived from the sequential nature of language, such rules, or even general principle don't exists for directing attention either spatially or temporally, with respect to imagery, as Lowe and others have discussed. Without obvious cues for mapping a visual representation to what it represents, and unaware of the thematic relevance of the visual elements present, novices and learners are forced to concentrate processing effort on what is perceptually significant.

Visual information can be intentionally presented in a ways that exploit perceptual significance as an indicator of hierarchy, however, such a strategy has limited application. The apparent perceptual hierarchy in images containing highly differentiated graphics, is most likely to be coincidental or irrelevant to the thematic structure and intended reading sequence. This results in the same failure to extract meaningful information that occurs with unfamiliar images where the components are perceptually undifferentiated, their arrangement providing no distinct perceptual cues. In both circumstances, specific domain knowledge, or a key is required to initiate the appropriate perceptual strategies for extracting useful information. Egyptian hieroglyphics, for example, in spite of their aesthetic and emotive appeal, as well as the incorporation of recognisable pictorial elements, were undecipherable until the discovery in Egypt in 1799 of the Rosetta Stone (British Museum, n.d.), which revealed a linguistic key that provided external guidance, where perceptual cues alone had failed. It follows, that novices can't be expected to learn from complex depictions, even when they contain recognisable components, without help.

In the historical examples cited earlier, Playfair and Minard anticipated that their, then novel and unfamiliar visual representations required some hints as to their interpretation, and employed text labels and captions to reveal the meaning of the visuals, thus allowing for the reading from their diagrams, of relativity, value, relationship, concurrence and difference, with an immediacy and clarity, unmatched by text or numerical tables alone.

There are however many circumstances, and categories of visuals, diagrams and displays where static labels, captions and even related body texts are insufficient to easily explain a depiction sufficiently to facilitate comprehension by a novice or learner. For example: where there is high information density; where dynamic events are depicted statically; where relationships between elements need to be know before function can be understood; where sequence or temporal relationships important in understanding significance; where attention needs to be guided; or where the function of elements requires explanatory hints. It is in such circumstances, that it is suggested that the animation of text labels, can fulfil a didactic function beyond nomenclature, through attention capture, attention guidance, indication of relationships and analogical movement.

Animated text

The compelling quality of animated text is generally assumed, and endlessly exploited in communication media capable of depicting motion. It was first used on screen, in the earliest, silent animated cinema cartoons, in order to visually indicate exclamations and sound effects, a device along with speech and thought balloons, borrowed from the newspaper cartoon strips of the day, upon which many such animations were based. Nowadays, the capability to animate text is widely available, implemented in many computer applications, including presentation software such as PowerPoint, that allow the user to animate text in various ways such as, for example, slide in, drop down, spiral in or appear letter by letter onto the screen, though with limited control. The apparent justification for such manipulations is that they draw attention to the text and maintain audience interest. Attracting and maintaining attention is a key precursor to learning but equally, repeated and unprincipled use of such techniques can result in distraction from the content presented or supported by the text, through the implication of spurious meanings such as urgency, priority or even sophistication. An example of inexpensive, accessible software that might be applied to the typographic task indicated in the previous paragraph is Swish from, a Macromedia Flash compatible text animation tool, offering comprehensive variety and level of spatial temporal and morphological control over individual letters and words. The availability of such choices prompts the question: what is known that can provide a basis for informing the strategic deployment of animated text to assist in such learning scenarios effectively?

A number of recent studies in cognition and attention should be considered. Text or writing is an external cognitive tool developed by humans in order to compensate for the limitations of memory and the ephemeral nature of oral communication. In doing so, text provides a means of facilitating the product of cognitive processes "to exist in the world, so that [they] can be perceived instead of computed" (Chandrasekharan 2004) thereby relieving the load on memory. Chandrasekharan has called such devices, 'epistemic structures' (2004). Since epistemic structures act as stimuli, an important aspect of their functioning is revealed by empirical studies which have shown that all stimuli, are subject to pre conscious, automatic affective processing, resulting in attitude priming that may impinge upon subsequent cognitive processing (Azar 1998; De Houwer & Hermans 2001; Musch 1996). This prompts the proposition, that how the text appears in the field of vision, what it does and how it looks as it appears, may influence how it is read or more precisely, what information is gained upon its observation and reading. It is known that text, in common with other symbolic systems, has the capacity to communicate outside of its denotative function.

This semantic dimension of text has been discussed by Drucker (1994, p.27), describing it as 'materiality' with respect to typographic experimentation in the early 20th century; and is relevant to this discussion because it is here suggested that the morphology as well as particular choreographies, or materiality of animated text could be arranged to serve an explanatory or enhancing function to what it denotes. Experimental work in this general area has included the 'Temporal Typography' project, investigating "the expressive power of time varying typographic form to convey emotion and tones of voice" (Wong 1996); the design of a 'Prosodic Font' to facilitate the representation of "the melody and rhythm people use in natural speech" (Rosenberger & MacNeil 1999) and the development of s "The kinetic typography engine" software, with the purpose of adding time based emotive content to the display of text (Lee, Forlizzi & Hudson 2002).

Attention and motion

While the automatic or instinctive orienting response to the onset of novel stimuli, first described by Pavlov, is well known (Kubey & Csikszentmihalyi 2004, p.51), some recent research has focused upon active and passive attention processes that occur following orienting stimuli. This research has been reviewed by Ruz and Lupianez (2002) particularly with respect to the differing effect of new stimuli on the capture of previously focused and unfocused attention. The effect of attention as an enabling mechanism for visual processing, has also been discussed in detail by Wolfe (2000), who noted motion, as one of the most effective pre-attentive factors. It has been observed however that not all motion or dynamic events capture attention equally (Franconeri & Simons 2003, p.1007). In formulating their behavioural urgency hypothesis, their experiments using letterforms, indicated that in addition to abrupt onsets, looming objects capture attention, whereas receding objects do not (pp. 1006-1007). It is worth noting however, that in yet earlier research, Hillstrom & Yantis (1994, p.399 & 410) had concluded that motion in itself was not responsible for capturing attention, rather, an object in motion constituted a new stimulus and it was that onset that captured attention. Issues to be addressed with respect to the present hypothesis include the cognitive significance of trajectory, velocity and intensity of animation applied to text labels also, if the communication of instructionally useful semantic information is possible through morphological manipulation?


Because of the growth in the diversity of imagery and increased reliance upon visuals to communicate general and domain specific information, as an adjunct to, or in place of text, there is an imperative to investigate ways of assisting novices and learners to extract meaning and information from unfamiliar displays, particularly in instructional settings. This paper suggests that the approach of enhancing the function of text labels, commonly used in diagrams, through animation may be useful. A preliminary review of the literature of empirical work in cognition and attention suggests that there is a basis for and some merit in the investigation of the potential for such didactic agency of animated text labels, beyond their conventional use. Whereas the implementation of such devices would have been labour intensive and only possible within specialised authoring environments previously, the most commonly used presentation software lacking the required flexibility and control for what is proposed; the adaptation of recently available highly automated and customisable typographic animation tools developed for Internet applications makes such investigation feasible and practically relevant.


Azar, B. (1998) Split-second evaluations shape our moods, actions. APA Monitor, 29(9). [verified 10 Oct 2004]

British Museum Trustees (n.d.). The Rosetta Stone. [viewed 20 July 2004, verified 10 Oct 2004]

Chandrasekharan, S. (2004). Epistemic structure: How agents change the world for cognitive congeniality. Carleton University Cognitive Science Technical Reports. [viewed 2 Apr 2004, verified 10 Oct 2004]

De Houwer, J. & Hermans, D. (2001). Editorial: Automatic affective processing. Cognition and Emotion, 15(2), 113-114.

Drucker, J. (1994). The visible word: Experimental typography and modern art, 1909-1923. Chicago: University of Chicago Press.

Franconeri, L. S. & Simons, D. J. (2003). Moving and looming stimuli capture attention. Perception and Psychophysics, 65(7), 999-1010. [verified 10 Oct 2004]

Hillstrom, A. P. & Yantis, S. (1994). Visual motion and attentional capture. Perception and Psychophysics, 55(4), 399-411. [verified 10 Oct 2004]

Holmes, N. (1984). Designer's guide to creating charts and diagrams (p.13). New York: Watson-Guptil.

Kubey, R. & Csikszentmihalyi, M. (2004). Television addiction is no mere metaphor. Scientific American Special, 14(1), 48-55.

Lee, J. C., Forlizzi, J. & Hudson, S. (2002). The kinetic typography engine: An extensible system for animating expressive text. In Proceedings of the 15th annual ACM symposium on User interface software and technology (pp. 81-90).

Lowe, R. K. (2000). Visual literacy and learning in science. ERIC Digests, ED463945. [verified 10 Oct 2004]

Musch, J. (1996). Affective priming. Dissertation thesis. Bonn University, Department of Psychology. [viewed 22 Apr 2004, verified 10 Oct 2004]

Rosenberger, T. & MacNeil, R. L. (1999). Prosodic Font: Translating speech into graphics. Conference on Human Factors in Computing Systems (CHI '99). pp252-253. Pittsburgh, Pennsylvania. [viewed 15 Mar 2004, not found 10 Oct 2004]

Ruz, M. & Lupianez, J. (2003). High density ERP indices of conscious and unconscious semantic priming. Cognitive Brain Research, 17, 719-731. [viewed 24 Jun 2004, verified 10 Oct 2004]

Tufte, E. R. (1983). The visual display of quantitative information. Connecticut, USA: Graphics Press.

Wong Y. Y.. (1996). Temporal typography: A proposal to enrich written expression. In CHI 96 [Conference on Human Factors in Computing Systems] electronic proceedings. [viewed 15 Feb 2004, verified 10 Oct 2004]

Please cite as: Borzyskowski, G. (2004). Animated text: More than meets the eye? In R. Atkinson, C. McBeath, D. Jonas-Dwyer & R. Phillips (Eds), Beyond the comfort zone: Proceedings of the 21st ASCILITE Conference (pp. 141-144). Perth, 5-8 December.

© 2004 George Borzyskowski
The author assigns to ASCILITE and educational non-profit institutions a non-exclusive licence to use this document for personal use and in courses of instruction provided that the article is used in full and this copyright statement is reproduced. The author also grants a non-exclusive licence to ASCILITE to publish this document on the ASCILITE web site (including any mirror or archival sites that may be developed) and in printed form within the ASCILITE 2004 Conference Proceedings. Any other usage is prohibited without the express permission of the author.

[ ASCILITE ] [ 2004 Proceedings Contents ]
This URL:
HTML created 20 Nov 2004. Last revision: 20 Nov 2004.