Sascha Brawer's Publications: Dynamic Presentation of Document Content ...

Dynamic Presentation of Document Content for Rapid On-Line Skimming

by Branimir Boguraev, Christopher Kennedy, Rachel Bellamy, Sascha Brawer, Yin Yin Wong and Jason Swartz

AAAI Spring Symposium on Intelligent Text Summarization, March 1998
Available as PDF document

Abstract

Present day summarisation technologies are imperfect. At a certain level of abstraction, they all work by performing data reduction over the original document source. In order for such summaries to be useful, it is necessary to be able to know how they relate to the documents. In order for such summaries to be usable, it is necessary for them to function as windows into the whole documents, with suitably designed interfaces for navigation into areas of particular interest. This paper discusses the notion of strong contextualisation of document highlights, how this translates into necessary features for document analysis, and how the document abstractions derived from such principles facilitate dynamic delivery of document content. We argue that dynamic document abstractions effectively mediate different levels of granularity of analysis, from terse document highlights to fully contextualised foci of particular interest. We describe a range of dynamic document viewers which embody novel presentation metaphors for document content delivery.

Introduction

Present day summarisation technologies fall short of delivering fully informative summaries of documents. Largely, this is due to shortcomings of the state-of-the-art in natural language processing; in general, the issue of how to customise a summarisation procedure for a specific information seeking task is still an open one. However, given the rapidly growing volume of document-based information on-line, the need for any kind of document abstraction mechanism is so great that summarisation technologies are beginning to be deployed in real world situations.

The majority of techniques for bbcharacteristic features of a known domain of interest; this information is used to generate an abstraction of the documentbsomehow as representative of the content of the document as a whole (or of some coherent segment of the document). A variety of approaches fall into this general category, ranging from fairly common sentence extraction techniques to newer methods utilising, for example, strong notions of topicality [4], [8], lexical chains [3], and discourse structure [14], [5] (see the papers from the recent ACL workshop on Intelligent, Scalable Text Summarization [2] for relevant overview). Ultimately, all of these approaches share a fundamental similarity: they construct a characterisation of document content through significant reduction of the original document source, rather than through some kind of generation procedure. This raises several important questions.

First and foremost, what is the optimal way of incorporating the set of extracted fragments that are identified as topically relevant by some method into a coherent representation of document content? However, unlike techniques which rely on domain modelbsor other bgbs question is the issue of granularity of data reduction: what sorts of expressions make the best information-bearing passages for the purpose of summarisation? Are sentences better than paragraphs? Are phrases even better?

The second question involves the level of user involvement. involvement. From the end-user's point of view, making judgements about a document on the basis of a summary involves a sequence of actions: look at the summary, absorb its semantic impact, infer what the document might be about, decide whether to consult the source, somehow call up the full document, and navigate to the point(s) of interest. How can a summary, then, alleviate the cognitive load placed on a user faced with a large, and growing, number of documents on a daily basis?

Finally, acknowledging that different information management tasks may require different kinds of summary, even from the same documenbta point made recently by Sparck Jones [16b]raises the question of how should the data discarded by the reduction process be retained, in case a reference is necessary to a part of the document not originally included in the summary?

This paper opens a discussion of these questions, and offers some initial answers for them. In particular,we argue that in order for summaries derived by extraction techniques from the source text to be useful, they must satisfy two constraints: they must incorporate a granularity of reduction that includes phrasal analysis, and they must be presented to users through dynamic interfaces. We demonstrate that such a summarisation technology facilitate a process of b of the soure and lead the user deeper into the content of the original document while retaining strong notions of contextualisation as an inherent property of the discourse. The organization of the paper is as follows.

In Section 2, we analyse certain usability aspects of summarisation technologies, and argue for a range of features of the analysis of a document which need to be retained as an integral part of any abstraction or summary for that document, including contextualisation of document highlights. We then sketch an interface environment for delivering such abstractions to end-users, in which a strong notion of context is maintained throughout the interaction between users and documents by dynamic delivery of document content. In Section 3, we outline a technology for phrasal-based content characterisation, and in Section 4, we discuss a range of experiments with dynamic visualisations of document content, introducing temporal typography as a particularly promising vehicle for dynamic document delivery. Section 5 describes a range of dynamic document viewers which implement novel modes of summary presentation and content visualisation.