ISO/IEC JTC 1/SC34 N0xxx

ISO/IEC JTC1/SC34/WG2 N210rev

ISO/IEC JTC 1/SC34

Information Technology --
Document Description and Processing Languages

TITLE: 1st WD, Minimum requirements for specifying document rendering systems
SOURCE: Project Editor
PROJECT: TBD
PROJECT EDITOR: Keisuke Kamimura
STATUS: 1st Working Draft
ACTION: For information and comments
DATE: 2005-05-25
DISTRIBUTION: SC34, SC34/WG2 and Liaisons
REFER TO:
REPLY TO: Dr. James David Mason
(ISO/IEC JTC 1/SC 34 Chairman)
Y-12 National Security Complex
Bldg. 9113, M.S. 8208
Oak Ridge, TN 37831-8208 U.S.A.
Telephone: +1 865 574-6973
Facsimile: +1 865 574-1896
Network: masonjd@y12.doe.gov
http://www.y12.doe.gov/sgml/sc34/
ftp://ftp.y12.doe.gov/pub/sgml/sc34/

Mr. G. Ken Holman
(ISO/IEC JTC 1/SC 34 Secretariat - Standards Council of Canada)
Crane Softwrights Ltd.
Box 266,
Kars, ON K0A-2E0 CANADA
Telephone: +1 613 489-0999
Facsimile: +1 613 489-0995
Network: jtc1sc34@scc.ca


Minimum requirements for specifying document rendering systems

Abstract

When a structured document is interchanged between an originator and a recipient, the recipient refers to the style specifications that the originator provides to reconstruct the presentation. However, when the recipient does not have sufficient rendering functionality, it may fail to reconstruct the presentation output as the originator expected. In order to preserve presentation output in the course of interchange, the originator and recipient need to negotiate over functionalities referring to the specifications of document rendering systems. To satisfy this requirement, this standard provides tha minimum requirements for specifying document rendering systems and document formats.

0. Introduction

0.1 Rationale

As more social activities are being processed by information system than ever before, the importance of information interchange via electronic document increases. Accordingly, electronic text with structure markup and style specification is getting more prevalent, because it is suitable for processing the information that travels between the machine and human readers.

The structured document, in many cases, is not human friendly, and it needs to be rendered to a human readable format. When an electronic document needs to be read by the human reader, the document must be transformed to a human readable format based on style and layout specifications. Style specification languages, such as ISO/IEC 10179:1996, Document Style Semantics and Specification Language (DSSSL), and World Wide Web Consortium's Extensible Stylesheet Language (XSL) are now in place to describe the style and layout of electronic document in a standardised manner. In theory, the final presentation format that the users at both ends of communication (e.g. the originator and recipient of the document) observe will be reconstructed by style specifications being applied to structure markup.

However, two issues of difficulty are to be identified in reconstructing the exact representation that the user expects. One is that it is extremely difficult to prepare an appropriate set of style specifications to convert a given structured document to a human readable format as designed. There have been various attempts to solve this issue. One such attempt is to create a predefined library of style specifications, which the user can pick up and apply to their own document so that he or she can do away with defining style description from scratch. ISO/IEC TR 19758 [1] is one of these examples.

The other issue is that the rendering output is not necessarily guaranteed across document rendering systems. Even if style descriptions are given in adequate preciseness, document rendering systems may have different implementation of the style descriptions, which will result in a failure to reconstruct the presentation as originally designed. There have been made various efforts have been made to set out a common set of document structure or style features, these efforts do not necessarily lead to preserving the rendering output.

Issues such as rendering output have been considered as 'implementation-dependent' and left out from the scope of document processing and description languages. However, as electronic document is becoming to be applied to a wider range of application domains, new user requirements may arise as mentioned earlier in this article, and issues that have not been in the scope need to be considered from a perspective of standard setting.

0.2 User Requirements

The primary user requirement that this standard tries to address is the preservation of rendering output across different document rendering systems. Currently, rendering output may not necessarily be preserved because of implementation-specific differences. In a specific application domain, such as the world wide web, where document rendering systems (i.e. web browsers) are relatively limited in number and there is one authoritative, referential document type or XML schema, efforts to make rendering output consistent across various web browsers have been successful to some degree.

However, as more socio-economic and cultural activities are conducted online and the application scope of electronic document expands, information interchange will depend more on structure document with style specification. As a result, structure document and style specification will be processed in more heterogeneous environment where a greater number of document rendering system implementations coexist. Current efforts, which worked well to make rendering output more or less consistent, may not work in such emerging environment, because there is no language or standard to express and evaluate the functionality or specification of document rendering systems. In most cases, differences in rendering output between two or more document rendering systems are understood as implementation-specific.

This, however, could become a source of confusion and dissatisfaction among non-technologists, who compose most of the users of electronic document. One cause of this confusion and dissatisfaction is that no standard can be referred to to show how interoperable the rendering output is between two or more document rendering systems. It also means that structure markup and accompanying style specification alone are not adequate any more, because a number of features are left 'implementation-dependent', and the problem remains unsolved. From a user perspective, it is legitimate and urgent to provide a standardised way of describing and comparing the features of document rendering systems.

1. Scope

This proposed standard targets at the document rendering systems which take in the logical structure and style specification of a document and put out the rendering output which result from the style specification being applied to the logical structure. The scope excludes any document processing system which does not separate the logical structure and style specification of the document.

2. Normative references

[tbd]

3. Definitions

[tbd]

4. Conceptual model for minimum requirements for specifying document

4.1 Approach

The basic assumption in document processing based on logical markup and style specification, is that the presentation format that the originator observes will be reconstructed at the recipient. However, the presentation reconstructed on the recipient's side often differs from that on the originator's side, because each document rendering system is different from the others in the details of implementation, and possible in the interpretation of style standards and style specifications.

However, as mentioned in the previous section, it is very difficult to intervene into the final presentation of electronic document because it is under control of each implementation. If you leave the issue to standard setting, it would not be solved either, because the issue revert back to each implementation. No matter how precise the style specification is, it is difficult to guarantee the rendering output because it is beyond control of each implementation. Therefore, standard setting in the area of structure description and style specification alone does not provide an effective way to preserve and guarantee the presentation of electronic document.

Instead, this standard suggests an alternative method. To preserve rendering output across various document rendering systems, it is not adequate to separate structure markup and style specification. Traditionally, the rendering output which results from the style specification being applied against structure markup is left to each implementation, and no guarantee is provided concerning how the final output looks like, whereas the user expects that the rendering output also travels all the way through, and reaches the recipient's output device.

1.2 Advantages

The proposed methodology focuses on standardisation in terms of the output, rather than the process. It is important for the user to have precise specifications readily available for the style and layout features that he or she may encounter. As mentioned earlier, however, this does not necessarily lead to guaranteeing the presentation output. Regulating the presentation output may seem to be a solution in order to preserve rendering output across different document rendering systems, but this is not an accepted custom in document processing based on structure markup and style specification.

In addition, this standard can expect a few advantages other than mentioned above over the approach in the existing standards in terms of making the presentation output consistent.

4.2.1 Effective negotiation between rendering systems

One of the advantages of the approach is that it can facilitate effective negotiation between two or more rendering systems. Suppose two document rendering systems negotiate over a single set of content. They may be built on totally different architecture. One may render documents on a page-by-page basis, whereas the other may render documents as a continuous, lengthy scroll of text. When this is the case, negotiation and compromise have to be made before exchanging the content concerning the way how page-based content should be rendered in scroll-based environment.

4.2.2 Common criteria in product description

The second advantage that this approach has is that it can provide common criteria for product description. Although product description is provided for various other products, both hardware and software, it is not a common practice in document rendering systems yet. One of the reasons, to our observation, is the lack of common criteria for describing the specifications of document rendering systems, and this standard will fill the gap.

4.2.3 Extension to multilingual document

Document style and layout features vary from language to language, or more precisely, from culture to culture. Existing standards are being reviewed and revised continuously to incorporate style and layout features of languages that are not referred to currently. To enable these style features to be supported in electronic document, standardisation efforts based upon local requirements are necessary.

ISO/IEC TR 19758 has been amended to address the document style and layout features that are specific to some Asian languages. Asian languages, and possibly any other languages, on one hand, have much in common in terms of document style and layout, but on the other there are a number of distinct and complex features of document style and layout. To support these document style and layout features unique to each language in the document rendering system, it is necessary to be able to specify the functionality of document rendering systems and to compare to what degree each implementation support such features.

5. Specification of document rendering systems

To satisfy the user requirements mentioned above, this article proposes a new standard which specifies the minimum requirements for specifying document rendering systems.

To specify document rendering systems, the standard suggests that a standard provide a list of functionalities that a document rendering system may have. The list should also provide names and descriptions of the functionalities. Below are the suggested items that need to be addressed in the minimum requirements.

5.1 General
5.1.1 Supported style languages
5.1.2 Pagenation or scrolling
5.1.3 Output device
5.1.4 Character-model: Simple character model or character cell model

5.2 Page and page sets
5.2.1 Geometry
5.2.2 Margins, paddings
5.2.3 Body
5.2.4 Headers, footers, sideline constructs
5.2.5 'Columnation'

5.3 Blocks or line chunks
5.3.1 Justification
5.3.2 Alignment
5.3.3 Keeps and breaks
5.3.4 Wrapping of widows and orphans
5.3.5 Line spacing

5.4 Line and interline
5.4.1 Hyphenation and breaks
5.4.2 Character spacing
5.4.3 Word spacing

5.5 Character-level processing
5.5.1 Font-related features: font selection, font substitution, font embedding
5.5.2 Special characters
5.5.3 Embedding of external glyphs and other symbols

5.6 Figures and graphics
5.6.1 Floating properties
5.6.2 Media types

5.7 Tables
5.7.1 Simple tables
5.7.2 Spanned rows and columns
5.7.3 Complex tables

5.8 Lists
5.8.1 List contour
5.8.2 Bullets
5.8.3 Nesting

5.9 Hyperlinking
5.9.1 Hyperlinking
5.9.2 Other linking mechanisms

5.10 Multimedia objects
5.10.1 Static content or dynamic content
5.10.2 External objects or internal objects

5.11 Micsellaneous functionalities

Annex A. Guidelines for authoring document formats

[tbd]

Annex B. Terminology for the specifications of document rendering systems

[tbd]