ISO/IEC JTC 1/SC34 N0xxx

ISO/IEC JTC1/SC34/WG2 N210

ISO/IEC JTC 1/SC34

Information Technology --
Document Description and Processing Languages

TITLE: 1st WD, Minimum requirements for specifying document rendering systems
SOURCE: Project Editor
PROJECT: TBD
PROJECT EDITOR: Keisuke Kamimura
STATUS: 1st Working Draft
ACTION: For information and comments
DATE: 2005-05-23
DISTRIBUTION: SC34, SC34/WG2 and Liaisons
REFER TO:
REPLY TO: Dr. James David Mason
(ISO/IEC JTC 1/SC 34 Chairman)
Y-12 National Security Complex
Bldg. 9113, M.S. 8208
Oak Ridge, TN 37831-8208 U.S.A.
Telephone: +1 865 574-6973
Facsimile: +1 865 574-1896
Network: masonjd@y12.doe.gov
http://www.y12.doe.gov/sgml/sc34/
ftp://ftp.y12.doe.gov/pub/sgml/sc34/

Mr. G. Ken Holman
(ISO/IEC JTC 1/SC 34 Secretariat - Standards Council of Canada)
Crane Softwrights Ltd.
Box 266,
Kars, ON K0A-2E0 CANADA
Telephone: +1 613 489-0999
Facsimile: +1 613 489-0995
Network: jtc1sc34@scc.ca


Minimum requirements for specifying document rendering systems

Abstract

When a structured document is interchanged between an originator and a recipient, the recipient refers to the style specifications that the originator provides to reconstruct the presentation. However, when the recipient does not have sufficient rendering functionality, it may fail to reconstruct the presentation output as the originator expected. In order to preserve presentation output in the course of interchange, the originator and recipient need to negotiate over functionalities referring to the specifications of document rendering systems. To satisfy this requirement, minimum requirements for specifying document rendering systems need to be standardised.

1. Introduction

1.1 Rationale

As more social activities are being processed by information system than ever before, the importance of information interchange via electronic document increases. Accordingly, electronic text with structure markup and style specification is getting more prevalent, because it is suitable for processing the information that travels between the machine and human readers.

The structured document, in many cases, is not human friendly, and it needs to be rendered to a human readable format. When an electronic document needs to be read by the human reader, the document must be transformed to a human readable format based on style and layout specifications. Style specification languages, such as ISO/IEC 10179:1996, Document Style Semantics and Specification Language (DSSSL), and World Wide Web Consortium's Extensible Stylesheet Language (XSL) are now in place to describe the style and layout of electronic document in a standardised manner. In theory, the final presentation format that the users at both ends of communication (e.g. the originator and recipient of the document) observe will be reconstructed by style specifications being applied to structure markup.

However, the authors have identified two issues of difficulty in reconstructing the exact representation that the user expects. One is that it is extremely difficult to prepare an appropriate set of style specifications to convert a given structured document to a human readable format as designed. There have been various attempts to solve this issue. One such attempt is to create a predefined library of style specifications, which the user can pick up and apply to their own document so that he or she can do away with defining style description from scratch. ISO/IEC TR 19758 [1] is one of these examples.

The other issue is that the rendering output is not necessarily guaranteed across document rendering systems. Even if style descriptions are given in adequate preciseness, document rendering systems may have different implementation of the style descriptions, which will result in a failure to reconstruct the presentation as originally designed. Various efforts, including those of the authors', have been made to set out a common set of document structure or style features, these efforts do not necessarily lead to preserving the rendering output.

1.2 Advantages

The proposed methodology focuses on standardisation in terms of the output, rather than the process. It is important for the user to have precise specifications readily available for the style and layout features that he or she may encounter. As mentioned earlier, however, this does not necessarily lead to guaranteeing the presentation output. Regulating the presentation output may seem to be a solution in order to preserve rendering output across different document rendering systems, but this is not an accepted custom in document processing based on structure markup and style specification.

In addition, the authors are convinced that the approach has a few other advantages over the approach in the existing standards in terms of making the presentation output consistent.

1.2.1 Effective Negotiation between Rendering Systems

One of the advantages of the approach is that it can facilitate effective negotiation between two or more rendering systems. Suppose two document rendering systems negotiate over a single set of content. They may be built on totally different architecture. One may render documents on a page-by-page basis, whereas the other may render documents as a continuous, lengthy scroll of text. When this is the case, negotiation and compromise have to be made before exchanging the content concerning the way how page-based content should be rendered in scroll-based environment.

1.2.2 Common Criteria in Product Description

The second advantage that this approach has is that it can provide common criteria for product description. Although product description is provided for various other products, both hardware and software, it is not a common practice in document rendering systems yet. One of the reasons, to our observation, is the lack of common criteria for describing the specifications of document rendering systems, and the proposed standard will fill the gap.

1.2.3 Extension to multilingual document

Document style and layout features vary from language to language, or more precisely, from culture to culture. Existing standards are being reviewed and revised continuously to incorporate style and layout features of languages that are not referred to currently. To enable these style features to be supported in electronic document, standardisation efforts based upon local requirements are necessary.

For the amendment for ISO/IEC TR 19758, the authors, along with other experts in the "Asian Document Style Standardisation for Information Interchange (DocSII)" project initiated by the Center of the International Cooperation of Computerization (CICC) of Japan, have studied the document style and layout in selected Asian countries to reach out for the style and layout features, if any, that are unique to Asian languages. The authors found that on one hand Asian languages, and possibly any other languages seem to have much in common in terms of document style and layout, but on the other there are a number of distinct and complex features of document style and layout in Asian languages. To support such document style and layout features unique to each language in the document rendering system, it is necessary to be able to specify the functionality of document rendering systems and to compare to what degree each implementation support such features.

2. User Requirements

The primary user requirement that the authors tries to address is the preservation of rendering output across different document rendering systems. Currently, rendering output may not necessarily be preserved because of implementation-specific differences. In a specific application domain, such as the world wide web, where document rendering systems (i.e. web browsers) are relatively limited in number and there is one authoritative, referential document type or XML schema, efforts to make rendering output consistent across various web browsers have been successful to some degree.

However, as more socio-economic and cultural activities are conducted online and the application scope of electronic document expands, information interchange will depend more on structure document with style specification. As a result, structure document and style specification will be processed in more heterogeneous environment where a greater number of document rendering system implementations coexist. Current efforts, which worked well to make rendering output more or less consistent, may not work in such emerging environment, because there is no language or standard to express and evaluate the functionality or specification of document rendering systems. In most cases, differences in rendering output between two or more document rendering systems are understood as implementation-specific.

This, however, could become a source of confusion and dissatisfaction among non-technologists, who compose most of the users of electronic document. One cause of this confusion and dissatisfaction is that no standard can be referred to to show how interoperable the rendering output is between two or more document rendering systems. It also means that structure markup and accompanying style specification alone are not adequate any more, because a number of features are left 'implementation-dependent', and the problem remains unsolved. From a user perspective, it is legitimate and urgent to provide a standardised way of describing and comparing the features of document rendering systems.

3. Proposed Methodology

3.1 Approach

The basic assumption in document processing based on logical markup and style specification, is that the presentation format that the originator observes will be reconstructed at the recipient. However, the presentation reconstructed on the recipient's side often differs from that on the originator's side, because each document rendering system is different from the others in the details of implementation, and possible in the interpretation of style standards and style specifications.

However, as the authors mentioned in the previous section, it is very difficult to intervene into the final presentation of electronic document because it is under control of each implementation. If you leave the issue to standard setting, it would not be solved either, because the issue revert back to each implementation. No matter how precise the style specification is, it is difficult to guarantee the rendering output because it is beyond control of each implementation. Therefore, standard setting in the area of structure description and style specification alone does not provide an effective way to preserve and guarantee the presentation of electronic document.

Figure 1. Control of Standards in Document Rendering

Instead, the authors suggest an alternative method. To preserve rendering output across various document rendering systems, it is not adequate to separate structure markup and style specification. Traditionally, the rendering output which results from the style specification being applied against structure markup is left to each implementation, and no guarantee is provided concerning how the final output looks like, whereas the user expects that the rendering output also travels all the way through, and reaches the recipient's output device.

Figure 2. New Point of Control in Document Rendering

3.2 Scope

There are three types of electronic document from a document style point of view [2]. One of these types is the final form document. It includes PostScript and PDF documents. These documents are not meant for editing or revising afterwards. Another type of document is the editing form document. Microsoft Word format and Rich Text Format may fall under this type. The other one is the structuring form with style specification. Typical examples of this type of document are SGML with DSSSL and XML with XSL.

This proposed standard targets at the document rendering systems which take in the logical structure and style specification of a document and put out the rendering output which result from the style specification being applied to the logical structure. The scope excludes any document processing system which does not separate the logical structure and style specification of the document.

3.3 Methodology

To satisfy the user requirements mentioned above, this article proposes a new standard which specifies the minimum requirements for specifying document rendering systems.

To specify document rendering systems, the authors suggest that a standard provide a list of functionalities that a document rendering system may have. The list should also provide names and descriptions of the functionalities. Below are the suggested items that need to be addressed in the minimum requirements.

1. General
1.1 Supported style languages
1.2 Pagenation or scrolling
1.3 Output device
1.4 Character-model: Simple character model or character cell model

2. Page and page sets
2.1 Geometry
2.2 Margins, paddings...
2.3 Body
2.4 Headers, footers, sideline constructs
2.5 'Columnation'

3. Blocks or chunks of lines
3.1 Justification
3.2 Alignment
3.3 Keeps and breaks
3.4 Wrapping of widows and orphans
3.5 Line spacing

4. Line and interline
4.1 Hyphenation and breaks
4.2 Character spacing
4.3 Word spacing

5. Character-level processing
5.1 Fonts
5.2 Special characters
5.3 Embedding of external glyphs and other symbols

6. Figures and graphics
6.1 Floating properties
6.2 Media types

7. Tables
7.1 Simple tables
7.2 Spanned rows and columns
7.3 Complex tables

8. Lists
8.1 List contour
8.2 Bullets
8.3 Nesting

9. Hyperlinking
9.1 Hyperlinking
9.2 Other linking schemes.

10. Multimedia objects
10.1 Static content or dynamic content
10.2 External objects or internal objects

11. Others

The functionalities listed above are not at all comprehensive, and need further discussion. As the list grows, it will incorporate all the required functionalities to express the specifications of document rendering systems.

4. Schedule of Work

Based on the approach and methodology above, the authors have placed a new work item proposal (NP) in ISO/IEC JTC 1/SC 34, which in charge of the standardization of document structures, languages and related facilities for the description and processing of compound and hypermedia documents. At the time of writing, the NP ballot has already been closed in favour of the proposal, and JTC 1 has circulated the NP for JTC 1 National Bodies' review. No objection from JTC 1 National Bodies was heard, and the project has formally established in the Program of Work of the SC34. The proposal will be elevated to a Committee Draft (CD) in December 2005, and to a Final Draft International Standard (FDIS) in June 2006.

5. Conclusion

Issues such as rendering output have been considered as 'implementation-dependent' and left out from the scope of document processing and description languages. However, as electronic document is becoming to be applied to a wider range of application domains, new user requirements may arise as mentioned earlier in this article, and issues that have not been in the scope need to be considered from a perspective of standard setting. Although the details of the standard are yet to be finalised, the authors expect that we can refine the standard with the expertise of WG 2 and other affiliated experts.

References