15th AFSIT

15th Asian Forum for the Standardization of Information Technology



Document Processing

Multilingual Document Interchange based on Standardized Document Description Languages (XML, XSL, etc.)


Yushi Komachi

Panasonic/MGCS, Shimomeguro, Tokyo Japan
email: komachi@y-adagio.com

2001-11-07


1. Introduction

1.0 Coded characters are done and the NEXT

Standardization activities on Coded characters have almost been done by the activities of

What shall we do next regarding characters and their related issues?

1.1 Presentation Style of Character String

When several coded characters are assembled and aligned, they configure a sentence, paragraph, clause, etc., in accordance with their semantics. Those sets of coded characters (sometime referred to as logical elements) configure an electronic document.

The assembles of coded characters are rendered and presented with particular presentation styles, for feasible understanding of their semantics.

Here I wish to emphasize the importance of presentation styles of character strings in electronic documents presentation, since they have their own meanings and have to be preserved when interchanging the electronic documents.

1.2 Study of Presentation Style

Presentation styles had been developed in the conventional printing and typography technology, being based on cultural background of each country, each territory, each publishing/newspaper company.

Therefore when we study presentation styles, we have to be serious about the cultural background as we had been in our character study.

When we study presentation styles, we face to the problem that there are few authorized or published references on presentation styles of each country, each territory, each publishing/newspaper company. The only authorized references could be Oxford rule and Chicago rule for conventional Western documents.

Besides, we should study presentation styles for web documents, where appropriate presentation styles have not yet established.

1.3 Electronic Documents

Before we discuss presentation styles, we should review the structure of electronic documents, where the presentation styles are dealt with.

Today we have a number of forms of electronic documents, e.g.,

1.4 Works for Electronic Documents

(1) Work Type 1

   -----------                    ------------
   |Creation |------------------->| Reviewed |
   -----------                    ------------
Created documents are interchanged and reviewed on a display or printed paper. Recipients make sometimes an editing.

(2) Work Type 2

   -----------                    -----------
   |Creation |------------------->|Processed|
   -----------                    -----------
Created documents are interchanged and processed without review (e.g., in CAD systems).

(3) Work Type 3

   -----------                    ------------------------
   |Creation |------------------->|Processed and Reviewed|
   -----------                    ------------------------
Created documents are interchanged, processed and reviewed. Processings are conversion, data merging, linking with database, etc.

1.5 Required Functionality for Electronic Documents,
and [Document Forms]

(1) Work Type 1

(2) Work Type 2

(3) Work Type 3

NOTE: HTML is an instance of SGML (XHTML is an instance of XML). A style specification has been defined for each element in the HTML. Therefore simple-structured XML documents are sometimes converted into HTML documents for presentation, by using XSLT.

1.6 Focussing on Work Type 3

Here we focus on electronic documents of work type 3
and discuss about presentation style specifications for the documents.

2. Logical Structures Description

2.0 Logical Structures

Major parts of documents consist of character strings, which are assembled into elements, e.g., title, clause, paragraph, etc., according to the semantics of the included character strings. The semantics and element structures are usually based on the intention of the document author.

For example, a simple document consists of the following elements

   -- title --
      -- author --
   -- abstract --
   -- heading of clause 1 --
   ---contents of clause 1 --------------------
   |      -- paragraph 1 --                   |
   |      -- paragraph 2 --                   |
   --------------------------------------------
   -- heading of clause 2 --
   ---contents of clause 2 --------------------
   |      -- paragraph 1 --                   |
   |      -- paragraph 2 --                   |
   --------------------------------------------

2.1 Description of Logical Structures

Logical structures of documents are described by existing markup languages, e.g., SGML(Standard Generalized Markup Language), XML(Extensible Markup Language), where element structures are specified by DTD(Document Type Definition) and actual document instances are marked up by tagging with the element types defined in the DTD.

HTML(HyperText Markup Language) is an instance of SGML with a fixed DTD. Therefore, HTML can be used to describe logical structures of a simple document.

Examples:

2.2 International Standards for logical structure description

3. Style Specification for Logical Elements

3.0 Processing in Style Specification

A style language:

applies style properties to logical elements.

(1) Identify a logical element in a node tree of an SGML/XML document.

(2) Specify style properties to the element.


(0) Sometimes before the step (1), the node tree is converted for appropriate rendering.

3.1 Processing by Standardized Style Languages

Languagestep(0)step(1)step(2)Note
DSSSL1)YYYfor SGML/XML documents
XSL2)NYYfor XML documents
XSLT3)YNNfor XML documents
CSS4)NYYsimple styles for web documents

1)  Document Style Semantics and Specification Language
2)  Extensible Stylesheet Language
3)  XSL Transformations
4)  Cascading Style Sheets

3.2 Processing Model of DSSSL

3.3 Simple Example of Style Specification by CSS

Style sheet is a set of rules.
A rule is described in the form of
        Selector {property: value}

where
        Selector identifies a element.
        {Declaration} specifies properties to the element.

For grouping,
        each selector is separated by ","
        each declaration is separated by ";"

Implementation
        (1) linking style sheet
        (2) embedding style sheet
        (3) inline style sheet

3.4 An Example of Style Specification by CSS

Example description

  body{
      color: black;
      font-family: helvetica, sans-serif;
      background: white;
      margin: 2em
      }
  h3  {
      margin-left: 1em;
      font-size: 95%
      }
  h4  {
      text-align: center;
      font-style: italic
      }
  p   {
      background: yellow
      }

Example text shown in 2.1

3.5 International Standards for Style Specification

4. A typical Set of Presentation Styles

4.0 Today's Status for Presentation Style

Description Languages for logical structures and style specifications have been internationally approved and actually implemented.

Is the status enough for everybody to describe presentation styles for his own documents?

No.


As shown in 1.2, there is no enough references on presentation styles.

We should try to draft some reference documents on presentation styles considering the cultural background of our countries and our document environments.

The style specifications are too complicated to be described by everybody. (The example of 3.4 is an extremely simple one.)

One solution could be a library of style specifications.


I will show a trial carried out in Japan, which is JIS/TR X 0010: DSSSL library for complex compositions. Its English version has been submitted to JTC1 as DTR (Draft Technical Report)

4.1 ISO/IEC DTR: DSSSL library for complex compositions

[1 Scope]

This Technical Report provides a DSSSL library that can specify styles for the documents described by SGML or XML. The library makes it feasible to describe DSSSL specification for those documents, without any particular knowledge of DSSSL or particular composition rules.

NOTE: When we started to develop the TR, any draft of XSL had not yet published. The CSS cannot satisfy user requirements for the TR.


[2 References] and [3 Definitions]

are shown according to the ISO/IEC directives.

[4 Formatting objects and properties]

Major and comparatively complicated formatting objects and formatting properties (i.e. presentation style) employed in usual publication (or usual Japanese publiaction) are clarified and defined.

They are:

[DSSSL Library]

The DSSSL library makes it feasible to describe DSSSL specifications required for complicated compositions.

Configuration of the DSSSL library is shown in [clause 5], and contents of the library files are listed in:

[Processing flow]

The configuration and processing flow of DSSSL library is shown in the following figure. In order to use the DSSSL library, Scheme processor and DSSSL processor have to be available.


(1) Simple parameter data

The simple parameter data are provided in a form of association list for the full parameter generator. The simple parameter data are default values for a typical composition style. If necessary, some parameter data can be added to the simple parameter data.

(2) Full parameter generator

Full parameter generator creates, on a Scheme processor, full parameters being based on the simple parameter data.

(3) Function set

The function set includes DSSSL flow object generating functions and their support functions used in the construction rules in DSSSL specifications. The full parameters are referred to by those functions.

(4) Page model set

The page models are DSSSL descriptions of page styles. The full parameters are referred to by those page models.

(5) Flow object construction rules

The construction rules provide actual DSSSL specifications for a specific DTD, using the full parameters, function set and page model set.

5. Requirements for Extension to the Presentation Styles Specification

5.0 Extension to the Scope

The scope [major and comparatively complicated formatting objects and formatting properties (i.e. presentation style) employed in usual publication (or usual Japanese publication)] of the TR cannot cover all the presentation styles.

For example, it does not support the presentation styles for

5.1 XSL and XSLT Libraries

After the PR and CR of XSL were published, a number of their implementations or implementation plans were announced.

Responding to those web technology, XSL and XSLT Libraries will be required.

6. Proposed Work Items

As a conclusion, here I propose several new work items for the topic of presentation styles required for multilingual electronic document interchange preserving styles.

The final target could be to draft and submit an ISO/IEC DTR entitled with "Style Specification Library for Multilingual Compositions"