[ Index | Previous | Next ]

3. Model for Multilingual Document Distribution

3.1 Document model

3.1.1 Three forms of document

Even in the multilingual document distribution, we should consider three forms of electronic documents:
(1) Revisable logical document
(2) Revisable document with formatting information
(3) Formatted form document
In the distributed creation of documents, in particular, the interchange of forms (1) and (2) documents are indispensable. HTML or SGML/XML is enough to support the requirements for the interchange of form (1) documents.
NOTE 1: HTML - Hypertext Markup Language
NOTE 2: SGML - Standard Generalized Markup Language
NOTE 3: XML - Extensible Markup Language
For HTML processing systems which include pre-defined formatting specifications, HTML could support the interchange of form (2) documents to some extent. When more complicated formatting specifications are required, XSL or DSSSL will be expected.
NOTE 4: XSL - Extensible Style Language
NOTE 5: DSSSL - Document Style Semantics and Specification Language
Form (3) documents can be described by PDF or PDL, the implementation of which is already available. When using PDF documents, we can interchange them with compressed data.
NOTE 6: PDF - Portable Document File
NOTE 7: PDL - Page Description Language

3.1.2 Multilingual consideration

Most of today's browsers have "multilingual" support. Their "multilingual" functions are, however, provided by selecting an appropriate language on the menu. They should be referred to as multi-localization rather than multilingual. They cannot render such a document as contains different language parts within a page or within the document itself.

The documents interchanged in the Internet environment are often required to be a multilingual mixture, i.e., described using multiple languages within a paragraph, a page or a document. In order to distinguish these documents, we should call them real multilingual documents. A typical example is a participant list of an online meeting, where each participant should be described in his or her own language/script.

The real multilingual documents must be rendered and represented according to appropriate multilingual formatting. It means that those documents should be created and treated with the following considerations:

(1) coded character set including multilingual repertoires
(2) font set required for multilingual rendering
(3) style specification for multilingual rendering
For implementation, the following issues should be discussed as future topics:

3.2 Japanese and Asian Fonts Overviewed

3.2.1 Characters Jungle

Major East Asian languages -- Chinese, Japanese and Korean -- utilize the two byte code system for their characters. This is inevitable, because a single byte code system like ASCII, which is more than enough to handle the alphabets derived from Greco-Roman scripts, is hopelessly inadequate for dealing with the thousands of characters commonly used in East Asian languages.

The problem is basically caused by the multitude of "Han-dynasty Characters", t