books
Structuring XML Documents

ISBN: 0-13-642299-3

Author: David Megginson

Series: Charles Goldfarb Series

David Megginson is a senior architect for Microstar Software Ltd. of Ontario, Canada. Microstar is a leading integrator of large SGML and XML document management, publishing, and production systems. Microstar is also well known for its Near & Far Designer technology which is the leading case tool for the design of SGML/XML document type definitions. Meggison has seven (7) years of experience in structured document design, first as an academic and then as a professional consultant.

Pages: 420 + CD-ROM

Intended Audience:

The intended audience of Structuring XML Documents includes those responsible for developing the document design. It is not intended for authors, or for software developers, but rather is aimed at document designers. Although XML is prominent in the title and on the cover of the book, this book is not intended for those who are XML beginners and wish to learn more about XML. In fact the design principles presented in this book can apply to XML, SGML, or any other syntax. It is the design process, itself, which is the focus of the book.

In specifying the intended audience of Structuring XML Documents, Megginson makes several important assumptions:

  • The reader knows SGML / XML syntax
  • The reader knows how to write a basic DTD
  • The reader already has identified the users as well as the project requirements
  • Book-oriented DTDs are the focus of the book (such as tech manuals or legislative documents)
  • Database-oriented DTDs are not specifically within the scope of the text

If these assumptions of Structuring XML Documents do not match your profile, another text will be more appropriate for you. On the other hand, if you are responsible for creating the document design for your project, whether it is a structured authoring project, an SGML project, or an XML project, then this text will provide an excellent starting point for you.

Summary of the Book:

Structuring XML Documents is divided into 4 parts and has back matter which includes an extensive "Index of Element Types and Attributes" as well as a CD-ROM.

The first part of Structuring XML Documents provides general background information which is required for mastery of the advanced topics presented in the following parts of the book. In the first part of the book a review of DTD syntax is provided. Again, this is a review and not a beginning text. If you are not already familiar with SGML, this will not provide adequate background. However, if you are just a bit "rusty" this portion of Structuring XML Documents provides an excellent review. Also in this part of the book, five (5) industry-specific DTDs are introduced. These DTDs will serve as a basis for discussion of document design throughout the remainder of the book. The DTDs which are highlighted in this text include:

  • ISO 12083: (journals and books)
  • DocBook: (software documentation)
  • Text Encoding Initiative (TEI): (research-oriented materials)
  • MIL-STD-38784: (technical documentation)
  • Hypertext Markup Language (HTML): (Web documents)

The second part of the book guides readers to review their goals and requirements against the 5 selected model DTDs. It is the author's philosophy (and mine as well) that if an industry standard DTD can be directly used or adapted the project will move along much more quickly and surely than if a new DTD was to be developed from the ground up. It is much more costly for an organization to develop a unique DTD. Also it means that the organization cannot take advantage of the lessons learned by those SGML pioneers who worked many long years to develop the model DTDs featured in this text. It is true that meeting an organizations functional requirements in an SGML/XML design is of utmost importance. But I agree with Megginson that many good foundations exist and that most goals can be met by working from an existing model.

Part three of the book focuses on advanced issues in DTD maintenance and design. In particular, issues such as repetition and omissibility of elements are highlighted. Also interchange considerations for document fragments are highlighted. #CURRENT and SUBDOC are just two advanced features which are discussed in these chapters. Perhaps the most useful portion of part 3 is the discussion of how to customize each of the five model DTDs. Since in most instances, customizing an industry standard DTD is the preferred strategy, tips on how to customize each of the prominent models will be very useful to document architects.

Part four of Structuring XML Documents shows the reader how architectural forms (from the HyTime standard) can be used in DTD design. Architectural forms are first introduced. Basic examples of how to use architectural forms are given. And advanced uses of architectural forms are also presented. This part of the book provides some mechanisms for dealing with design situations which may not be resolved in any other way. My concern here is the degree of available tool support for architectural forms. Since designs must be implemented, the elegance gained by using advanced DTD design strategies may suffer when faced with the realities of system implementation.

The back matter of Structuring XML Documents is a complete index of all elements and attributes from the 5 model DTDs. I suppose the idea here is that you can pick standard tag names from any of the standard DTDs to construct your own DTD. While interesting, I question how valid a mix-and-match approach really can be. In my experience designers want to begin with a standard model, prune the model, and then add tags with specific meaning for their own environment.

The CD-ROM includes a number of XML parsers. It is important to note that most of the text in Structuring XML Documents appears to focus on SGML. And discussions of advanced topics such as #CURRENT, which are not supported in XML, reinforce my impression that this is primarily an SGML text. Very little mention of XML can be found in the book. I would have expected some chapters which presented a strategy to move each of the model SGML DTDs to be XML DTDs, for example. I suppose the title is XML rather than SGML because of the inclusion of XML parsers on the CD-ROM and because XML is currently a "hot" topic.

Final Recommendation:

Structuring XML Documents is an excellent text for beginning document designers. Although the book focuses on SGML and does not have significant XML-specific content, it will clearly be useful to SGML and XML designers alike. I particularly agree with and recommend Megginson's development strategy to begin with a standard model DTD and customize that for individual use. I also believe it is great to have all the most commonly used DTD models in a single text where the designer can easily compare and contrast them. And the tips for how to customize each model are invaluable. I will certainly try the XML parsers included in the book as I prepare upcoming issues of XML Filesfor the GCA