ringmayr

 
Print this page
Publishing

Large Text Documents Conversion of into DocBook



DocBook documents can be relatively simple and user-friendly--provided they were authored in DocBook from scratch. However, transforming a larger documdent that was authored by "traditional," consumer-grade text.processing means (e.g., MS Word, OpenOffice, Word Perfect, etc.) is whole different matter altogether. Endeavors of this kind all too often create huge problems and are not seldom abandoned altogether because of the overwhelming complexities they present.

What is frequently understimated is the amount of time, effort, and expertise required. Converting, say, a book manuscript by a "DocBook-unaware" author from MS Word into DocBook has almost nothing whatsoever to do with what is commonly meant by "format conversion" and cannot be compared in the least with converting, for example, a MS Word document into another text processor format.

Creating Docbook from "traditional text"--especially where larger manuscripts are concerned--is not just a technical process, but requires at least as much editorial skill as it makes special technical demands. Only from a combination of these areas (and a substantial amount of experience) can eventually come a usable and meaningful product.

Areas typically requiring special attention in the process are:

Semantic Markup

DocBook documents require "functional" instead of "presentational" markup. Strictly speaking, a MS Word document only contains information about the visual presentation of its content in a single, consumer-grade environment (PC, printer). DocBook, however, marks every content element semantically, thus allowing for tailoring different presentational designs based on function in a given environment. Multifunctionality in all sorts of systems, from comsumer to profressional level, is one of the main characteristics of DocBook. However, functional markup cannot per se be derived from presentational markup, and it certainly cannot be derived in an automated fashion.

Structural Model

DocBook documents are based on an entirely differnt structural model from text-processor output. While text processors simply treat a text as a linear flow of characters and structural elements (e.g., headings, list items, emphasized text) are marked simply via visual marks (larger text size, color, etc), DocBook utilizes an elaborate "cmntainer" model for keeping elements together that belong together. This hierarchical container paradigm is regulated by a sizeable and complex set of rules.

The structural paradigm of DocBook is not easily handled by automated routines, at least not without a rigorous and thorough editorial preparation of the manuscript.

Author Deviations

Every manuscript is unique. And even if authored in accordance with detailed style manuals, nearly every manuscript has its own little peculiarities, creative or involuntary interpretations of the standard. These many little deviations from a seamingly clear set of structural rules, taken together, frequently present a huge potential for problems in the process and cannot be solved by purely technical solutions at all. tial dar.

We have years of experience in solving these kinds of complex problems. We competently convert your manuscripts into valid, complete, and multi-usable DocBook.

Please inquire directly with us regarding our specialized services. You can use our contact form, or call us directly at +49 03 32 96 09 77 0. We love to solve your problems!

 
Last Update:
30 09 2008