Lou Burnard

25 April 1991

# 1. Program Overview

This program has the very simple and pragmatic objective of taking documents drafted in something like SGML and formatting them with LaTeX. Early versions of the program were written to work with the TEIDOC0 dtd developed for P1 by myself and Michael; it has now been generalised somewhat to accept any tagset, but no very principled approach has been taken to supporting every possible feature of SGML. In particular, the program does no validation but assumes the document has already been checked; only a limited number of attributes are acted on; and there are a couple of dirty tricks used -- for example, comments are assumed to be closed on the line in which they are opened (they are simply echoed to the terminal rather than being translated, as I suppose they should, to LaTeX comments).

The program is driven by a dictionary file' which defines the mapping to be applied between SGML tags and LaTeX tags. It does a little more than simply translating the tags however: an action code may be defined for any tag to specify an additional action, as follows:

• ! -- output the SGML GI as well as the content
• * -- suppress the SGML tag but output the content
• - -- suppress the whole of the element (tag and content)
• : -- assume the presence of an end-tag at the end of the input line (this is a particularly unwholesome feature of the program which I keep meaning to remove)
The program in its current incarnation does not support the - action. Not yet having found any use for it, I never got round to implementing it.

The dictionary file also contains substitution strings for any entity references in the document.

The program does its best to render safe' any of the (numerous) characters which LaTeX finds upsetting; this is done in a separate scan of the input strings carried out after tags have been identified. It also tries, not very successfully, to put spaces in where LaTeX will not subsequently remove them, e.g. between LaTeX tags and following content.

If tags or entities are found in the document that do not exist in the dictionary, the user is given the option to define them, and to update the dictionary.

Three tags have a special effect: <xmp>, which is translated into \begin{verbatim}, also switches off suppression of LaTeX special characters (though not if they appear on the same input record as the <xmp> itself -- I told you this was a simple minded program); <eg>, which is translated into \begin{verse}, also causes the LaTeX hard-line tag \\ to be added to the end of each input line; <div> and <head> tags are the really insanitary ones - they are translated into the appropriate LaTeX \section, \subsection etc. and any following <head> tag is simply removed. Any uninterrupted sequence of digits, stops and spaces at the beginning of the head tag is silently removed (this is to cater for Michael's habit of explicitly numbering titles etc. to make the SGML eye-readable). The tag <h1> is treated as a synonym for <div1><head>, and likewise for <h2> etc.

The only attribute values acted on are ID and TARGET: these are used to generate the LaTeX equivalent labels', so that cross-references at least work.

The source code for the program follows, in Appendix 1. An example dictionary file, corresponding with the DTD used for the present document, is given in Appendix 2.

# 2. Discussion

Is this program reversible? Clearly not, since there are many LaTeX tags for which no descriptive equivalent exists, and even more which might be used ambiguously in a number of situations. As a trivial example -- what is one to do with the sequence \it some string \rm ? It might just be emphasis (although LaTeX has an \em tag, it doesn't require you to use it for all occasions when emphasis is intended, only those where you might want to change the typeface of the surrounding body text) or it might be a citation or .

I found particularly aggravating the LaTex conventions for dealing with spaces following tags. These are even worse than the SGML rules about record separators, believe it or not. Otherwise, writing this program and getting it to produce reasonably acceptable output was very easy indeed.

# 3. The program source

*  GMLTOTEX
*  runs with any MACRO spitbol implementation
*  check the NEWFILE routine for system dependent filenames
*  Lou Burnard, OUCS, March 1991
&trim = &anchor = 1
define('tagcheck(tag)')
define('entcheck(str)')
define('new_dic(s)t1,t2')
define('new_file()')
define('yn(s)')
define('specials(s)s1,c')
osp = span(' ') | null
ups = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
los =  'abcdefghijklmnopqrstuvwxyz'
* pattern to recognise and remove comments
comment_pat = ('.' | '<!--') notany('.') rem . terminal
* pattern to find tags
p = break('<') . s1 '<'   break('>') . tag '>'
* pattern to break attributes out of a tag
attr_pat = osp
.         break('=') . a_name '=' osp
.    (  ("'" break("'") . a_value "'")
.     | (break(' ') . a_value span(' '))
.     | rem . a_value)
* -- open next data file and next output file
*
file new_file()                                      :f(close)
*
* -- output any LaTeX initialisation stuff
*
out = '\documentstyle[11pt,A4]{article}'
out = '\frenchspacing'
out = '\def\tag#1{{\boldmath\bf #1}}'
out = '\def\smartitalicx{\ifx\next,\else'
.       '\ifx\next-\else\ifx\next.\else\/\fi\fi\fi}'
out = '\def\smartitalic#1{{\it'
.         '#1}\futurelet\next\smartitalicx}'
*
* -- read in next line from data file
*
str = specials(str)
str = differ(verse) str '\\'
*
* -- get next tag from input string
*
check     str p =                                    :f(write)
new_str =  new_str s1
attrs =
tag break(' ') . tag span(' ')  rem . attrs
* kludge city -- bad guys dont look
leq(tag,'xmp')                            :s(do_verbatim)
verse = leq(tag,'eg') 1
verse = leq(tag,'/eg')
tag ('div' | any('hH')) span('123') rpos(0)
.                                                      :f(eok)
str span(' 0123456789.') =      :(eok)
do_verbatim
out = '\begin' '{verbatim}' str
in_verb str = in                                 :f(done_verb)
str breakx('<') . s len(1) '/xmp>' =
.                                                :s(done_verb)
out = str                                      :(in_verb)
done_verb  out = s ; s =  ;
out = '\end' '{verbatim}'                     :(check)
* end of kludge city
eok  new_tag = tagcheck(tag)
terminal = leq(new_tag,'*')
.         'No action taken for ' tag ' tag'
.                                                    :s(check)
*
*-- here we deal with attributes
*
chk_attr ident(attrs)                                :s(colon)
attrs attr_pat      =
:f(dud_attrs)
* Extract ID and TARGET attribute values for use as labels
* Other attributes are ignored
a_name ('id' | 'target')                    :s($(a_name)) terminal = 'Attribute ' a_name '=' a_value ' ignored' . :(chk_attr) id label = a_value :(chk_attr) target new_tag = new_tag a_value '}' :(chk_attr) * * -- check action code on new tag * colon new_tag ':' = :f(shriek) * implied close at end of line new_tag = new_tag str '}' str = ident(label) :s(store) new_tag = new_tag ' \label{' label '}' label = shriek new_tag '!' :f(store) * bang out the tag name as well as its content new_tag = '\bf ' tag ': \rm ' * * - here we buffer it store new_str = differ(new_tag) new_str new_tag . :(check) * * - and here we output it write out = new_str str new_str = :(read) eofile terminal = 'File ended' * out = '\end{document}' endfile(4) :(file) close yn('Rewrite dictionary?') :f(end) output(.out,4,dic_name) t = sort(convert(t,'array')) i = 0 ti i = i + 1 out = t<i,1> ':' t<i,2> :s(ti) out = '*ENTITIES*' et = sort(convert(et,'array')) :f(end) i = 0 ti2 i = i + 1 out = et<i,1> '=' et<i,2> :s(ti2)f(end) * specials s break('$&%#{}\^|') . s1 len(1) . c = :f(no_spec)
* look for entity refs
c '&'                                         :f(not_ent)
s2 =
s (span(ups los) (';' | '.') ) . s2 =
.                                                 :f(keep_amp)
s2 = entcheck(s2)                            :f(keep_amp)
specials = specials s1 s2                     :(specials)
keep_amp
specials = specials s1 ' \& '
s = s2 s               :(specials)
* look for latex specials
not_ent   c any('$&%#{}_') = '\' c :s(sp_store) * anything else can be done with \verb c any('\^<>|') = '\verb+' c '+' sp_store specials = specials s1 c . :(specials) no_spec specials = specials s :(return) * tagcheck tag = replace(tag, ups , los) tagcheck = replace(t<tag>,'_',' ') terminal = ident(tagcheck) . 'What sort of a tag is a ' tag '???' . :f(return) tagcheck = terminal differ(tagcheck) . yn('Shall I add ' tagcheck ' to dictionary?') . :s(add) t<tag> = '<' tag '>' :(return) add t<tag> = tagcheck :(return) * entcheck * strip off last ; or . before looking it up str rtab(1) . str entcheck = et<str> terminal = ident(entcheck) 'Undefined entity ' str . :f(return) yn('Do you want to define this entity?') . :f(skip_ent) terminal = str '=?' entcheck = terminal et<str> = differ(entcheck) entcheck . :s(return) skip_ent entcheck = str ';' :(return) * new_file endfile(3) terminal = 'Filename?' f = terminal :f(freturn) ident(f) :s(freturn) input(.in,3,f) :f(new_file) * filename tweaked for VMS and PC only f ((break(']') | pos(0)) break('.')) . f . :f(funnyfilename) in '<' break(' >') . doctype :f(no_dtd) new_dic(doctype) :f(freturn) output(.out,4,f '.tex') :f(wot) terminal = 'Output to ' f '.tex' . :(return) no_dtd terminal = 'Input file must begin with <xxx>' terminal = ' xxx identifies the doctype ' :(freturn) * new_dic * * -- open a new dictionary file and load its contents * parameter is the name of the file * endfile(7) dic_name = s '.dic' input(.din,7,dic_name) :f(no_dic) * t is the table for tags and et for entity names t = table() ; et = table() nt = ne = 0 load_t s = din :f(loaded) s break(':') . t1 ':' rem . t2 :f(do_entities) nt = nt + 1 ; t<t1> = t2 :(load_t) do_entities s '*ENTITIES*' :f(dud_t) load_t2 s = din :f(loaded) s break('=') . t1 '=' rem . t2 :f(dud_t) ne = ne + 1 ; et<t1> = t2 :(load_t2) dud_t terminal = 'dictionary starts ' s . ' - what nonsense is this?' :(freturn) no_dic terminal = 'Where is dictionary ' dic_name '???' . :(freturn) loaded terminal = nt ' tags, ' ne . ' entities loaded from ' dic_name . :(return) * yn terminal = s terminal any('yY') :s(return)f(freturn) end  # 4. Example dictionary file !:* /abstract:\end{abstract} /action:\end{flushright} /address:* /attend:end{quotation} /attr:\rm_ /body:\end{document} /citn:\rm_ /eg:\end{verse} /fig:\end{figure} /frontm:* /gl:\end{description} /hi:\rm_ /ldoc:</ldoc> /note:} /ol:\end{itemize} /q:'' /tag:$>$} /ul:\end{itemize} abstract:\begin{abstract} act:* action: \linebreak \begin{flushright} \scACTION:\rf appendix:\appendix attend:\begin{quotation} attr:\it_ author::\author{ back:* body:\begin{document} \maketitle citn:\smartitalic_ date::\date{ div1::\section{ div2::\subsection{ div3::\subsubsection{ docnum::\center{ duedate:* eg:\begin{verse} front:* gd:] gl:\begin{description} gt:\item[ head:* hi:\bf_ include:* li:\item_ note:\footnote{ ol:\begin{itemize} p:<p>_ q: tag:<gi> term:\smartitalic_ title::\title{ toc:\tableofcontents ul:\begin{itemize} who:* xmp:\begin{verbatim} xref:\unskip^\ref{ *ENTITIES* amp=\& dash=--- gt=$>$lt=$<\$
`

HTML generated 18 May 1998