TEI ED P1: Design Principles for Text Encoding Guidelines

Design Principles — for Text Encoding Guidelines
14 December 1988
rev. 9 January 1990

This document defines the basic design goals and working principles for the text encoding guidelines to be created by the Text Encoding Initiative.

It extends the principles enunciated by the Poughkeepsie Planning Conference of November 1987 (see TEI document no. TEI PCP1) to questions of detail not covered there, and provides basic interpretations of the clauses of the Poughkeepsie Principles.

1. Introduction

The Text Encoding Initiative is a cooperative undertaking of the textual research community to formulate and disseminate guidelines for the encoding and interchange of machine-readable texts intended for literary, linguistic, historical, or other textual research. It is sponsored by the Association for Computers and the Humanities (ACH), the Association for Computational Linguistics (ACL), and the Association for Literary and Linguistic Computing (ALLC). A number of other learned societies and professional associations support the project by their participation in the Initiative's Advisory Board. The project is funded in part by the U.S. National Endowment for the Humanities.

The primary goal of the Text Encoding Initiative is to provide explicit guidelines which define a text format suitable for data interchange and data analysis; the format should be hardware and software independent, rigorous in its definition of textual objects, easy to use, and compatible with existing standards. The Standard Generalized Markup Language (SGML) is expected to provide an adequate basis for the guidelines.

This document attempts to set out the fundamental principles upon which the work of the Text Encoding Initiative is to proceed. In it, the guidelines for text encoding and text interchange to be formulated by the Initiative are referred to simply as ‘the guidelines’; the encoding scheme specified by the guidelines is referred to as ‘the TEI scheme’ to distinguish it from other encoding schemes extant or prospective. ‘Encoding scheme’ and ‘markup scheme’ are here used interchangeably; the term ‘tag set,’ which conveys a different sense, is sometimes also used since the expectation is that the TEI markup scheme will consist largely of a set of SGML tags together with an account of their interrelationships and meanings.

Commercial and research interests do not, in any case, always conflict. Both are best served by an intellectually adequate analysis of textual problems and their representation. Very few problems in the research area lack analogues in commercial areas, even though in research the problems may occur more often and more forcefully.


Also relevant, but less difficult to accommodate, are the national and international standards for data interchange, character sets, character names, etc., and the standards governing library cataloguing, dataset description, transliterations, etc.


This means that problems of ASCII-EBCDIC translation, and the limitations of the ASCII-EBCDIC translations in common use, will be specifically addressed; the interchange character set should be the set of all characters consistently and reversibly translated by all such translation programs. The recently developed 190-character extensions to ASCII and EBCDIC will also be discussed.