1. What is the TEI
1.1. 1.1 Origins
Origins in the literary-and-linguistic computing community.
Great interest immediately in other areas: computational linguistics is in a
stage of massive db development and concern for reusability of data across
Those affected by the project are
- researchers (esp. humanists but also computational linguists)
- publishers and industry
- software developers
- data archivists
- funding agencies
- It should specify a common interchange format for machine readable texts.
- It should provide a set of recommendations for encoding new textual
- It should document the major existing encoding schemes, and investigate
the feasibility of developing a metalanguage in which to describe them.
- It must be a set of guidelines, not a set of rigid requirements.
- It must be extensible.
- It should be device- and software-independent.
- It should be language-independent.
- It should be application-independent.
We are conscious of a number of tradeoffs:
- in standardizing notation one risks standardizing the thinking
- there's a long way from the classicist in the garret to the
multi-million-dollar machine-translation project. We have to keep things
simple for the poor scholar, expressive for the team with programmers to
- we want rigorously defined standards, but they should be clear and
expressive. (Enough rigor will render anything unreadable.)
1.3. 1.2 Organization
Sponsorship by ACH, ALLC, ACL.
Funding is from NEH, EEC, Mellon.
Participation by 15 other organizations.
Steering Committee, Advisory Board, Editors, Working
3. 3. Why Should Industry Care about the TEI?
Why should you care about this? Well, in the SGML revolution,
the research community are the Jacobins or the Bolsheviks. SGML attempts the
liberation of electronic texts from paper output. But it takes a while to shake
your thoughts free. But the research community has never been fixated on ink on
paper: texts have always appeared to researchers as complex multi-leveled
cultural and linguistic objects that exhibited a lot of regularity but also a
tremendous variety of form.
Also, whether it's obvious or not: our problems are your
problems, and your problems are our problems. Most industrial firms do not much
care about the textual criticism of the First Folio, but they do face serious
problems of version control -- which take the same form for text applications.
You may not care about literary allusion, but subject indexing has many of the
same problems. You may not care about the problems of theoretical diversity, but
the same problems arise in trying to mediate among conflicting models in page