Corpora Work Group

Objectives and Deadlines


22 October 1990

Table of Contents


  1. Reformulate this list of objectives.
  2. Evaluate the adequacy and completeness of TEI P1 with respect to corpus-level and sample-level documentation by preparing a realistic number of sample TEI headers for existing language corpora.
  3. Draft recommendations for classifying texts in a corpus according to text type, subject, socio-linguistic stratum, etc: specify the axes which should in general be documented, and propose specific values along the axes.
  4. Survey and report on existing text-classification schemes, noting divergences from draft recommendations of the Work Group, and proposing ways of documenting such divergences. If possible, propose a unified classification scheme and draft recommendations for its documentation.
  5. Survey and document major existing annotation schemes for corpora; compare them, and estimate the utility and possibility of harmonizing these existing schemes in a single interchange scheme. If possible, draft a unified annotation scheme and recommendations for its documentation.
  6. Evaluate the recommendations of TEI P1 with respect to the re-use of existing corpora. Propose a minimal subset of linguistic features which must be identifiable (and hence automatically convertible to TEI format) for the useful inclusion of existing texts in new TEI-conformant corpora.
  7. Respond to comments on the relevant portions of TEI P1 routed to this work group by the editors.


Work group reports to Committee for Text Representation. Head of the work group is Douglas Biber. Two members of the work group are to be named by the work group head with the concurrence of Stig Johansson; at least one should be a European.

Funding is authorized for one meeting.


To be agreed with the work group head.