Handling primary sources in TEI XML

3. Letter forms

  • Unicode (ISO 10646) defines computer codepoints for most, though not all, of the abstract characters recognized by modern scholars when reading ancient sources.
  • Different fonts realise those codepoints in different styles; however the underlying character remains the same.
  • Data entry of Unicode characters can be
    • direct: some key combination or menu-selection generates the character ć for us
    • indirect, using a numeric character entity reference such as &#xE6
    • indirect using a mnemonic character entity reference such as æ (this requires every document to carry a DTD with it)

Up: Contents Previous: 2. Letter forms Next: 4. Non-Unicode characters