ELRA Work Package 3: first draft

5. Appendixes

5.1. A Feature System Declaration for the EAGLES morphosyntactic Guidelines

This is a complete FSD for the EAGLES Guidelines for morphosyntactic analysis, using the formalism defined in chapter 26 of the TEI Guidelines. It consists of a series of declarations for feature structures, each represented as a <fsDecl> element, and each corresponding with an EAGLES recommended feature. Each <fsDecl> contains a series of <fDecl> elements, each corresponding with a set of the feature-value pairs defined for that feature structure in the EAGLES scheme. The values (<vRange>) are specified as a set of alternate values using the <vAlt> element, indicating that EAGLES does not permit multi-valued features, but a system-dependent default value (<dft>) is permitted for use in cases where none of the specified values is applicable.
<!DOCTYPE teiFsd2 system "teifsd2.dtd"> <TEIfsd2> <teiHeader> <fileDesc> <titleStmt> <title>Feature System Declaration for the EAGLES tagset</title> </titleStmt> <publicationstmt> <p>Prepared for ELRA WP3 </publicationstmt> <sourcedesc><p>No source: this is an original work</sourcedesc> </filedesc> <revisionDesc> <change><date>2 apr 1997</date> <respstmt><resp>ed</resp><name>LB</name></respstmt> <item>Minor changes for validation; added header</item> </change> <change> <date>31 mar 1997</date> <respstmt><resp></resp><name>APM</name></respstmt> <item>First complete draft</item> </change> </revisionDesc> </teiHeader> <!-- Feature system for Nouns --> <fsDecl type = Noun> <fDecl name = Type> <fDescr>Range types associated with a noun</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Common --> <sym value=2><!-- Proper --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Gender> <fDescr>Range genders associated with a noun</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Masculine --> <sym value=2><!-- Feminine --> <sym value=3><!-- Neuter --> <sym value=4><!-- Common FOR USE WITH DUTCH AND DANISH ONLY --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Number> <fDescr>Range number associated with a noun</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Singular --> <sym value=2><!-- Plural --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Case> <fDescr>Range case associated with a noun</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Nominative --> <sym value=2><!-- Genitive --> <sym value=3><!-- Dative --> <sym value=4><!-- Accusative --> <sym value=5><!-- Vocative --> <sym value=6><!-- Indeclinable VALUE FOR GREEK ONLY --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Countability> <fDescr>Optional attribute counatbility</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Count --> <sym value=2><!-- Mass --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Countability> <fDescr>Language Specific Attribute Definiteness for Danish</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Definite --> <sym value=2><!-- Indefinite --> <sym value=3><!-- Unmarked --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> </fsDecl> <!-- Feature system for Verbs --> <fsDecl type = Verb> <fDecl name = Person> <fDescr>Range person associated with a verb</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- First Person --> <sym value=2><!-- Second person --> <sym value=3><!-- Third person --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Gender> <fDescr>Range genders associated with a verb</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Masculine --> <sym value=2><!-- Feminine --> <sym value=3><!-- Neuter --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Number> <fDescr>Range number associated with a verb</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Singular --> <sym value=2><!-- Plural --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Finiteness> <fDescr>Range finiteness associated with a verb</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Finite --> <sym value=2><!-- Non Finite --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = FormOrMood> <fDescr>Range form/mood associated with a verb</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Indicative --> <sym value=2><!-- Subjunctive --> <sym value=3><!-- Imperative --> <sym value=4><!-- Conditional --> <sym value=5><!-- Infinitive --> <sym value=6><!-- Participle --> <sym value=7><!-- Gerund --> <sym value=8><!-- Supine --> <sym value=9><!-- Ing Form VALID FOR ENGLISH ONLY --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Tense> <fDescr>Range tense associated with a verb</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Present --> <sym value=2><!-- Imperfect --> <sym value=3><!-- Future --> <sym value=4><!-- Past --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Voice> <fDescr>Range voice associated with a verb</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Active --> <sym value=2><!-- Passive --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Status> <fDescr>Range status associated with a verb</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Main --> <sym value=2><!-- Auxiliary --> <sym value=3><!-- Optional Attribute Semi Auxiliary --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Aspect> <fDescr>Optional Aspect attribute</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Perfective --> <sym value=2><!-- Imperfective --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Separability> <fDescr>Optional Separability Attribute</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Non Separable --> <sym value=2><!-- Separable --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Reflexivity> <fDescr>Optional Reflexivity Attribute</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Reflexive --> <sym value=2><!-- Non reflexive --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Auxiliary> <fDescr>Optional Auxiliary Attribute</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Have --> <sym value=2><!-- Be --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = AuxiliaryFunction> <fDescr>Auxiliary Function Attribute Applicable ONLY TO ENGLISH</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Primary --> <sym value=2><!-- Modal --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> </fsDecl> <!-- Feature system for Adjectives --> <fsDecl type = Adjective> <fDecl name = Degree> <fDescr>Range degree associated with an adjective</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Positive --> <sym value=2><!-- Comparative --> <sym value=3><!-- Superlative --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Gender> <fDescr>Range genders associated with an adjective</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Masculine --> <sym value=2><!-- Feminine --> <sym value=3><!-- Neuter --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Number> <fDescr>Range number associated with an adjective</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Singular --> <sym value=2><!-- Plural --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Case> <fDescr>Range case associated with an adjective</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Nominative --> <sym value=2><!-- Genitive --> <sym value=3><!-- Dative --> <sym value=4><!-- Accusative --> <sym value=5><!-- Vocative GREEK ONLY--> <sym value=6><!-- Indeclinable GREEK ONLY --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = InflectionType> <fDescr>Optional Inflection Type Attribute</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Weak flection --> <sym value=2><!-- Strong flection --> <sym value=3><!-- Mixed --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Use> <fDescr>Optional Use Attribute</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Attributive--> <sym value=2><!-- Predicative --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = NPFunction> <fDescr>Optional NP Function Attribute</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Premodifying --> <sym value=2><!-- Postmodifying --> <sym value=3><!-- Head function --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> </fsDecl> <!-- Feature system for Pronoun-Determiners --> <fsDecl type = PronounDeterminer> <fDecl name = Person> <fDescr>Range person associated with a pronoun/determiner</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- First Person --> <sym value=2><!-- Second person --> <sym value=3><!-- Third person --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Gender> <fDescr>Range genders associated with a pronoun/determiner</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Masculine --> <sym value=2><!-- Feminine --> <sym value=3><!-- Neuter --> <sym value=4><!-- Common DANISH ONLY --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Number> <fDescr>Range number associated with a pronoun/determiner</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Singular --> <sym value=2><!-- Plural --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Case> <fDescr>Range case associated with a pronoun/determiner</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Nominative --> <sym value=2><!-- Genitive --> <sym value=3><!-- Dative --> <sym value=4><!-- Accusative --> <sym value=5><!-- Non Genitive --> <sym value=6><!-- Oblique --> <sym value=7><!-- Prepositional case SPANISH ONLY --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Category> <fDescr>Range category associated with a pronoun/determiner</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Pronoun --> <sym value=2><!-- Determiner --> <sym value=3><!-- Both Pronoun and Determiner --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = PronounType> <fDescr>Range pronoun type associated with a pronoun/determiner</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Demonstrative --> <sym value=2><!-- Indefinite --> <sym value=3><!-- Possessive --> <sym value=4><!-- Int/Rel --> <sym value=5><!-- Personal/Reflexive --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = DeterminerType> <fDescr>Range determiner type associated with a pronoun/determiner</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Demonstrative --> <sym value=2><!-- Indefinite --> <sym value=3><!-- Possessive --> <sym value=4><!-- Int/Rel --> <sym value=5><!-- Partitive --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Strength> <fDescr>Range strength associated with a pronoun/determiner in FRENCH DUTCH AND GREEK ONLY</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Weak --> <sym value=2><!-- Strong --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = SpecialPronounType> <fDescr>Optional Special Pronoun Type Attribute</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Personal --> <sym value=2><!-- Reflexive --> <sym value=3><!-- Reciprocal --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = WHType> <fDescr>Optional WH Type Attribute</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Interogative --> <sym value=2><!-- Relative --> <sym value=3><!-- Exclamatory --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Politeness> <fDescr>Optional Politeness Attribute</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Polite --> <sym value=2><!-- Familiar --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> </fsDecl> <!-- Feature system for Articles --> <fsDecl type = Articles> <fDecl name = ArticleType> <fDescr>Range types associated with an article</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Definite --> <sym value=2><!-- Indefinite --> <sym value=3><!-- Partitive FRENCH ONLY --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Gender> <fDescr>Range genders associated with an article</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Masculine --> <sym value=2><!-- Feminine --> <sym value=3><!-- Neuter --> <sym value=4><!-- Common DANISH ONLY --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Number> <fDescr>Range number associated with an article</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Singular --> <sym value=2><!-- Plural --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Case> <fDescr>Range case associated with an article</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Nominative --> <sym value=2><!-- Genitive --> <sym value=3><!-- Dative --> <sym value=4><!-- Accusative --> <sym value=5><!-- Vocative GREEK ONLY --> <sym value=6><!-- Indeclinable GREEK ONLY --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> </fsDecl> <!-- Feature system for Adverbs --> <fsDecl type = Adverbs> <fDecl name = Degree> <fDescr>Range degree associated with an adverb</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Positive --> <sym value=2><!-- Comparative --> <sym value=3><!-- Superlative --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = AdverbType> <fDescr>Optional Adverb Type Attribute</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- General --> <sym value=2><!-- Degree --> <sym value=3><!-- Particle ENGLISH GERMAN DUTCH ONLY --> <sym value=4><!-- Pronominal ENGLISH GERMAN DUTCH ONLY --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Polarity> <fDescr>Optional Polarity Attribute</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- WH Type --> <sym value=2><!-- Non Wh Type --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = WHType> <fDescr>Range degree associated with an adverb</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Interogative --> <sym value=2><!-- Relative --> <sym value=3><!-- Exclamatory --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> </fsDecl> <!-- Feature system for Adpositions --> <fsDecl type = Adposition> <fDecl name = Type> <fDescr>Range types associated with an adposition</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Preposition --> <sym value=2><!-- Optional Fused Prepositional Article Value --> <sym value=3><!-- Postposition ENGLISH GERMAN ONLY --> <sym value=4><!-- Circumposition ENGLISH GERMAN ONLY --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> </fsDecl> <!-- Feature system for Conjunctions --> <fsDecl type = Conjunction> <fDecl name = Type> <fDescr>Range types associated with a conjunction</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Coordinating --> <sym value=2><!-- Subordinating --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = CoordType> <fDescr>Optional Coordination Type Attribute</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Simple --> <sym value=2><!-- Correlative--> <sym value=3><!-- Inital --> <sym value=4><!-- Non Initial--> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = SubordType> <fDescr>Subordination Type for GERMAN ONLY</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- With finite --> <sym value=2><!-- With infinite--> <sym value=3><!-- Comparative--> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> </fsDecl> <!-- Feature system for Numerals --> <fsDecl type = Numerals> <fDecl name = Type> <fDescr>Range types associated with a numeral</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Cardinal --> <sym value=2><!-- Ordinal --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Gender> <fDescr>Range genders associated with a numeral</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Masculine --> <sym value=2><!-- Feminine --> <sym value=3><!-- Neuter --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Number> <fDescr>Range number associated with a numeral</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Singular --> <sym value=2><!-- Plural --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Case> <fDescr>Range case associated with a numeral</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Nominative --> <sym value=2><!-- Genitive --> <sym value=3><!-- Dative --> <sym value=4><!-- Accusative --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Function> <fDescr>Range function associated with a numeral</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Pronoun --> <sym value=2><!-- Determiner --> <sym value=3><!-- Adjective --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> </fsDecl> <!-- Feature system for Unique tags --> <fsDecl type = unique> <fdecl name="interjection"> <fDescr>Range of types associated with interjections</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Interjection --> </vAlt></vRange> <vDefault><dft></vDefault> </fdecl></fsDecl> <fsDecl type = Unique> <fDecl name = InfinitiveMarker> <fDescr>Range types associated with an infinitive marker</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- German marker zu GERMAN ONLY --> <sym value=2><!-- Danish marker at DANISH ONLY --> <sym value=3><!-- Dutch marker DUTCH ONLY --> <sym value=4><!-- English marker ENGLISH ONLY --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = NegativeParticle> <fDescr>Negative particles ENGLISH ONLY</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- full form not --> <sym value=2><!-- contracted form of not --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = ExistentialMarker> <fDescr>Existential Markers</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- English existential marker ENGLISH ONLY --> <sym value=2><!-- Danish existential marker DANISH ONLY --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = SecondNegativeParticle> <fDescr>Second negative particles FRENCH ONLY</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- French pas --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Anticipatory> <fDescr>Anticipatory Marker er DUTCH only</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- er --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Mediopassive> <fDescr>Mediopassive PORTUGESE ONLY</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Mediopassive marker se --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = PreverbalParticle> <fDescr>Preverbal Particle GREEK ONLY</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Preverbal particle --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> </fsDecl> <!-- Feature system for Residuals --> <fsDecl type = Residual> <fDecl name = Type> <fDescr>Range types associated with a residual</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Foreign Word --> <sym value=2><!-- Formula --> <sym value=3><!-- Symbol --> <sym value=4><!-- Acronym --> <sym value=5><!-- Abbreviation --> <sym value=6><!-- Unclassified --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Number> <fDescr>Range number associated with a residual</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Singular --> <sym value=2><!-- Plural --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Gender> <fDescr>Range genders associated with a residual</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Masculine --> <sym value=2><!-- Feminine --> <sym value=3><!-- Neuter --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> </fsDecl> <!-- Feature system for Punctuation --> <fsDecl type = Punctuation> <fDecl name = Period> <fDescr>Range types associated with a fullstop</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Period --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Comma> <fDescr>Range types associated with a comma</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Comma --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> <fDecl name = Question> <fDescr>Range types associated with a question mark</fDescr> <vRange><vAlt> <sym value=0><!-- Value not relevant for a language --> <sym value=1><!-- Question mark --> </vAlt></vRange> <vDefault><dft></vDefault> </fDecl> </fsDecl> </TEIfsd2>

5.2. Sample Mapping Lists for the EAGLES Obligatory Features

The following tables illustrate how a particular set of analytic tags, in this case the CLAWS7 tagset, can be re-expressed in terms of the EAGLES ‘intermediate representation’. In cases where the CLAWS7 tag underspecies, each possible EAGLES value is given as an alternation.

The tables are organized as follows. Each table relates to an EAGLES obligatory feature, within which appear entries for all of the CLAWS tags categorised as being grouped with that feature. These tags are then further analysed in terms of their recommended features.
Table 1. Mapping list for Nouns
ND1N1010
NNN1000
NN1N1010
NN2N1020
NNAN1000
NNBN1000
NNJN1000
NNJ2N1020
NNL1N1010
NNL2N1020
NN0N1000
NN02N1020
NNT1N1010
NNT2N1020
NNUN1000
NNU1N1010
NNU2N1020
NPN2000
NP1N2010
NP2N2020
NPD1N2010
NPD2N2020
NPM1N2010
NPM2N2020
Table 2. Mapping list for Verbs
VB0V00012|31|000
VBDRV2|0|001|2|011|1|2400
VBDZV-20111400
VBGV00029000
VBIV00025000
VBMV10111100
VBNV00026400
VBRV2|001|211100
VBZV30111100
VD0V-3|0|0|001|2|0|011|1|2|31|1|1|000
VDDV00011400
VDGV00029001
VDIV00025001
VDNV00026401
VDZV30111100
VH0V-3|0|0|001|2|0|011|1|2|31|1|1|000
VHDV00011400
VHNV00026401
VHGV00029000
VHIV00025000
VHZV30111100
VMV00011002
VMKV00011003
VV0V-3|0|0|0|001|2|0|0|01|1|1|1|01|1|2|3|01|1|1|0|101
VVDV00011401
VVGV00029001
VVGKV00029001
VVIV00025001
VVNV00026401
VVNKV00026403
VVZV30111101
Table 3. Mapping list for Pronoun-Determiners
APPGEPD1|201|2|00|1|20203
DAPD0|301|200302
DA1PD0|30100302
DA2PD0|30200302
DARPD0|30000304
DATPD0|30000304
DBPD0|30000304
DB2PD0|30200301
DDPD0|30000302
DD1PD0|30100301
DD2PD0|30200301
DDQPD0|30000304
DDQGEPD0|3001|20303
DDQVPD0|30000304
PNPD0|30000120
PN1PD0|30100120
PNQ0PD0|30006140
PNQSPD0|30001140
PNQVPD0|30100140
PNX1PD30100150
PPGEPD1|2|31|2|301|2|00130
PPH1PD33101|6150
PPH01PD31|2106150
PPH02PD30206150
PPHS1PD31|2101150
PPHS2PD30201150
PPI01PD10106150
PPI02PD10206150
PPIS1PD10101150
PPIS2PD10201150
PPX1PD1|2|31|2|3100150
PPX2PD1|2|31|2|3200150
PPYPD20001|6150
Table 4. Mapping list for Adjectives
JJAJ1000
JJRAJ2000
JJTAJ3000
JKAJ1000
Table 5. Mapping list for Adverbs
RAAV121
REXAV121
RGAV122
RGQAV112
RGQVAV112
RGRAV222
RGTAV322
RLAV121
RPAV123
RPKAV123
RRAV121
RRQAV111
RRQVAV111
RRRAV221
RRTAV321
RTAV121
Table 6. Mapping list for Articles
ATAT1000
AT1AT2010
Table 7. Mapping list for Adposition tags
IIAP1
IOAP1
IWAP1
GEAP3
Table 8. Mapping list for Conjunctions
BCLC120
CCC11|40
CCBC110
CSC201
CSAC203
CSNC203
CSTC201
CSWC201|2
Table 9. Mapping list for Numerals
MCNU10000
MC1NU10100
MC2NU10200
MCGENU10000
MCMCNU10000
MDNU20000
Table 10. Mapping list for Residuals
FOR200
FUR600
FWR100
ZZ1R310
ZZ2R320
MFR300
Table 11. Mapping list for Unique tags
UHI Interjection
EXUEExistential ‘there’
TOUTInfinitive marker
XXUXNegative particle
PUQRPunctuation mark (quotation)
PUNRPunctuation mark (non-quotation)

5.3. Some current markup validation practice

In the following list, we summarize claims made by the builders of several of the corpora analysed in Work Package 2 regarding how the encoding of their corpus was validated. The information here is only partial, and has not been reviewed by our informants.
BNC
SGML parser used to validate all markup against the CDIF (Corpus Document Interchange Format) dtd; all tagging errors reported are then hand-corrected. Some semantic validation (on a portion of each text) also performed for errors such as incorrect or missing headings, with limited manual correction. All addition of analytic tagging was automatic. but its syntactic validity was checked again, using an SGML parser. As a separate exercise, a 2 percent sample of the corpus was hand-checked for accuracy of analytic tagging, and the results used to improve the original part-of-speech tagging. (Results of this are not yet publicly available, but are due in 1998).
LOB and Brown
No SGML mark-up used, but structure indicated by means of a simple and automatically verifiable coding. Typographic errors are retained unchanged. Analytic coding performed using similar techniques to those of the BNC.
London Lund Corpus
No SGML mark-up used, but detailed indication of prosodic features using idiosyncratic markup scheme; no information available as to how this was verified.
Penn Treebank
No SGML mark-up used, but detailed indication of syntactic features using idiosyncratic markup scheme;validated by own analytic tools.
ICE
Originally used own SGML-like markup scheme, validated by suite of WordPerfect macros which inserted text unit markup after full stops etc. This system ‘generally ensures that markup symbols are closed, and reminds users to do so should they try opening the same symbol again before closing it.’ Nelson 1996, p 65-66. After developing further software tools to check validity, the project has reportedly converted to an SGML system, but we have been unable to obtain further details of this.
Multext and CRATER
Where applicable, automatic conversion of preexisting header data was carried out. As for primary data in most cases division and/or paragraph level markup of some kind already existed in the texts we received, so getting P and DIV was a matter of conversion or automatic insertion. However, corrections were made by hand to P level markup. Since they were dealing with issues of alignment the accuracy of sentence level (and above) tags was crucial, so, while automatic means where used for as many of the steps as practical, hand-checking was also performed on sentence and above (<p>, <quote>, <div> etc) markup. All texts were parsed against their respective DTDs.
Plato
According to our informant, ‘The corpora were produced all over Europe in various formats and by people with varying amounts of experience and expertise in such work. Many started with a paper text, which was then scanned or even keyboarded. So this was clearly an issue to be tackled, especially since we wanted to align the texts and needed the markup to be not just accurate and SGML-wise correct, but also similar enough to assist the aligner. Parsers (nsgmls/xemacs) were used to check and correct the SGML, and most of the hands-on dirty work was done recently at the workshop in Nancy with Laurent Romary and his team. Most of the TELRI-ers who had prepared texts came along and we had the chance to really check and compare the texts. Some of the texts very initially sliced into sentences using tools that has been developped at our sites and which, being SGML aware can base their work upon an existing [lt ]p[gt ] structure. ’
The Lampeter Corpus
Originally prepared using word processor macros to insert minimal tagging for font changes and some structural features, use of different languages etc. The texts were then converted to true SGML by a combination of automatic and manual means, and have been proof read several times. Correction and validation carried out using emacs, PSGML, SP, and Author/Editor.
ENPC
Validated against the TEI P3 DTD twice, once after proofreading, and then again after alignment to check that the values of the id and corresp attributes are unique and that the value of the corresp attribute points to an existing id in the parallel text. All validation performed by SP; project has developed its own SGML-aware software for further analysis.
UAMSC
Uses SGML-like coding for speaker identification and vocalic effects but not validated during data capture; some subsequent SGML-based analysis and validation.
Helsinki
Uses simply OCP-style markup only; validated only by analytic tools.
MUC
Some use of SGML-style tagging, e.g. for anaphor markup. No formal validation, other than by analytic tools.
Speech Thought and Writing Presentation Corpus
Some use of SGML-style tagging but no formal validation, other than by analytic tools. Tagging all manually added.
PAROLE
Minimal TEI-conformant dtd defined at start of project against which all corpora are eventually to be validated. Considerable variation in encoding practices reported amongst partners, no detailed information currently available.

Up: Contents Previous: 4. Representation of Validation Next: 6. References