Trip Report: Tokyo and Boston, December 1993
The International Conference on Building and Sharing of Very Large-Scale Knowledge Bases '93 was organized by the Japan Information Processing Development Center, with the avowed hope that it will be followed by other similar conferences held elsewhere. Some speculation around the conference held that it was intended as a prelude to a large-scale knowledge base project intended to be a successor to the Fifth Generation and Electronic Dictionary Research projects, a connection subtly conveyed by the keynote addresses, given by by Kazuhiro Fuchi of the Fifth Generation Project and Toshio Yokoi of the EDR.
Most notable to me, in Yokoi's keynote address, was the emphasis on problems like "Making information machine-readable" and "Making information hyperstructured", for which SGML appears to offer an important tool for future work. I was also interested by the stress on reusability and modifiability of the knowledge representation methods used in building large-scale knowledge bases, since many of the concerns expressed mirror those of the TEI. As the conference wore on, I began to realize that for this audience, at least, the simplest way to describe the TEI's product is as a 'text representation language', analogous to a 'knowledge representation language.' (This notion then invites speculation on how the SGML and TEI communities can develop better languages for modeling, querying, and manipulation of texts.)
During the first morning, Norio Fujisawa gave an outline of a Platonic view of 'knowledge' which provided a high standard of clarity and a refreshing background for consideration of what everyone else was talking about. "Plato, however, clearly denies the information stored in a book (or in a computer) to be in itself true knowledge." Most interesting was Fujisawa's objection to the usual subject-predicate, substance-attribute method of expressing propositional content; his remarks could not help reminding me, however, of an imaginary language described by Jorge Luis Borges in which no nouns exist and the only open lexical class is that of verbs, precisely in order to avoid the use of subject-predicate patterns.
There followed a survey of current language technology, in which Makato Nagao gave an informative survey of NLP work, and Susan Armstrong-Warwick spoke of problems involved in the acquisition and exploitation of textual resources for NLP. SAW rather stressed the difficulties of acquiring permissions, and downplayed the fundamental problems of data representation, which rather disappointed me, but otherwise I liked her talk quite well.
A session on "Sharable Knowledge Sources" allowed Antonio Zampolli to list the bewildering variety of European projects aimed a producing same, and Susan Hockey to describe the TEI and its relevance for sharable materials. In the same session, Douglas Lenat spoke about the Cyc project, providing (inter alia) a virtuoso set of reasons for not publishing a lot of papers about a project, including the unanswerable one that getting the project done is more important than publishing papers about it. This endeared him to my heart enough that I was able to forgive him when he, too, radically understated the amount of information and sophistication present in properly done representations of textual material.
The concluding panel included a great deal of wise speculation about the future of knowledge bases, which I cannot summarize if I am to get this report out today. Its most memorable moment, for me, came during the question period, when Hisao Yamada (of the National Center for Science Information Systems of Japan) said everything everyone had said seemed to be floating in the air, because all the techniques they were talking about were suitable for Western languages, but not for scripts which use Chinese characters. Knowledge representation languages, SGML, the TEI were fine for alphabets, but have not addressed the problem of writing in kana, let alone kanji. I had risen to reply, but the chair recognized Prof. Yokoi, who spoke at some length about the fact that the TEI had in fact addressed the character set issue head on, thanks in large part to the work of Prof. Tutiya, and that continued collaboration with the TEI was an obvious desideratum for Japanese work in document and natural language processing, for which funding was being actively sought. Having nothing to add (and indeed not wanting to spoil the moment), I sat back down without speaking.
During the conference, several members of the steering committee met with several representatives of the Japanese research establishment; a memorandum of the discussion at this meeting will be distributed separately.
The next two days were occupied by a workshop on the same topic as the conference, but somewhat less formal and limited to 60 people, instead of the 450-odd who had been attending the conference. There were a number of good talks, as well as some rather disappointing ones, but the most important results of the meeting for me lay in the personal contact made. I believe that at least Tim Finin of Univ. Maryland/Baltimore County, who is working on a language for distributed knowledge querying, now has a stronger interest in SGML and the TEI as a basis for text representation, which should fill a gaping hole in the language he is designing as it now exists. Also notable was the interest in SGML from the database people, including separate invitations from Joachim Schmidt of Hamburg and A. Desai Narasimhalu of Singapore to consider collaborating with them, and/or using their software systems as a basis for work with SGML. Schmidt is developing object-oriented dbms for objects of a polymorphic type system with variable persistence; Narasimhalu and his colleagues have already developed an SQL-based Document Query Language (which was presented at SGML '93) and, in conjunction with Fujitsu, an SGML system built on a dbms foundation.
On the final morning of the workshop, Syun Tutiya arranged for me to have breakfast with several representatives of East Asian countries, including China, Korea, Malaysia, and Thailand. We exchanged compliments and expressions of mutual interest, and I explained a bit about TEI character set handling, stressing that the definition of TEI conformance has no component governing character set usage, so that TEI conformance is possible no matter what character set mechanisms one is using; this seemed to reassure the Thais in particular. ST expressed some interest in getting people from other countries in East Asia to attend a TEI workshop in Asia sometime in 1994, and even suggested that perhaps it should be held elsewhere in Asia, rather than in Japan. Our colleague in Bangkok (Vilas Wuwongse of the Asian Institute of Technology) later privately offered assistance with organizing such a workshop in Bangkok.
From Tokyo I went directly to Boston, where the Graphic Communications Association sponsored its annual SGML conference, and where Lou Burnard and I made a TEI sandwich of the meeting by giving, respectively, the opening and the closing keynote addresses. Lou managed, by focusing on the acccomplishments of the TEI in using SGML, to clarify the direct relevance of the TEI to other SGML projects in a way that previous talks on the TEI succeeded in doing; focusing as they often have done on the peculiarly difficult problems posed by some older texts, some of our earlier presentations clearly left some people with the idea that the TEI was all about medieval manuscripts, and had nothing to do with problems like modularity of DTDs, class systems for SGML elements, version control, and the like.
- Charles Goldfarb and Erik Naggum announced the release, in the first quarter of 1994, of a C++ library of SGML parsing routines and a 'portable object-oriented entity manager' (POEM), implemented by a consortium going under the name of Project YAO. This library should make it easier to embed SGML awareness in processors other than SGML parsers, and POEM should make it easier to use external entities other than files in SGML documents.
- A talk by Dave Sklar of Electronic Book Technologies, and a panel of implementers, on the subject of SGML transformation engines, showed quite clearly that the problem of text manipulation is receiving a good deal of attention. The sample problems, and their solutions using the SGML Hammer (Avalanche), OmniMark (Exoterica), Balise and Polypus (AIS/Berger-Levrault), TagWrite (Zandar), and CoST (the Copenhagen SGML Tool, a non-commercial tool written by Claus Harbo, rather in the style of Lou Burnard's Spitbol-based tf filter program) should all be posted on comp.text.sgml, and may be collected and published in a journal of some kind.
- Word Perfect appeared for the first time among the vendors; alas, I never did see the demo. But they were there.
- Several vendors showed database-oriented projects, most (but I think not all) based on an underlying database technology (rather than on the left-right scanning characteristic of most existing SGML products).
The day after the meeting concluded, the vendor consortium SGML Open held a full day of meetings, which Lou and I attended on behalf of the TEI. According to Yuri Rubinsky, the TEI will be an affiliated organization, which gives us the same rights and privileges as a corporate associate member (i.e. somewhat less than the rights of a corporate sponsoring member, and much more than a simple subscriber). Notably, we will have the ability to participate as members in the technical committees of SGML Open, though not the ability to vote on the adoption of committee reports by SGML Open. I was impressed into service on a working committee to address character set issues, but managed to persuade Steve Edwards of Recording for the Blind to chair the committee, and to persuade the committee to accept chapters CH and WD as representing the TEI position on character sets.
C. M. Sperberg-McQueen