TKSESH
    A hieroglyphic database system
    
    Équipe EC.art, laboratoire LRIA, Université Paris
      8
     
    Abstract
    
      We introduce tksesh, a multiplatform editor, database and
      dictionary software. Tksesh is both intended to be a toolkit to build
      applications for the philologist and an example of such application.
      
      The core of tksesh is a hieroglyphic editor which understands "Manuel
      de codage" encodings. The edited texts can be saved in a database, and
      referenced by the dictionary system, via hyperlinks.  
      
      The dictionary can handle complex definitions by multiple
      authors. Most of the text in the dictionary has precise meaning, not
      only for the reader, but also for the computer. This allows automated
      treatments and possibly complex searches. An important feature of the
      dictionary is that it can contain references to the text database, in
      a readable way, and that clicking on these pops up the referenced
      text. Of course, exhaustive searches in the database text are also an
      option.
      
      The Tksesh system is written in the Tcl/Tk language, which is freely
      available for both Windows, Mac, and Unix systems, allowing it to be
      very portable.
      
    
Introduction
    
      The system we introduce here, called TKsesh, is a multi platform system
      (it works under windows 95, Unix, and should work on macintoshes as
      well) built around a hieroglyphic editor (compatible with the manuel
      de codage), and a database engine. 
    
      We started working on it quite a long time ago, but at that time it
      was a rather secondary work. Yet, it appeared that the system was
      potentially useful. Thus we decided  to develop it further. What is
      presented here is a preliminary version of the software. We look forward
      for opinions and criticisms to improve it.
      
    
Context and goals of the system
 
    
    While working on our computer-science thesis, whose subject was
    automatic syntax analysis applied to middle Egyptian, the question
    of what was to be done with the texts we worked on was left in the
    background. However, we ended up with the idea of an integrated
    environment which would allow to store most information one reads
    and produce while working on a text, to share them, and, most
    important, to retrieve them. The system could also be a testbed
    for the Natural Language Processing systems we worked on.
      The goal of Tksesh is both to help the realisation of a complete
      database of Ancient Egyptian texts, and to be a working tool for
      its user, where one can keep notes, lexical files, and so on.
    
The components of the system
 
    
    Tksesh is currently an integrated system : a number of quite different
    elements bound together in one program. We will now proceed to
    describe these elements, starting with the editor.
    
    The editor
    The hieroglyphic editor is a central element of tksesh;  
    its use will be described here, and  information on its code will be given in the system's documentation.
    
    The editor's primary function was typing texts for databases, not
    for printing. So our primary goal was ease and speed of text
    typing, versus accuracy of printed representation.
     
    
    typing text
    As any hieroglyphic editor, tksesh allows entering the signs by a
    menu or by code. Simple sign grouping can be done by use of the
    Manuel de Codage symbols ":" and "*", which allows fast typing;
    but complex grouping is done by menu. So it is impossible to enter
    incorrect codes in the system.
    The strong point of Tksesh as far as typing goes is that it is
    tolerant about codes. When a transliteration is typed, if the sign
    is not the expected one, a press on the spacebar will propose a
    new sign. For example, if I type "mr", I'll get  . If I want the pyramid-sign (O24), I'll press
    "space" a few times, and get
. If I want the pyramid-sign (O24), I'll press
    "space" a few times, and get  . Next time I'll
    enter "mr", the system will remember it and propose O24 first.
    Even better, when the list of possible signs is exhausted, the
    system looks in the dictionary, for words
    having the said transliteration. Thus, in the present state of
    the dictionary, typing "iw" and spacebar will propose :
. Next time I'll
    enter "mr", the system will remember it and propose O24 first.
    Even better, when the list of possible signs is exhausted, the
    system looks in the dictionary, for words
    having the said transliteration. Thus, in the present state of
    the dictionary, typing "iw" and spacebar will propose :  
  
  
  
  
  
  
  
  
  
  .
.  
    Grammatical informations
    
    Supporting the grammatical codes of the Manuel de codage
    was essential for a system whose primary goal was text
    databases. However, there's a problem with the manipulation of
    these codes. The display, and in general the way a user manipulate
    the text, is cadrat-oriented. Words separations and grammatical
    separations are sign oriented. Hence we have two problems : the
    first is to design how the user will manipulate the system to add
    these informations, and the second is how the system will extract
    words and the like. 
    At the time being, grammatical markers are indicated by sign
    colors. Blue signs indicate word endings, yellow signs grammatical
    markers, and green signs word endings that are also grammatical
    markers (see Figure 1).
    
      
FIGURE 1 : WORDS ENDINGS IN LOUVRE C14 STELA
    
    Currently, the user can only change the status of the last sign of
    a cadrat, typing "/" to make it a word ending, and "=" to make it
    a grammatical ending. If the marking is done at the time the text
    is entered, this is not a problem. However, it becomes one when
    the marking is done a posteriori. In these cases, one has
    to break the cadrats and rebuild them after marking. It is not
    convenient. Next version will include a sign-by-sign navigation
    mode, which will allow to navigate one sign at a time, and thus to
    change the status of the current sign.
  
    
    
      A related problem is the difficulty to cut and paste a
	word. This possibility is highly desirable, since it would
      allow, for example, to add commands like "find the current word in
      the dictionary", enter the current word in the dictionary, and
      so on. To do this properly, we have to 
    
    
      - be able to designate the current word --- and for this, the good unit is the sign, not the cadrat ;
- be able to build a cadrat from parts of a cadrat.
This is not currently possible, but should be soon.References
 As our goal was to build an "intelligent" text
    database system, with hypertext links all over the place, we
    needed a way to refer to particular points in a text in an
    efficient way.  
    
      We thought that readable references, very much akin to those
      used while referring to paper editions, would be fine. This has
      many advantages. First, it allows the references to be used
      outside the base : we support things like "O. DM 1567
      verso, 2". 
      A second point was that long texts are not always entered from
      the first line of the first page onward. In fact, you can start
      typing an interesting part of a text, enter information in the
      base about its contents, and some time later, decide, for
      completeness's sake, to type the rest. Our current system allows
      to explicitly give the position of the current part of the text.
      For example, in Figure 2, the whole
      content of P. L2 wasn't entered. But as the first line is
      explicitly "page 2, ligne 6", references to parts of
      the text will still be exact, even if we type the first pages
      afterwards.
    
      
FIGURE 2 : EXPLICIT REFERENCES IN A TEXT
    
    
     In order to create this system while keeping our files "Manuel
    de Codage"-compliant, we used the comment system. The first lines
    of L2 look like this once saved :
    
    
      ++TKSESH DATABASE FILE+s
      ++NAME Ptahhotep,L2+s
      ++COORDS=page 2, ligne 6+s
      -i-r-wn:n-n:k\-m-s-.-sSm
    
    Our main problem now with this reference system is to make it
    really usable by the end-user. The interface to the
    reference-setting system is probably not very user-friendly. As an
    example, I give in Figure 3, a picture of
    the system for the stela Berlin 1157, from an example file of
    Winglyph. The stela has four zones : three called A, B, C; and the
    main text below, which is not designated by a letter. So the text
    is separated in "lines" (which could be columns), grouped into
    zones. The lines are simply numbered (the value NUM for coord 1),
    and the zones (usually used for pages) are separated into A, B
    etc. Hence the numbering system is A1, A2, B1, C1, 1, 2, etc.
    
    
    
      
FIGURE 3 : REFERENCE CREATING SYSTEM
    
    
    Further needs
 
    
    The main shortcomings of tksesh's editor stand in the domain of
    presentation. Improved cadrat rendering and column handling would
    be nice. More seriously, a font editor is absolutely necessary. We
    have already written one, but it runs only under UNIX, and thus
    can't be integrated in the whole system.
      
      Another interesting addition would be the support of multiple
      reference systems. It might be interesting, for example, to be
      able to chose between a reference in the original source, or a
      reference in an edition (That is, for example, between
      P. Leyde I 350 verso 13 and KRI II,813,3).
    
The dictionary
    
    Introduction
    When we decided to transform tksesh into a work environment for
    studying texts, the need for a linked dictionary arose naturally.
    In the first version of the dictionary, entries were quite simple
    : three fields : "transliteration", "spelling", "translation", the
    latter being free text. We included also the possibility to add
    hypertext references to the text database. 
    
      
      However, a real dictionary entry is something both complex and
      very structured. Entering it as free text is not a very good
      option, because the structure is lost to the computer. The human
      reader might be able to reconstruct it, the system won't. Many
      automated processing that would be possible with a well
      structured lexicon are then impossible.
    
  
      On the other hand, giving a very precise and rigid form to the
      dictionary would also be a problem, because it would force an
      artificial structure on all definitions.
    
      
      Last, but not least, the structure proposed should be
      extensible, but no extension should break existing data. 
    
      Hence the structure we are going to describe now. This structure
      is supported by the editor built in the dictionary, which
      prevents unstructured entries to be made. It allows to enter
      many different style of dictionary entries, while keeping the
      maximum amount of structure information. We tested it by
      entering definitions from GARDINER's lexicon, FAULKNER, HANNIG,
      and of P. WILSON's A Ptolemaic
	Lexicon.
    
    Structure of the dictionary
      The fields in the dictionary are of roughly three types : base
      fields, which contain one type and only one type of information
      (for example, a transliteration), complex fields, that can
      contain mixed information (for instance text in transliteration
      and hieroglyphs), and the group and comment fields.
    The group field
      The group field is the main organizational device of the
      dictionary. Groups can be nested, to represent sub-meaning of a
      words, derived words, and so on. The basic point is that if a
      group contains multiple fields of the same kind, let's say
      multiple transliterations, they are supposed to be variants. In
      the case of translations, this would mean near-synonymous
      meanings.In Figure 4, all spellings for
      ip.t are supposed to be equivalent. In a likewise case,
      the completion system described above
      should be able to propose all these writings. 
      
    
      
FIGURE 4 : REPRESENTATION OF VARIANT SPELLINGS
    
    
    
      When some important information (transliteration, spelling, for
      example) is not available in a group, it is supposed to be
      inherited from its parent group.
      Let's take, for example, the entry for Awi
      (Figure 5,). The groups (indicated by
      french quotes << >>) delimit a
      number of new entries. The first one is for the adjective-verb
      meaning be long, which has the same transliteration as
      the head word, but different determinatives. It inherits its
      transliteration, but changes the translation and the spelling.
      
      Then comes the expression "ib=f Aw". Expressions and composite
      words are a tricky problem, whose representation might need some
      improvement. At the time being, we have a number of tags that
      can be used in expressions to represent the currently defined
      word, an animate, or an inanimate. The other words might be free
      text, or explicitly transliterations of words (in which cases
      they are indexed. For instance, in the current case, a search
      for the word "ib" will retrieve this definition).
      
    
      
FIGURE 5 : COMPLEX DEFINITIONS
    
     The comment field
 
    In some cases, a dictionary
    definition can be a true little monograph on a word. In this
    case, the dictionary entry structure is not very efficient. This
    is the reason for the comment field's existence. It is there for
    anything that can't fit in a definition. Many fields can appear in
    it, like in a true little text editor.
    References
 References are hypertext links to the text
    database. They are readable by the human reader, like in the
    example on the right, taken from Amenemope.  These links are made quite
    easily, by using the "copy reference" menu option in the editor,
    and pasting the reference in the dictionary. Afterwards, a click
    on the reference will load the text at the proper place.
 These links are made quite
    easily, by using the "copy reference" menu option in the editor,
    and pasting the reference in the dictionary. Afterwards, a click
    on the reference will load the text at the proper place. 
     Signature
 An important feature for further use of the
    system will be the possibility to share texts and dictionary
    entries, and conversely to identify the author of these
    entries. The Signature field is supposed to be used for this. A
    further development would be to fill it automatically -- at least, at data exchange time.
    Indexation and search
 The system builds an index for each
    dictionary entry, into which it writes references for each
    transliteration, spelling, and translation. Any field of these
    types in dictionary entries is indexed, so even sub-definitions are entered in the database.
    
      An important practical point is that text is treated to ease the
      search. For example, the hieroglyphs are save as Gardiner code,
      no matter their original form, and only the list of signs is
      saved. So, someone looking for "p*t:pt" and typing "p:t-pt" will
      find his word. The transformation could even be improved by
      suppressing redundant phonetic complements and the like, but this
      is future work.
    
     Transliteration are also simplified for searches : all 'j' are
     made into 'i', all points suppressed, etc. Note that this is only
     made at search time. Any point entered in the dictionary entries
     will be retained.
    
Extensions
 
    
    Looking at the current state of the
    dictionary, we see a possible generalization : the same mechanism
    can be used for freer text, for example for notes and the like. So
    we intend to reuse the code for the dictionary to allow the
    edition of general notes, which will benefit from the indexation
    mechanism of the dictionary.
    
      Another interesting extension would be to add an reference
      system to parts of the dictionary. This would allow referencing
      a definition in another one. It would be very interesting in
      expression definitions, as it would allow to reference in a
      precise and explicit way the words which appear in the
      definition. 
    
The transliteration and translation editor
    This facility will be a central working point of the system once
    finished. What we present now is just a model. It allows parallel
    edition of the text translation and transliteration, which can be
    saved separately. The advantage is that this allows multiple
    translations to be edited for one text.
    
      
FIGURE 6 : TRANSLATION EDITOR
    
    In the editor, all texts are synchronized : the edited line is
    always displayed in translation, transliteration, and
    hieroglyphs. We worked a little with the model, and it seems to be
    quite suitable. We linked it with the dictionary, and it is now
    possible to look for the word selected in the hieroglyphic window.
    
    
    Search facilities
    One of the most important facility a database can provide is the
    possibility to retrieve the information it contains. So our base
    allows to find a words (given in transliteration) in the
    texts. For texts which have been manually transliterated, it
    should look in the man-made transliteration, because it's
    supposed to be accurate. But (as we'll see later), we have a
    automatic transliteration program that produce a rough
    transliteration. It's quite fast on a basic Pentium computer, and
    can be used if no time is available to write a
    transliteration. The search made in this case is not complete nor
    sure : some occurrences might be lost, and some words founds can
    be errors. Yet, it can give a fast initial working base, and it
    should improve with our transliteration system.
    
    In Figure 7, we have the result of the search for the word
    Axt, and an example of a solution. 
    
    
      
FIGURE 7 : SEARCH RESULT
    
    Natural Language processing
 
    
    The system includes (currently only in its UNIX version) a prolog
    interpretor which allows us to use natural language processing
    techniques more easily. We had sketched in [ROS94] a transliteration system. After having
    worked more on syntactic analysis problems, we had a student,
    L. KERBOUL, working on the subject .  [KER97]. As the result was interesting, we
    decided to work again on transliteration, and if possible, to end
    up with a usable system. A detailed technical description does not
    fit here; let's only say that the basic principle is still the one
    described previously, but the performances have much improved. The
    main problem now is word cutting, which is a difficult
    problem. For this, the best solution would be an interactive one,
    the system proposing word-cuttings, and the user changing them. It
    is, interestingly, the solution used by some in editing Asian
    language (an example for Thai in MCB97)
    Further developments
    The developments axis of the system will be :
    
      - improvement of the editing system
- improvement of the natural language processing system -- ultimately with the incorporation of grammatical information
- creation of  an exchange module --- to allow easy information sharing and exchange, using disks or even the net.
Conclusions
    The system I've just described is still in his infancy. Its
    foundations are however quite sound, and it works well. Now what it
    needs is users, to make it live, to propose friendlier interfaces,
    and useful additions.
    Appendix
    
    
    
    TCL and TK
    TCL/TK is a programming language developped by John
    OUSTERHOUT at Cambridge University (USA) and then
    in the SUN research department. It is a simple, powerful,
    cross-platform language, specially designed to be embedable in
    programs written in compiled languages like C or Pascal. Tksesh is
    based on a number of extensions, written in C, to TCL/TK. It runs
    under UNIX and Windows 95. Due to the flexible nature of TCL, it
    is possible to use tksesh to write little application that would
    need to display hieroglyphs.
    Availability
    The system will be available free of financial charges, as "textware":
    if the system is of some use to you, please contribute some texts. It
    would be definitly better to ask which texts are needed before sending
    them.
    
    At the time being, the software is quite young, and has had very
    few users. For this reason, I don't release it by simply putting
    it on a ftp server. You will have to register first. The details
    about obtaining the system will be available by next autumn on 
    http://www.iut.univ-paris8.fr/~rosmord/EgyptienE.html.
    References
    
    
      - BIL95 
	S. BILLET, 1995,
	
	  Apports à l'acquisition interactive de connaissances
	  contextuelles
	
	Thèse de doctorat de l'Université Montpellier II
	(another approach of automated translitteration).
      
- KER97 
	F. KERBOUL, 1997,
	
	    Translittération automatique des hiéroglyphes
	
	Rapport de stage de l'ENSTA
      
- MCB97 
	S. MEKNAVIN, 
	P. CHAREONPORNSAWAT and
	B. KIJSIRIKUL, 
	1997,
	
	  Feature-based Thai Word Segmentation
	
	In Natural
	Language Processing Pacific Rim Symposium 1997, Phuket,
	Thailand
      
- ROS94 
	S. ROSMORDUC, 1996,
	
	    Traitement automatique du langage naturel en moyen égyptien 
	  
	In Robert  VERGNIEUX, editor, 
	  Xieme conférence Informatique et Égyptologie
	  
      
- ROS96 
	S. ROSMORDUC, 1996,
	Analyse morpho-syntaxique de textes non ponctués
	Thèse de doctorat, École normale supérieure de Cachan,
      
Serge Rosmorduc