These pages are devoted to the discussion of a new Manuel de Codage for Egyptian hieroglyphic texts in XML. We want to end up with a decent, extensible, clean, standard, that would be usable by all programs that deal with hieroglyphs. Contributions are welcome.
To contribute, subscribe to hieroxml (send a mail to hieroxml-request@iut.univ-paris8.fr with subject "subscribe" or "subscribe you@your.address).
You can get help for hieroxml simply by mailing a request for help (subject should be "help")Temporary document. Date : Thu Mar 2 13:19:25 CET 2000
But the Manuel is getting old. For quite a long time now, nobody has been really using the manuel as such. Instead, people are using extensions proposed by the different hieroglyphic typesetting systems, like Winglyph or Macscribe. Their format are extensions of the old Manuel. They are needed to address fine typographical points, like sign positioning, or to correct a number of weak points in the original manuel, like hatching.
This would be fine if these extensions were compatible one with another, which is not currently the case (but should be fixed in next version), and if the Manuel was easily and logically extensible, which is not really the case. The first problem is a serious one, even if it get fixed one day : it makes program development difficult if one has compatibility in mind.
There's one thing the current Manuel is fine for : hand-made encoding. The manuel allows rather terse representation of a text, and for simple things, that's ok. For communicating a simple hieroglyphic text by the way of ASCII codes, you can't compete with the Manuel. It's a strength, but also a weakness. Hand encoding means that errors are made. It's like writing a computer program : here and there, you'll forget something. The problem is well known for web pages : most HTML code on the web is broken. Web browsers have to include error correcting system to deal with these broken pages.
This causes two problems : first, it's difficult to write a good error correction. Second, with a good formalization, it's quite likely that two different programs will have the same idea of the "meaning" of the same correct text. But for badly coded text, such a agreement is impossible to achieve.
So, if we want to have an encoding of hieroglyphic texts which
It was chosen for a number of reasons. First, it's easy to extend an XML format. Second, it's easy to parse an XML file, an there are a lot of tools for it: people will be able to manipulate XMLMCD files without being graduate in Computer Science. Third, XML is being used for a growing number of applications --- for instance web browsers. Fourth, there's a user community for XML in the philological world : two interesting examples are the Text Encoding Initiative and the recent conference on XML and Ancient Near East.
Let's illustrate these points. In the current MCD, data about an individual sign is scattered around it. Look for example at :
=A1\\r1 -iIt means "Sign Gardiner A1", as both grammatical and word ending, reversed, rotated. fine positional data, colour data, and more are hard to add. On the other hand, the current proposal would represent the same sequence as
<hieroglyph code="A1" gramend="y" wordend="y" rot="90" reversed="y"> <hieroglyph code="i">Of course, it's much longer. But The format is not supposed to be directly manipulated by humans, so it's not a real issue. The important point is that it's possible to add data to the signs without breaking the whole encoding.
In particular, Hans van den Berg, from the CCER, has also created a format for Winglyph 2.0. He presented it at the conference mentionned above.
The goal is not to propose two standards, but to start a dynamic to improve the possible ones.
In any case, it seems difficult to document a scene without stating which text refers to which scene element. A standard way of referencing would also be interesting here.
The point is that the representation of hieroglyphic text should not follow this principle. What we want to represent first is the original document. Of course, it's not possible to achieve a completely trustful electronic representation (in this case, a scan would be better).
On XML, the best starting points are the W3 and Oasis sites.