Preliminary Proposal to Encode the Old Sogdian Script in Unicode

Source:  Script Encoding Initiative (SEI)
Author: Anshuman Pandey (
Date:     14 April 2015 

1 Introduction
This is a preliminary proposal to encode the ‘Old Sogdian’ script in Unicode. A discussion and typology of the various ‘Sogdian’ scripts and the requirements for encoding Old Sogdian and its descendants is provided in the forthcoming “Roadmap for Encoding Sogdian Scripts in Unicode”.
This document provides a brief description of Old Sogdian, its character repertoire, and specimens of the script. The code points shown in the code chart and names list are based upon the current allocation for a ‘Sogdian’ script in the Roadmap to the SMP; they are only tentative and the assignments may change. The representative glyphs are also illustrative and are not intended to be normative or typographically aesthetic. Provisional character properties are also included.
The proposal author seeks feedback from scholars regarding the proposed characters and representative glyphs. Issues requiring further discussion are enumerated in section 6. The information presented here may be incomplete and may change as more information on the script is obtained. Research on Old Sogdian is ongoing and a formal proposal to encode it is forthcoming.

2 Background
The Old Sogdian script appears in manuscripts and inscriptions dated between the 4th to 7th centuries 􏰀􏰁. The earliest manuscripts containing the script are known as ‘Sogdian Ancient Letters’ (see figures 1–5). These paper documents were found in 1907 by Aurel Stein in Dunhuang, now in Gansu province, western China. Based upon internal evidence it has been suggested that the ‘Ancient Letters’ were written in 312–313 􏰀􏰁 (Sims-Williams 1985). A script similar to that used in the ‘Ancient Letters’ appear upon hundreds of rock carvings in the Gilgit region of Pakistan. The ‘Upper Indus graffiti’ have been dated to the 4th–7th centuries 􏰀􏰁 (Sims-Williams 1989, 2000; see figures 6, 7).

3 Script Details 3.1 Structure
Old Sogdian is an abjad that is written from right to left. Letters retain their basic shapes in different positions within a word, but a few letters have distinctive word-final shapes. As such, Old Sogdian is structurally a non-joining abjad, similar to Hebrew. The available sources show instances where letters are connected, but such conjunctions result from the regular flow of writing or from cursive practices rather than any intrinsic conjoining behavior of the script, as is the case with Arabic, Mongolian, etc. Similar to other abjad systems, vowels in Old Sogdian are represented using aleph, yodh, and waw.

3.2 Script name
There is no standard name for the script of the ‘Ancient Letters’. The catalogue of the International Dun- huang Project at the British Library refers to it as “Sogdian” and does not differentiate between the varieties of the script grouped under this designation. Skjærvø (1996) refers to the script as “Sogdian Aramaic”. The tentative identifier for the script block in Unicode is ‘Old Sogdian’. This name was chosen because it differentiates the script from the ‘Sogdian’ script proper.
3.3 Character names
The names of Old Sogdian characters are tentatively based upon analogous Unicode names for letters of the ‘Imperial Aramaic’ block. The sort order is identical to the encoding order.
3.4 Character Repertoire
The repertoire of Old Sogdian letters is based upon that of Aramaic, but it contains 20 letters as opposed to the original 22 of the latter. It lacks distinctive letters that correspond to Aramaic teth and qoph. The actual number of distinctive letters may be fewer, considering that daleth, ayin, and resh may be represented using a single letter; the same may apply to zayin and nun.
Old Sogdian glyph shapes are quite uniform, as observed in the ‘Ancient Letters’. The glyphs clearly reveal their Aramaic origins, but several changes in the shapes of letters are noticeable between the two scripts (see table 5). Some glyphs for Old Sogdian letters resemble those of Parthian letters, but they are used for different letters between the two.
Distinctive numerical signs are attested for one, ten, twenty, thirty, one hundred, and the fraction one-half. An Aramaic heterogram is used for the thousands. Ten thousand is expressed using number words.
Punctuation marks are not used in Old Sogdian.

To read the full proposal, click HERE

