Acta Lingweenie

Monday, June 6, 2011

Kahtsaai: Devising a Practical Orthography

All of my conlangs that do now or ever have existed are written in the Latin alphabet. I have from time to time tried my hand at inventing scripts, but the results are never satisfying. One of the first attractions to me about foreign languages was not the languages themselves, but the writing systems. I gave myself an intense early education in calligraphy in several scripts, which makes me a harsh judge of invented writing systems. I rarely find a conscript beautiful, or at least harmonious, and this applies doubly or triply so for my own. So, I'm stuck with Latin.

All my early languages aimed at a phonetic representation. Thus I was rather shocked the first time I encountered Dirk Elzinga's wonderful Tepa, which spells things like [tuɣu] as tuku and [yɨška] as yɨyka. But now that I've spent a lot more time staring at Native American languages — including plenty in the Uto-Aztecan family, which seems to be the inspiration for Tepa — I've come to appreciate phonemic writing systems a lot more. Changes in my habits of language construction drive this somewhat, too. So, here's an account some of the considerations that went into settling on the Latin orthography for Kahtsaai.

The Vowels

Here's the Kahtsaai vowel inventory:

i [i]	ii [iː]
e [ɛ]	ei [eː]			o [o] [ʊ]	ou [uː]
		a [a]	aa [aː]
			aai [aːɪ]

The first issue I had to deal with is tone. I'm very fond of tonal languages — more fond than typology would warrant — but there it is. The only practical way to indicate tone is with diacritics.¹ Since I stick with simple two- or three-tone systems, this is easy. In a two-tone system I use á for a high tone and no accent for low, and for a three-tone system á high, a mid and à low.

However, once I decide to use tone, I'm only really left with one option for long vowels, something else I'm fond of. In a non-tonal language, I use the acute accent for a long vowel. But, since I've already grabbed that diacritic for tone in Kahtsaai, I simply write the vowel twice to indicate length, a and aa, etc. (In the ancient times of ASCII-only terminals, that's how I always wrote long vowels.) In theory I could combine diacritics, and put accent marks above macrons, but I find that difficult to read and a real pain to write legibly or type. In Kahtsaai, each mora of a long vowel may have its own tone, leading to tone contours on long vowels, káar to save, to preserve having a falling pitch.

You will also note that the mid vowels aren't marked long in the same way. Phonemically, e and ei are just short and long versions of each other, but there was such a significant quality change that I decided to write them differently. This does work out in the phonological processes of the language. Noun stems that end in vowels lose a single mora at the end when they are incorporated. So, the noun kopi water becomes just kop- when incorporated, and éi tree has the incorporation form é-. This pattern also motivates the spelling of the single, long diphthong as aai. When final, the moraic reduction results in -aa, as in taraa- from taraai health, condition, status, weather. I think the switch from aai to aa conceals the stem less than a spelling change from ai to aa. The extra reminder that this is a long vowel diphthong doesn't hurt, either.

Finally, the phoneme /o/ has two realizations. In open syllables it is [o], in closed it is [ʊ]. The morphology of Kahtsaai ensures that underlying /o/ in a single root presents itself in both shapes frequently. For example, using the verb -wo to eat, te'ewo I ate it has no evidential due to the first person subject, and is pronounced [tɛ.ʔɛ.wo]. With the direct evidential, -ts, we get yonwots she ate it [jʊn.wʊts].

The Consonants

The consonants of Kahtsaai are much simpler. I decided not to follow the Americanist tradition of spelling /ts/ as "c", and just use ts. At morpheme boundaries t + s results in tss, so no ambiguity about stem boundaries arises from using this digraph. Since Kahtsaai allows coda stops, this could have become a minor problem.

Before voiced resonants (l r) or glides (w y) the stops (which includes ts for this discussion) are pronounced voiced. This change is not represented in the practical orthography, [kid.ɾa] to tame, subdue is spelled kitra. Again, this choice is motivated by not wanting the basic stem to be concealed in writing every time a new morpheme was added. Besides, the change is 100% predictable.

_____
¹ Ok, some languages use what look like coda consonants to mark tone instead of actual syllable codas. Hmong, especially, comes to mind. But I tend to favor moderately complex syllables, with actual coda consonants, so that could get very confusing.

Thursday, June 2, 2011

A Little Kahtsaai

I've been churning through sketches and modifications in the last year, resulting in the current rather full language, Kahtsaai. A lot of the work is based on Bixwá, which in turn was the outcome of several sketches. It became clear that Bixwá was getting cognitively unwieldy for my purposes, so I stepped back. I generalized some of the ideas a bit. In particular, I ditched the instrumental prefixes in favor of full-on noun incorporation, with instrumental significance one use available for that (Mithun's type IV NI). This cleaned things up a bit.

I dropped case marking altogether, with one marginal exception. Semantically inanimate nouns are marked when they are the subject of a transitive verb. The verb subject prefix for an inanimate noun is also different. So, in both case marking and verb conjugation, inanimates follow an ergative alignment (mostly), while animates are nominative-accusative:

he-nop

3IN-fall.over

it fell over

kí-tá-nop-im

3IN.TRANS-1SG-fall.over-CAUS

it knocked me over

The language is far enough along that I can complain about the recent weather and environmental conditions:

Ááni	tá-wime	he-tsaaiki-koh	to'pe-yo-se'á
lately	1SG-eye	3IN-itch-INST.APPL	spruce-LNK-wind
lately my eyes have been itching from allergies

Noun-noun compounds have a link syllable joining elements (an idea probably most recently inspired by Coast Tsimshian). Incorporated nouns are abbreviated in various ways, most regularly, but a few have particular incorporation stems. So, I could have rephrased things a bit:

Ááni	tei-wim-tsaaiki-koh	to'pe-yo-se'á
lately	1SG-eye-itch-INST.APPL	spruce-LNK-wind
lately my eyes have been itching from allergies

Notice that the incorporated noun, wime, has been reduced to just wim-. You will also see that Kahtsaai has an instrumental applicative to bring in a new argument. There is also a benefactive applicative, as well as a fossilized locative applicative that is not freely productive.

So far I have omitted evidential marking, which is usually marked:

tówaar	mós	heweitaraa'ánméín
tówaar	mós	he-wei-taraai-án-mé-n
meanwhile	tomorrow	3IN-very-state-hot-FUT-EVID
it's supposed to be very hot tomorrow

Here we have a hear-say evidential, somewhat merged with the future marker (Kahtsaai is usually aspect obsessed, not marking tense except for the future). The discourse particle tówaar marks a discourse break, especially a change in topic.

Sunday, May 15, 2011

Addicted to Dependency-marking

The cycle of revisions I've been working on in the last year and a half or so is winding down to a fixed set of features that I really like. But I have found there's one thing I've had a hard time giving up: case marking. A hefty chunk of what I'm aiming for is inspired by various areal features of North American native languages, where case marking (and dependency marking in general) is not exactly common.

Removing cases gives me deep anxieties, even though I know intellectually a language is perfectly capable of working fine without them, even if you have a nonconfigurational syntax. I spent part of today working through the behavior of applicatives, and have finally reassured myself multiple objects without overt marking can work just fine. Thinking about reasonable discourse situations, rather than concocted grammar puzzles of the sort one finds in old Latin textbooks, is a better guide to where real ambiguities can arise.

Monday, March 7, 2011

Wardwesân

In January of this year a Frenchman working under the pseudonym Frédéric Werst had a book published, Ward : Ier-IIe siècle. I first heard about it at Language Hat. It's a historical anthology of the Wards, an invented people, in an invented language, Wardwesân. I figured my library of conlang grammars (Láadan, Esperanto, Klingon) could use some Gallic company, so I ordered the book. It finally arrived, and I thought I'd give a quick overview of the language a bit. I haven't yet had time to read the main body of the text too much — my French is rusty, and literary French is much harder going for me than technical French — but I'll grab a few examples from it for analysis.

Sound System

The vowels are a, ā, e, ē, o, ō, i, y and u. Long ā has regional variantions, pronounced either [œ] or, apparently, as a long /a/ (ou commme un â français très marqué). For /e/ and /o/ sounds the ones with the macrons are tense, those without are lax, e [ɛ] ē [e], o [ɔ] ō [o]. The y is IPA [y], but is also usable as a consonant, [j]. The diphthong ae is [aɪ].

These consonants are as in French: b, d, j, k, l, m, n, p, ph, s, t, z; g is always hard, [g], r is "always rolled;" sh is [ʃ], kh is [x], gh is [ɣ], th is [θ], zh is [ð] (really!), x is [ts] and xh is [tʃ], q is [q], jh is the ich-laut, [ç].

Word accent is always on the first syllable.

There is no discussion of phonotactics or phrase accenting.

The Noun

Much loving attention was devoted to the morphology of the noun.

Most nouns do not have overt gender marking; inanimates never do, animates may, -a for masculine, -e for feminine: westa "king", weste "queen." There are some other minor patterns.

Plurals got quite a lot of work, though not all nouns get plural forms.

First, there is the common class of internal plurals which show some vowel change: gan "night" gaen, mazira "pheasant" mazōra. These seem quite common.

Second, some may have an internal vowel alteration, with or without additional change: thanor"pond" thōnar.

Third, some take a prefix al- or ar-: karz "child" alkarz. This may also involve vowel changes: barw "name" arbyrw.

Fourth, some take suffixes or other word-final alterations: rame "sister" rameth.

Fifth, nouns in -ael may form a plural in -aldon: zagael "young man" zagaldon.

Sixth, nouns starting in o- often have a plural in wo-: ora "plant" wora.

Finally, many nouns, abstract nouns especially, have no proper plural at all. But, if one is really needed, the postpositive particle amōn may be used.

Werst had fun with these many plural forms, with a number of lexical items with the same singular having different plurals: gem, gemazhan "cheek" (actually an example of the dual, -zhan) but gem, argym "cloud."

Vowel alterations are not only used for marking the plural. Agent nouns will convert an /a/ in the first syllable to /e/, and an /e/ in the first syllable to /o/: merwān "to manufacture" morwa "author."

There appear not to be adjectives, but nouns of qualification, which are joined to nouns with the appositive particle ab (which reminds me a lot of Persian ezafé). So, mega "something new", but mega ab magha "new god".

The Verb

The verb system is rather simple. There is a small collection of co-verbs which encode person and time, and participles which encode number. There are three active participles, and these are mixed and matched with co-verbs to produce quite an array of tense forms.

The co-verbs are regular: present wena (1st person), wega (2nd person), weza (3rd person masc.) and wetha (3rd person fem.), with er- replacing we- for the perfect, me- for the imperfect and wa- for the subjunctive.

The participles are -an (pl. -anōn) for present, -azan (pl. -azanōn) for past and -agō (pl. -agōn) for future.

A full conjugation for arbān "to write":

	Singular	Plural
1st	wena arban	wena arbanōn
2nd	wega arban	wega arbanōn
3m.	weza arban	weza arbanōn
3f.	wetha arban	wetha arbanōn

This strikes me as rather regular, but an interesting way to split the load. There is, however, a small number of verbs which have irregular — and fully conjugated — perfects (which I will not give here).

There is no passive conjugation, though there are passive participles in -ēnd, which are joined to their noun with ab: kamazh ab arbēnd "a written book."

There is a "gnomic" tense, which ends in -aoth. It has neither tense nor number, and indicates general activities, arbaoth "one writes, people write, it is usual to write." The ending may alter the final consonant of some stems.

The imperative is in -ax (pl. -axō): jarān "to come" jarax, jaraxō.

Prepositions

Prepositions have two forms, strong and weak. It is easiest to say that the weak forms are used whenever the governed noun is modified by another noun, and the strong in all other situations. For example (in order of weak then strong) az, azōn "with". First, strong forms:

azōn yarn "with a friend"

azōn yarn nēs "with my friend"

azōn yarn ab Xamōn "with (my) friend Xamôn"

Weak forms:

az Xamōn yarn "with Xamôn's friend"

az warma yarn "with a sick friend"

Note especially the last form: it causes the qualifying noun to act more like what we expect with an adjective, rather than the warma ab yarn we'd otherwise expect.

Particles

Indeclinable particles do a number of syntactic jobs. As we have seen above, there is the appositive particle ab. The genitive relation is handed with tha, as in barzha tha qaman "the destruction (qaman) of the building."

The particle zha means something like "which has," which can also be used to create the effect of an adjective, mazaraon zha kazhar "a certain dating" (kazhar "certitude").

Various combinations and particles are used for verb aspect (which I'll omit here).

Et Cetera

There are several particle combinations with na which combine evidentiality and judgement, na zant for opinion from experience ((se) dire), na qant to express certainty, etc (these are a little regular for my taste, all in C-ant).

There are several kinds of article and demonstratives.

The grammar as a good (though not huge) section on the syntax of the language, which I will omit here.

Small Examples

A sentence grabbed at random from the text part of the book.

Na warzawēr nama aw aexeth ren ab arkan em aw bamastan baratha ab eman. (Il et évident à tout le monde que l'oeil a pour fonction de voir, et pour utilité de regarder, p. 215).

na, nāz, "in the function of; according to"

warzawēr "all, everyone"

nama "evidence, spectacle"

aw (also ā, ō) aspectual particle, quant à

aexeth "use, utility"

ren "act, action"

arkan "vision"

em, emzhan "eye"

bamastan "utility, service"

baratha "action, movement"

emān "to see," here in the present participle.

More interesting, for the verse of the Wards he picked a system of assonance and alliteration (a few lines, untranslated):

dura meth math derw mez denan

ukan gōn garth ag urben ganta danagh

dwan jaen jar jarga darnan

zeman nāz naen nāz naba zeran...

Monday, February 14, 2011

A cute little word

I have in the last few months been spending more time working on natural languages. In particular, a treasure trove of documents on Uto-Aztecan languages has been interesting. Unfortunately, the English translation of Michel Launey's "Introduction to Classical Nahuatl" keeps having its release date pushed back. I look forward to getting my hands on that. Most Nahuatl textbooks in English currently available make the old-fashioned philological approach I learned Greek with look like progressive language pedagogy.

As an experiment, I started working on a new language back in October, not using my normal methods of notebook-then-webpage, but straight into LaTeX. The results certainly look more impressive once the major sections start to fill out. The cross-referencing is nicer, too. I think I need to work on a few more style tweaking macros. In any case, the language, Tsariku, started off as a cross between inspiration from Uto-Aztecan languages and ancient Greek. However, it has evolved somewhat from there. I realized last week that I had snuck in a variant on split-ergativity, with the split working along animacy. Inanimate subjects of transitive verbs get a case marker, -s, but are unmarked as the direct object of a transitive verb or the subject of an intransitive.

aiku-s tsi-nepá-n
this-ERG 3IN-hurt-1SG
This hurt me.

aiku ni-nepá-h
this 1SG-hurt-3IN
I hurt this.

sé tsi-lemya aiku
not 3IN-function this
This didn't work.

Note that the conjugation, obligatory for both subject and object in transitive verbs, is still nominative-accusative alignment.

In any case, a tasty little tidbit of vocabulary. A noun recently concocted is kwehtsa, fear and uncertainty in response to sudden and uncertain social or political developments. This is less interesting than a recent compound, kwehtsulatú the sudden hush that comes over a conversation when an unexpected person approaches because one is uncertain of their loyalties.

Tuesday, October 19, 2010

If Zamenhof had been Cree

In the last few days there has been a few posts on the conlang-l list about conlangs based moribund or dead languages. Since Native American languages were named, it reminded me of thoughts that have rolled around in my head from time to time over the last year or so.

I don't really deal in auxlangs, but it a moment of musing it occured to me that the only real chance one has of being widely adopted in the U.S. is if Native Americans one day get sick of conducting their inter-tribal business in English and decide they need something else. No existing Native language would probably really work for several reasons. First, an auxlang should be a lot easier than a natural language to learn, and there not a single Native language easy enough to reasonably fill the auxlang role. Second, there would be political problems — there are standing tensions between some tribes, some of which go back very far indeed. For example, there are probably very few Hopi or Navajo who would be willing to learn each other's language (you can google their land dispute on your own).

I'm not actually going to concoct a North American Native auxlang, but I offer here some of my thinking about how one might go about such a thing, with a few hints of what this might look like were I to do so.

Easier — not Easy. Any NAN-auxlang would have to take into consideration the fact that large numbers of Native Americans are now monolingual English speakers, but I don't think a goal should be to make the language familiar to speakers of European languages. There are very widespread areal features in North American languages, and as much as possible these should be drawn on in creating the language. I would, for example, include regular conjugation of verbs, very probably by prefixing. I've been using WALS to verify commonalities in N.A. languages.

The Stock. Taking inspiration from Lojban, as a practical matter I'd draw on the largest language families — Na-Dene, Algonquian, Uto-Aztecan, Siouan and Iroquoian. The Salish family might belong in there, too, as well as Kiowa-Tanoan and Muskogean. For the many isolates, we'd have to rely on areal similarities.

Phonology. A simple, 4 or 5 vowel system. I would, sadly, omit tone, and, more happily, nasalization.

For the consonants I would at the very least include /p/ /t/ /ts/ /k/ /s/ /n/ /m/ (maybe only one nasal) /w/ /y/ /ʔ/ /h/ (which could be [h] or [x]) and /l/. I would probably include /ɬ/. That's less common the further east you go, but occurs even in the Muskogean family. I was prepared to omit ejectives at first, but now I think I'd consider including them. Thanks to Na'vi, I know that most L1 English speakers can learn them pretty easily, and they really extend pretty far east, too.

I'd keep syllable structure simple or at most moderately complex, allowing, say, syllable onsets to have /w/ or /y/ as a second element (if not, I'd add /kᵂ/ to the base inventory), maybe a nasal coda. Accent strictly initial or final.

Grammar. In favor of Esperanto's gender system, I'd differentiate by animacy. I'm still not sure if I'd include 4th person/obviation mechanics.

Rather than tense, the verb would be more preoccupied with aspect, by regular suffixing. A future tense adverb might sneak in.

Nominative-accusative alignment, but no case marking. Subjects of verbs would be person prefixes; not sure about objects, but I could be talked into putting those into the verb, too.

Plenty of North American languages have some sort of classifier system. I'm not sure that I'd add those, or if I did it'd be a very simple and regular system, attached to numbers only (i.e., I would regretfully lay aside the verbal encoding that goes on in so many languages).

At the very least, reportative evidential. Maybe inferential.

I probably would include adjectives as a word class, maybe with special marking for predicate adjectives.

Haven't decided on word order, probably SVO with SOV a strong contender.

Vocabulary would be churned through some automatic system to find any mnemonic cross-family similarities there might be, and ensure equal representation.

Monday, October 18, 2010

Old High Coochy-Coo

Of those few conlangs that reach a pretty well-developed state (beyond 1500 words or so, a reasonable corpus), a good number will have well-defined formal and literary registers. Part of this is probably yet another lingering influence of Tolkien, though for most people a literary conlang may be the first they encounter. In my own Vaior I created syntax and a good dose of parallel vocabulary for fairly common words used only in the poetic register (raie was the normal word for star, emme poetic), as well as poetic syntax (animate direct objects of perception verbs are in the genitive, not accusative).

One thing I've never seen in a conlang is baby-talk. How different cultures talk to children isn't exactly universal. Some people don't talk directly to children until they have something interesting to say back, without apparently causing developmental problems. But it's a pretty common practice. What I would not have suspected, until I read about it a few days ago, is that it is fairly common for people to use baby talk — or something much like it — when speaking to animals.

A few things are common to baby talk —

Reduplication is very common (in my own family, a bottle is either a ba or a baba).

Much wider pitch range, and a tendency to stay in higher registers.

Simplified grammar (not a surprise).

Vocabulary that exists only in the baby-talk register ("binkie" for "blanket"; in Nootka, paapash "eat!" for adult ha'ukw'i).

Particular patterns of phonological deformation (not exactly simplification, but nearly so).

The word deformations are most interesting to me, and in Native American languages dovetail with some interesting things that happen in story telling registers. In Cocopa, for example, the onset consonant of stressed syllables is turned into a /v/, while other consonants are fronted. Adult kwanyúk "baby" becomes kanvúk. Cocopa uses a very similar register with animals, with different informants finding the register appropriate for speaking to cats, dogs, horses or even chickens. In this register, every word gets a palatalized lateral fricative, /łʲ/, inserted or substituted into every word.

In Quileute certain prefixes might be used when speaking to people with particular characteristics, /s-/ for a small man, /tł-/ for someone who is cross-eyed. But certain characters in traditional stories also have their language altered in particular ways. Raven prefixes /ʃ-/; Deer prefixes /tłk-/ and turns all sybilants to laterals. Coyote, of course, speaks inappropriately and often in highly distorted ways all over the West.

One of the great things about using formal registers is that it becomes that much easier to be rude and impolite. So I was delighted to read that in Nootka, the word deformation for Raven — /-tʃx-/ inserted after the first syllable of the word — is also used to speak of greedy people. But not to their face.

I'll have to try out some of this in some future project.