Saturday, April 7, 2018

Lexical Exploration: "bruise"

The English bruise is related to words for "crush, injure, cut, smash." The usage for blemished fruits is first attested in the 14th century.

In Ancient Greek, several words related to the core sense of "crush" are also given the definition "bruise:" θλάω, τρίβω. There is also the rare-appearing word μώλωψ, "mark of a stripe, weal, bruise" which generates a denominal verb.

In the Dravidian family, again, quite a few words related to "crush" or "(strike a) blow, beat," and occasionally "press," are also glossed "bruise." See for example, naci and tar̤umpu.

In the Austronesian family color terms seem to be a popular source domain, as in the color root, -*dem, which generates a term in one daughter language, and the root *alem, also related to color, does in another. Also *baŋbaŋ₈, which generated terms related to a range of skin discolorations. There are other source domains, however, such as baneR, which in addition to "bruise, weal" also generates a specific term for blemishes on fruit.

In Mbula, -berebere across dialects means "be bruised and swollen, itch and burn, have blisters."

Mandarin has a large collection of terms glossed "bruise," most of which seem to be polysemous with more generic injury terms, "wound, abscess, bump," or the aftermath, "scar." The term 烏青 wū qīng refers to the color ("dark/black" + "grue/grey"), and can be used alone as a color term.

Somba-Siawari's yöhöza covers all of "bite, sting, rub, hurt, bruise, weigh down."

In Malayalam the terms are all polysemous with other injury terms, of which ആഘാതം āghātaṁ is most flush with meaning: "stab, stroke, beat, trauma, blow, waft, bruise, bump, impact, poke, push, shock."

Other dictionaries consulted: Maori, Turkish, Angave, Swahili, Arabic, Wolof, Korean, Armenian, Malay.

Summary: the cause of bruising ("hit, crush, pound, press," occasionally "abrade" or "dent") is a common source domain. In some families, the word is polysemous with other kinds of injuries, "weal" and swelling, in particular. Color terms are an occasional source. It's hard to tell history from some dictionaries, but there may occasionally be root terms for a polysemous injury word that includes "bruise." Finally, languages that are robustly reduplicating seem happy to use it in "bruise" terms (but this might be due to the stative sense rather than specific semantics).

Tuesday, November 14, 2017

Identity-centered Pronouns

One of the things you might do as a non-hetero or non-gender-conforming conlanger is fiddle around with pronouns or other parts of a language to represent your own reality a bit more. Most first attempts at fiddling with the pronouns are likely to result in a giant pronoun inventory that is unwieldy while also leaving some people out. I've certainly produced a few of these in the past.

For a very slowly developing personal project (Kílta), though, I came up with an idea that might be workable: the identity center.

This is modeled on the deictic center. The deictic center of a narrative or conversation is that location in space to which words of location and motion are oriented: this/that, here/there, come/go, etc. The center can move in narrative to where the action is, but in most interactive conversation the center is where the conversation is taking place.

So, in this model personal and demonstrative pronouns are coded as being either at or away from the currently active identity center. There are neutral pronouns, and much of the time those will be the ones used, but if somehow identity becomes relevant these pronouns can be brought out to signal where things fit. Further, the identity itself can be anything salient. One might, for example, say this to conlangers:

Inna ekólot si kotiho më.
DEM.IDC work ACC understand.PFV NEG.
(They) don't understand this work.

In this, inna is the identity centered demonstrative, here indicating that the work in question (fiddling with pronouns, say) is somehow related to the conlanger identity.

I translated the Fire, walk with me poem from Twin Peaks into Kílta as a test, and the final line is:

Luëka, án tin tali.
fire.VOC 1SG.IDC with walk.IMP
Fire, walk with me!

By using the identity-centered first person pronoun, án, the reciter is placing themselves into the same identity as the mystical fire being addressed.

Kílta plays with pronouns in several ways. There is, for example, a pair of first person pronouns that code how much agency the speaker feels they had in the state of affairs being described. But by thinking about LGBTQ+ pronoun questions I have concocted a system that is more broadly usable. But I'm going to have to use the language for a good bit longer before I'll be quite ready to declare a success.

Wednesday, June 18, 2014

"Conlang" and the OED

So, conlang got an entry in the OED a few days ago. The word has been in use since the early 1990s, and in the post-Avatar, post-Game-of-Thrones world, it is unlikely to fade out of existence any time soon, so this is an obvious move on the part of the OED editorial team.

Compared to some conlangers' reactions, my own personal reaction to this is fairly muted. I absolutely do not view this OED entry as any sort of vindication of the art. First, if I needed approval from others to pursue my hobbies, I wouldn't play the banjo, much less conlang. I don't usually look to others for approval of my pastimes (except my neighbors, I suppose, if I decide to do something unusually loud). Second, there are all manner of very unpleasant behaviors also defined in the OED, which no one takes as a sign of OED editorial approval. The word's in the OED because it is being used now, has been for a few decades, and is likely to continue to be used for decades to come. The OED entry is a simple recognition of that fact.

I was, however, delighted to notice that one of the four citations was a book by Suzette Haden Elgin, The Language Imperative. Few people are neutral on her major conlang, Láadan. I'm a big fan, while at the same time not believing it capable of accomplishing the goals it was designed to attain. I got a copy of the grammar for the language before I had regular internet access, and so was the first conlang I ever saw that wasn't mostly a euro-clone.1 I learned a lot from Láadan, so I have a warm place in my heart for it. It's a shame Alzheimer's has probably robbed Elgin of the opportunity to know she was cited in the OED.


1 Klingon is not nearly as strange as it looks on the surface. Láadan introduced me to a range of syntactic and semantic possibilities I had not previously encountered: evidentiality, different embedding structures, inalienable possession, simpler tone systems, the possibilities of a smaller phonology.

Monday, June 16, 2014

The Ultimate Dictionary Database System

Is text. End of post.

Ok, it's not quite that simple. You probably want some sort of structured text, semantically marked up if possible. But at the end of the day, all you can really rely on is text.

Why Spreadsheets Suck

First, the format is proprietary and often inconsistent across even minor version changes. You will be in a world of hurt if you want to share your dictionary with anyone else.

Second — and this is the biggest problem by far, assuming you're trying to make a naturalistic conlang — a real dictionary for a real language does not look like this:

  • kətaŋ sleep
  • kətap book
  • kətəs hangnail on the left little finger which interferes with one's needlework
  • kəwa tree
  • kəwah noodle
  • kəwe computer
  • kəweŋ hard

A few words between two languages might have (nearly) perfect overlap, and the early history of word in a conlang might start as a simple gloss, but a simple word-to-word matching is profoundly lying to you for a real language, and in a conlang signals a relex.

A real dictionary entry looks like this: δίδωμι. It has multiple meanings defined, examples of use, collocations, grammar and morphology notes, references, etc., etc.

The spreadsheet format forces you into a very limited structure for each word. That structure can never hope to cope reliably with all the different words of a single language, much less the variety of things conlangers come up with (to say nothing of natlang variety). A spreadsheet is a too rigid format to grow the meaning and uses of a word over the lifetime of your conlang.

Why Databases Suck

First, they share the same problems with spreadsheets with respect to format. Technically, SQL is a standard. In reality, all but the most trivial of databases tend to use non-standard SQL conveniences offered by the database server software the software author decided to use. So, you may get something almost portable, but often not.

Second, and again like the spreadsheet problem, a truly universal dictionary tool, a piece of software that could handle everything from Indonesian to Ancient Greek to Navajo — or Toki Pona to Na'vi to High Valyrian to Ithkuil — is going to require a very complex database structure. The SIL "Toolbox" dictionary tool has more than 100 fields available (Making Dictionaries), and all those possibilities need to be in both the database design and the software that talks to the database.

I have, over the years, spent some time trying to design a database that could really be a good language dictionary. The schema for even a simple design was quite complex, and I would not have wanted to write the software to control it. There's this huge problem in that different languages vary wildly in their definitional needs. For Mandarin, for example, you need to cover all the usual purely semantic matters — polysemy, idiom, collocation, multiple definitions, examples, etc. — but there aren't too many morphological worries. But once you add morphological complexity you've got a whole new layer of issues. The Ancient Greek example I link to above is for a fairly irregular verb, with dialectal worries to boot. And for Navajo and related Athabaskan languages the situation is so dire that people write papers called things like Making Athabaskan Dictionaries Usable and Design Issues in Athabaskan Dictionaries (do look at those to get a feel for the issues).

Any truly general dictionary database, one capable of handling enough sorts of languages to be genuinely useful, would have vast tracts of empty space to accommodate information not needed in many languages, with these fields of whitespace in different places for different languages. Even if you target your database and software design to something like Ancient Greek, there will be lots of fields left blank most of the time. It's not like all the verbs are irregular, though it may sometimes seem that way to beginners.

If you had a very good team of developers, you could probably overcome these problems, assuming the users were willing to configure a complex tool to make it easy to use for only the things your language needed. But it's never going to be a money-making venture. I don't expect to see such a tool in my lifetime.

Enter Stage Right: Text

So, we're back to simple text. The benefits:

  • the file is still readable if Microsoft/Apple/Whoever releases a New and Improved (tm) version of this or that proprietary bit of software; a file you find from 10 years ago will still be readable
  • there are zillions of text editors, usually with built in search functions, which will work on the file
  • if part of the file is destroyed, the rest of the file will generally be recoverable (proprietary formats tend to be brittle when bitrot sets in)

Bare text, of course, is not very attractive. The way around this is to use a text-based markup of some sort. You could use HTML. Or even XML with a little more work. I strongly favor LaTeX, which requires more typing than I might like, but it gives me maximum flexibility to change my mind and spits out very attractive results. The point of this is that even though HTML and LaTeX are presentation formats, the underlying basis is still just plain text. If something goes horribly wrong, you'll have a modestly ugly text file to read, but all your hard work will still be recoverable.

If you are disorganized, a computer will not help you. If you can impose a little order on yourself, though, a computer can make your life a lot easier. And a little thought can make even a plain old .txt file into the best dictionary tool you could ever want.

Sunday, April 6, 2014

Afrihili Days of the Week

In anticipation of last week's release of my Fiat Lingua paper Afrihili: an African Interlanguage, I took to Twitter to do a few Word of the Day posts. Because this is the sort of silliness that amuses me, each Word of the Day was the word for that day. Here they are in a tidy list:

  • Kurialu Sunday
  • Lamisalu Monday
  • Talalu Tuesday
  • Wakashalu Wednesday
  • Yawalu Thursday
  • Sohalu Friday
  • Jumalu Saturday

I wasn't able to find the source languages for these words, each of which ends in alu day.

For good measure, here are the months:

  • Kazi January
  • Rume February
  • Nyawɛ March
  • Forisu April
  • Hanibali May
  • Vealɛ June
  • Yulyo July
  • Shaba August
  • Tolo September
  • Dunasu October
  • Bubuo November
  • Mbanjɛ December

Again, the source languages aren't always clear, though July is coming from some European language. I must admit I didn't devote too much time to tracking these down, though. Some might be immediately obvious to some of my readers.

There aren't enough examples of time phrases to be sure of everything. The notion of "by (a month)" combines two adpositions, ɛn Shaba fo by August.

Friday, February 21, 2014

Níí'aahta Tép Toulta - "Lord Smoke and the Merchant"

I have worked up a full interlinear for one of the shorter stories with Lord Smoke, a sort of trickster figure. I don't go into every subtlety of expression, but most should be clear.

Níí'aahta Tép Toulta (PDF), and a recording (MP3) of me reciting the tale.

Friday, November 22, 2013

What about dying languages?

There are various ways a person can respond the the discovery that I create languages for fun. The most common is noncommittal and polite puzzlement. A few people will be enthusiastic about the idea, especially if they're fans of the recent big films and TV shows involving invented languages in some way. Every once in a while, especially online, someone will object on the grounds that people involved with invented languages should, instead, be Doing Something about dying languages. This objection is so badly thought out that I'm genuinely surprised at its popularity.

First and foremost, anyone complaining about people messing around with invented languages has failed, in a fairly comprehensive way, to understand the concept of a hobby. Time I spend working with an invented language is not taken from documenting dying languages or some other improving activity, it is taken from time I spend with my banjo, reading a novel or watching TV.

Second, while it is true I, along with most language creators, know more about linguistics than the average Man on the Street, documenting undocumented languages is a special skill taking training I certainly don't have. In fact, most people with Ph.D.'s in linguistics won't even have such training. Do people going on about dying languages really imagine anyone can go out and do this sort of work? If someone has a nice garden near their house, we don't harass them about how they should be growing crops to feed the hungry, nor do we demand every weekend golfer go pro. What is it about invented languages that brings out this pious impulse to scold people for not doing something productive with their time when so many other hobbies get no comment at all?

If we step back to more modest goals than documenting a dying language, we're in much the same boat. There is little point to me going out and learning, say, Kavalan (24 speakers left as of 2000) unless I go to Taiwan and spend most of my time among the people who speak it. Sitting at home in Wisconsin learning Kavalan does nothing to preserve it in any meaningful way. You just can't really learn a language from a book. You have to spend time with native speakers.

Using other people's cultures — or fantasies about their culture — as a rhetorical foil has a long history. When Europeans were less approving of sex, they complained that Muslims were libertines, while others used this an example of a more sensible cultural trait. This is all part of the usual Noble Savage industry. The death of so many languages is a real issue, representing the permanent loss of a wealth of cultural and environmental knowledge. It deserves to be treated with more respect than to be used merely as a rhetorical club to browbeat people who have a hobby you don't like.