Acta Lingweenie

Sunday, October 28, 2018

Kílta metaphor: SALT IS VITALITY

One standard feature of my current grammars for new languages is a separate section after the dictionary where I focus on particular areas of interest or difficulty. For example, copulas and verbs of existence in Kílta have a few complications, so there's a section on those. This lets me limit cross-references in the dictionary definitions to something reasonable, while still being able to give a thorough overview later.

A subsection on conceptual metaphor (Conlangery Podcast #66) is now standard in my grammars. I've recently been working out the metaphor SALT IS VITALITY (for some reason, conceptual metaphors are often given in all-caps like this).

When I first thought about this metaphor, I spent a little while first thinking through the implications. In this instance, I already had an idiom involving salt that would interact a bit oddly with it —

Ches si tirat vuëtiso.
salt ACC give.1R-INF try-PFV
They tried to bribe me. (lit., "they tied to give me salt")

I decided this wasn't a vital problem, and in fact slightly enhanced the idiom and the conceptual metaphor I was about to develop.

Kílta has a modest set of derivational affixes, so I first thought about how some of those might work:

chesámin - "saltless," has the standard meanings of dull, lifeless, with an additional sense of mildly ill
chesëtin - "salty, having salt" is the core sense, but also means vital, lively, vigorous

For now, no other derivational elements have suggested themselves for this metaphor. In general, I try to take these metaphorical derivations only if they have a clear literal use, too.

Next, Kílta, as a verb-final language, favors N-V combinations for creating new verbal senses from nouns, rather than derivational affixes. These are more obviously idiomatic, with less clear-cut literal use:

ches si raho - literally, "throw (the) salt," has the same basic sense and tone as the English idiom "kick the bucket," but is a touch less respectful than the English
ches tëníto - literally, "(the) salt is gone," matches the idea of being dejected, or "the life has gone out of him/her/it"
ches si kwilë relo - literally, "carries too much salt," is for someone who has too much nervous energy, or a pet having the zoomies

That's as far as I want to take things for now with a new metaphor. I've made a notation in the dictionary entry for ches salt which reminds me of this sense if I add new examples to that headword, in addition to the conceptual metaphors section after the dictionary. Maybe that's as far as this metaphor will go, but it's always nice when a new metaphor-based idiom suggests itself.

Wednesday, October 24, 2018

Two Notes on Walman

The Walman language of Papua New Guinea has two interesting grammatical features: a conjugated and, and an inflectional diminutive.

Conjugated Conjunction

Walman's verbs have polypersonal agreement on transitive verbs, marking both subject and object. Conjunction is handled with two verb stems, -aro- and <-a-> (subject is a prefix, object is a suffix):

nyue w-aro-n ngan
mother 3SG.F.SUBJ-and-3SG.M.OBJ father
a mother and father

Since verb serialization is already present in Walman, it looks like a verb got grabbed to mean and and got dropped into the serialization chain. There is also a non-conjugated and, which may be used instead of the conjugated form, but seems to be preferred for inanimate constituents and clauses. Interestingly, the Lamaholot language of Indonesia also has an inflecting and, but it can be used to join clauses.

See Verbs for 'And' in Walman for all the glorious details.

Inflectional Diminutive

Walman also has a third person singular diminutive marker which occurs on verbs and adjectives.

Pelen l-aykiri.
dog 3SG.DIMIN-bark
The puppy is barking.

Pelen w-aykiri.
dog 3SG.FEM-bark
The female dog is barking.

Pelen n-aykiri.
dog 3SG.MASC-bark
The male dog is barking.

Pelen y-aykiri.
dog 3SG.PL-bark
The dogs are barking.

The authors of the paper below believe that the diminutive marker was originally a neuter gender.

See Diminutive as an Inflectional Category in Walman for details.

Saturday, April 7, 2018

Lexical Exploration: "bruise"

The English bruise is related to words for "crush, injure, cut, smash." The usage for blemished fruits is first attested in the 14th century.

In Ancient Greek, several words related to the core sense of "crush" are also given the definition "bruise:" θλάω, τρίβω. There is also the rare-appearing word μώλωψ, "mark of a stripe, weal, bruise" which generates a denominal verb.

In the Dravidian family, again, quite a few words related to "crush" or "(strike a) blow, beat," and occasionally "press," are also glossed "bruise." See for example, naci and tar̤umpu.

In the Austronesian family color terms seem to be a popular source domain, as in the color root, -*dem, which generates a term in one daughter language, and the root *alem, also related to color, does in another. Also *baŋbaŋ₈, which generated terms related to a range of skin discolorations. There are other source domains, however, such as baneR, which in addition to "bruise, weal" also generates a specific term for blemishes on fruit.

In Mbula, -berebere across dialects means "be bruised and swollen, itch and burn, have blisters."

Mandarin has a large collection of terms glossed "bruise," most of which seem to be polysemous with more generic injury terms, "wound, abscess, bump," or the aftermath, "scar." The term 烏青 wū qīng refers to the color ("dark/black" + "grue/grey"), and can be used alone as a color term.

Somba-Siawari's yöhöza covers all of "bite, sting, rub, hurt, bruise, weigh down."

In Malayalam the terms are all polysemous with other injury terms, of which ആഘാതം āghātaṁ is most flush with meaning: "stab, stroke, beat, trauma, blow, waft, bruise, bump, impact, poke, push, shock."

Other dictionaries consulted: Maori, Turkish, Angave, Swahili, Arabic, Wolof, Korean, Armenian, Malay.

Summary: the cause of bruising ("hit, crush, pound, press," occasionally "abrade" or "dent") is a common source domain. In some families, the word is polysemous with other kinds of injuries, "weal" and swelling, in particular. Color terms are an occasional source. It's hard to tell history from some dictionaries, but there may occasionally be root terms for a polysemous injury word that includes "bruise." Finally, languages that are robustly reduplicating seem happy to use it in "bruise" terms (but this might be due to the stative sense rather than specific semantics).

Tuesday, November 14, 2017

Identity-centered Pronouns

One of the things you might do as a non-hetero or non-gender-conforming conlanger is fiddle around with pronouns or other parts of a language to represent your own reality a bit more. Most first attempts at fiddling with the pronouns are likely to result in a giant pronoun inventory that is unwieldy while also leaving some people out. I've certainly produced a few of these in the past.

For a very slowly developing personal project (Kílta), though, I came up with an idea that might be workable: the identity center.

This is modeled on the deictic center. The deictic center of a narrative or conversation is that location in space to which words of location and motion are oriented: this/that, here/there, come/go, etc. The center can move in narrative to where the action is, but in most interactive conversation the center is where the conversation is taking place.

So, in this model personal and demonstrative pronouns are coded as being either at or away from the currently active identity center. There are neutral pronouns, and much of the time those will be the ones used, but if somehow identity becomes relevant these pronouns can be brought out to signal where things fit. Further, the identity itself can be anything salient. One might, for example, say this to conlangers:

Inna ekólot si kotiho më.
DEM.IDC work ACC understand.PFV NEG.
(They) don't understand this work.

In this, inna is the identity centered demonstrative, here indicating that the work in question (fiddling with pronouns, say) is somehow related to the conlanger identity.

I translated the Fire, walk with me poem from Twin Peaks into Kílta as a test, and the final line is:

Luëka, án tin tali.
fire.VOC 1SG.IDC with walk.IMP
Fire, walk with me!

By using the identity-centered first person pronoun, án, the reciter is placing themselves into the same identity as the mystical fire being addressed.

Kílta plays with pronouns in several ways. There is, for example, a pair of first person pronouns that code how much agency the speaker feels they had in the state of affairs being described. But by thinking about LGBTQ+ pronoun questions I have concocted a system that is more broadly usable. But I'm going to have to use the language for a good bit longer before I'll be quite ready to declare a success.

Wednesday, June 18, 2014

"Conlang" and the OED

So, conlang got an entry in the OED a few days ago. The word has been in use since the early 1990s, and in the post-Avatar, post-Game-of-Thrones world, it is unlikely to fade out of existence any time soon, so this is an obvious move on the part of the OED editorial team.

Compared to some conlangers' reactions, my own personal reaction to this is fairly muted. I absolutely do not view this OED entry as any sort of vindication of the art. First, if I needed approval from others to pursue my hobbies, I wouldn't play the banjo, much less conlang. I don't usually look to others for approval of my pastimes (except my neighbors, I suppose, if I decide to do something unusually loud). Second, there are all manner of very unpleasant behaviors also defined in the OED, which no one takes as a sign of OED editorial approval. The word's in the OED because it is being used now, has been for a few decades, and is likely to continue to be used for decades to come. The OED entry is a simple recognition of that fact.

I was, however, delighted to notice that one of the four citations was a book by Suzette Haden Elgin, The Language Imperative. Few people are neutral on her major conlang, Láadan. I'm a big fan, while at the same time not believing it capable of accomplishing the goals it was designed to attain. I got a copy of the grammar for the language before I had regular internet access, and so was the first conlang I ever saw that wasn't mostly a euro-clone.¹ I learned a lot from Láadan, so I have a warm place in my heart for it. It's a shame Alzheimer's has probably robbed Elgin of the opportunity to know she was cited in the OED.

¹ Klingon is not nearly as strange as it looks on the surface. Láadan introduced me to a range of syntactic and semantic possibilities I had not previously encountered: evidentiality, different embedding structures, inalienable possession, simpler tone systems, the possibilities of a smaller phonology.

Monday, June 16, 2014

The Ultimate Dictionary Database System

Is text. End of post.

Ok, it's not quite that simple. You probably want some sort of structured text, semantically marked up if possible. But at the end of the day, all you can really rely on is text.

Why Spreadsheets Suck

First, the format is proprietary and often inconsistent across even minor version changes. You will be in a world of hurt if you want to share your dictionary with anyone else.

Second — and this is the biggest problem by far, assuming you're trying to make a naturalistic conlang — a real dictionary for a real language does not look like this:

kətaŋ sleep
kətap book
kətəs hangnail on the left little finger which interferes with one's needlework
kəwa tree
kəwah noodle
kəwe computer
kəweŋ hard

A few words between two languages might have (nearly) perfect overlap, and the early history of word in a conlang might start as a simple gloss, but a simple word-to-word matching is profoundly lying to you for a real language, and in a conlang signals a relex.

A real dictionary entry looks like this: δίδωμι. It has multiple meanings defined, examples of use, collocations, grammar and morphology notes, references, etc., etc.

The spreadsheet format forces you into a very limited structure for each word. That structure can never hope to cope reliably with all the different words of a single language, much less the variety of things conlangers come up with (to say nothing of natlang variety). A spreadsheet is a too rigid format to grow the meaning and uses of a word over the lifetime of your conlang.

Why Databases Suck

First, they share the same problems with spreadsheets with respect to format. Technically, SQL is a standard. In reality, all but the most trivial of databases tend to use non-standard SQL conveniences offered by the database server software the software author decided to use. So, you may get something almost portable, but often not.

Second, and again like the spreadsheet problem, a truly universal dictionary tool, a piece of software that could handle everything from Indonesian to Ancient Greek to Navajo — or Toki Pona to Na'vi to High Valyrian to Ithkuil — is going to require a very complex database structure. The SIL "Toolbox" dictionary tool has more than 100 fields available (Making Dictionaries), and all those possibilities need to be in both the database design and the software that talks to the database.

I have, over the years, spent some time trying to design a database that could really be a good language dictionary. The schema for even a simple design was quite complex, and I would not have wanted to write the software to control it. There's this huge problem in that different languages vary wildly in their definitional needs. For Mandarin, for example, you need to cover all the usual purely semantic matters — polysemy, idiom, collocation, multiple definitions, examples, etc. — but there aren't too many morphological worries. But once you add morphological complexity you've got a whole new layer of issues. The Ancient Greek example I link to above is for a fairly irregular verb, with dialectal worries to boot. And for Navajo and related Athabaskan languages the situation is so dire that people write papers called things like Making Athabaskan Dictionaries Usable and Design Issues in Athabaskan Dictionaries (do look at those to get a feel for the issues).

Any truly general dictionary database, one capable of handling enough sorts of languages to be genuinely useful, would have vast tracts of empty space to accommodate information not needed in many languages, with these fields of whitespace in different places for different languages. Even if you target your database and software design to something like Ancient Greek, there will be lots of fields left blank most of the time. It's not like all the verbs are irregular, though it may sometimes seem that way to beginners.

If you had a very good team of developers, you could probably overcome these problems, assuming the users were willing to configure a complex tool to make it easy to use for only the things your language needed. But it's never going to be a money-making venture. I don't expect to see such a tool in my lifetime.

Enter Stage Right: Text

So, we're back to simple text. The benefits:

the file is still readable if Microsoft/Apple/Whoever releases a New and Improved (tm) version of this or that proprietary bit of software; a file you find from 10 years ago will still be readable
there are zillions of text editors, usually with built in search functions, which will work on the file
if part of the file is destroyed, the rest of the file will generally be recoverable (proprietary formats tend to be brittle when bitrot sets in)

Bare text, of course, is not very attractive. The way around this is to use a text-based markup of some sort. You could use HTML. Or even XML with a little more work. I strongly favor LaTeX, which requires more typing than I might like, but it gives me maximum flexibility to change my mind and spits out very attractive results. The point of this is that even though HTML and LaTeX are presentation formats, the underlying basis is still just plain text. If something goes horribly wrong, you'll have a modestly ugly text file to read, but all your hard work will still be recoverable.

If you are disorganized, a computer will not help you. If you can impose a little order on yourself, though, a computer can make your life a lot easier. And a little thought can make even a plain old .txt file into the best dictionary tool you could ever want.

Sunday, April 6, 2014

Afrihili Days of the Week

In anticipation of last week's release of my Fiat Lingua paper Afrihili: an African Interlanguage, I took to Twitter to do a few Word of the Day posts. Because this is the sort of silliness that amuses me, each Word of the Day was the word for that day. Here they are in a tidy list:

Kurialu Sunday
Lamisalu Monday
Talalu Tuesday
Wakashalu Wednesday
Yawalu Thursday
Sohalu Friday
Jumalu Saturday

I wasn't able to find the source languages for these words, each of which ends in alu day.

For good measure, here are the months:

Kazi January
Rume February
Nyawɛ March
Forisu April
Hanibali May
Vealɛ June
Yulyo July
Shaba August
Tolo September
Dunasu October
Bubuo November
Mbanjɛ December

Again, the source languages aren't always clear, though July is coming from some European language. I must admit I didn't devote too much time to tracking these down, though. Some might be immediately obvious to some of my readers.

There aren't enough examples of time phrases to be sure of everything. The notion of "by (a month)" combines two adpositions, ɛn Shaba fo by August.

Friday, February 21, 2014

Níí'aahta Tép Toulta - "Lord Smoke and the Merchant"

I have worked up a full interlinear for one of the shorter stories with Lord Smoke, a sort of trickster figure. I don't go into every subtlety of expression, but most should be clear.

Níí'aahta Tép Toulta (PDF), and a recording (MP3) of me reciting the tale.

Friday, November 22, 2013

What about dying languages?

There are various ways a person can respond the the discovery that I create languages for fun. The most common is noncommittal and polite puzzlement. A few people will be enthusiastic about the idea, especially if they're fans of the recent big films and TV shows involving invented languages in some way. Every once in a while, especially online, someone will object on the grounds that people involved with invented languages should, instead, be Doing Something about dying languages. This objection is so badly thought out that I'm genuinely surprised at its popularity.

First and foremost, anyone complaining about people messing around with invented languages has failed, in a fairly comprehensive way, to understand the concept of a hobby. Time I spend working with an invented language is not taken from documenting dying languages or some other improving activity, it is taken from time I spend with my banjo, reading a novel or watching TV.

Second, while it is true I, along with most language creators, know more about linguistics than the average Man on the Street, documenting undocumented languages is a special skill taking training I certainly don't have. In fact, most people with Ph.D.'s in linguistics won't even have such training. Do people going on about dying languages really imagine anyone can go out and do this sort of work? If someone has a nice garden near their house, we don't harass them about how they should be growing crops to feed the hungry, nor do we demand every weekend golfer go pro. What is it about invented languages that brings out this pious impulse to scold people for not doing something productive with their time when so many other hobbies get no comment at all?

If we step back to more modest goals than documenting a dying language, we're in much the same boat. There is little point to me going out and learning, say, Kavalan (24 speakers left as of 2000) unless I go to Taiwan and spend most of my time among the people who speak it. Sitting at home in Wisconsin learning Kavalan does nothing to preserve it in any meaningful way. You just can't really learn a language from a book. You have to spend time with native speakers.

Using other people's cultures — or fantasies about their culture — as a rhetorical foil has a long history. When Europeans were less approving of sex, they complained that Muslims were libertines, while others used this an example of a more sensible cultural trait. This is all part of the usual Noble Savage industry. The death of so many languages is a real issue, representing the permanent loss of a wealth of cultural and environmental knowledge. It deserves to be treated with more respect than to be used merely as a rhetorical club to browbeat people who have a hobby you don't like.

Friday, October 25, 2013

Arbitrary Sort Orders in Python (including digraphs!)

Unicode: everyone wants it, until they get it.
Barry Warsaw

I know I'm due to do another post about LaTeX, but that'll have to wait for next week.

I've recently discovered two nice tools for my iPad which let me do some programming, and sophisticated editing and text processing, Editorial and Pythonista. So, I've been working on some code related to conlanging.

I know some people hate them, but I'm a big fan of word generators for three reasons. First, they help me avoid overusing certain sounds, something I'm normally prone to. Second, it helps you verify that the rules you've given for your syllable shapes actually describe what you want. Finally, while I might have phonaesthetic concerns about some vocabulary, I don't want to agonize over the word for "toe" or "napkin" most of the time, so I like having a random pool of words to grab from. I still might change the word, or decide a random selection is not right for the word, so it's not like I'm giving up aesthetic control of my language.

In any case, while it is a bit odd to write new software on a tablet, last night in about an hour I created a good tool for generating random new word shapes based on rules. But one serious problem came up — the sorted list was sorted terribly! For a person using a computer intended for English speakers, "á" is sorted after "z", which is not what I want at all. So I spent some time trying to come up with a way to sort arbitrarily.

In addition to the sort order of "á", I wanted to be able to correctly sort digraphs. In some languages, "ng" comes after the entirety of "n" in dictionaries and phone books.

It turns out there is a terrifying Perl library to accomplish this, Sort::ArbBiLex. As far as I can see, no such library exists for Python, so I had to write my own.

The code could probably be more efficient, but it works for my purposes, and turned out to be fairly simple. I rely on two bits of trickery. First, Python lets you sort ordered collections like lists and tuples. This makes it easy to follow the "decorate-sort-undecorate" pattern for sorting complex items. Second, I use a bit of a regular expression hack. If you split using a regular expression in a group, you get a strided array as a result, with the split pattern interwoven with the regular expression match.

>>> import re
>>> m = re.compile(r'(ch|t|p|k|a|i|o)')
>>> m.split("tapachi")
['', 't', '', 'a', '', 'p', '', 'a', '', 'ch', '', 'i', '']
>>> m.split("tapachi")[1::2]
['t', 'a', 'p', 'a', 'ch', 'i']
>>>

Basically, I split on every single character in the language, which gives me a lot of empty strings, but they're easily filtered out. Notice how it recognizes "ch" as a separate letter of the language.

So, the central algorithm of this little bit of code is: convert the unicode string to a sequence of "letters" (however defined in your language), convert those letters into a numerical code, sort the list of numerical codes, turn the collections of numerical codes back into words, spit back the complete result.

import re

class ArbSorter:
    def __init__(self, order):
        elts = re.split('\s*', order, flags=re.UNICODE)
        # Create a regex to split on each character or multicharacter
        # sort key.  (As in "ch" after all "c"s, for example.)
        # Gosh, this is not especially efficient, but it works.
        split_order = sorted(elts, key=len, reverse=True)
        self.splitter = re.compile(u"(%s)" % "|".join(split_order), re.UNICODE)
        # Next, collect weights for the ordering.
        self.ords = {}
        self.vals = []
        for i in range(len(elts)):
            self.ords[elts[i]] = i
            self.vals.append(elts[i])

    # Turns a word into a list of ints representing the new
    # lexicographic ordering.  Python, helpfully, allows one to
    # sort ordered collections of all types, including lists.
    def word_as_values(self, word):
        w = self.splitter.split(word)[1::2]
        return [self.ords[char] for char in w]

    def values_as_word(self, values):
        return "".join([self.vals[v] for v in values])

    def __call__(self, l):
        l2 = [self.word_as_values(item) for item in l]
        l2.sort()
        return [self.values_as_word(item) for item in l2]

if __name__ == '__main__':
    mysorter = ArbSorter(u"a á c ch e h i k l m n ng o p r s t u")
    m = u"chica ciha no áru ngo na nga sangal ahi ná mochi moco"
    s = mysorter(m.split())
    print " ".join(s).encode('utf-8')

(A more attractive presentation.)

Just run the code and it prints out "ahi áru ciha chica moco mochi na ná no nga ngo sangal", exactly what you want. Much better than the "ahi chica ciha mochi moco na nga ngo no ná sangal áru" you'll get on a computer localized for an English speaker.

It is vital that you tell Python you're working with unicode text here, so make sure to include this in a comment near the top of your code: -*- coding: utf-8 -*-.

Wednesday, August 14, 2013

Conlanging with LaTeX, Part Three

In this post I want to talk about the thing that makes LaTeX so immensely powerful: it is programmable.

It is the great tragedy of modern computing that the industry has, for the most part, systematically trained people to be terrified of their computers. Things are changing all the time, usually in baffling ways, and those little changes all too often completely break other things we rely on. One consequence of this, though other issues compound the problem, is that most people have very powerful universal computing machines at their disposal but never write even a small program to solve a problem they might have.

This is not the place to teach computer programming, but I can introduce you to some very basic programming within LaTeX, to give you the power to radically alter the appearance of your conlanging documentation with just a few simple changes. It is this programmability of LaTeX that makes it such a powerful tool. Fortunately, most easy things are easy, so we'll start with that.

Text Appearance

Before we get to the programming, we'll start with the simple commands LaTeX uses to change basic font appearance. For example, from time to time we might want text to appear in italics or bold. In modern LaTeX, you just wrap the text you want to change in simple commands, \textit for italics and \textbf for bold. For example, \textbf{lorem ipsum dolor sit amet} will typeset that bit of gibberish in bold.

In addition to the bold and italics, there are a few other basic font changes you can use. Many linguisticky forumlae use small capitals, for which you can use \textsc. Note, though, that many fonts do not have a true small caps option. If you want to use proper ones, you'll need to pick your font carefully. You can use textsf to get a sans serif family, and \texttt for a "typewriter" family, with fixed character widths. In my own documentation, I find I mostly use italics and bold, with an occasional use of small caps, if I happen to be using a font that supports it. Unfortunately, Gentium, my favorite font, does not. Here are some fonts I know have small caps, apart from LaTeX's default Computer Modern (which I personally don't care for):

Linux Libertine.
The Brill, free for non-commercial use, has lots of good things of interest to a conlanger.

This introduction to LaTeX has a nice long list of various text tweaking options in LaTeX, Introduction to LaTeX, part 2.

Your Style

In my conlang documentation, I like to use bold font for the conlang and italics for the English translations. So you might think that I have \textbf and \textit all over my documentation. I don't. Instead, I write macros which declare my intent ("this is the conlang," "this is the translation"). That way, if I were to one day change my mind, I only have to update a single macro instead of going through the entire text changing all the \textbfs to something else.

Fortunately, in LaTeX it is trivial to write my own versions of things like \textbf, and I do so freely. My personal convention is to put (English) translations into a \E macro and the example language in \LL. This is how they are defined —

\newcommand{\LL}[1]{\textbf{#1}}
\newcommand{\E}[1]{\textit{#1}}

So, what does all this mean. First, \newcommand does what you'd expect — it creates a new command. The next part, in curly braces, lets you name this new command of yours. Note that LaTeX is case sensitive, so \E and \e would be different commands. Also, note that if you accidentally try to use a name that is already defined somewhere in LaTeX, it will barf out and complain about the redefinition. This is why my "in the language" macro is \LL — there's already a \L in LaTeX (it gives a barred-L for languages like Polish).

The part in the square brackets says how many arguments the macro has. That is, how many different sets of curly braces there will be with the command. Finally is the body of the macro, which is what you want the macro to do. Within the body you can use #1 to refer to the first argument, #2 to the second, etc. So, my \LL macro has a single argument, which is wrapped up in the \textbf command.

On the surface, this looks sort of dumb. I have just written my own command to do something which LaTeX can already do. But, I've replaced a font styling command with a semantic command, for my personal cognitive benefit. \LL everywhere means "this is in the conlang" not just "this is in bold face." This gives me two advantages. First, I can go through the document looking just for examples of the conlang. Second, if I decided later I hate bold for the conlang, I can simply change the macro and let LaTeX do the rest.

You can also just put plain text within a new macro. For example, my dictionary stye has this:

\newcommand{\Seealso}[1]{See also \LL{#1}.}

Let's look at a command with more than one argument. This is a simplified version of my "Lexicon EXAMPLE" macro.

\newcommand{\lexample}[2]{\LL{#1} \E{#2}}

An example of us of this is, \lexample{tempus fugit}{time flies}. It will just print the Latin phrase in bold, a space, then the English translation in italics. Note very carefully — normal text parsing rules of LaTeX apply within a macro definition, so you need to take care about extra spaces or line ends. You can get weird effects, and I'll talk about ways to tame that in a later post.

For one last example, sometimes I make small notes to myself within the body of a document I'm working on. Because I want it to stand out, but not take up too much room, I format that note in a smaller font, but I use a different color.

\newcommand{\note}[1]{\textcolor{magenta}{\small\textit{#1}}}

If you're not using XeTeX, you'll probably need to \usepackage{color} to get this to work.

Etc.

A few weeks ago on the conlang-l mailing list someone mentioned that there's a nice LaTeX package to typeset vowel triangles in the way we're used to from a nice IPA chart. Ignoring other package and LaTeX setup details, you just need this:

...
\usepackage{vowel}
...
\begin{vowel}
  \putcvowel{\LL{i}}{1}
  \putcvowel{\LL{u}}{8}
  \putcvowel{\LL{a}}{4}
  \putcvowel{\LLi{e}}{2}
  \putcvowel{\LLi{o}}{7}
\end{vowel}

Which produces this:

If you're using TeXLive, you'll already have the package installed. The package documentation is very clear.

Next Time

The next post will be all about tables, because if there's anything conlangers love, it's paradigm charts.

Monday, July 1, 2013

Conlanging with LaTeX, Interlude One

One of the perennial problems in writing any document dealing with multiple languages is choosing a font that can handle everything. Add a little linguistics, and things get very messy. Since I wrote the first part in this series, I have discovered a new font that's designed for this sort of linguistic work, the Brill. It's been in development for a while, but I hadn't checked it in more than a year, waiting for the bold. Now it's ready.

It has several character sets (Latin, Greek, Cyrillic, IPA), special glyphs for some humanist work, and, best of all, has true bold, italics and small caps which harmonize nicely with the rest of the text.

It's free for non-commercial use — which describes most of us conlangers — so give it a try. I've been working on a personal language document with this font, and it really is very nice. I'm not 100% fond of the italics, but I'll put up with that for true small caps and a well-integrated polytonic Greek.

Sunday, May 12, 2013

Conlanging with LaTeX, Part Two

In the previous post I suggested a basic LaTeX tutorial you might use to get a basic command of LaTeX. I'm going to assume everyone reading this has played around a little with LaTeX.

Before you can produce any document in LaTeX, you need to tell it a little about what you intend. The very simplest trussing for this will look a lot like this:

\documentclass{article}

\begin{document}
Saluton!
\end{document}

The space between the \documentclass and \begin{document} lines is called the preamble, and this is were you can put all sorts of other declarations to change how LaTeX works, either by changing its default behavior or by adding new functionality. For this post, I'm going to mention a few things that are useful for conlangers to have in their preambles. Specifically, I'm going to focus on what LaTeX calls packages. Fortunately, if you do a web search on most LaTeX packages you can get good documentation on how to use them effectively.

The first thing you should know, is that the font size can be changed in the \documentclass line. I usually like a 12pt font, but you can also ask for 10 or 11 points. As always, you need to use other packages to get more font size options.

\documentclass[12pt]{article}

By default, LaTeX has rather large margins. I have no need for so much whitespace, so I use the fullpage package to pull out the margins to something less wasteful of paper:

\documentclass{article}
\usepackage{fullpage}

\begin{document}
Saluton!
\end{document}

And that's all you need to say. Simply by using the package, the changes you want take effect.

The next big thing is a package to manage fonts. In the old days, dealing with fonts in LaTeX was truly a nightmare — strange font names, freaky encodings, fonts themselves in a special LaTeX format, fights between different packages and font expectations, etc. These days, the XeTeX version of LaTeX has much simpler font management capabilities, though you still have to do a little work.

For XeTeX to find a TrueType or OpenType font, it needs to be installed in the usual places your OS would put the font, since it relies on local mechanisms to find them.

There is a utility package that helps manage all this, fontspec:

\documentclass{article}
\usepackage{fullpage}

\usepackage{fontspec}
\defaultfontfeatures{Mapping=tex-text}
\setromanfont{Gentium Basic}
\newfontfamily\gplus{Gentium Plus}

\begin{document}
Saluton!
\end{document}

So, what I'm doing here is loading up the package, then immediately running some commands provided by that package to set some font defaults. The \defaultfontfeatures line tells XeTeX I want to use the normal, old-fashioned LaTeX digraphs and trigraphs for certain kinds of characters. For example, it will convert three consecutive minus signs into an em-dash (—), in the usual LaTeX way. If you omit this line, many examples of LaTeX you might find on the web may break in subtle ways for you.

The next line, \setromanfont picks the default font for the document. I like the Gentium family, since it has lots of accenting support, as well as Ancient Greek, which I often find myself using.

The next line lets me create a font command. It turns out, the Gentium Plus font has much better support for IPA characters, so when I want to type IPA, I can use the \gplus command to get the IPA. Note that you have to enclose the commands created that way in curly-braces to limit their effect. An example from my Kahtsaai grammar:

 \item Double \LL{ł}, \LL{łł}, is 
    pronounced [{\gplus ɮ}:].

So, the \newfontfamily command needs a command name, which you choose, and then a font name. Here, I picked the name \gplus (the leading backslash is required for all LaTeX command names).

The fontspec package is vast and powerful, allowing many interesting effects. You can look at the documentation to learn about more of its capabilities. I will just add that it is common for LaTeX documentation to have a large section at the end with the actual package code, with explanations. Most of the time, that is safely skipped.

I like to use different sorts of underlining in examples, for which the package ulem is very useful. Just use \usepackage{ulem}, and then you get some new LaTeX commands:

\uline{Just a normal underline.}
\uuline{A double underline}.
\uwave{A wavy underline}.

Some people will want to use the tipa package, which provides a funky encoding for IPA. I don't use it these days, since I don't always like the look of the output.

These are the most basic packages I use. There are a few more, but they are complex enough, or add such large new functionality, that I will save them for future posts.

Do experienced LaTeX-er conlangers have other basic packages to recommend, other than things like multicol, makeidx, multicol or hyperref, which I hope to talk about more in the future?

In the next post, I will talk a bit about defining your own simple macros to ease some formatting tasks, and tables tables tables...

Friday, April 26, 2013

Conlanging with LaTeX, Part One

One common set of questions in conlanging forums is about how to organize the material, the grammar, the dictionary, lessons, etc. While there are some dedicated language tools out there, most of them are fairly complex or expensive. So most people just use word processors for their grammars and sometimes spreadsheets for their dictionaries, assuming they use computers at all.

At this point, I'm prepared to say there are no good tools for writing a dictionary. There are tools out there, but they tend to be very tricky to use well, assuming the hobbyist conlanger can even afford the cash or the time to invest in such tools. And for tools to let people collaborate on a lexicon? Forget it.

So, I just write my dictionaries as text. Here's an example lemma for Kahtsaai,

No spreadsheet is going to produce anything that looks like this without a great deal of programming. It might be nice to have a nifty tool to manage a dictionary entry like this, but a general tool to do that would be so complex that I'm not sure it would be worth the effort.

Because I want my grammars and dictionaries to look good, I had to pick something nicer than a plain text file or even HTML. I went with LaTeX, a very sophisticated typesetting system that started out in the world of mathematics and the sciences, but which humanities folks are starting to learn to appreciate. Unlike a word processor, which is WYSIWYG, "what you see is what you get," LaTeX takes a different approach. You type up your document in a special typesetting language, and then you feed that to a LaTeX program which spits out your document after making all the typesetting decisions and formatting for you. Paraphrasing, you tell LaTeX what you intend, and it produces the nicest possible output matching your intent.

In LaTeX simple things are simple. You could typeset a printed letter in it, and except for some messing about at the start of the file, what you had to type wouldn't look much different from an email (though the output would be far nicer). But, LaTeX is programmable, and is thus capable of very sophisticated things. Here, for example, is a semantic map which was described entirely in TikZ, a graphics language that exists for LaTeX,

It is this ability to do sophisticated things when you need to that makes LaTeX such a powerful tool.

Due to an early encounter with old Latin grammars, I prefer to typeset my grammars with bold face for text in the language, italics for translations, and just the normal font for English explanations. But, rather than tell LaTeX to bold everything in my conlang, I write a macro which I enclose all my conlang in. That way, if one day I decide to format everything differently, I just have to change the macro, run the LaTeX program again, and voilà! out comes a new version of my grammar with everything changed to the new way. I wrote a set of macros to typeset my dictionary entries in the way I prefer.

Reasons a conlanger might want to use LaTeX:

It's programmable, and thus easy to make sweeping formatting changes with minimal effort.
Modern versions speak UNICODE natively, so it's good for fun character sets and accents galore.
Modern versions can also use almost any font you want.
The output is gorgeous.
Conlangers love tables, and LaTeX has very powerful table capabilities.
Cross-references are useful in grammars, and LaTeX has a powerful reference system, which can produce clickable citations in a PDF.

In the next few blog posts, I am going to explain some features of LaTeX that would be most useful for conlangers. I cannot do a full tutorial on LaTeX. One good tutorial is Learn to use LaTeX, but there are many on the internet easily found by search. I recommend you practice with a few quick and simple documents before reading the other posts.

LaTeX is free software, and there are several different distributions out there. I strongly recommend TeX Live. It's sort of large, but it will have all the extra linguistics packages you want to use, and it includes XeTeX, the most powerful modern LaTeX engine, which speaks UNICODE natively and has far, far nicer font management tools. It's the best choice for conlangers. I will assume XeTeX for all my posts on LaTeX.

In my next post, I will go a bit more into detail about the things you'll want in your LaTeX preamble to make XeTeX pick the best fonts for multilingual work. And maybe start in on tables.