Skip to main content

Conlanging with LaTeX, Part One

One common set of questions in conlanging forums is about how to organize the material, the grammar, the dictionary, lessons, etc.  While there are some dedicated language tools out there, most of them are fairly complex or expensive.  So most people just use word processors for their grammars and sometimes spreadsheets for their dictionaries, assuming they use computers at all.

At this point, I'm prepared to say there are no good tools for writing a dictionary.  There are tools out there, but they tend to be very tricky to use well, assuming the hobbyist conlanger can even afford the cash or the time to invest in such tools.  And for tools to let people collaborate on a lexicon?  Forget it.

So, I just write my dictionaries as text.  Here's an example lemma for Kahtsaai,
No spreadsheet is going to produce anything that looks like this without a great deal of programming.  It might be nice to have a nifty tool to manage a dictionary entry like this, but a general tool to do that would be so complex that I'm not sure it would be worth the effort.

Because I want my grammars and dictionaries to look good, I had to pick something nicer than a plain text file or even HTML.  I went with LaTeX, a very sophisticated typesetting system that started out in the world of mathematics and the sciences, but which humanities folks are starting to learn to appreciate.  Unlike a word processor, which is WYSIWYG, "what you see is what you get," LaTeX takes a different approach.  You type up your document in a special typesetting language, and then you feed that to a LaTeX program which spits out your document after making all the typesetting decisions and formatting for you.  Paraphrasing, you tell LaTeX what you intend, and it produces the nicest possible output matching your intent.

In LaTeX simple things are simple.  You could typeset a printed letter in it, and except for some messing about at the start of the file, what you had to type wouldn't look much different from an email (though the output would be far nicer).  But, LaTeX is programmable, and is thus capable of very sophisticated things.  Here, for example, is a semantic map which was described entirely in TikZ, a graphics language that exists for LaTeX,
It is this ability to do sophisticated things when you need to that makes LaTeX such a powerful tool.

Due to an early encounter with old Latin grammars, I prefer to typeset my grammars with bold face for text in the language, italics for translations, and just the normal font for English explanations.  But, rather than tell LaTeX to bold everything in my conlang, I write a macro which I enclose all my conlang in.  That way, if one day I decide to format everything differently, I just have to change the macro, run the LaTeX program again, and voilà! out comes a new version of my grammar with everything changed to the new way.  I wrote a set of macros to typeset my dictionary entries in the way I prefer.

Reasons a conlanger might want to use LaTeX:

  • It's programmable, and thus easy to make sweeping formatting changes with minimal effort.
  • Modern versions speak UNICODE natively, so it's good for fun character sets and accents galore.
  • Modern versions can also use almost any font you want.
  • The output is gorgeous.
  • Conlangers love tables, and LaTeX has very powerful table capabilities.
  • Cross-references are useful in grammars, and LaTeX has a powerful reference system, which can produce clickable citations in a PDF.
In the next few blog posts, I am going to explain some features of LaTeX that would be most useful for conlangers.  I cannot do a full tutorial on LaTeX.   One good tutorial is Learn to use LaTeX, but there are many on the internet easily found by search.  I recommend you practice with a few quick and simple documents before reading the other posts.

LaTeX is free software, and there are several different distributions out there.  I strongly recommend TeX Live.  It's sort of large, but it will have all the extra linguistics packages you want to use, and it includes XeTeX, the most powerful modern LaTeX engine, which speaks UNICODE natively and has far, far nicer font management tools.  It's the best choice for conlangers.  I will assume XeTeX for all my posts on LaTeX.

In my next post, I will go a bit more into detail about the things you'll want in your LaTeX preamble to make XeTeX pick the best fonts for multilingual work.  And maybe start in on tables.


  1. Thanks for writing this, William. I've stolen lots from your stuff in the past, especially the Na'vi grammar. I think I would benefit most if you would start at the beginning and go through your layout. Also, I haven't seen your lexicon layout, so that would be wonderful too. Thanks again.

  2. This post series is a great idea! I myself use LaTeX (well, more specifically XeLaTeX) for everything except work stuff, but unfortunately my Moten dictionary isn't in LaTeX yet (because I use SIL's Toolbox to manage my dictionary, and it doesn't have LaTeX export capabilities). I've wanted to create my own Toolbox-to-LaTeX export system, but I lack the knowledge to do it. Hopefully your future posts will help me make the first steps towards that goal.

    By the way, I really like your example lemma. In my dictionary native words are also typeset in bold, but translations are in normal font. I like the idea of having them in italics. They would stand out better compared to the rest of the entries. I'll have to see if I can steal that idea (Toolbox's export system uses a Word template, so in principle I should be able to do make such a change easily in my dictionary. However, Word templates tend to be black magic, much less clear than LaTeX code, so don't hold your breath! :) ).

  3. hi William, did you write a post about typesetting a dicitonary using latex? although i master latex for math formulas quite well, i'm all new to dictionary typesetting, and there's not much useful stuff out there, even on the ling-tex list. would you be willing to share your code? pleeease! :) see you, chris

    1. I have not yet gotten to the part about dictionary layout. That is the ultimate goal, though.


Post a Comment

Popular posts from this blog

The Ultimate Dictionary Database System

Is text. End of post.Ok, it's not quite that simple. You probably want some sort of structured text, semantically marked up if possible. But at the end of the day, all you can really rely on is text. Why Spreadsheets SuckFirst, the format is proprietary and often inconsistent across even minor version changes. You will be in a world of hurt if you want to share your dictionary with anyone else.Second — and this is the biggest problem by far, assuming you're trying to make a naturalistic conlang — a real dictionary for a real language does not look like this:kətaŋsleepkətapbookkətəshangnail on the left little finger which interferes with one's needleworkkəwatreekəwahnoodlekəwecomputerkəweŋhardA few words between two languages might have (nearly) perfect overlap, and the early history of word in a conlang might start as a simple gloss, but a simple word-to-word matching is profoundly lying to you for a real language, and in a conlang signals a relex.A real dictionary ent…

Kílta metaphor: SALT IS VITALITY

One standard feature of my current grammars for new languages is a separate section after the dictionary where I focus on particular areas of interest or difficulty. For example, copulas and verbs of existence in Kílta have a few complications, so there's a section on those. This lets me limit cross-references in the dictionary definitions to something reasonable, while still being able to give a thorough overview later.

A subsection on conceptual metaphor (Conlangery Podcast #66) is now standard in my grammars. I've recently been working out the metaphor SALT IS VITALITY (for some reason, conceptual metaphors are often given in all-caps like this). 

When I first thought about this metaphor, I spent a little while first thinking through the implications. In this instance, I already had an idiom involving salt that would interact a bit oddly with it —

Ches si tirat vuëtiso.
salt ACC give.1R-INF try-PFV
They tried to bribe me. (lit., "they tied to give me salt")
I decided …

Lexical Exploration: "bruise"

The English bruise is related to words for "crush, injure, cut, smash." The usage for blemished fruits is first attested in the 14th century.In Ancient Greek, several words related to the core sense of "crush" are also given the definition "bruise:" θλάω, τρίβω. There is also the rare-appearing word μώλωψ, "mark of a stripe, weal, bruise" which generates a denominal verb.In the Dravidian family, again, quite a few words related to "crush" or "(strike a) blow, beat," and occasionally "press," are also glossed "bruise." See for example, naci and tar̤umpu.In the Austronesian family color terms seem to be a popular source domain, as in the color root, -*dem, which generates a term in one daughter language, and the root *alem, also related to color, does in another. Also *baŋbaŋ₈, which generated terms related to a range of skin discolorations. There are other source domains, however, such as baneR, which i…