Detail #434: Number, Clusivity, Personal Pronouns

March 29th, 2023

I imagine this might actually be something that exists in a language. Consider first and second person pronouns and number. Normally, the number marked is the number of the group discussed. I.e. when I say 'we', I might very well be the single person present who belongs to this 'we'. Of course, there's clusivity which can clarify this, but let's consider plural 'you'. This may very well be uttered towards a single person who represents a group that mostly is absent.

Is there any language that encodes both the number of the group it refers to, as well as the number of persons currently present out of that group?

In part, however, this might even be a bit redundant, and we could introduce a further complication beyond the redundancy.

The obvious uses are:

1-singular-plural: I who am the only person present, and some people
2-singular-plural: you who are the only person present, and some people

Is the second slot meant to signify number of non-present, or is it meant to include 'the full number of referents'? These two give different interpretations:

interpretation 1: 1-plural-plural: I, and some people
interpretation 2: 1-plural-plural: I, and some people who are present, and some other people

Thus, we here have two options: conflate the distinction whenever several members of the group are present, or distinguish them thus:

1-plural: I and some people who are present
1-plural-plural: I, and some people who are present and some people who aren't

Naturally, this should be easy to extend to duals and trials.

An interesting simple approach for a conlang could be this though: just have singular and plural, and distinguish by number of addressees.Also, 'I' can mean 'we' if only I, out of the whole group, am present.

Word break conventions and emergent typology

March 16th, 2023

I've been doing a lot of free-form writing in Koa this year and it's been a pretty revealing experience. There's nothing that exposes semantic gaps and structural shortcomings like trying to write complex, expressive prose; initially all my writing felt unbelievably clumsy, with none of the grace, sophistication or subtlety that I try to embody when I write in other languages I know well. After a month or two, though, I feel like I'm starting to find my voice in Koa -- or maybe more accurately, Koa is finding its own nascent voice.

This is really the first constructed language in which I've navigated this process and it's fascinating (and intimidating): coming from a place of only having written single, unconnected example sentences, how does the language in question construct, say, a whole paragraph? How does it flow structurally? I feel so practiced in other areas of language design, but here I'm just doing my best to move through it all in an intuitive way without getting hung up on my own anxiety. Someday I'll have to try to actually articulate some of these emergent principles, but I think they need time to emerge further first.

In the mean time, another thing that came up as I began keeping a regular journal in Koa was a discovery I only made when I tried to read what I'd written later on. For one thing, I knew theoretically that production and comprehension were different disciplines, but I wasn't quite prepared for just how unpracticed I was at understanding my own language. It makes sense: I'd really never had the opportunity to try to interpret speech or writing coming at me before! In response I added a word recognition module to my vocab learning program; previously it had only been testing me on production in the target language.

More surprisingly, though, it turned out that the way I've always represented Koa is kind of hard to parse. Here's an example block of text written in the traditional style:

Ta lai la ka ásulo ta la ko vúakupu e ko mivami, sii, ta mene la ko kóuva e tule lai la ni. Ni si vima poli lo kopato ve hua i cu misucu, ala he lopu poka i pea pono e ka lila ni sai i si kali. E ka tana i kali i koe ka sena. Hala kehe nu lu nike la ko mova ka kecu, ka nu lu ete la ko mupea ka háote nu ne kene koa.

As soon as I start to read it my eyes sort of go out of focus; with such a rapid stream of little words it's hard for me to keep track of where I am in the text, let alone where I am in the syntactic tree. As a result, over the past month I've been experimenting with writing roots with their particles attached to them. The precise rules about what should be attached and what should be left separate are still developing, but the essence of the system has come together nicely. Here's what that previous paragraph looks like with the new conventions:

Talai lakaásulota lakovúakupu e komivami, sii, tamene lakokóuva e tule lai laní. Nisivima poli lokopato ve hua i cumisucu, ala helopu poka i pea pono e kalílani sai i sikali. E katana i kali i koe kasena. Hala kehe nulunike lakomova kakecu, kanuluete lakomupea kaháotenu nekene koa.

Even though this was unfamiliar, I instantly found it massively easier to parse. Allison said that made sense to the extent that there were many more word shapes now for the brain to grab onto; it's also entirely clear which particles belong to which roots, and morpheme clusters mirror natural intonation groups. Here's an attempt to articulate the principles of the system.

1. Particles whose scope is a predicate -- regardless of how complex it is is -- are written together with that predicate. This may require the use of additional accentuation where possessive pronouns and directionals are suffixed to the root.

ninasitemuláheta = "I couldn't make him leave"

2. Particles whose scope is a clause with a pronominal subject are joined joined to that clause (but see point 6)

nisánota lakomutulu kakúmumani  = "I said it to make my teacher angry"

3. Particles whose scope is a clause with a full subject NP are separated from surrounding words

nitovo ko le Kéoni i cutule = "I hope that John will come"

4. Predicate clusters -- compounds and incorporated objects -- are written together, but plain adjectival phrases are not joined to their head nouns

kalopuviko = "the weekend," but
kapasano vime = "the last statement"

5. Pronominal particles follow the same rules as predicates when used as the head of an NP, but must be marked with an accent.

laní = to me
nahunú = none of us

6. Certain particles, principally with clause-level scope, are always written separately: i, e when it means "and," au, ai, ha when it means "if," ve when used as a complementizer, and ko when used alone as a complementizer (this list may not be exhaustive). Le is also separated from its head to avoid muddlement with capitalization and foreign words.

One point of uncertainty: when a particle is written separately from its head but is itself within the scope of other particles, are those particles also separated or should they be attached to the "frontmost" one? For example, which of the following should be the convention?

nisánota lakole Kéoni i cutule
nisánota lako le Kéoni i cutule
nisánota la ko le Kéoni i cutule
"I said it so that John would come"

I'm not sure yet; I'll get back to you after more experimentation. I suspect a standard will shape itself over time.

A bunch of this, incidentally, may actually be an artifact of trying to smoosh Koa into an alphabetic writing system. If the language could be written with a syllabary rather than an alphabet, and if there were some marking that identified the stressed syllable of predicates -- in other words, if predicates were instantly differentiated visually from particles -- then there would be a much closer match between writing and Koa's native structure.

But what, then, is Koa's native structure? I had always thought of it as a basically isolating language, but one thing that really surprised me when I first saw text written with these new conventions is how...agglutinating it looks. I'm sort of shocked that I've never asked this question before, but...where does the structure of Koa really fit, typologically?

The language is certainly about as close as you can get to monoexponential in that each morpheme is (theoretically) encoding one and only one semantic, and since I've been thinking of all particles and predicates as individual "words," my unconsidered classification of isolating seemed justified. But looking at forms like this one from above...

"I couldn't make him leave"

...I really wonder on what grounds I would not call that a "word." A word constituting a complete sentence, with seven morphemes, which a Turkish speaker could feel right at home with. And if that resemblance isn't just incidental but in fact diagnostic, then classifying those first five morphemes as "particles" is obscuring something important: they're actually prefixes. Occupying slots, in a specific order. Like an agglutinative language.

I'm actually not sure how to make a ruling on this, and more thought and research may be required. Some of those particles certainly can stand on their own in certain contexts -- nate "no, I can't," or keka sa? ni "who is it? me" -- and maybe more revealingly, the pronouns can appear to be gapped: 

"the one who couldn't make him leave"

"the one I couldn't make leave"

On the other hand I've vigorously maintained previously that gapping is in fact not the best explanation for these structures despite the fact that it's possible to draw the trees that way. It may be that this new word break convention and the kinds of apparent agglutinative "words" it produces is itself also obscuring some of the true nature of the base structures. Ultimately this is not a question of graphical representation -- whether we write ni na si te mu lahe ta or ninasitemuláheta -- but what's really happening below the surface. And I'm starting to tie my brain in knots which is a pretty clear sign that I need to put down this problem for a bit.

More to come, clearly.

A parting of the ways

March 8th, 2023

My decision last month to remove all predicate roots beginning with /j/ from the Koa lexicon sparked a significant artistic crisis. I tried to accept the replacements, but as time passed I was confronted with a growing feeling that this change was not okay. I loved those proscribed roots, loved the variation in syllable structure that they provided, and realized that I would like Koa less without them; worse than that, that it would feel like it had lost some of the essence of itself. It would feel like it was no longer mine.

I was clearly right when I said that this phoneme had no place given Koa's charter, but it just doesn't matter: apparently at this point the language has developed such a strong sense of itself, especially after all the vocabulary creation and writing that's been going on this year, that honoring that personality is actually more important. The charter was supposed to be an inspiration, not a prison, and the fact is that I love what Koa has become so much that I would rather change the limits than stifle the language to fit within them.

This may seem like a lot of fuss over 20 roots and a marginal phoneme, but this is the first time I've ever consciously and intentionally prioritized aesthetics over the language's ease or clarity. It's uncomfortable, but also unquestionably the right decision.

Emboldened by this I've found myself thinking crazy thoughts, like considering adding another consonant phoneme. I experimented with [ŋ] and was shocked to discover that I actually loved it, and that it "felt" like Koa despite the fact that it would be completely off the deep end charter-wise. I don't know that I'll really go down that path, but it's sort of a wonderful thing that after 23 years there is something that Koa "feels like" to such a clear extent that it can begin to direct its own course into territory I'd never imagined.

Over the weekend I reinstated all my exiled vocabulary. It was a tremendous relief. Honestly I think I would have died on that hill for iolo alone.

Taadži Liguistics

March 1st, 2023

Lauren Kuffler is a computational geneticist and hobbyist conlanger. They are a Ph.D. candidate in Mammalian Genetics at Tufts Graduate School of Biomedical Sciences, focusing on the 3D context of genetic-epigenetic interactions affecting gene expression. They have a lifelong interest in linguistics. The Taadži language is their first conlang to escape private notebooks. They have been working on the language and its associated worldbuilding for two years.

Tade Taadži is the representative conlang of an ongoing worldbuilding project, focusing on a culture that arises from dispossessed peoples transported to an isolated archipelago. This article will provide a brief historical context for the language, describe its grammar and demonstrate its logo-phonetic writing system with example sentences and an illuminated text. Notable features include an extensive system of ligatures in formal texts, and a five-gender personal pronoun system.

Version History

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License

Lament for the vanished on-glides

February 19th, 2023

Although /j/ as a formal phoneme had been officially nixed a few years previous, beginning in 2011 a number of words appeared in Koa with an initial [j] sound: iolo "joyful," iuna "train," iune "steal," and so on. This was possible because of the adoption of a series with this sound among the particles: ia "yes, definitely," io "already," etc., for which I could think of no objection.

As of this morning the language contained around 20 predicates with this on-glide. I still think the particles are fine, but it suddenly crystallized for me that there was a serious problem with the predicates: the accidental adoption of this phoneme created a growing functional load on the distinction between prevocalic [j], [i] and [ij]:

ka kane iolo
[ka kane jolo]
"the joyful man"

ka kane i olo
[ka kane i olo]
"the man smells (something)"

ka kane i iolo
[ka kane i jolo]
"the man is joyful"

As I've been experimenting with writing Koa without spaces between bound morphemes (post eventually forthcoming) the problem became very stark: the phrases above come out kakane iolo, kakane iolo, and kakane iiolo! It's even worse when the preceding predicate also ends with /i/:

ka hapi iolo
[ka hapi jolo]
"the joyful ant"

ka hapi i olo
[ka hapi i olo]
"the ant smells (something)"

ka hapi i iolo
[ka hapi i jolo]
"the ant is joyful" here we're making a distinction between [ijo], [iio] and [iijo]. Heavens above. As much as it -- truly, sincerely, kind of agonizingly -- grieves me, these just couldn't stand: in an artlang, sure, but not with Koa's charter. They just weren't meant to be.

And so, glumly, this afternoon I went through and reassigned all of these predicates. Some of them feel okay, others may take some getting used to or find themselves replaced eventually. The hardest one by far was iolo: there is just no other sequence of sounds that more clearly communicates joy to me after having it as a core predicate for more than ten years. I feel like I want to keep it around as an archaic alternative usable in poetry.

Anyway, for posterity, here are the lost on-glide roots; farewell, and I'll remember you always.

iaho -> auho "flour"
iali -> ali "put away"
iane -> ane "cord"
iapu -> epu "spit"
iehi -> ehi "hate"
iela -> sela "whole, unbroken"
ietu -> cetu "dishonor"
ieva -> teva "gradual"
ioco -> oco "copper"
iolo -> elo "joyful"
iomu -> omu "meat"
ioni -> coni "yoni"
ioti -> toti "perseverate"
iotu -> enu "curious"
iovi -> kovi "wise"
iule -> ulu "apart"
iuna -> vona "train"
iune -> lune "steal"
iuve -> uve "fall short"

Ooh, some of these are still not feeling great...I can tell I'm going to have to give myself time.

Consonant use statistics

February 2nd, 2023

This morning I focused for the first time on the fact that my little random word program -- the database that suggests Koa roots in need of meanings -- is suggesting roots containing /c/ a disproportionate amount of the time. This in itself wasn't surprising: /c/ only returned to active use about 15 months ago, so it would make sense that more roots containing it would be available. It made me wonder, though, just how much variation there is in consonant phoneme frequency in Koa. I ran some numbers...

This was not quite what I expected! It turns out that as of this morning at 11:30am, of the 840 roots assigned meanings so far, the average number of words containing a given consonant phoneme is 135.5. That puts /h m n s t/ right in the middle with approximately equal frequency. My expectation about /c/ was correct, with roots containing it only representing 46% of average...but who knew that /p/ is way down there too at only 69%? I knew I had a bit of anti-bilabial-stop bias -- Seadi didn't even have those phonemes originally, explaining them away via some extremely convenient historical change -- but I certainly was not aware of its having been working so effectively in the background of Koa word creation.

On the other end of things, /k/ and /l/ are significantly overrepresented at nearly 150% of average! ...Which also kind of makes sense because they're also favorites of mine.

I guess it just hadn't occurred to me that my own personal aesthetics would have figured so prominently in root choice with respect to phoneme frequency! I must have expected that each consonant would appear approximately equally, as odd as that would have been cross-linguistically?

That raises a really interesting point, though, which I also had never considered: the particular character of Koa as it has always existed manifests these frequency biases. Like any language, the phonemes are represented unequally, and that gives it an important part of its unique phonological character. As such, moving towards greater uniformity -- as my random picker would automatically tend to do -- would, over time, actually alter the feel of Koa.

And if I like the phonological aesthetics as they've been up to this point -- which it turns out I do -- I may actually not want to continue generating words this way! I'm not sure yet exactly how I'll do this, but what we really want is for the randomness to be weighted -- towards words with Koa's favorite phonemes, and away from words with those it prefers less -- such that a random sample of suggested words would tend to show the same frequency distribution as the language as a whole.

I almost wonder if I should go back to an earlier version of the file, run these numbers again, and use those statistics; the program potentially had a noticeable impact on the frequencies with those 200+ words in the past couple months. Though...on the other hand I was still vetting the choices so my aesthetics were still probably in force, even if being nudged. I could figure out the statistics of the recent additions on their own just to be sure.

Anyway this is certainly an interesting little surprise for me to ponder.

Art & Anxiety: Conlanging through imposter syndrome

February 1st, 2023

Jessie Sams is a Professor of Linguistics at Stephen F. Austin State University. She generally teaches courses rooted in linguistic analysis of English, though one of her favorite courses to teach is her Invented Languages course, where students construct their own languages throughout the semester (she was even able to get Invented Languages officially on the books at SFA with its own course number). Her research primarily focuses on syntax and semantics, especially the intersection of the two within written English quotatives; constructed languages; and history of the English language and English etymology. Since 2019, she’s worked as a professional conlanger on the Freeform series Motherland: Fort Salem. In her free time, she enjoys reading, hosting game nights with friends, baking (especially cupcakes), and, of course, conlanging.


In this essay, Jessie Sams discusses some of the major personal hurdles she has to overcome as a conlanger, and introduces a new personal conlang she’s working on, Zhwadi.

Version History

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License

Etymology statistics

January 29th, 2023

Just a point of interest as I continue to organize my lexicon...

Of the 790 predicate roots assigned so far:

* 163 (21%) are derived from Finnish
* 57 (7%) are derived from Hawai'ian, Sāmoan, Tongan or Māori
* 75 (9%) are derived from other languages (Arabic, Basque, Bislama, Chinese, Doraja, Esperanto, French, Greek, Icelandic, Irish, Japanese, Latin, Lapine, Latvian, Malay, Nahuatl, Polish, Proto-World [ha ha], Quechua, Quenya, Russian, Seadi, Spanish, Swahili, Tagalog, Turkish, Swahili, or broad international usage)
* 34 (4%) are internally-derived

This means 295 (37%) of the current Koa root stock was derived in some way from other languages, compared with 495 (63%) that was either randomly generated, internally derived, or selected/created in some way (unfortunately there's no good way to distinguish randomness from intention reliably at this point). I find these figures a little surprising: it was my impression that the significant majority of Koa words was based in something -- to the point that I was stymied for a long time in creating more vocabulary when I couldn't find enough existing linguistic inspiration. Also, again, let's just pause for a second to acknowledge that Finnish has provided a fifth of Koa vocabulary.

Worthy of special mention are 6 roots (1%) that were created by friends or family members -- I'd love to swell that number moving forward!

A first Koa publication

January 27th, 2023

In response to my children's repeated requests, I decided late last year that I would do my best to assemble a printed, kid-oriented, concise Koa dictionary in time for the Solstice. As the project took shape it grew beyond my original intention, eventually including a phrasebook and mini-grammar as well, and in the end I was pretty pleased with it as a snapshot in time of the development of this language.

It was also an opportunity to buckle down to some serious vocabulary creation, which had been languishing a bit in recent years; I'm pretty happy to finally have words like siki "particle," mohi "predicate," lelo "sentence," cóepo "alphabet" and címihale (or halecimi...more on that soon) "grammar."

In fact the process of creating needed vocabulary for the Úputusi Énasi sort of unblocked me and I've been on a bit of a rampage since then, coining around 200 new words over the past two months. What's been amazing is discovering that all that toiling in the syntactical, pragmatic and morphological mud I was doing in 2021 moved the structure of Koa to a place where now vocabulary is its primary need. Suddenly having all these words available, I'm finding that the language is much more speakable than I had previously expected, and with surprising expressive power.

As of today at noon, 774 of Koa's 3330 possible predicates have been defined. Emotional vocabulary has been my focus of late, but I'm starting to wonder what other thematic categories deserve some attention. Materials? Science? Botany? Civics? Geography? I've always been so intensely focus on word-worthiness and concerned about running out, but after 23 years I've still only used up a quarter of my possible roots!

Anyway, this was such a fun project that really jump-started some major progress after a pretty slow year. Unfortunately the dictionary doesn't seem to have inspired my girls toward total Koa fluency yet, but surely it's only a matter of time...

...And if you'd like to download the whole thing, a PDF is available here.

Ni Ceso

January 23rd, 2023

This is a difficult moment in my life. It's not the first such moment that's passed since I started this blog, but it is the first time that I was actively working on Koa enough to have something to say. What this means, of course, is that I'm now going to make you read sad love poetry.

Seriously, though, this is the first native-Koa artistic composition since Aika Konuku in 2012, and the first non-translated work of poetry of any kind. It feels significant! It features Koa’s growing collection of emotive vocabulary, particularly "ceso" which means something like "incurious" but less highbrow: the opposite of "curious," desiring not to know, feeling pulled away from rather than towards understanding of something; it also makes heavy use of modal particles and clause nominalization.

As with the previous poem, I find myself really liking how compact, elegant and balanced Koa can be for poetry. I didn't expect this but it makes me happy! My shot at an English translation definitely loses some or all of that particular aesthetic sense of the Koa original.

Ni Ceso

Ni ceso
Noia na vi sano ni
Ka se cu nike he tana
Ka so cu ete mo kune
Ka ne se simo he ko meti pe to níkete
Ka ne se simo he ko meti pe to mehe
Ka ne se simo he ko meti pe ka kecu.

Ni ceso
Kelo se na te lu tai me ni
Kemo sisu ve se ca ma tala ko halu ni
Ka ma lolo se simo mo iule o ni
Ni na te koma ka natepakoma.

Ni na lu koma
Ni pavasu lo ko tala
Ni cu te hitui la hete to lise mo cali.
Ni na lu koma ka natepakoma
Ni na lu koma kelo se na te halu ni
Noia na vi sano ni
Ni ceso.

-Váhumaa, 2023-01-22


I'm Not Curious

I'm not curious
Please don't tell me
Who you're seeing today
What you're doing together
What's in your heart when you think about that meeting
What's in your heart when you think about that person
What's in your heart when you think about the future.

I'm not curious
Why you can't want to be with me
How hard you're still trying to want me
What's holding your heart back from me
I can't understand the incomprehensible.

I don't want to understand
I'm worn out with trying
I could smash myself against that wall forever.
I don't want to understand the incomprehensible
I don't want to understand why you can't want me
Please don't tell me
I'm not curious.

-Portland, 1/22/23