Home For Fiction – Blog

for thinking people

Patreon LogoPatreon

July 18, 2022

Structuring Language for Automatic Text

Programming, Writing

creativity, imagination, language, programming, writing

If you’ve used Planet Generator, you must’ve noticed how it offers, among other things, what I refer to as “civilizational data”. These also include trivia for the imaginary cultures of the program, in the style of “Arranging a visit to Orphne? Avoid Flöchixäwu — and its rather ferocious beasts”. Such phrases are examples of automatic text.

To be clear, these examples are not entirely automatic – in the sense, they’re not made out of thin air, perhaps using AI or at least combining texts from other sources. Rather, they’re based on syntactic patterns I’ve offered the program, together with sets of words to choose from.

But it’s precisely this simplicity that makes this strategy attractive. It’s trivial to use, and the possible combinations it can come up with is staggeringly high.

So, in this post, I’m offering you a look under the hood of Planet Generator, showing you how it generates its automatic text. It’s easy, educating (in terms of teaching us how language operates), and revealing.

automatic text
“Natives of Damon are considered potentially obstinate” – or that’s what the automatic text of Planet Generator tells me…

Automatic Text Structure

It all begins with structure. Imagination and creativity are essential here, and a good command of the English language helps, too.

So, the first thing I came up with was some basic types of sentences I wanted the program to create. For instance:

Many inhabitants of Jupiter dislike eating raw meat.

The next step was to identify all parts of speech and phrases in the sentence and offer alternatives wherever possible. It will be easier to visualize with brackets:

{Many} {inhabitants of} {Jupiter} {dislike} {eating} {raw} {meat}.

Then, I prepared arrays of possible substitutions. For example:

var x = ["many", "some", "a few", "all", "a number of", "most"];
var y = ["inhabitants of", "people on", "visitors to"];

You get the idea. It’s then trivial to simply pick a random item from each array and combine them all together.

Wildly Many Combinations

At this point, you might think that these options are quite limited. As in, the number of possible combinations can’t be that high.

It’s true that many of those will be similar. I mean “Many inhabitants of [planet]” and “Some people on [planet]” are fairly similar. But before we get there, let’s see how many different combinations we can actually get. To do that, we need to multiply the array lengths.

Not taking into consideration the planet name, and assuming arrays of five items on average, a sentence structure like the one above can return something like 16,000 different combinations.

That’s a substantial number. And guess what? That’s only the beginning.

Enriching Variety with MakeWord()

MakeWord() is the name I gave to a simple function in the program that, as its name reveals, makes random words. The name “Flöchixäwu” you read in the introductory paragraph is such an example. It has several ways of rendering automatic text like that, and here’s an easy way it can be used to offer even greater variety to the sentence we used above:

var z = ["raw", "uncooked", "dry", "overcooked", "warm", makeWord()];

See what can happen here? We can get a sentence like:

“Most people on Jupiter avoid eating Flöchixäwu meat.

In the context of the program, which is meant also as a worldbuilding tool for authors, such combinations can be extremely useful.

home for fiction

How Such Automatic Text Can Become Less Repetitive

As I noted above, you might rightly wonder whether phrase like “Many inhabitants of [planet]” and “Some people on [planet]” are that much different. It’s true, they’re not. And that’s why the program uses many different sets of structures. Also note that it’s trivial to scale up the sets – as well as the arrays of options themselves.

Currently, the program uses around 30 different sentence structures. Most of them are more complex than the example I used in this post, but even if we assumed similar complexity (16,000 combinations for each structure), that would mean 1 chance in 480,000 of getting the exact same phrase. It goes without saying that this doesn’t include made-up words, which increase this exponentially.

Another possibility is to randomly add (or leave out) additional clauses. Take a look at this:

var add1 = ["However", "Interestingly", "Peculiarly", "Amazingly"]; 
if (Math.random() < 0.5) {
    phrase += (pickRandFromArray(add1) + ", " + subclause);
}

You can guess the idea. We could get something like:

The Hebe language has 9 types of verbs. Interestingly, the oldest, ‘häxoi’, is understood by just 400 inhabitants of Hebe.

Further Randomization for Such Automatic Text

If we wanted to leave the very simple confines of such examples, we could use some tools to make this way, way more diverse. Indeed, Planet Generator already uses the awesome RiTa library as part of its random word creation process, and for a couple of other uses. Another thing to try would be using the Datamuse API to add, for instance, suitable adjectives before nouns. For an example of this, check out my… “Horoscope” Generator.

And of course, for more serious use, playing with AI/ML can be a pretty interesting exercise – one that, I do confess it, I’m too lazy to try.

Note: Interested in a program that creates an entire language for you? Try my Fantasy Language Generator!