Home For Fiction – Blog

for thinking people

Patreon LogoPatreon

August 18, 2020

Vocabulary Richness Ratio: a JavaScript Analyzer for Fiction Texts

Programming

javascript, literature, programming, vocabulary, writing

I’ve been reading a lot of Yukio Mishima’s fiction lately. In awe of the richness of his vocabulary, I decided to code a little program that analyzes a novel and tells us how diverse the author’s word choices are. Enter Vocabulary Richness Ratio, my newest JavaScript experiment!

Just like my Fiction Complexity Index attempt, the Vocabulary Richness Ratio is:

I’ve combined my expertise in literature and creative writing together with my interest in coding, and this little program was the result. Again, I must emphasize that it’s only a work-in-progress. It can give you a hint of how rich the vocabulary of your book is, but it’s not exact science.

vocabulary richness ratio
Vocabulary Richness Ratio is a JavaScript program that finds unique words in a text and, together with other factors, estimates how diverse the book’s vocabulary is

Vocabulary Richness Ratio: the Program

Let’s start with the program right away, so you can try it for yourself. Afterwards, I’ll talk about the hows and whys of it.

The code only runs locally – i.e. on your own browser, as you visit this page. It doesn’t send or save your file anywhere. If you’re interested in it, you can see it on my GitHub page.

Note: The program is hosted on raw.githack. Since it’s a free service, 100% uptime cannot be guaranteed. If the program doesn’t appear below, please try later. The demo texts might also not work (for developers: it’s a CORS issue that I’m too bored to fix).

Click to run the program

The key number to keep in mind is, self-evidently, the Vocabulary Richness Ratio. As the code instructions indicate, typical values range between 8-13%. Values over 13% indicate a particularly rich vocabulary, whereas values below 8% indicate a repetitive, not so rich vocabulary.

How It Works

The code goes through the entire narrative and counts several elements. To name a few: It calculates word count, number of adjectives, unique words and adjectives, and others.

Afterward, it returns the Vocabulary Richness Ratio, by calculating the average between several other ratios. Briefly, factors that affect the result are:

Moreover, the program returns a list of uncommon adjectives that are used more than once. The idea is to detect tendencies to overuse such adjectives.

Like many other of my programs, this one too uses the wonderful RiTa library.

Caveats and Observations

There are some caveats about this Vocabulary Richness Ratio program. You should keep them in mind.

Bottom line, feel free to experiment with the program, but don’t take the results too heavily! Genre is a weird concept anyway, and programs dealing with genre can be off the mark.

Vocabulary Richness Ratio: What’s Next

The obvious next thing to do would be to improve accuracy. In other words, I plan to continue testing it with a variety of texts, to see what tweaks I still need to implement.

As I mentioned earlier, if you’re interested in the code, feel free to take a look at the dedicate Vocabulary Richness Ratio page on GitHub.