Home For Fiction – Blog

for thinking people

There are no ads, nor any corporate masters
How to show support


June 17, 2020

Fiction Complexity Index: Calculate Your Novel’s Genre Positioning

Literature, Programming

fiction, genre, literature, programming

4 comments

Romance novels aren’t as complex as literary fiction. Similarly, historical fiction is more complex than, say, young-adult fantasy. I’ve been thinking, we need a Fiction Complexity Index. Moreover, we need a Fiction Complexity Index by genre; a number that can give us a rough estimate of whether our novel is “about right” in terms of complexity.

And so, I decided to make one!

The good news is, this little Fiction Complexity Index is something you, too can benefit from. Because I decided to code it in a way that allows anyone to upload their novel (as a .txt file) and immediately see the results.

Fiction complexity index
A Fiction Complexity Index can help us see whether a text is suitable for the intended genre

A Fiction Complexity Index (in alpha stage)

As the instructions below emphasize, only .txt files are accepted. If your novel is in e.g. .docx form, simply copy & paste the text into Notepad, then save it as .txt. Page breaks, formatting, and other such details don’t matter.

The code (available at the end of this post) doesn’t send or save your file anywhere. It’s JavaScript, which means it only runs locally – i.e. on your own browser, as you visit this page.

Click to run the program

As I said, this is an entirely “alpha stage” piece of code, that I put together in one afternoon.

The code part is fairly simple – my input, in any case. I decided to start by measuring the ratio between punctuation and the length of the novel, as it’s a sign of short structures. At the same time, I made a differentiation between, on the one hand, periods and commas, and, on the other, exclamation and question marks.

The reason?

My experience as a writer, reader, and editor, has helped me discover that genre fiction (particularly of the lower end of complexity) deploys a lot of such punctuation.

The code also incorporates a much more sophisticated readability analysis by Simo Ahava – a great resource for JavaScript developers working with texts; highly recommended.

How to Interpret the Results

Or, in other words, why is a Fiction Complexity Index useful?

As you notice from the little table I included with the program, each genre has different requirements. More complex does not necessarily mean it’s “better” in some way.

Rather, complexity needs to be in accordance with generic expectations. A text that is too simplistic for literary fiction or hard science fiction, might be exactly right for romance or young adult fantasy.

Conversely, a text that is complex, lyrical, and highbrow, might indeed be great for literary fiction or some historical and Gothic fiction, but utterly unsuitable for more accessible genres.

It’s a Tool; not Panacea

Like all tools, this Fiction Complexity Index (which, I remind you, is only an “alpha” product and the result of afternoon boredom) is only that: a tool. It’s up to you to take it and ponder on its results.

If the code claims your novel is less complex than the reference range, this could mean (take it with a pinch of salt) that your descriptions might be too short, that your characters might be too simple, and that your novel might feature little reflection.

Conversely, if the code claims your novel is more complex than the reference range, you might have too lengthy descriptions or meandering inner dialogues.

Still, I must emphasize it once more: no program – not even mine 😉 – can be any kind of authority on your novel. You are the only such authority. True, experience and knowledge can help you reach better conclusions. But the proverbial buck stops with you.

Fiction Complexity Index: The Code

If you’re looking for the code, here it is.

<html>
<style>
body {
	background-color:black;
	color:cornsilk;
	font-family:Verdana;
	margin:10px;
	padding:10px;
}
p {
	font-size:1.4em;
	margin-top: 10px;

}
@media only screen and (max-width: 600px) {
    	p{
	font-size:1em;
	margin-top: 5px;
}
.button {
	font-size:1em;
}
.button2 {
	font-size:1em;
	margin-top:10px;
}
#select {
	display:none;
}
</style>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<div id="select">
	<p>Select the file containing your novel<span style="font-size:0.7em"><br><br>Please note that only .txt files are accepted. For best results, remove unnecessary details, e.g. acknowledgments, table of contents, etc.</span>
	</p>
	<input type="file" accept=".txt" id="file_input" class="button" />
</div>
<div id="results" style="display:none">
	<p id="ci">
		The Complexity Index will appear here…
	</p>
	<p style="font-size:1em;">
		Indicative Values:<br><br>
		15-22: Middle Grade, Young Adult<br>
		20-26: Romance Fiction<br>
		22-32: Crime Fiction, Fantasy, Horror<br>
		30-45: Literary Fiction, Science Fiction, Gothic Fiction<br>
		40-55: Historical Fiction<br><br>
		<span style="font-size:0.9em">Values lower than 15 or higher than 55 are atypical, and would respectively indicate a too simplistic or too complex structure.</span><br><br><span style="font-size:0.9em;">Furthermore, keep in mind that there can be significant genre overlap. A young-adult science fiction novel should return far lower values compared to a so-called hard science-fiction novel.</span>
	</p>
	<button class="button2" onClick="location.reload();">Start Over</button>
</div>
<script>
var text;
var array = [];
var counter = 0;
var emotive = 0;
var wordcount = 0;
var read;

$(document).ready(function(){
$("#select").fadeIn(1000);
$('#file_input').on('change', function(e){
    	if (this.files[0].type.match('text/plain')) {
            readFile(this.files[0], function(e) {
		    text = e.target.result;
		    array = text.split(" ");
		    wordcount = array.length;
		    $("#select").fadeOut(1000);
		    setTimeout(
			    function() {
				    $("#results").fadeIn(3000);
		    }, 1000);
		    playMe();
	})
        } else {
            alert("Only .txt files are allowed!");
        }
})
})
function readFile(file, callback){
	var reader = new FileReader();
	reader.onload = callback
	reader.readAsText(file);
}

function checkReadability() { 
	readabilityFunction = new readabilityFunction();
	read = readabilityFunction(text).automatedReadabilityIndex;
}

function playMe() {
	
	for (var i = 0; i < array.length; i++) {
		if (array[i].match(/[.]/)) {
			counter = counter+3;
		}
		else if (array[i].match(/[!?]/)) {
			counter = counter+2;
			emotive = emotive+1;
		}
		else if (array[i].match(/[,;]/)) {
			counter = counter+1;
		}
		else if (array[i].match(/[…]/)) {
			emotive = emotive+1;
		}
	}
	
	checkReadability();
	document.getElementById("ci").innerHTML = ("Complexity Index:" + (Math.round(read*(wordcount/(emotive+counter)))));
	
	

}

function readabilityFunction () { <!-- by Simo Ahava, https://www.simoahava.com/analytics/calculate-readability-scores-for-content/ -->
  return function(text) {
  	var sampleLimit = 10000;
    var sentenceRegex = new RegExp('[.?!]\\s[^a-z]', 'g');
    var syllableRegex = new RegExp('[aiouy]+e*|e(?!d$|ly).|[td]ed|le$', 'g');
    var freBase = {
      base: 206.835,
      sentenceLength: 1.015,
      syllablesPerWord: 84.6,
      syllableThreshold: 3
    };
    var cache = {};
    
    var punctuation = ['!','"','#','$','%','&','\'','(',')','*','+',',','-','.','/',':',';','<','=','>','?','@','[',']','^','_','`','{','|','}','~'];
    
    var legacyRound = function(number, precision) {
      var k = Math.pow(10, (precision || 0));
      return Math.floor((number * k) + 0.5 * Math.sign(number)) / k;
    };
    
    var charCount = function(text) {
      if (cache.charCount) return cache.charCount;
      if (sampleLimit > 0) text = text.split(' ').slice(0, sampleLimit).join(' ');
      text = text.replace(/\s/g, '');
      return cache.charCount = text.length;
    };
    
    var removePunctuation = function(text) {
      return text.split('').filter(function(c) {
        return punctuation.indexOf(c) === -1;
      }).join('');
    };
    
    var letterCount = function(text) {
      if (sampleLimit > 0) text = text.split(' ').slice(0, sampleLimit).join(' ');
      text = text.replace(/\s/g, '');
      return removePunctuation(text).length;
    };
    
    var lexiconCount = function(text, useCache, ignoreSample) {
      if (useCache && cache.lexiconCount) return cache.lexiconCount;
      if (ignoreSample !== true && sampleLimit > 0) text = text.split(' ').slice(0, sampleLimit).join(' ');
      text = removePunctuation(text);
      var lexicon = text.split(' ').length;
      return useCache ? cache.lexiconCount = lexicon : lexicon;
    };
    
    var getWords = function(text, useCache) {
      if (useCache && cache.getWords) return cache.getWords;
      if (sampleLimit > 0) text = text.split(' ').slice(0, sampleLimit).join(' ');
      text = text.toLowerCase();
      text = removePunctuation(text);
      var words = text.split(' ');
      return useCache ? cache.getWords = words : words;
    }
    
    var syllableCount = function(text, useCache) {
      if (useCache && cache.syllableCount) return cache.syllableCount;
      var count = 0;
      var syllables = getWords(text, useCache).reduce(function(a, c) {  
        return a + (c.match(syllableRegex) || [1]).length;
      }, 0);
      return useCache ? cache.syllableCount = syllables : syllables;
    };
    
    var polySyllableCount = function(text, useCache) {
      var count = 0;
      getWords(text, useCache).forEach(function(word) {
        var syllables = syllableCount(word);
        if (syllables >= 3) {
          count += 1;
        }
      });
      return count;
    };
    
    var sentenceCount = function(text, useCache) {
      if (useCache && cache.sentenceCount) return cache.sentenceCount;
      if (sampleLimit > 0) text = text.split(' ').slice(0, sampleLimit).join(' ');
      var ignoreCount = 0;
      var sentences = text.split(sentenceRegex);
      sentences.forEach(function(s) {
        if (lexiconCount(s, true, false) <= 2) { ignoreCount += 1; }
      });
      var count = Math.max(1, sentences.length - ignoreCount);
      return useCache ? cache.sentenceCount = count : count;
    };
    
    var avgSentenceLength = function(text) {
      var avg = lexiconCount(text, true) / sentenceCount(text, true);
      return legacyRound(avg, 2);
    };
    
    var avgSyllablesPerWord = function(text) {
      var avg = syllableCount(text, true) / lexiconCount(text, true);
      return legacyRound(avg, 2);
    };
    
    var avgCharactersPerWord = function(text) {
      var avg = charCount(text) / lexiconCount(text, true);
      return legacyRound(avg, 2);
    };
    
    var avgLettersPerWord = function(text) {
      var avg = letterCount(text, true) / lexiconCount(text, true);
      return legacyRound(avg, 2);
    };
    
    var avgSentencesPerWord = function(text) {
      var avg = sentenceCount(text, true) / lexiconCount(text, true);
      return legacyRound(avg, 2);
    };
    
    var fleschReadingEase = function(text) {
      var sentenceLength = avgSentenceLength(text);
      var syllablesPerWord = avgSyllablesPerWord(text);
      return legacyRound(
        freBase.base - 
        freBase.sentenceLength * sentenceLength - 
        freBase.syllablesPerWord * syllablesPerWord,
        2
      );
    };
    
    var fleschKincaidGrade = function(text) {
      var sentenceLength = avgSentenceLength(text);
      var syllablesPerWord = avgSyllablesPerWord(text);
      return legacyRound(
        0.39 * sentenceLength +
        11.8 * syllablesPerWord -
        15.59,
        2
      );
    };
    
    var smogIndex = function(text) {
      var sentences = sentenceCount(text, true);
      if (sentences >= 3) {
        var polySyllables = polySyllableCount(text, true);
        var smog = 1.043 * (Math.pow(polySyllables * (30 / sentences), 0.5)) + 3.1291;
        return legacyRound(smog, 2);
      }
      return 0.0;
    };
    
    var colemanLiauIndex = function(text) {
      var letters = legacyRound(avgLettersPerWord(text) * 100, 2);
      var sentences = legacyRound(avgSentencesPerWord(text) * 100, 2);
      var coleman = 0.0588 * letters - 0.296 * sentences - 15.8;
      return legacyRound(coleman, 2);
    };
    
    var automatedReadabilityIndex = function(text) {
      var chars = charCount(text);
      var words = lexiconCount(text, true);
      var sentences = sentenceCount(text, true);
      var a = chars / words;
      var b = words / sentences;
      var readability = (
        4.71 * legacyRound(a, 2) +
        0.5 * legacyRound(b, 2) -
        21.43
      );
      return legacyRound(readability, 2); 
    };
    
    var linsearWriteFormula = function(text) {
      var easyWord = 0;
      var difficultWord = 0;
      var roughTextFirst100 = text.split(' ').slice(0,100).join(' ');
      var plainTextListFirst100 = getWords(text, true).slice(0,100);
      plainTextListFirst100.forEach(function(word) {
        if (syllableCount(word) < 3) {
          easyWord += 1;
        } else {
          difficultWord += 1;
        }
      });
      var number = (easyWord + difficultWord * 3) / sentenceCount(roughTextFirst100);
      if (number <= 20) {
        number -= 2;
      }
      return legacyRound(number / 2, 2);
    };
    
    var rix = function(text) {
      var words = getWords(text, true);
      var longCount = words.filter(function(word) {
        return word.length > 6;
      }).length;
      var sentencesCount = sentenceCount(text, true);
      return legacyRound(longCount / sentencesCount, 2);
    };
    
    var readingTime = function(text) {
      var wordsPerSecond = 4.17;
      return legacyRound(lexiconCount(text, false, true) / wordsPerSecond, 2);
    };
    
    var grade = [];
    var obj = {};
    (function() {

      var fre = obj.fleschReadingEase = fleschReadingEase(text);
      if (fre < 100 && fre >= 90) {
        grade.push(5);
      } else if (fre < 90 && fre >= 80) {
        grade.push(6);
      } else if (fre < 80 && fre >= 70) {
        grade.push(7);
      } else if (fre < 70 && fre >= 60) {
        grade.push(8);
        grade.push(9);
      } else if (fre < 60 && fre >= 50) {
        grade.push(10);
      } else if (fre < 50 && fre >= 40) {
        grade.push(11);
      } else if (fre < 40 && fre >= 30) {
        grade.push(12);
      } else {
        grade.push(13);
      }
      
      // FK
      var fk = obj.fleschKincaidGrade = fleschKincaidGrade(text);
      grade.push(Math.floor(fk));
      grade.push(Math.ceil(fk));
      
      // SMOG
      var smog = obj.smogIndex = smogIndex(text);
      grade.push(Math.floor(smog));
      grade.push(Math.ceil(smog));
      
      // CL
      var cl = obj.colemanLiauIndex = colemanLiauIndex(text);
      grade.push(Math.floor(cl));
      grade.push(Math.ceil(cl));
      
      // ARI
      var ari = obj.automatedReadabilityIndex = automatedReadabilityIndex(text);
      grade.push(Math.floor(ari));
      grade.push(Math.ceil(ari));
      
      // LWF
      var lwf = obj.linsearWriteFormula = linsearWriteFormula(text);
      grade.push(Math.floor(lwf));
      grade.push(Math.ceil(lwf));
      
      // RIX
      var rixScore = obj.rix = rix(text);
      if (rixScore >= 7.2) {
        grade.push(13);
      } else if (rixScore < 7.2 && rixScore >= 6.2) {
        grade.push(12);
      } else if (rixScore < 6.2 && rixScore >= 5.3) {
        grade.push(11);
      } else if (rixScore < 5.3 && rixScore >= 4.5) {
        grade.push(10);
      } else if (rixScore < 4.5 && rixScore >= 3.7) {
        grade.push(9);
      } else if (rixScore < 3.7 && rixScore >= 3.0) {
        grade.push(8);
      } else if (rixScore < 3.0 && rixScore >= 2.4) {
        grade.push(7);
      } else if (rixScore < 2.4 && rixScore >= 1.8) {
        grade.push(6);
      } else if (rixScore < 1.8 && rixScore >= 1.3) {
        grade.push(5);
      } else if (rixScore < 1.3 && rixScore >= 0.8) {
        grade.push(4);
      } else if (rixScore < 0.8 && rixScore >= 0.5) {
        grade.push(3);
      } else if (rixScore < 0.5 && rixScore >= 0.2) {
        grade.push(2);
      } else {
        grade.push(1);
      }
      
      // Find median grade
      grade = grade.sort(function(a, b) { return a - b; });
      var midPoint = Math.floor(grade.length / 2);
      var medianGrade = legacyRound(
        grade.length % 2 ? 
        grade[midPoint] : 
        (grade[midPoint-1] + grade[midPoint]) / 2.0
      );
      obj.medianGrade = medianGrade;
      
    })();
    
    obj.readingTime = readingTime(text);
      
    return obj;
  };
}
</script>
</html>

Looking for other cool similar projects? Why don’t you take a look at my rhyming anapest generator or my Mansion Escape text adventure.

4 Comments

  1. Igor Livramento Igor Livramento

    Curious how it resulted in 18. Was it because it was a novella? Erotic fiction, but regarded as literary fiction by all that read it. Maybe because it’s in Portuguese? Anyways, this is interesting, but from reading the first paragraphs of the post, I thought you were going to ask for what type of fiction it was in order to figure out where the work fell in that genre’s complexity index.

    1. Chris🚩 Chris

      Inspired by your result, I tried it with something in Greek and, lo and behold, it returned something far less than I expected, considering its genre.

      You’re right, it definitely must relate to language. Perhaps English allows far more complicated secondary clauses, that are more awkward in other languages.

      Also keep in mind that the readability function I’ve incorporated into the code (by Simo Ahava) is specifically meant for English, which means results in other languages are skewed.

      I should try it in Finnish, where you can find a word such as “lento­kone­suihku­turbiini­moottori­apu­mekaanikko­ali­upseeri­oppilas”, meaning “aeroplane jet turbine motor assistant mechanic, non-commissioned officer, in training” 😀

      Edit: Speaking of complicated clauses in English – and since we mentioned David Bohm the other day – how about this one, from his Wholeness and the Implicate Order: 😛

      1. Igor Livramento Igor Livramento

        Although the sentence is long, in Portuguese it would demand some punctuation (e.g. a comma always precedes an adversative/opposition conjunction, “but” is such a conjunction). Remember that Portuguese writing is legally ruled by international treaties, so there is just so much innovation one can do before the text gets refused by any and every publishing resource.
        I think the sentence is fairly straightforward and clean, though. It is easy to understand, it moves at a reasonable pace, ideas are presented linearly and logically. But have you tried reading Pierre Bourdieu’s “Language and Symbolic Power”? Well, fuck him well fucked. It is just unbearable. I don’t know if it’s the translation, but damn some sentences are just horrid. I had that text at a class during grad. and I had to rewrite almost a whole chapter to understand it.

        1. Chris🚩 Chris

          No, I haven’t read Pierre Bourdieu, but I now feel some compulsive need to go to the library and find it— it’s like when you know something will smell bad, but you still wanna try 😛


Punning Walrus shrugging

Comments are closed for posts older than 90 days