Imagine if publishers had the ability to tell whether a manuscript had the potential to become a bestseller within minutes of receiving it… without even reading it. It sounds ridiculous, unfathomable, but, incredibly, computer scientists have already taken a big step towards making it a reality.
Researchers at an American University have published a paper explaining the workings of what they claim is a fully tried and tested algorithm, which can take a manuscript and predict – with 84 percent accuracy – whether it is likely to become a commercial and/or critical success should it be published.
I guess the most obvious place to begin breaking this study down is by asking: how does the algorithm actually work? Well, the program takes the first 1000 words of a book and analyses five predefined elements of writing style: lexical choices, distribution of word categories, distribution of grammar rules, distribution of phrasal and/or clausal tags, and distribution of sentiment and connotation.
Of course, this would be the process when a ‘new book’, unseen before, is placed into the system and the algorithm is allowed to run loose on it. What should really interest writers is how they came up with the algorithm and the qualities found in ‘successful novels’, for this will reveal what you need to do as an artist to land yourself in that ‘84% chance of commercial success’ bracket (based on these scientist’s criteria). Bear with us because the findings and, therefore, answers may well surprise you as a number of them go against conventional wisdom:
Regarding lexical choices, less successful books rely on verbs that are explicitly descriptive of actions and emotions (e.g., “wanted”, “took”, “promised”, “cried”, “cheered”, etc.), while more successful books favor verbs that describe thought-processing (e.g., “recognized”, “remembered”), and verbs that serve the purpose of quotes and reports (e.g,. “say”). Also, more successful books use discourse connectives and prepositions more frequently, while less successful books rely more on topical words that could be almost cliche, e.g., “love”, typical locations, and involve more extreme (e.g., “breathless”) and negative words (e.g., “risk”).
In terms of word categories, prepositions, nouns, pronouns, determiners and adjectives are predictive of highly successful books whereas less successful books are characterized by higher percentage of verbs, adverbs, and foreign words. Additionally, successful books make heavy use of conjunctions—like “and” and “but”.
Useful as this study is so far, Stephen King told us much of what you’ve just read years ago in his biography/writing guide, On Writing:
[T]he less successful books also rely on verbs that explicitly describe actions and emotions (“wanted”, “took”, “promised”, “cried”, “cheered”), while more successful books favor verbs that describe thought-processing (“recognized”, “remembered”) and verbs that simply serve the purpose of quotes (“say”).
One of the more interesting findings was that “readability” and the success of novels are negatively correlated. It sounds strange, but researchers explain that they don’t think it’s necessarily that readers are attracted to complicated language; rather they think that successful books tend to deal with more complicated ideas and, therefore, need more complex syntax to express those ideas.
I’m going to presume that if you are a writer this study has got your attention. Imagine if this algorithm really does work as well as the scientists claim. Imagine a piece of software that could analyse your newly-finished manuscript and tell you that there is an 84% chance that this book would be a critical and commercial success. Essentially, the software would be telling you that a publisher would be crazy not to pick up your manuscript, right?
As someone who writes and, one day, wishes to publish a novel, I have to say that the above also worries me. Imagine finishing a 140,000 word novel, chucking it into the computer and having it come back and say: ‘unlikely to be a commercial success’. What do you do then? Do you give up and start a new book, do you say ‘screw it, there is a 16% chance the computer is wrong’?
There are certainly high-profile exceptions to the rule: researchers recognised that The Lost Symbol by Dan Brown was flawed stylistically, and could have predicted a poor response by critics (which it did indeed get), but, as we know with hindsight, was a huge commercial success. The Scientists explain:
There are potentially many influencing factors, some of which concern the intrinsic content and quality of the book, such as interestingness, novelty, style of writing, and engaging storyline, but external factors such as social context and even luck can play a role. As a result, recognizing successful literary work is a hard task even for experts working in the publication industries. Indeed, even some of the best sellers and award winners can go through several rejections before they are picked up by a publisher.
Looking at it from a publishing house’s point of view, it could be a very productive way to vastly reduce their workload – if they were brave and trusted in the algorithm. Let us say the publishing house gets 10,000 manuscripts a year. Perhaps they bring in a policy that a book has to have a unique story-line, be relevant to social-context or – if familiar in terms of story-line/tropes – be absolutely incredibly well-written to be published in 2015 as their list is almost full. Rather than having to read the synopsis and a couple of chapters of each book they can simply read the synopsis and place the manuscripts in a ‘unique story-line/relevant to social-context story’ pile or ‘refused unless incredibly well-written’ pile. The unique story-line/relevant to social-context story pile, say 100 out of the 10,000, could be read to cover the publishing house against the social context and trends rule, whilst the remaining 9,900 could be chucked into the which have a ‘86% chance will it be a commercial success algorithm’. When you consider that statistics such as only 3 of 10,000 manuscripts submitted are published (I would keep in mind that I personally consider this hugely exaggerated) this kind of system could save a publishing house a heck of a lot of time… in SFF, especially, ‘slush-piles’ are legendary.
We should take a moment to say that – even if it works as the scientists claim – this algorithm is, of course, limited to an extent. It is a system that poses a risk for the publisher turning down a book (having never read it) and having it picked up by a rival publisher who does and makes millions upon millions on it. That said, how many times was Harry Potter turned down by a human being? And, also consider that, depending how successful the system is, there are massive productivity gains – through less hours having to be invested in reading manuscripts that will never see the light of day – to be benefited-from too.
Could a computer use this algorithm to write a book?
Yes. And, it’s happening already.
Essentially, what the above study has revealed is that – to some extent at least – writing is based on a formula. You give a computer a formula to follow and it can follow it.
You may or may not have heard of a man named Philip M. Parker. Philip M. Parker is the author of over 100,000 titles (700,000 if you include those listed as written by his company); all of which can be found on Amazon priced from £18.19 up to £795.00.
Well, although Mr Parker is attributed all these titles, in actuality it is his algorithm that draws upon huge content-specific databases and mimics the thought process that an expert would go through when writing about that specific topic, which has generated each of these books. Now, don’t write this off as simply a computer copying and pasting a couple of facts about a topic based on a word you enter into a search. The algorithm’s creator claims that it can write 200+ page research papers in comprehensible prose, include graphs that it makes using its own decisions on what data is likely to be best represented in such a way and format the whole things with titles and contents too. Here’s a video explaining the process in a bit more detail for those interested:
Although each of Mr Parker’s published books have so far been non-fiction, he is convinced that fiction is not too far off and has even developed a prototype:
Essentially, Mr Parker feels that because novels (and, indeed, genres) lend themselves to formulas they can easily be created. If you watched the above video you will see that in addition to a huge number of story-line and character variables, he has come up with ways to characterise and select a certain kind of writing style (e.g. influenced by Robin Hobb or full of wit, etc). Most of you will likely know Arthur Quiller-Couch’s theory that there are only seven basic plots (all a series of conflicts): well, if you’ve got an algorithm that can write stylistically excellent prose and generate one of the seven plot devices, create unique characters based on stock attributes, and follow the conventions of a chosen genre: just how much is left? Mr Parker firmly believes that ‘any creative work produced by artificial intelligence will be “successful” if it reads like a human being wrote it, or more precisely, like a human intelligence is behind the work’.
Sadly, it’s now time to pull our heads from the clouds and consider that despite the many claims looked at in this article it’s easy to let ourselves get carried away. I decided to get in touch with Mark Lawrence, author of The Broken Empire trilogy and a scientist working in the field of Artificial Intelligence, to ask his thoughts on the paper and Mr Parker’s work.
Mark began with a warning, stating that ‘it’s in the nature of scientists to treat any extraordinary claim with a degree of skepticism’. It’s a good thing really, because it’s in the nature of a Fantasy and Science-Fiction Geek – such as myself, and probably you guys too? – to jump at the idea of robots penning novels and take it to our hearts. Mark continued by telling me that based on his experience ‘With a database containing chunks of meaningful text you might be able to construct a page that makes vague sense (if the chunks are large and you work at it) but it will be meaningless garbage once you read past the average chunk size. The computer doesn’t ‘understand’ the sentences. The analysis on the paper is statistical – there’s no meaning extracted.’ Certainly, it’s hard to contest that Mark is wrong, because, despite having over 100,000 published novels on Amazon, not one of Mr Philip M. Parker’s books has more than one review and, also, none have ‘see inside’ enabled.
Essentially, as wonderful as it all sounds, we should remember that Mr Parker is promoting a product that he is selling for profit (both his books and his mass book producing algorithm). It’d be like going to a car garage, asking them how a certain car drives and taking what they say as fact… of course, they are going to skim-over/withhold the shortcomings.
What about the paper though? Well Mark sees a flaw in the logic there too: ‘The paper is interesting but it doesn’t give sufficient information to properly examine its claims and in a refereed journal you would expect that it would … As they say: there are lies, damn lies, and statistics.’ And, even if the paper is able to predict the quality of your writing style, Mark still has his doubts as to how useful it would really be: ‘to an author who has comments like “this was too dark for me” or “I want a like-able character” as far more common reasons for putting my debut down than “I didn’t like the writing” I find it unlikely that these factors could be extracted through the techniques described.’
I guess we’d expect an author to be a bit hesitant about supporting the paper and the software, right? I mean, if it is correct, we could do-away with them, each buy ourselves some writing algorithm based software – that is soon to be made available by company’s such as Mr Parker’s – and have our computers write books for us… But what about publishers? Do they see the paper or Mr Parker’s work more favourably?
Sadly not. John Wordsworth, Commissioning Editor at Headline, agrees firmly with Mark Lawrence. He feels that because this study focuses extensively on prose (as opposed to plot), it ‘spectacularly misses the point as what readers really respond to is a great story. It’s like judging a film solely by its set design.’ By this point I’m starting to think both John and Mark have good points here; as brilliantly written as a novel can be written, it is nothing without characters and plot and I struggle to believe that software drawing upon even the most sophisticated database would be able to create a plot as intricate as A Game of Thrones with as memorable characters.
John reinforced my thoughts with some advice to me and everyone else looking for a shortcut to getting published: ‘while I suggest new authors check out books by structure gurus like Syd Field and Robert McKee, they won’t write your novel for you. I think they really come into their own when you have a bunch of scenes, characters and ideas which have been percolating in your noggin, but you just can’t get to work as a cohesive narrative.’
Over the last few years, with the arrival of ebooks and ereaders, a lot has been said about the huge changes in how books are produced and distributed. Today we’ve taken a look at how the process manuscripts go through to find themselves in the laps of a commissioning-editor have potential to change in the future too. Not only that, but there is even potential for books – especially non-fiction books – to no longer require an author. We’ve also heard from authors and publishers who feel that this just isn’t very likely and, when we consider novels like A Game of Thrones, we’ve had to agree that it seems a long way off.
So, much as I hate to say it, I feel that John’s bottom-line opinion on these recent studies and claims deserves to be the conclusion of this article (I guess I can always edit it in 2050 should he be proven wrong and the robot Government threaten to lock us up for treason?): ‘Computers might be able to kick our arses at chess, but I really can’t see them ever appearing on the bestseller lists.’
But is replication an act of creation? Onceing always trounces twiceing as far as I’m concerned.
Part of the beauty of fiction is the impossibility of the reader matching the writer’s original vision to the images conjured from the words. If The Hobbit had been rendered from an algorhythmic wundertool, there’s no doubt I’d have seen precisely the same Shire ‘n’ dwarf combo as I did when I read the Real McCoigh, only without the unwitnessable imaginings of the author lurking behind the words. Fiction is non-existent nonsense, but unless there is a transfer of figment and guff from one brain to another then the words are kind of about nothing.
A well-written article, but I believe the skepticism by Mark Lawrence and John Wordsworth are completely justified.
The algorithm described in the referenced paper was tested on books available on Project Gutenberg, which is a well known database of public domain works. Now this need not be a problem, had they used historical sales numbers for the books in question, but instead they used download counts.
While many of the books in the “top downloads” category are well-known works, this information does not reflect actual sales. To further complicate matters: the top ten downloaded books includes works such as Beowulf, The Prince by Machiavelli and the Kama Sutra, none of which I’d consider bestsellers, or a proper analogy for them.
As such, I believe the paper has too much of a selection bias to be credible.
Yeah, as I say in the article, as much as I’d love it to be true and as exciting as it is in many ways: the paper and the youtube videos focus on the ‘what is possible’ as opposed to ‘what is not’. It is very cool the robots and computers can do as much as they can, but computers by nature are always going to draw upon databases and that goes against creativity in the way we know and love (although, I guess there is an argument to be had that even human creativity and dreams are built on experience and therefore ‘stock’ to some extent? CAN OF WORMS – ARGH! 😉 )
[…] if you’d like to skip the short story stage and go straight to a bestseller, Science has discovered the secret formula. Also in the works: pills that let you let you eat without gaining weight and a magic powder that […]