Why you should care about generative text
Generative text gets some press. Unfortunately, like many other technical fields that attract shallow coverage, generative text has been…
Why you should care about generative text
Generative text gets some press. Unfortunately, like many other technical fields that attract shallow coverage, generative text has been the subject of minor variations on the same article every few months since the mid-1950s.
The typical article on anything related to generative text (particularly generative fiction) has the following pattern:
“[Insert quote here].”
You would think that was written by a human, wouldn’t you? Well, reader, you are a dumbass, because that was written by a machine!
But don’t feel too bad, because here is an example of the machine writing something comically terrible and absurd: “[insert quote here]”.
“[Insert quote here],” says one of the three researchers in the field that we crib quotes from previous interviews from every six months.
In conclusion, I’d like to reassure you that computers won’t be writing all the novels soon. Or will they? Shock!
If you haven’t read articles about generative fiction before, no matter; they’re all like that. If you have, you know what I mean. This article is not that article. I’m going to tell you all of the interesting things about generative text that those articles didn’t cover.
You should care about generative text if you care about:
Video games have been trying to expand their clout as a narrative medium. While there are a lot of ways to do narrative in an interactive medium, one of the most obvious is to have interactive dialogue. After all, dialogue usually drives narrative in films, books, comics, and TV.
Unfortunately, dialogue trees don’t scale well: after all, they grow exponentially, and inconsistent or inadequately varied dialogue is extremely obvious. As a result, dialogue-driven interactive narrative is typically limited to big game studios, and truly complex dialogue trees are rare even in game genres that have that as their primary technical focus (such as VNs and adventure games).
The same way that, five years ago, indie game developers started looking toward procedural world generation as a way of creating large maps that allowed them to compete with large development houses’ sandbox games in terms of scale, [indie game developers now are beginning to look at the various ideas in the field of text generation](http://forum.makega.me/t/procedural- narrative-dialogue/91) for ideas about automatically generating varied dialogue and dialogue trees from models of character and narrative. The next Minecraft might derive its scale from procedural expansion of NPC dialogue instead of procedural expansion of the map.
Right now, because of the use of advertising for monetization of content, a lot of internet ‘journalism’ is clickbait — in other words, content optimized for page view counts rather than for sustained attention. Clickbait is generated at low cost by content farms; it’s poor-quality because being high- quality is a net loss, and it’s short because that’s cheaper than being long. Clickbait optimizes for two things: number of ads on a page and number of people who will click a link. However, generative text is extremely promising for content farm owners: after all, even if you’re paying content farmers pennies an hour, you’re still paying more for these humans than you would for machines, who can generate far more content far more quickly with only slightly lower average quality. The kind of A/B testing that content farms use for optimizing their headlines is, furthermore, a perfect match for existing methods by which relatively simple AIs can use feedback to improve their headline generation — and AIs are already pretty good at generating clickbait headlines. In other words, machines might well easily replace the lowest end of internet journalism.
At the same time, text generation is already beginning to supplant the lowest end of traditional paper journalism, with various organizations automatically generating minor financial and sports stories. This frees up human writers to be put on more interesting stories, or to alternately be fired, depending upon the skill of the writer and the financial situation of the news agency.
Both of these effects are truly huge potential economic shifts in these industries. And, they have the potential to be truly positive, as well. Consider Buzzfeed, which makes its money off inane clickbait content and then turns around and funds wonderfully deep serious journalism by serious journalists about interesting subjects with all that shit-click money: if they automated their low-quality high-lucre content, they could shift more of their workforce toward high-quality journalism. Alternately, the slightly lower quality of machine-written articles versus content-farmed articles might accelerate the devaluation of clickbait and cause [alternatives to ad-based monetization](https://medium.com/@enkiv2/alternatives-to- advertising-7af0e32b8a8e#.r7943d2xy) to become more popular more quickly.
The effectiveness of generative text is the result of an interplay between the design of the generator and the human audience. The best generators lean heavily on the human element, using rich associations and loaded structures to convince the reader to project meaning onto the text, which itself is very often structurally simple. All of this is to say that a large part of the design of text generators is psychology. Invert this, and it’s not at all surprising that text generators are being used by experimental psychologists to probe the human mind.
Just recently, there’s been press coverage of a study using a new-age BS generator to study personality traits associated with the projection of meaning, as well as of a [group ](http://www.nytimes.com/2013/01/06/opinion/sunday/can-computers-be- funny.html)[of](https://www.abdn.ac.uk/ncs/departments/computing- science/standup-315.php) older studies using joke generators to study the mechanics of humor. Text generation allows psychology experiments to scale up and to have extremely fine control over the material they use; when text generators used in psychology experiments have their source made available, later experimenters can tweak the generator in various ways in order to easily test variations on the original experiments, and text generators can be hooked up directly to systems like Mechanical Turk that allow experimenters to expand their studies outside the college campus.
The spam industry is the only segment of the tech world to really take text generation seriously. Spammers have been using techniques like text spinning to trick both humans and AI filters for more than a decade. As technology improves, spam will get better. Any advances in text generation will probably be employed by spammers first.
[A few years ago, Amazon had a problem with machine-generated reference books of poor quality](http://www.theguardian.com/technology/2011/jun/23/ebook-spam- problem-growing). These reference books would be produced based on search queries, which were fed into Wikipedia and the resulting pages combined, initially into a print-on-demand book but later into ebooks. It’s a little unclear to me how Amazon fixed this problem, but it doesn’t seem to be such a big deal anymore.
However, this is not the only con in Amazon’s ebook ecosystem. Today, the big money is in creating hyper-targeted erotica[1, (http://thehustle.co/part-2-confessions-from-the-scammy-underground-world- of-kindle-ebooks), (http://thehustle.co/part-3-confessions-from-the-scammy- underground-world-of-kindle-ebooks)]. Consumers of erotic fiction often don’t care very much about prose quality, or are prevented by fear of social stigma from being vocal in their criticism of poor-quality prose in erotica. Erotic fiction is extremely popular, and hyper-specific subgenres have their own categories on Amazon, which means that it’s fairly easy to get to bestseller status in a single category (and thus have a boost in sales resulting from a listing on bestseller pages).
Again, text generation is a very good fit for erotica. It is easy to generate arbitrary amounts of poor-quality erotica. Erotica is either effective or comical: even bad erotica is good. While existing cons for Amazon erotica ebooks often involve taking public domain erotica and publishing it with a few easily automated changes, there’s no reason that brand new erotic fiction couldn’t be generated in various categories at the kind of rate that only machines can keep up with.
Partly because of the hype behind Slack and Siri, conversational UIs have become [trendy recently](https://medium.com/@enkiv2/this-is-not-no-ui-it-s- text-based-ui-c4e5991b7edb#.cn9qgdc8u). Those of us of my generation will recall the failure modes of conversational UIs. (Remember SmarterChild?) Understanding how to perform good text generation and take advantage of the Eliza Effect will help future conversational UIs feel less static and more lively.
Determining whether or not forming an emotional attachment to an AI-driven corporate mascot is a good thing is left as an exercise for the reader.
[There’s a long history](http://mathesonmarcault.com/index.php/2015/12/15/randomly-generated- title-goes-here/) of musicians and authors employing ‘writing machines’ to help produce inspiration from old content. These ‘writing machines’ vary in details, from forms of traditional bibliomancy to dadaist or surrealist writing games to oulipo-style constrained writing to the use of computer programs for scrambling text. [Phillip K. Dick used the I Ching to determine the plot of his award-winning novel The Man in the High Castle](http://www.philipkdickfans.com/literary-criticism/frank-views- archive/vertex-interview-with-philip-k-dick/); William S. Burroughs, Thom Yorke, and [David Bowie](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwjpxs- roKfKAhUEJR4KHVpbDWMQqQIIHzAA&url=http%3A%2F%2Fwww.bbc.com%2Fnews%2Fentertainment- arts-35281247&usg=AFQjCNFYxGWyrcdjoq05de- xRTteiZS0Wg&sig2=Vsc5K3OnLiucThMFM7KkvA&bvm=bv.111396085,d.dmo) all used cutups to inspire their work (going so far as to sometimes use the text produced by cutups directly); Doctor Seuss’s unique style was largely determined by [heavy constraints on his vocabulary](http://jamesclear.com/dr- seuss).
Text generation technology presents a new set of ‘writing machines’ for authors, poets, and musicians to collaborate with and build upon. The difference is that, where previous mechanisms primarily created stylistic affectations or merely inspired narrative tangents, more recent text generation technologies are capable of producing a variety of engaging and interesting narratives by themselves.
Outside of the more traditional forms, pure text generation has come into its own in the form of text generator driven twitter bots. A variety of pitch bots use simple templates to produce amusing and evocative ‘pitches’ by combining familiar forms with mismatched corpora. Bots exist that automatically generate biting satire of overused, shallow, or damaging trends.
[Canonical link](https://medium.com/@enkiv2/why-you-should-care-about- generative-text-52496cb74beb)
Exported from Medium on September 18, 2020.