Do Androids Dream of Becoming Writers?

AI, MT, NLG and other acronyms that may threaten the translation industry

iPad_face

Artificial intelligence (AI) is often portrayed as a portent of doom for pretty much anything that we humans do to make a living for ourselves. As a translator, people are always asking me if I see machine translation (think Google Translate) as a threat to my own livelihood, and, fortunately, I don’t think it is or even will be anytime soon. But what about automated content generation?

Machine translation (MT) isn’t much of a threat mainly because the quality of the text generated is still pretty poor, but the same can’t be said of natural language generation (NLG). Content generators today are pumping out billions of pieces of content each year, anything from financial analysis to recaps of sporting events, and it's all content of pretty decent quality. So is this a greater threat to writers and translators than machine translation?

A recent blog post by the people over at Automated Insights, for example, makes the case that their Wordsmith product doesn’t take writing jobs away, but actually makes our jobs better. Essentially, the argument is that content generators write content that (1) wouldn’t exist without being automated or (2) is “grunt work” that can now be offloaded from overworked writers, leaving them free to write higher-value content.

In my translation work, I’ve translated a ton of financial reports. The “quality” of these reports has actually given me good reason to wonder how long it can be before they are all machine generated, because surely a content generator can already do a better job than the actual humans writing this dreck now. And if a report can be auto-generated in one language, why not auto-generate it in all of the languages the company needs to publish it in. No need for a translator at all, human or otherwise.

This would certainly be a threat to financial translators, unless, by auto-generating the boring bits of the report, companies redirect their efforts onto actually improving the quality of their reports by adding the type of content that can’t be machine generated. In essence, both machine translation and automated content generation suffer from the same limitation: they have no grasp of the context and subtext that aren’t expressed in the words (in the case of translation) and data (in the case of content generation) alone.

But will this always be the case?

 

Writers, artists, and other creators tend to write all this AI business off on the idea that a computer could never be truly “creative”, but we're beginning to see that this isn't necessarily true either. We're seeing computers that write poetry or create other works of “art”. And, as alluded to above, content generators can already pump out more content at higher quality than we humans, collectively, are doing today. But the two brief sports reports shown here, for example, point to that context limitation of automated content generation.

Image from “sports reports” article referenced above

Image from “sports reports” article referenced above

One was machine generated (not machine translated), and the other was written by a human. Can you tell which is which?

Exhibit A, of course, is the machine-generated content, but it isn't linguistic errors or awkward, unnatural word choice that tells us this (as would likely be the case with machine translation). Taken individually, each sentence looks and sounds like it could have been written by a human. What tips us off to this being automated content is the type of information provided and the manner in which it is conveyed.

The content-generation software took the raw data from the basketball game and presented a selection of facts from that game in an essentially logical sequence. The human-authored story, on the other hand, shows deeper and broader analysis of the context surrounding the game, such as pointing out interesting comparisons in the performance of particular players or describing the significance of a certain player's performance over time.

Now, Wordsmith can make comparisons like these, too, but it needs to be fed the right data, and an actual person will then configure the system to make the comparisons that will be the most interesting to the reader, but this still points to the human advantage of context. With human input, Wordsmith can generate some really good content, but a human writer can take it a step further to better explain the significance of the what has happened.

We’ve heard that a million monkeys at a million keyboards could produce the complete works of Shakespeare; now, thanks to the Internet, we know that it is not true.
— Robert Wilensky

While AI is becoming “creative” to a certain degree, this should be mostly a good thing for the human creators among us. There can be no doubt, though, that content generation will be disruptive to the writing and translation industries as a whole. Right now, for example, massive quantities of dreadfully boring text are being written and translated, but as the bulk of this gets automated, both writers and translators will need to up their games in order to keep pace with the changing needs of their industries. As a result, hack writers and translators may find it more difficult to find sufficient quantities of decent paying work to continue making a living by just bottom-feeding.

 

(UPDATE: Based on feedback from Automated Insights, the article has been updated to reflect the fact that Wordsmith can be configured by a human to make more complex analyses of specific aspects of the data it receives.)

Previous

Culture, Competition, Context, Collaboration

Next

An Analysis Model for Multilingual Content Management