I write pretty fast, for a human. But last night, I accomplished my greatest blogging feat ever: seven different, accurate blog posts, written in the span of 2.5 seconds. Posts like “Home prices in San Diego Metro Area rise,” and “Home prices in Greater Boston fell.”
And you know what? If I’d felt like it, I could have written many more posts—say, 100,000 more—without breaking a sweat.
I’m not bragging. Anyone could do this. The software I used to write these posts is called Wordsmith; it’s an automated writing platform created by a North Carolina-based company called Automated Insights that uses large data sets to write natural-language stories. For the last several years, several big media organizations have been using Automated Insights’ software to generate millions of stories about corporate earnings, fantasy football reports, and the like. If you’ve read lots of news in the past year or two, you’ve probably read a Wordsmith story, perhaps unaware that it was written by a bot.
This week, Automated Insights opened up Wordsmith to the public for the first time. Users can sign up to take part in a beta test, and a wider rollout is scheduled for January.
“It’s a new way to write,” Automated Insights CEO Robbie Allen told me. “The writing process has largely been unchanged for a long time.”
Writing news stories using automated software has its risks. And not every type of story can be bot-written. (Basically, in order to work, the story has to be about a structured, patterned data series. So weather reports and stock price stories can be automatically written, but not, say, stories about Demi Lovato conspiracy theories.) But the price for cash-crunched newsrooms is right, and for certain types of formulaic, by-the-numbers stories, it makes sense to farm out the writing to software, rather than have humans peck away at it.
Automated Insights gave me early access to the beta test for Wordsmith. So naturally, I hopped in, and started writing some stories.
The first step to creating a story with Wordsmith is finding the right data set. In order to work, the story has to be able to be generated from a structured data set known as a CSV (comma-separated values). CSVs are basically spreadsheets containing numbers and organized, patterned text, and finding good ones can be a little tricky. (I tried using a CSV of local crime data from the St. Louis Police Department, but the data wasn’t properly formatted for use in Wordsmith, and I gave up after a half-hour of failed attempts to fix it.)
However, Automated Insights provided me with a sample CSV, containing some made-up home prices for several large metropolitan areas around the U.S. Using that data set, I was able to generate seven different, accurate stories with a single click.
After finding your data set, the first step to writing stories with Wordsmith is to make a template. If you want to generate stories about housing prices in Los Angeles, using the percent change as a variable, you might start your story:
“Good news, homeowners: in the last month, home prices in Los Angeles have risen <percent>.”
But, of course, if housing prices fell in the last month, that wouldn’t work. You’d want your story to say “Bad news” instead of “Good news,” and “have fallen” instead of “have risen.”
To make that happen, Wordsmith uses what’s called “branch logic.” Click “add branch,” and it pulls up a menu like this, that allows you to use simple mathematical formulas to tell the software which words to write. So in this one, I’m telling Wordsmith to write “Good” if the home sales from the current month are greater than the home sales from the previous month, and “Bad” if the previous month’s sales were better. (I could also give it more flowery word choices, like “Fantastic” or “Terrible,” but I stuck to the basics.)
Then, after you’ve got all of your branches set up, you end up with something that looks like this:
In this example, I’m telling the software to create a short story using 10 different categories of data: the region, the short name of the region, the current month’s sales, the previous month’s sales, the top subregion, the number of sales in the top region, the current month’s inventory, the previous month’s inventory, the media sales price in the current month, and the median sales price in the previous month.
I’m also giving the software some limited choices, so it doesn’t sound totally robotic. (In the second paragraph, for example, I gave it the choice of saying “fell,” “dropped,” or “sank.”)
And then I clicked “generate.” In seconds, the software spit out 7 stories, using all the data it had. (I could have given it much, much more data, but the sample I had was limited.)
So, were these stories convincingly human? Well, you tell me. If you skimmed this story in a local Phoenix newspaper, would you know it was generated by a bot?
Bad news, homeowners. In the last month, home prices in Phoenix Metro Area have fallen. Overall, 3,214 houses were sold in Phoenix over the last 30 days, with Phoenix County leading the way with 3,032 sales.
Potential buyers take note: the median sale price in Phoenix fell to $424,000, while the available housing inventory rose.
There are now 3 months of home inventory left in Phoenix.
Go find a bargain, buyers!
I probably wouldn’t. And that’s the point—for certain types of basic stories, it makes sense to let bots do the work, and free up humans to do more creative types of reporting. You can imagine types of news organizations that might do well with automation: financial blogs, sports sites, local news sites that cover crime and weather, food blogs that cover restaurant openings and closings, sites that report the results of auctions or drafts.
Wordsmith (which will cost an unspecified amount—Allen told me they’re “still working out the details”) won’t be putting reporters out of work any time soon. It’s still pretty primitive, as auto-writing software goes. It can’t do sentiment analysis, apply machine learning to target specific readers, or any other fancy tricks. All it can do is push data through pre-written templates, and use rudimentary branch logic to produce some limited variability. If you know how to do even basic coding, you could make something very similar for free.
And it doesn’t really make sense to say that Wordsmith “wrote” the stories it produced. After all, I still had to type out all of the words in the template, create the branch logic to make sense of the data, and fact-check the output. Perhaps someday, Wordsmith will be able to create certain kinds of templates on its own. But for now, it’s a labor-intensive automation process.
But Wordsmith is an accessible product, even if you can’t code, and it represents an important step toward the future of outsourcing some very basic types of reporting to software, and freeing up reporters from some of the mundane parts of their job. Allen told me he envisions all kinds of people using Wordsmith in the future—a Little League coach who wants to send his players’ parents automated recaps of games using the box scores, a company whose managers want to send custom sales recaps to each of their direct reports every day or week without having to type it all out.
That kind of stuff shouldn’t be written by humans—it’s a waste of our unique talents. And soon, it won’t be. Unless you’re a Little League reporter who’s happy reporting on play-by-play recaps, the robot invasion is good news.