What can we learn from 5 million words?

There’s a popular saying that, “a picture is worth a thousand words.” But two Harvard researches want to test the validity of this saying and actually have found that some pictures can be worth much more than a thousand words.

            “Ladies and gentlemen, a picture is not worth a thousand words. In fact, we found some pictures that are worth 500 billion words,” said Erez Liberman Aiden of Harvard and Google.

            Aiden’s statement brought laughter to the attending audience as did the colorful images that explained the process of how he and fellow researcher Jean-Baptiste Michel, also a fellow of Harvard and Google, could come to such a conclusion.

“So many books actually have been written over the years,” Michel explained. “So we were thinking, well the best way to learn from them is to read all of these millions of books.” Reading that many books would take a lot of time and so is not an option for Aiden and Michel.

            Faced with the improbability of reading this many books Michel and Aiden decided to use Google, which to this point has scanned 15 million books. With Google’s help, Michel and Aiden were able to statistically evaluate the use of certain lines of text within the books. With this statistical data Michel and Aiden where able to see such things as how popular a certain event of phrase is.

“That gives us a time series of how frequently this particular sentence was used over time,” Michel said. “We do that for all the words and phrases that appear in those books, and that gives us a big table of two billion lines that tell us about the way culture has been changing.”

            Using the results of this data Michel and Aiden began to show graphs that explained much about history, including what is important to the different generations. Some of them brought laughter to the audience especially when they looked at the usage of “argh, aargh, aaargh” and found that the various levels of frustration had different levels of popularity in the last century.

            It was not only fun and games, as Aiden explains, “There are more sobering notes among the n-grams.” These moments; as he explains, they found by looking at inconsistences within the graphs and found that certain people should have been more famous for a certain time but due to various factors, they were not. Michel explained doctors, physicist and biologists don’t get mentioned much in history despite their contributions to society while actors and actresses are mentioned a lot.

                        “There are many usages of this data, but the bottom line is that the historical record is being digitized,” Michel said, which allows us to look at important aspects of human culture. It is a way for us to gaze into the past and understand what people where thinking during that time so that we may understand further our history.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s