Reblogged from: https://tekhnologic.wordpress.com/2014/11/06/google-ngrams-the-highs-and-the-lows/

Google Ngram Viewer can be used to produce interesting charts for class,check word frequency, look at parts of speech, and collocations.

I enjoy using the Ngram viewer and I think it is a useful tool for teachers and students. It is a site that I have bookmarked for those occasions when I am not sure about a word. It is definitely part of my Digital Teaching Toolkit.

The Highs

Or ‘A Study Aid for Teachers and Students’

‘The Highs’ talk about general use of the Ngram viewer.

Interesting Charts

The Ngram viewer can be used to compare how often words and phrases appear in Google’s collection of over 5 million books.

If I type two words separated by a comma, for example:

love,hate

The Google Ngram Viewer produces a chart.

‘Love’ is the blue line, and ‘hate’ is the red line. Now we have an interesting chart we can examine and use to practice the language of explaining charts and graphs.

Click to enlarge Source: http://books.google.com/ngrams
Click to enlarge
Source: http://books.google.com/ngrams

Students can discuss steady increases and rapid declines, a sharp rise and a dramatic fall. However, more importantly we can use Ngrams to practice inference.

There are three things we can infer from this graph.

  1. People write about love much more than hate, which gives me hope.
  2. People wrote more about love in past than they do today. Though, this may prove to be a false conclusion.
  3. There was a marked decrease in the amount of times the word ‘love’ appeared in the written record in 1918 and 1940. A sobering thought as we approach Remembrance Day.

It is an example however, of how the Ngram viewer can sometimes provide cultural and historical insights.

Comparing Words

The Ngram viewer is designed to compare words and their frequency. This is useful for helping us to determine which word or phrase has become more common.

Let’s type in the following words:

global warming,climate change

The Google Ngram Viewer produces a chart.

Click to enlarge Source: http://books.google.com/ngrams
Click to enlarge
Source: http://books.google.com/ngrams

Both terms seem to appear around 1985 and there doesn’t appear to be much difference until the mid-nineties when there is a marked increase in the use of ‘climate change’.

The phrase ‘global warming’ always suggested an increase in temperature, whereas ‘climate change’ could include unusual weather patterns.

British vs. American English

Try typing:

colour,color

The Google Ngram Viewer produces a chart.

colour color.png

We can now see how the American English spelling came from almost nothing to dominate over the British spelling in terms of frequency.

However, we can also see in more detail which phrases are more popular in each individual version of English.

Try typing:

at school, in school

Then change the ‘from the corpus’ box from ‘English’ to either ‘British English’ or ‘American English’. The Google Ngram Viewer produces two charts.

at-school

in-school

We can see that although in both versions of English, both ‘at school’ and ‘in school’ are used, ‘at’ is more frequently used in Britain, and ‘in’ is more common in North America.

Teachers and students now have the tools to check which word is in common usage.

The Lows

Or ‘Advanced Features’

The Lows aren’t  negative points, but they are more advanced features of the Ngram viewer that are worth considering before presenting data.

Parts of Speech

Words don’t always have the same job. ‘Love’ is both a noun and a verb. The Ngram viewer will count all instances of the word ‘love‘ unless we tell it to specifically search for nouns or verbs.

Let’s type in the following:

effect_NOUN,effect_VERB,affect_VERB,affect_NOUN

The Google Ngram Viewer produces a chart.

Click to enlarge http://books.google.com/ngrams
Click to enlarge
http://books.google.com/ngrams

By typing underscore + part of speech (_NOUN), we are able to separate words by their different function. A complete list of tags are available on the Ngram viewer’sinformation page.

The chart shows that ‘effect‘ is usually used as a noun, and ‘affect‘ is usually used as a verb and demonstrates the frequency of their occurrence in the written record.

However, the Ngram viewer doesn’t always account for human error though. It’s important to be aware that the Ngram viewer is an analytical tool not an intuitive one. Accuracy is discussed on the Ngram viewer’s information page.

Collocations

Let’s type in the following:

a bottle of *

The asterisk (*) represents a word that follows the phrase and the Google Ngram Viewer produces a chart of the most common words associated with the phrase ‘a bottle of.

Click to enlarge Source: http://books.google.com/ngrams
Click to enlarge
Source: http://books.google.com/ngrams

‘A bottle of wine’ was the most common by far, but other drinks such as champagne, water, rum and whiskey are shown on the chart.

By searching for the collocations we are able to put the phrases into more context than if we just searched for the word ‘wine.

A Final Thought

The Ngram viewer can be fun, it can be informative and it can encourage students to think critically about vocabulary. It does have some limitations but overall I think it is a useful tool to be able to refer to.

Thanks for taking the time to read this.

Take care!


Google’s Ngram viewer is best explained in a great TEDx video by two of its creators, Jean-Baptiste Michel and Erez Lieberman Aiden. Subtitles are available in over 30 languages if you download the video fromTED.com. You can also read Google’s information pageabout the Ngram viewer.

Other Links

This isn’t the first post written about Google Ngram Viewer, and it probably won’t be the last. Here are some links you might be interested in.

Larry Ferlazzo talks about Chronicle. The NY Times’ version of Google’s Ngram Viewer. (24/6/2014)

NOTE: I didn’t realise it at the time, but Larry also produced a love/hate chart using chronicle. It is interesting to compare the differences in the data representation.

Larry Ferlazzo‘s collection of posts that discuss Google’s Ngram viewer. (17/12/2010)

____________________________________________

Reblogged from: http://www.theatlantic.com/technology/archive/2013/10/googles-ngram-viewer-goes-wild/280601/

Google’s Ngram Viewer Goes Wild

With the addition of wildcard search-term capabilities, Google’s fabulous language-analysis tool gets even more powerful.

It’s been nearly three years since Google rolled out its Ngram Viewer, allowing armchair historians to plot the trajectories of words and phrases over time based on an enormous corpus of data extracted from the Google Books digitization project. Since then, there have been numerous studies seeking to glean some cultural significance from the graphs of falling and rising word usage. And the graphs themselves have inspired imitators: Recently, the engineering team behind Rap Genius introduced Ngram-style graphing of historical word frequency in rap lyrics, and, more bizarrely, New York Times wedding announcements. (You can even compare the hiphop and matrimonial datasets.)

As the Ngram model extends its influence, Google continues to tinker, making improvements to the Ngram Viewer’s already slick interface. Last year saw a major upgrade, with a sizable increase in the underlying data spanning English and seven other languages, as well as the introduction of part-of-speech tagging and mathematical operators that allowed for more sophisticated searches. Today, meet Ngram Viewer 3.0. While the corpus itself hasn’t expanded in this version, the search features have become even more useful, especially now that wildcards are in the mix.
Anyone who has spent time delving into databases knows how much flexibility you can get with wildcards: use an asterisk to stand in for any word, and suddenly your search horizons have expanded. In the new Ngram Viewer, using the asterisk as a wildcard will display the top ten most frequently appearing words that fill the slot over the range of time you have selected. The asterisk can be combined with parts of speech, too, so “*_NOUN” will find only the nouns that could appear in the sequence of words you’re searching on.

Now if you type “*_NOUN ‘s theorem” into the Ngram Viewer, you will see a graph with the ten most common names (which count as nouns) that have spawned eponymous theorems — names like Godel, Bayes, and Euler. (Right-clicking will toggle back and forth between a view tracking the different variants and one showing a single line encompassing all the variants.)
When the Google project team (Jon Orwant, Slav Petrov, and Dipanjan Das) gave me a sneak peek at the new version of the Ngram Viewer, I had no shortage of wildcard searches to test out. On Twitter, I’ve fielded questions like “Besides media moguls, what other moguls are there?” and “What can be ragtag other than a bunch?” It’s possible to answer these questions using the publicly available corpora compiled by Mark Davies at Brigham Young University, but the peculiar interface can be off-putting to casual users. With the Ngram Viewer, you just need to enter a search like “*_NOUN mogul” or “ragtag *_NOUN” and select a year range. It turns out that in 20th-century sources, media moguls are joined by movie moguls, real estate moguls, and Hollywood moguls, while the most likely things to be ragtag are armies, groups, and bands.

You can also compare different slices of the overall dataset. Let’s say you want to know the most typical prepositions that precede the street in American and British varieties of English. (Any American who has puzzled over Madness singing “Our house, in the middle of the street” will know that prepositions work a bit differently on the other side of the pond.) It turns out that in the street is the most frequent prepositional phrase in British English, while on the street currently leads the pack in American English.
All of this wildcard goodness isn’t restricted to the English section of the corpus, either. In English, you can discover that the nouns that most often serve as the object of the verb drink include water, wine, coffee, beer, and tea. But you can do the same search on the German verb trinken to find a different ranking of beverages: Kaffee (coffee) and Bier (beer) are on top, followed by Wein (wine), Wasser (water), and Tee (tea).

I expect that one salutary effect of the new wildcard searches will be to encourage more nuanced searching, instead of simply running the numbers on individual words and phrases devoid of context. Some of the scholarly work in the burgeoning field of “culturomics” has relied on Ngram data without bothering to dig much deeper than relative frequencies of single words. For instance, an article appearing earlier this year in the journal Psychological Science purported to demonstrate that “individualistic and materialistic values” are on the rise simply by looking at the changing fortunes of word pairs like give vs. get.

While get has become more frequent relative to give, does that mean we’re becoming more selfish? As Mark Liberman suggested on Language Log, the rise in get usage could be due to phrasal patterns that have nothing to do with acquiring material possessions, since get can be used with adjectives (get sick) or passive verbs (get acquainted). And sure enough, with wildcard searching we can quickly see increases in “get + adjective” (like get better, get ready, and get drunk) and “get + verb” (like get married, get involved, and get started).

In addition to wildcards, the new Ngram Viewer introduces a couple of other welcome changes: variation in capitalization and inflection can be accounted for. The previous version was always case-sensitive, but now you can check a “case-insensitive” box if you want to look at forms with varying capitalization all at once. Right-clicking on the line will then display the most common case variants, each on its own line in the graph. So, for instance, a case-insensitive search on aids will expand to show the rapid rise of AIDS (as opposed to aids or Aids) since 1980.
Similarly, different inflections, such as variants of a verb with -s, -ed, -ing, or no ending, can be split out into different lines or summed together, all with a right-click. For the interactivity of the graphs, we can thank one of the three college interns that Google hired to work on the project this summer. Jason Mann of Columbia, David Zhang of USC, and Lu Yang of Cornell all worked on the Ngram Viewer, and Mann came up with the idea of using interactive charts to break out and merge lines for different variants.

While the Ngram Viewer remains one of Google’s “20 percent time” projects, meaning that it isn’t a high priority for the engineers working on it, it is heartening to see continued improvements to satisfy all of us Ngram-heads. It’s also notable that Google’s dictionary, which pops up in search results when you’re looking to define a word (like, say, literally), now includes a graph of relevant Ngram results. Clicking on the graph in the dictionary entry takes the user to the Ngram Viewer, which should guarantee a steady stream of new devotees to the addictive world of Ngrams. Now more than ever, it’s a glorious time suck for professional and amateur researchers alike.

Advertisements