People are afraid of wines. They’re uncertain about them, and the entire wine industry is geared towards exploiting those negative emotions and playing upon the consumer’s lack of confidence. When my wife and I started a small wine business, I was trying to figure out a way of helping people better understand wines and afford them an opportunity to explore them in a non-threatening way.

So I started out by thinking about the language used by wine critics – we’ve all heard critics slam pretentious wine reviewers and their florid language, talking about how a certain wine was redolent with ridiculous flavors and scents, like in this Slate article which quotes the Robert Parker (the Arch Bishop of Wine Critics) as saying a wine had “notes of graphite, black currant liqueur, incense, and camphor.”  Personally I’m only vaguely familiar with camphor, and I’m not sure if having graphite notes is good or bad.

But I actually like (some!) wine critics and thought that rather than condemning them, what if they actually knew what they were talking about?  People have a tendency to use similar words for similar things – especially critics, who (if ethical and/or sincere) would use the same words for the same kinds of wines (presumably true for all reviewers, one might think, although some words would change over time as they grow older, more experienced, the times change, etc., etc.), and also speak in a sort of reviewer’s set of words that the industry has developed over time. While I started out with adjectives only, the addition of nouns was a big help as well. And other parts of speech… which turns out to be really complicated (thank god for the NLTK toolkit!)

So in short I rounded up a hell of a lot of reviews and did some word clouds (which I usually detest, but they seemed actually well-suited for my project) based on the nouns and adjectives used in a whole bunch of reviews, and found… well, hell.  Here’s one for Merlot:


(If you’re unfamiliar with word clouds, here I’m illustrating the more frequently chosen words by size and color intensity; the cloud was done with the nice software from

I mean, sure, it’s not the perfect description of Merlots… but heck, yeah, fruit, cherry, sweet, black… sure, I can buy that.  Here’s one for Zinfandel, that grapey hammer of the gods:


You might quibble about some of the words, but it’s big, black, full of alcohol, that peppery thing shows up, etc., etc.  I’d say it’s not a bad description.  One more, for good luck – a white this time, Savignon Blanc:


To me the most amazing thing is not that it works… but that it works at all!  I mean… I didn’t do anything but do some word frequencies, and you end up with fun and colorful descriptions of wines.  Are there better ways of doing this?  Sure, but few as simple and easy to implement, I’d wager.  Of course this is quite probably old news to many, but it was fun putting them together.

What really made a believer out of me was when I made a grid of the wines and counted up the similarities between them; here’s a pair showing cross-wine similarities; the one on the left is more of a heat map, where dark means more words in common in their reviews, while the one on the right is a simpler one to see some of the more stark similarities (things like these and others are at


heatish-map of wine similarities


coarser matrix of wine similarities

The pockets of similarities most often showed up in wines from similar regions. It’d make sense that people used not only similar terms for wines of an area or market, but also the grapes themselves typically change a fair bit from place to place.

Of course, the more I thought about it, armed with my hammer, I believe it might well work for just about anything… cars, music, etc. I can’t prove this yet (and getting data can be challenging.) This is one place I think it’s important *not* to crowd source – professional reviewers are great because they’re both more precise in their words but they say more than most people do, and getting a mass of words is important to analyze), but that’s my gut, I’m trying to put it into more formal or accessible terms so people won’t think I’m more of a loon than they already do.

My non-statistical tests also seemed to illustrate that good reviewers are more consistent with their language than other reviewers, which made sense to me… the aforementioned Robert Parker is famous for a reason, and his palate is pretty legendary.  While he might lean towards purply grape prose, he presumably means something when he says graphite, and that something is present in other graphite friendly wines (whatever they are) that include graphite in the review.

To me, looking at that, the next obvious thing to do is to allow people to click on the words that they like, and that word would like the wines that are most like it… so I did that, and put up more clouds… but also it’d be good to just drag the words you want – and don’t want – and have it suggest some wines.  So if you say that you like wines that are complex, have a cherry flavor but isn’t earthy it tells me… pinot noir, grenache, tempranillo, and merlot.  OK, that works.  Of course after implementing a proof of concept on this two near-simultaneous disk crashes of my hard drive and backup blew it away, which is why it’s taken me a few years to come back to this.

For those are still interested, you can see charts, graphs, toys, etc. over at

But I did put up a little toy to play with – starts with a wine, lists the top words, and then when you click on a descriptor it lists the top wines that match that descriptor (again, using word frequencies); also grids, maps, etc.

I thought an app or playing with this more would be fun… but time has not been kind to my side projects. I wrote a command line tool that lets you say things like “give me wines with cherry, but no tannins” and such; fun things.

It was fun doing the analysis with some scrapin’ and scriptin’, and the NLTK provided the backbone for the analysis.  Who knew what meronyms, holonyms, hyponyms, and the like were before all that?

Sorry, the comment form is closed at this time.

© 2012 trouble Suffusion theme by Sayontan Sinha