Anti-Buzz: Data Mining

Antibuzz has been running as a weekly column (more or less) for three years. We have decided to give Andrew a few weeks off and run some “best of” columns from the past. This column was published originally in 2012 and has some basic info that is still relevant especially in light of the recent NSA data mining revelations.


The Buzz (Word): Data Mining

This is where our two worlds collide. A buzzword, if you need a reminder, is a technical term that gets taken out of its original context and used more broadly by the general public, often at the expense of accuracy. Such is the case here: a computer science term, with specific meaning, has for outsiders become the banner for more general concepts, and this banner seems to cover all statistical analysis.

However, contrary to the theme of this column, I don’t see any merit in de-buzzing this buzzword. First of all, we asked for it; the term “data mining” sounds more like a marketing concept than anything, and the imagery evoked by the term – miners with headlamps lost deep in a mountain of information – is poetic enough that it will always resonate with “normal” people.

I could pedantically correct the common misuse of the term, but this would be at once futile and self-defeating. Futile because you won’t side with me against a legion of colleagues who use the term loosely anyway, and self-defeating because I think it’s in your best interest to be interested in all those things under your data mining umbrella anyway.

Still, understanding what is at stake for yourself could benefit from separating general statistical analysis from “real” data mining. Also, there is another dichotomy usually left un-addressed for dentists: There is data mining as it applies to your patients, and there is data mining as it applies to your business. Both are relevant, but both draw from different wells and have different motivations. So here is a broad overview of what I think you should understand about all of these perspectives.

Data mining as statistical analysis: From a theoretical point of view, the field of Statistics has long been sufficiently developed that one could pore over all the information in your practice and glean a lot of useful information. The problem until recently was that doing so would require a lot of work by hand – paper records are hard to mine. Now we have a wealth of digitally recorded data on just about everything and the ability to automate the statistical analysis of large quantities of data is a huge boon to the field of statistics – no longer do we have to rely on cash-starved undergraduates to perform our psychological surveys.

However, none of this is new from a mathematical point of view. Your practice would have benefited from large scale statistical analysis in 1955 just as much as it will now, the difference now is that doing so will soon be be fast, easy, and inexpensive. This type of analysis isn’t “true” data mining, it’s just statistics made more accessible by computing technology in the same way that many other things have become more accessible by computing technology. However, getting the public excited about “old statistics, new accessibility” can be difficult. Statistics had a large impact on the last century, but its impact on this one could be enormous, and getting people behind that concept is hard … unless you dress it up with a particularly evocative buzzword such a “data mining.”

Data mining as discovery: Classic statistics is prepared to answer questions. You might want to understand the relationship between age and cost-of-visit, the probability that symptom X indicates disease Y, or which treatment has been the most successful for a certain condition, and classic statistics can answer these questions for you. The ask-crunch-answer model is incredibly useful, but it’s not data mining. The key problem with this classic model is that it is biased by what questions you think to ask.

Probably the best way to understand “real” data mining is to understand that it does not answer questions, it discovers interesting facts. To get that exciting, 21st century, counter-intuitive innovation thing going, you need a system that finds correlations you never thought to ask about. You need a system that might suggest that the best treatment for condition X is one that’s never been tried before. Data mining is about discovery. What can an unbiased machine infer from our heaps of data that we can’t?

There is a bit of hand-waving here, and also the unsatisfying truth that “What will data mining do for me?” is best answered with, “I don’t know yet but it will probably be really cool.” A currently understood application of the field, however, is diagnostic agents. Machines taking a list of symptoms, making a diagnosis, and coming up with a treatment plan is a reality that we are already facing, and if you want something concrete to hang your hat on, this is the product of genuine data mining.

Patient-centric versus business-centric statistics: Another thing that gets lost in this discussion is that you have two separate interests here. Statistics can be used purely for medicine – symptoms, impacts of treatments, cause-effect relationships, etc – or it can be used purely for business – How much does something cost you per day or per visit? Does a particular insurance provider cause you to lose money? – and you can blend them – How much do patients with this symptom pay per visit? As statistical tools become readily available, your use of them will be determined by your own goals and interests. This isn’t to be some morality lesson about sacrificing your bottom line for the sake of your patients, but just a friendly reminder to remember both sides of the coin. Depending on who you are, you might be more inclined to ask questions on one side of the spectrum or the other. If you use data analysis to improve your treatment of patients, but fail to recognize how to easily trim some fat from your business, then you are helping no one if your practice goes under. And if you use data analysis to become a more savvy businessman, but fail to improve your performance as a dentist, then you are ignoring the most fundamental method of earning and retaining patients: by providing quality dental care.

Leave a Reply

Your email address will not be published.