You are here
Home > Big Data: Tools are not a substitute for thought
- Albert Shar [sharethis]

Would you be surprised if you flipped a coin seven times and got seven heads? You should be since the probability of that occurring with a fair coin is less than 1% (.0078125). Should you still be surprised if a told you that that streak of seven heads was part of a sequence of 1,000 coin tosses? You shouldn’t be since it’s a virtually certainty that you’d get that result. And therein lies one of the problems with trying to make decisions around big data (not that 1,000 data points has much to do with big data).

I recently wrote about my concern about people misinterpreting what superficially appear to be actionable results that come from mining big data. That was just part of the problem since big data more and more relies on complex analysis and extraction methods that can obscure thelogic behind the results. There is a school of thought that argues that big data results eliminate the need for theory. This is evident in the search for new drugs via high-throughput screening where candidates are discovered by testing for reactions in literally millions of micro-experiments. While this may be valuable in some circumstances, I believe that there on some inherent dangers that must be addressed.

  1. As alluded to in the first paragraph, it is important to understand the probabilities in big data experiments. Rare events will occur with high probability. It is important to not look at just superficial results.
  2. Big data techniques are exceptionally good for developing hypotheses. They are not, in and of themselves, proofs.
  3. In some highly sensitive situations, it is not sufficient to demonstrate that something is true. It is critical to provide an underlying logic to that demonstration. This is most often the rule in health interventions. Sophisticated big data tools most often fail along this dimension.

The best course of action is to think carefully about the purpose of your work and act in a way that’s consistent with that purpose. If you’re looking for interesting hypotheses, think about the methods for testing them. If you’re looking to test and implement an intervention, consider the costs of being wrong. The degree of certainty needed to act should consider the consequences of all possible outcomes. If you’re using complex tools, you may need to develop simpler models to justify the logic of even valid findings. These are not insurmountable obstacles; they are reasonable activities that go hand in hand with big data and their associated tools.

In short, powerful tools that one can unleash on interesting data repositories are not substitutes for thinking. A few years ago I was working in a facility with huge technology resources. I carefully set up an experiment to test an hypothesis, using essentially every tool we had to work overnight on the data. When I came in the next morning and scanned the results I realized that had I thought about the problem for five minutes, the answer was obvious and easily articulated. Just because we have cannon doesn’t mean that they are the right tool.

Linda Musthaler About the Author
Albert Shar, PhD, is a principal at QERT (qertech.com), a technology consultancy and serves as vice president the Robert Wood Johnson Foundation. Shar previously worked as director for IT Research and Architecture at the RWJ Pharmaceutical.
Research Institute, a J&J company. He has also served as CIO at the University Of Pennsylvania School Of Medicine. He holds a patent in medical imaging and is the author of more than 50 articles in medical informatics, computer science, and pure and applied mathematics.

2 thoughts on “Big Data: Tools are not a substitute for thought

  1. Pingback: 2019

Comments are closed.

Top