What is data science? From the hype in the IT press right now, you might think that it is something excitingly new, destined to determine the future prosperity of a whole swathe of companies big and small.
Exciting it might be. New it is not.
Let’s cut to the chase here. This is a philology rant, not a technology rant. If you are already tired of the term “big data,” but not yet tired of the term “data science,” let me help you get there as swiftly as possible.
There is no such thing as “data science.” It is a solecism. These two words, when conjoined, are utterly misleading. We can start with the word “science.” Science is the systematic study of the world through observation and experiment. The scientific method is well known and reasonably well understood by most people. It involves inquiry based on empirical and measurable evidence. Scientists formulate theories on the basis of their observations, and they test them empirically. If the evidence supports whatever assertion they made, it becomes a credible theory – and it usually has predictive value.
If you’re asking, “Hey, isn’t that what a ‘data scientist’ does?” then in fact it may be exactly what someone with the newly minted title does. But still, there is no such thing as “data science.”
Science studies a particular domain, whether it be chemical, biological, physical or whatever. This gives us the sciences of chemistry, biology, physics, etc. Those who study such domains will gather data in one way or another, often by formulating experiments and taking readings. In other words, they gather data.
If there were a particular activity devoted to studying data, then there might be some virtue in the term “data science.” And indeed there is such an activity, and it already has a name: it is a branch of mathematics called statistics. It doesn’t need a name upgrade, or if it does, we should call it Statistics 2.0.
In the IT industry we are used to marketeers bending, folding, spindling and generally mutilating our language. That’s what they often do; in fact, that’s what they’re paid to do, and some do it well. In response we are supposed to discover the real meaning behind the words and behave rationally, despite the communication travesty.
I can live with that. What I have problems with is when the sabotage of meaning gets so egregious that people are very liable to misunderstand and subsequently exhibit suboptimal or even outright foolish behavior because of it.
8 Steps for Getting it Straight
So here’s a shot from the hip:
- There is nothing new at all about what is being called “data science.” It is the application of statistics to specific activities.
- We name sciences according to what is being studied, and the behavior involved is (or should be) along the lines of the scientific method. If what is being studied is business activity, and that’s usually the case, then it is not “data science,” it is business science. It is a language standard.
- This statistical activity is identical to what we also call data analysis.
- If you are interested in trying to work out what the budget for such activity should be, then you should not be thinking is terms of the usual ROI metrics that IT often relies upon. This scientific business activity is what we have known for decades as R&D – research and development of the business. The amount of money devoted to R&D is pretty much always a board level decision. If someone wants to initiate data analysis activity within a business, then they should talk to the board about the business’ approach to R&D.
- None of this depends upon whether the data is big or not. Of course, if the data is big, then the IT resources required to carry out the scientific activity will be more expensive.
- You will not be hiring a “data scientist” to carry this out. Here’s why: the combination of skills required to carry out these business science projects rarely reside in one person. Someone could indeed have attained extensive knowledge in the triple areas of what the business does, how to use statistics, and how to manage data and data flows. If so, he or she could indeed claim to be a business scientist (a.k.a., “data scientist”) in a given sector. But such individuals are almost as rare as hen’s teeth.
- If you wish to develop such a capability, the sensible way to proceed is to put together a multi-disciplinary team of individuals with a set of well-defined goals who collectively possess the required skills. The one in charge should have a title like Project Director or Research Director. He is not obliged to wear a white coat.
- In some organizations, the results of R&D are poorly implemented. This is an organizational problem. Think Xerox and Xerox PARC – truly great research leading to truly great products, except Xerox didn’t actually make those products. If you carry out wonderful business science and it doesn’t get implemented, it eventually will be. By your competitors.
I have said enough.