Go to Top

Data Scientist or Business Scientist?

A common refrain among the data science community is that existing business analysts are too burdened with their assumptions and bias from their existing work to be effective with big data, working with predominately internal data, usually at some level of aggregation as well. Data scientists, the argument goes, are more “scientific” and use machine learning algorithms to allow the data to speak for itself. Neither of these arguments is particularly valid. No matter how much machine learning is employed, at some point the data scientist has to build a model and at this point their experience and “bias” come into play. The business analysts either will or won’t adapt to a more data-driven point of view.

Hear Author Neil Raden in this archive of The Briefing Room with John Santaferraro of Actian.

Example: The marketing department has always used regression analysis for forecasting. A “data scientist” would regard this as a potentially misleading bias and would prefer a model-less solution such as machine learning to let the data “speak for itself.” This represents a shift from statistics to algorithmics. But bias and assumptions creep into every model. Machine learning is a not a crystal ball and still requires the data scientist to trim the number of variables or to interpret intermediate results before starting another iteration.

Many of the influencers writing and speaking about big data are coming from an industry that is quite different than most. Because much of their work has been done by hand (writing programs for Hadoop or the R framework), they are unfamiliar with the vast amount of technology already in place from the data warehousing and business intelligence providers. In fact, many of those tools are far superior and more appropriate for an enterprise environment. In many cases, it may make more sense to add some capabilities to those tools than to abandon them in favor of a universe of open source and do-it-yourself programs.

One obvious and straightforward way to do that is for analytical platform vendors to add quantitative tools and applets to their products. By making these capabilities accessible though a familiar interface (SQL) and packaging them for use by those with excellent analytical skill and domain knowledge (we call these Type III and Type IV analytic types as shown in Figure 1), organizations can reduce the effort of the rare data scientists and ease the problem of staffing more of them.

Raden_fig1
Figure 1: Analytic Types

Training the Go-To Guys

People in organizations tasked with providing analysis (or the reports and dashboards for analysis and reporting) work with business intelligence tools. Within the group that learns these tools are specialists and experts, the “go-to guys” with subject matter knowledge in the various functions of the organization. They extract data, work with the data warehouse, build cubes and produce useful information for the rest of the organization. The go-to guy has some skill writing SQL, which is, by the way, rapidly becoming a standard for big data as well and has a good to excellent grasp on the business domain, such as finance, marketing, distribution, pricing, etc.

Go-to guys provide an indispensible service to the organization. These are the Type IIIs in our model. What do they lack to move into a Type II role, defined in Figure 1 as Type II-B: light data scientist? Essentially, they need to learn about the data sources such as Twitter feeds, weblogs and other sources; they need to learn how to use (but not program) analytical, statistical and algorithmic models. There is no reason to assume that many of them can. Training can be provided in house through self-study and especially online as there are many options today.

A lack of a Ph.D. should not be a barrier for a Type III shifting into Type II. Type II data scientists do not need Ph.D.s. They do not (except in rare circumstances) produce original research, publish in academic journals or share and collaborate with their peers in other organizations. A Ph.D. requires a broad range of learning in the discipline, most of which is not applicable to “data science” in a commercial organization. The last few years of Ph.D. candidate’s life are consumed with the dissertation, which is original research and is pinpoint-focused on a topic that may also have noting to do with analyzing big data.

There is an excellent opportunity for the BI analyst to move into Type II with on-the-job and/or distance learning. For those with some math/engineering/physics background (or those willing to acquire it in online training), moving into Type II roles over time is also possible, but this movement requires employers to understand that locking the BI analysts into their role will likely result in turnover as they will surely be recruited elsewhere. The organization should support those wishing to make this move with at-work study time, mentoring and providing an environment where people can enhance or even change their careers without danger of dismissal.

For many, learning math is painful. Part of the problem is math without an application is too abstract for many people to grasp. But learning to use quantitative functions and where they apply to the business a person is in is much easier. There is no need at what we refer to as a Type II-b level analysts to learn to differentiate a moment generating function. Instead, it is necessary to understand what kind of model is appropriate for the problem at hand, how to run it and how to evaluate the results. However, these investigations should rarely be put into production or relied upon without being vetted by more senior staff.

Moving Type IVs to Type III

For the most part Type IV analysts were confined to the structured data culled from various internal systems within the organization.  In the best of cases, this data was carefully modeled and extracted from other systems with a high degree of quality and reliability into data warehouses and BI tools. However, in too many organizations, the Type IV analyst had to learn the structure and semantics of various operational systems and perform their own extraction and transformation into various personal databases and spreadsheets. For that reason, their skills were more heavily weighted toward data and structure, and much less so toward analysis. For the purposes of management and external reporting, performance management and certain analytical endeavors, this was sufficient but tedious.

In our experience, the Type IV analyst has potential that has been throttled by technology. Their skills at manual data management can be easily transformed into more useful Type III analytics with the proper training and encouragement. Type IVs who work with BI may either continue, as there is still a need for their work, or learn to work more analytically than their current role requires.

The big data technology market is evolving at an extremely rapid rate. All sorts of tools and products are emerging to ease the burden of managing, analyzing and explaining big data, making it more likely that Type I analysts can move up to Type II or even Type III

A New Type: Type V

It is well understood that the majority of people in organizations are not involved in analytics at all as we have described it. This is not true. The most widely used analytical tool, even for data scientists, is Microsoft Excel. The use of Excel by people in professional positions is nearly universal. But if Type IV analysts begin to move into Type III roles (and, unfortunately, this movement may be by position rather than by person as the result of attrition, retirement, etc.), their skills are still needed. No one has sufficiently answered the question why only 15-20% of knowledge workers use BI, but the answer is likely to be the ease of use, relevance and understanding of the tools. This is clearly improving and despite all of the attention big data gets, the BI industry (which does not include Excel) is about $10-12 billion per year and new vendors are emerging every day.

The Internet connects the world; the Web offers the means to tap into those connections. Search capabilities let people find things; big data gives people the opportunity to understand what they find.

Data scientists in large digital companies like Google have the luxury of being able to explore and experiment, often doing things by hand. This is not a necessary model for other organizations. What your analytics provider should offer is a set of advanced analytical models and tools to eliminate the Not Invented Here (NIY) syndrome. Your analytics provider should offer a high performance, high availability analytics service one level of detail down from raw big data. Eliminating the handwork of preparing and moving data and results makes data scientists more efficient.

Neil Raden

About Neil Raden

Neil Raden is the founder and Principal Analyst at Hired Brains Research. He is the co-author, with James Taylor, of “Smart (Enough) Systems: How To Deliver Competitive Advantage by Automating Hidden Decisions.” With 30 years experience, he is a widely published writer, well-known speaker, analyst and consultant, having personally designed and implemented dozens of large analytical applications in finance, marketing, distribution, logistics, actuarial, intelligence, scientific, statistical and consumer products. As an industry analyst, he has published over 40 white papers, hundreds of articles, blogs and research reports.

Neil Raden

About Neil Raden

Neil Raden is the founder and Principal Analyst at Hired Brains Research. He is the co-author, with James Taylor, of “Smart (Enough) Systems: How To Deliver Competitive Advantage by Automating Hidden Decisions.” With 30 years experience, he is a widely published writer, well-known speaker, analyst and consultant, having personally designed and implemented dozens of large analytical applications in finance, marketing, distribution, logistics, actuarial, intelligence, scientific, statistical and consumer products. As an industry analyst, he has published over 40 white papers, hundreds of articles, blogs and research reports.

6 Responses to "Data Scientist or Business Scientist?"

  • Geoffrey Malafsky
    September 3, 2013 - 10:02 am Reply

    Excellent article and very good points. I am a Type 1 according to your scale since I was a practicing research scientist in Nanotechnology with a PhD in Chemistry. As I always write, the problems with data analysis leading to decision making stemming from inaccurate data semantics (aka poor quality, incongruous data system schemas, ..) is rather easy to fix with the right organizational culture. Unfortunately, the overwhelming culture in all things data mgmt is not centered on producing accurate, auditable, updateable enterprise data, but throwing partial tools and methods at problems and leaving the pieces on the table waiting to see if the C-execs really care. The answer is usually no.

    Thus, I disagree with the premise that a new suite of tools, or adding better analysis functions to existing tools will in anyway dent the problem experienced by the majority of large organizations. This problem is exacerbated by the consulting and training promulgated by the experts of yesteryear holding dearly on to outdated methods as long as they can. This will dissipate in another generation of workers but the question is whether organizations can continue to operate with such poor data for that long.

    If there is desire for better enterprise data without causing inter-group friction, there are ways drawn from real science which practices this everyday, to do so rather quickly and cheaply.

    • Neil Raden
      Neil Raden
      September 3, 2013 - 2:14 pm Reply

      Geoffrey,

      Good post and I agree with you. I often say, what laws of physics did big data change so that machines can ingest crappy data and make reliable inferences? I’ll write more about this after the session today. Thanks for the note.

      -NR

  • MarcH
    September 3, 2013 - 2:09 pm Reply

    Great article. I share the same views. So I allow myself to translate the following commentary, which I wrote on a French site on July 5 but which was of course less structured and serious than this article.

    A wind of change is coming from America which might feed the bemoaning of the IT employers over skill shortages, all the more overestimated as unemployment continues to grow, but is also related to lacking analysis of the labour market and of the permanent reorganization of the digital services market. The new myth is the one of the “data scientist”. Its skills are supposed to cover computing, statistics and a (true :)) “trade” expertise. What better way to make savings while the balance of jobs in the digital sector is at least Darwinian (what is however probably different in the USA) ?
    Somehow, it looks very much like a socio-political project in the sense that the labor market could tighten for the benefit of super-elites activating the levers of automation of an increasing number of tasks.

    In reality, IT workers adapt themselves to “big data” technologies as they always did for other technological waves. More and more business trainings integrate “business intelligence”, basis of the “big data”, because a large part of managers and experts have been manipulating data for decades. Unfortunately, there are even less job opportunities in BI today than in 2009. Statistics also spread more and more in trainings and local public institutions, devoid of any fame, even succeed in recruiting statisticians in very ordinary terms.
    More over, we always repeat that France has an advantage for training engineers and statisticians. Haven’t they populated trading rooms for years ? It would even be possible that a part of them turn towards more productive activities like the industry where “big data” will appear as a tool among others, as well as for public works engineers, electricians etc….
    It is in fact especially the marketing where we would suddenly make a quantum jump. But contrary to what is permanently trumpeted, “big data” spread there at the speed of the fiber optics and the startups abound while the added value is not so high in a stagnant economy, where multiplying techniques to incite to consume has little effect if the purchasing power is declining as it is the case.

    The 2nd problem lies in the contradiction between the speech of vendors which emphasizes the increasing ease of use of their tools, for “big data” in particular (documented statistical methods…), and the need for super-specialists. But maybe the logic is not an essential skill for “big data”. Since “that correlates” in the buzz…

  • Dorothy Hewitt-Sanchez
    September 3, 2013 - 2:12 pm Reply

    Great article!

  • Dorothy Hewitt-Sanchez
    September 4, 2013 - 10:36 pm Reply

    The webinar was great. However, the career path was not necessary because I think it was limiting the employee’s potential by trying to set boundaries on his/her career path. Business Analytics is used everywhere in the business. So, it someone wants to go into management, programming, business scientist, or data scientist from a Business analyst role, I do not see a reason as to why they should not be allowed to do so.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>