Inside Analysis

How the Data Explosion Changes the Way We Do Business

On August 25, 2014, Bloor Group CEO Eric Kavanagh sat down to chat with Sami Akbay, cofounder of WebAction, an end-to-end platform that enables real-time, data driven applications by acquiring, processing and delivering structured and unstructured data. This is the audio and transcript of the insightful conversation.

 

Eric Kavanagh:  Ladies and gentlemen, hello and welcome back once again to Inside Analysis, the eponymous show here at insideanalysis.com where we talk to industry leaders and try to figure out what’s going on out there in the world of data and applications and enterprise software and all the fun stuff that these tools and technologies make happen. I’m Eric Kavanagh and with me today is Sami Akbay of WebAction. I’ve known Sami for quite a few years now. He’s a real thought leader in the space, always coming up with new ideas, and he’s got a fairly new company which is doing some cool stuff. With that, Sami Akbay welcome to the show.

Sami Akbay:  Thank you Eric, it’s always good to be talking to you.

Eric:  Sure thing, so let’s talk first about how we got here. Maybe you can give me your impression on what you saw a few years ago when you started working on WebAction. What did you see in the marketplace that led you to think that there was an opportunity for something here?

Sami: I think the change started more than a few years ago. I think in mid-2000s, we started talking about the data explosion. And at that time, our focus was more around structured data. People were discussing semi-structured data, unstructured data, and these other types of data were emerging. What really happened is we started becoming addicted to our mobile devices and then to social media, and suddenly the explosion of data really took place. We went from growing a couple x every year to these new types of that data became so prominent, they started to change how we handle or need to handle the information that’s flowing through the enterprise infrastructures as well as our daily lives.

For example, my older car had barely any sensors in it. My newer car probably generates gigabytes of data every hour without me knowing and this has permeated every aspect of our life. There is machine-generated data, social-media-generated data, device-generated data. As we become more connected, the data explosion that we used to talk about almost ten years ago has become a reality and challenges the infrastructures that we have built over the years.

Eric: Exactly, so companies seem to be doing a variety of things. One angle is to kind of turn a blind eye, pretend it’s not happening, but the other more thought provoking and beneficial angle seems to be to embrace this new wealth of data and find new and creative ways to do that. For those companies then it becomes the question of build or buy or amend. One of the trends I’m seeing is in the core applications that run businesses, like ERP systems. It’s very difficult to change those in any significant way without lots of upheaval. I see a lot more activity around the ERPs, taking bits and pieces of data from them as needed but building solutions that don’t circumvent the ERP but run adjacent to it. Is that a fair assessment?

Sami: I think that’s a fair assessment. Again this is partially driven by the types of data that are currently emerging. Let me rewind a little bit, ten years ago we were looking at data that was explicitly captured inside an enterprise as the transactions – as payments, as inventory info, resource management information, CRM systems and sales mgt info. These were in the realm of megabytes and gigabytes, and it was mostly transactions. Then we started to track interactions between companies and customers. Not just transactional data but sales records; CRM data; every phone call started getting recorded, transcribed and processed; the clickstream that we captured from our web interactions, web servers and online sales that became a part of the single view of the customer.

Then we started adding these social interactions and support requests and everything else in the mix. That took us to the terabytes realm, and once we started getting this smart data from devices like RFID tags, system logs, sensor feeds, social graphs, and location and geospatial information perceptions that were almost exclusively captured by humans, then entered into the systems were now automatically getting recorded and pushed into the infrastructures.

Now when you add all these different types of data, which takes us to multi-terabytes and to the petabytes realm, the entire ecosystem around the original ERP solutions or the CRM solutions or the transactional systems that we had – start to get seriously challenged.

Eric:  I can see that. Those systems were not designed to handle the magnitude of data that is coming from these sensors for example. This new era of data, of machine-generated data, in addition of course to social media data which is its own entity all to itself. What did you see as the opportunity as you were watching all this activity come down the pike and you were thinking of WebAction? How did you see your space being curved out in this new world?

Sami: There were a few things that started to change. One of them was that the enterprise architectures or enterprises had lesser control over the structure of the data that started to arrive. I remember in early to mid-2000s, an enterprise architect or a data architect had a lot of control over the structures of the data, the database schemas and how the relationships in a database schema worked – all of these things were within the control of the enterprise.

Suddenly, these things start to get outside of the control of the enterprise. Which if I know the structure of the transactions; if I know the structure of the customer record and if it’s going to change or not and I can control every aspect of how this is captured in my enterprise, the way I handle it is different than if I’m co-dependent in an ecosystem that has many people generating these pieces of information. I don’t really have much control over data that to me comes from outside of my enterprise, right?

This created an emerging need to be able to handle data that had loose structures as opposed to strict structures, which meant that we had to handle variety in terms of, as opposed to, data that we understood well. Now there was data that we had lesser understanding of, and we had to be more flexible.

Then of course there was the volume – it wasn’t just a matter of size of the data that’s coming in, it was also data that had very high frequencies. I mean getting 1 million 80 byte device records is very different than getting a single 80 megabyte file because how you have to process it starts to change quite a bit.

When you put all of these different things together, having a lot of different types of data – which have “variety” meaning different structures, “velocity” meaning it might be very small pieces of information at a very high frequency, as well as the “volume,” which is the sheer size of the data you pull together. The old style is that organizations just want to get all this information, put it into a centralized enterprise data warehouse and start running the analytics from it. That of course introduces tremendous amounts of latency. The question becomes: When do you need the information? Maybe in a day, maybe in two or three hours – to what do you need to do in order to leverage this information so you can use it instantly?

Eric: Right, and instantly – that’s the key. Instantaneous actions, the real-time aspect to business needs. Can you talk about that value proposition and that driver because that’s an architecture driver? The fact that you want to use real-time data, you want to deliver apps that can respond in real time or near-real time to take care of very specific business needs – that was a major design point for your solution, right?

Sami: Absolutely and I think in one of our earlier conversations, you were using the term “customer time.” I really like that term because obviously the word real time means different things in different industries and to different people. Our goal is to the reduce the latency from the time the information or data is generated until it becomes useful in the context of the customer interaction to a level that is not perceived as a delay by the end user. If I’m engaged with a company on an Internet site, while I’m still on the site – If that company can leverage the information that they captured about me then that infrastructure has a lot of value. Whereas if they have to reach out to me and send an email or send a communication to me after I stop my interaction or after I leave the site, the value of that information and consequently the value of the infrastructure behind it drops significantly.

Eric: This is a really good point, which is that they don’t want to lose any of the values at customer touchpoints where they want to make an offer that is going to make sense. Also you don’t want to have them wind up feeling that they have to take some other action afterwards because first of all, that’s an unhappy customer. Second of all, that’s an added expense for the business. Because when the customer picks up the phone, let’s say they’re trying to get something done on their app or their mobile phone and they can’t, they get frustrated and call your call center. Well you just incurred more cost and you have an unhappy customer on your hands and the point you’re making that there is an opportunity cost to not getting the job done in that touchpoint, right?

Sami: That’s right and it’s not just an opportunity cost in terms of the dollar value to the enterprise. It’s also annoying to the end customers. Here’s an example, yesterday I purchased a program online (video over IP) to watch a soccer game. Today, they called me from that same company, probably to market an offering related to what I did yesterday. Now, after annoying me and disrupting my time today the next time they call, I’m going to tell them please take my cell phone off their list.

At the same time, while I was paying for this for this, while I was interacting with them, if they made a cross-sell marketing offer such as “since you are buying this one game, you might want to subscribe to the entire season of the soccer league,” I probably would have been a lot more receptive to their offer. It would have cost them less. They would have gotten a better response and on top of that, my experience as a customer would have been substantially improved.

Eric: What I like about what WebAction has done is that you’ve really embraced the importance of having an entire environment for capturing this kind of data we’re talking about, then being able to build applications around it but also having the in-memory capability to process it all. The in-memory component of your architecture is very important because that’s what gets you that hardcore speed, right?

Sami: Absolutely. I think we’re always chasing the real-time aspect of customer experience and interactions, but what enables that is the underlying technology that allows us to become more and more real-time interactive. Our traditional computing approach has been to acquire the data, store it in a centralized location where all applications can access it, then build the applications/processes on top of it in order to make sense out of it.

Now, because of the challenges we talked about earlier regarding the variety and velocity of the data. That approach is no longer viable. The moment that you store the data to disk, you start losing the battle against time. Our approach is to get that data and while all the data is moving through the infrastructure you process it before you persist it. There is a lot of value in persisted data, and that’s learned information. At the same time, there is also a temporal value to that data while it’s going through the infrastructure. You have to start making decisions before you have the luxury to write it to disk and start running your traditional application on it. Now, this obviously is something that requires a lot of in-memory processing and it is somewhat non-trivial in many aspects because when you start looking at the multiple things happening in an enterprise. You start processing this real-time data and you lose these time windows.

You may have multiple windows of data coming from different systems that require you to analyze a correlated set of data in conjunction with each other. For example, I could be collecting data from my clickstream as well as my CRM system at the same time and correlating those two while that data is going through the infrastructure all in memory. I’m sure, when you call your cable company or when you call your airline, most of the time you are now in front of the computer so you’re probably logged onto the system. You’re using their customer service rep. They are helping you out but you also have a second channel going through. You’re generating clicks and you’re doing online activity as well. I just I know that when I call my favorite airline carrier, I’m always logged onto my account. I’m having a conversation with the customer service rep where I have access to a certain set of information and I’m making changes to the account, while she’s doing the same thing on her side. This means that data is generated in multiple systems and within the last 5 minutes, 10 minutes, 15 minutes or whatever the time window is everything that we do from two separate channels has to be coordinated in memory in real time in order to create a seamless and rich customer experience for me.

Eric: That’s very interesting stuff that I think it does speak to just how significant the transformation we’re going through right now in terms of software. You and I have talked about the changing nature of applications as they get rolled out now. You have on the one side of the pendulum swing, I always use Microsoft Word as my whipping boy because it started off as a word processor and then they added so much other stuff into it including information for tables and storing numerical data and of course doing layout and then publishing to the web. It becomes this all-things-to-all-people sort of monstrosity that is very unwieldy and hard to deal with.

One of the things I joke about is if you want to see what I’m talking about, just open up a Microsoft Word document, the latest version you have, type “hello world” and then save as HTML. Then open that HTML document in a text editor and it will be about five pages long, for two words basically. It speaks to throwing so much time and effort at a problem that doesn’t require that much horsepower.

Now you’re starting to see a lot of more of these apps, certainly on phones and on different things for your iPad. They are much more targeted to specific business use cases. I see WebAction is a building this platform to enable companies to build these very specific, very precise and very agile little applications that are going to generate significant business value for them. It’s a different mechanism for getting something done. That’s how I read it, but what do you think?

Sami: I love the Microsoft Word analogy and if you want to really get daunted, try it in Excel. In reality, most of the enterprise applications have gone through that phase, because the economics of enterprise applications have forced the enterprises as well as the ISVs to build these very large, sophisticated applications. Now, why is that? Because of the time to deploy and develop one of these applications could be six months and multiple years in the process. You don’t deploy an ERP system in two weeks. This actually takes a lot of time and energy, which means you are going to make a significant investment.

If I’m going to make a significant investment, I have to service a larger number of users. Otherwise, I’m not going to get my original return on the investment. I mean if I spend six months of two full time employees in order to service five people in one department – that never gets approved from a financial perspective. What that leads to is building applications that require a larger number of users for the return on investment, which means that I have to service different profiles of people with different needs. That creates this increase in features and functionality that are used by few and far between. Then of course that means that usability becomes harder, more complex and requires a lot of training. You need to train the operators and end users as well as people who maintain these applications, ultimately, leading to a return on investment that takes usually years.

Now, we have kind of by-passed that on the consumer application space. The laptop that I use doesn’t really have half the applications that I used to have. I used to go these stores and see all this packaged software. It was kind of exciting to open the software, unbox it and take out the floppies and the CDs and DVDs. We would put it in a DVD drive or CD drive and load it on your computer. We don’t have that any more. As a matter of fact, I just recently realized that the laptop that I have been using for over a year doesn’t have a DVD drive. I’ve never attached one to it because it’s a different age, right?

So consumer apps have gone through that evolution and it’s been rather seamless because the underlying platform, the underlying infrastructure, the operating system as well as the back-end application stores are there. You know the iTunes store or Google Play create such a seamless experience…the consumer’s experience is a few clicks for a two dollar app that provides entertainment or fixes my problem.

Granted there are some sophisticated apps out there like Microsoft Office and Adobe, but the majority are small surgical apps. There is no such thing as reading a manual to figure out how to use them. As a matter of fact, majority of the applications I have running on my cell phone my five-year-old could competently use in a matter of minutes. That’s the experience that the enterprises need to get to, because computing and processing data has become a utility. We need to move toward these data-driven apps because we already have the interactions and not just the transactions and because we already know what the systems are doing and what our customers are doing.

Then once the data is running through a centralized core platform infrastructure, we need to be able to build these data-driven apps quickly, in a matter of days or maybe weeks, that are viable for a few users not for hundred users to make it worth it. What that means is that the apps must be purpose specific and address a well-defined need. I call them surgical applications or surgical apps. That reduces the complexity so usability requirements or training requirements are diffused that give you time-to-value in a much shorter time frame then the traditional applications in the ERP or CRM type of systems you get in the traditional infrastructures.

Eric: Yes indeed. You know one of the really interesting things that I see happening in the marketplace and this is just kind of how I view what’s going on out there, I think that the Hadoop movement is giving data management a second chance. We did a lot of good things in the past. We did some things maybe they weren’t so intelligent in terms of using data and thinking about where we would store data. This whole new concept of the information architecture, it seems like there are just a few people who really pay a lot of heed to that; most people just ignore it. I see that this whole Hadoop movement is at least giving us all another chance to rethink how we’re doing things. To rethink information architecture as an application architecture, and I see WebAction as being one example of a company that is witnessing this window that is opening. It’s still open but I guessing at a certain point, it’s going to close again at least metaphorically. What do you think about the theory that how to do this has kind of given this whole industry a second chance?

Sami: That’s definitely an interesting perspective and I think what Hadoop has done is allowed us to capture and store a lot of data. The amounts of captured data today were just not possible for the infrastructure to support in the past. In the first phase of it, people had to just collect all the data that you can collect and dump it onto HDFS so that at some point, when you hire enough data scientists, you can make sense out of it. You know figure out what that data means. Some of it was driven by compliance requirements so that if someone comes back and asks a question, you can go back to your file on Hadoop six months later and reconstruct what went down back then. Others decided they didn’t know what that data meant yet but captured it and put it somewhere so that they can start improving the customer experience, making more sense out of it, and get some value out of that information.

Now, this is kind of very different that what we used to do with the data warehouse. Remember in the data warehouse it had to be subject-oriented. You had to know the question you were going to ask for the data warehouse before you built a data warehouse. With Hadoop you put the data on top of this cheap disk and later when you know what you need to do, you apply processing on top of it. I think that this mindset shift is driving a lot of new-found value, but also a little bit of confusion. Now that I have the data, how can I turn that data into information or applications or apps or what not because I know a lot more – I have a lot more data than I ever used to have.

At the data warehouse, the challenge was: how do I change dimensions or add a new project – because you knew the questions that you were going to answer beforehand. Now with Hadoop you have the data and you don’t have the questions and the questions arise rapidly. And when that happens, you need to build new applications or new ways of making some sense of that data. That is creating a bit of a challenge for the enterprise, partially because they can’t get enough Java experts to write the MapReduce jobs, and they can’t get enough data to scientists to make sense of that data.

One of the things that really we are pushing for – now that we have the data and we have the means to get that data into a centralized place. We want to give you a platform that allows you to build these data-driven apps. These rich apps have allowed you to make sense out of it. It’s almost like an app server just like back in the 1990s we had the web application servers, WebLogic, NetDynamics, WebSphere. It’s almost like history is repeating itself with projects, on top of the data, on the top of the streaming data structures. If you are going to build applications, you need a platform that acts like a web server on top so that you can build these applications very quickly.

Eric: Let’s close up with advice on how people can wrap their heads around what WebAction offers. I’m guessing that when you talk to clients, there’s something of an educational component to that. Once they understand what you’re enabling, I’m sure it’s kind off to the races in terms of figuring out what is the low hanging fruit and where to go with all that. Maybe can you just give us a couple examples of interesting use cases where companies have been able to use your platform, get something up and running very quickly and make a difference.

Sami:  One of the things I notice is that people are trying to get the data into their Hadoop/Big Data infrastructures with the lowest latency possible. When they are trying to do this, they use a collection of open source technology, some of the commercial technologies such as in-memory grids and memory caches, things like that. When they start cobbling together multiple solutions into a single architecture diagram and have five or six different pieces that allows them to reduce latency and start building applications on top of their big data that’s flowing to the enterprise. That’s when they should probably start thinking about a platform like WebAction Real-Time Application Platform. Because the number of moving parts in an architecture that you put together from all these different pieces can become a hassle a few months or a few years down the road.

One of the challenges that many of our prospective customers are facing is that the first application, the second, the third – they were pretty happy with what they were doing together. As they add more and more applications, the complexity starts to become rather challenging. This is similar to back in the Web days and the first coming Web 1.0, we used to write scripts with Perl, C+ or C++ in some cases.

Then you knew you could handle five visitors to your website in an hour. It became a problem when you had five hits per second. Then you have to write your identity database, spooling session and runtime management. Before you knew it, two years down the road you had a really complex and hard-to-maintain infrastructure.

Then came the application servers and you started standardizing and automating some of the things that we repeatedly did underneath. We are entering a very similar phase in managing the data flow going through our infrastructures. For those people who think the complexity is his getting pretty substantial. Don’t think that you’re alone. Companies dealing withstreaming data – that’s happening. We try to take the pain out of your day-to-day menial tasks, not to give you something that you could not do with enough resources at hand and engineering talent. WebAction takes away 80-90% of the complexity you have to repeatedly do for each app so you can focus on the business logic and the things that truly add value to your successful enterprise.

Eric:  It makes a lot of sense. You really are enabling this new sort of proving ground for data-driven apps. It seems to me that innovation has to occur somewhere and so what forward-thinking companies do is they figure out how to rid environments that are the conducive for that kind of innovation. When you consider the amount of inertia actually exists in these traditional enterprise applications, at a certain point, we kind of have to go around that. You have to find some other way to be a bit more nimble and agile. I see that WebAction is being a straw in the wind for new ways to open up the world of big data, and small data, and mix these things together because it’s going to be that union, that nexus, that generates the real value. I see you guys being kind of the forefront of this new kind of application.

Sami:  Yes, these are really exciting times. For many years, data was kind of boring, but now every conversation I have with our customers or people in the industry, I’m finding that people are proud to talk about their new data project which just gets them more and more excited. These are great times if you are a data professional, database professional, a Hadoop professional or anyone who handles big data – so yes this is time for a big change and it’s happening out there so love it.

Eric: We’ve been talking to Sami Akbay of WebAction. Th is very, very interesting stuff. Hop online to webaction.com or insideanalysis.com and check it out. Well Sami, thanks for your time today.

Sami: Thank you.

Eric: Okay folks, thanks again for another episode of inside analysis. We’ll talk to you next time folks, take care. Bye.

Leave a Reply

Your email address will not be published. Required fields are marked *