Bloor Group CEO Eric Kavanagh chatted with ParStream CEO Peter Jensen about a wide range of subjects from the exponential increase of data as a result of the Internet of Things, to the need for sub-second response times, the ability to handle constant data streams from a wide variety of devices and the choice to do analytics at the edge. If you think data size is only going to increase and data latencies are going to fall, you’ll want to read this insightful conversation.
Eric Kavanagh: Ladies and gentlemen, hello and welcome back once again to Inside Analysis, the show where we talk to industry leaders and try to figure out what is going on out there in the bigger and better world of data management. Today, we’re talking about big data. It’s a hot topic for all kinds of reasons. One of the major sources of big data is this entity called the Internet of Things. We’re going to be speaking with a thought leader in that space Peter Jensen, CEO of ParStream, which is one of the faster moving companies out there in the world of analytics today. Peter, welcome to the show.
Peter Jensen: Thank you.
Eric: Why don’t you start off for those who don’t know ParStream and give us a quick overview of what you have built and why you built it.
Peter: What we have built is a platform that allows our customers to build applications so that they can analyze the vast amount of data that arrives from sensors and devices out on the Internet. So in short, it’s analytics for big Internet of Things (IOT) data. The reason we really developed this product and this company is that as we’ve seen over many years that there is value in data when you capture it and analyze it. Of course, as we all know, the amount of data that is being generated from these sensors and devices is going to be huge. Therefore, there’s a lot of value to be had by everyone.
Eric: Sure thing. Obviously the Internet of Things refers to all of these devices, many of which have very specific kinds of sensors built into them. It’s in every industry you can imagine these days – it was in manufacturing quite some time ago, but now you’ve got all these smartphones, which is a huge source I’m quite sure, and all the different sensors on other kinds of enterprise-caliber machinery. What did you do specifically to cater to this kind of market? In other words, what’s unique about your architecture that allows you to leverage all of that stuff very, very quickly?
Peter: You’re totally right. Let me take this in two steps. One, people have been doing IOT applications for at least the past 15 years within the B2B space. What most people think about when they think about the IOT space is on the consumer side with the Nest devices and so on. But we see where the biggest potential is right now, is B2B. Yes, there are some interesting consumer plays and there’s huge potential in analyzing that data. But right now, we’re focusing on the industrial Internet and some of those specific requirements. The challenge is that the requirements for analytics are a little bit different because of the use cases being different. If you look at the needs for IOT analytics, one, you need a platform that can store a lot of data.
When we talk about a lot of data, we say billions of rows, terabytes of data, and so on. That’s one requirement. But the second is customers and use cases that require sub-second response time. In the old days with some use cases, for example, in retail, it may be okay to come in Friday morning and have a report that shows the sales data from the previous day, which is 12-24 hour delay. Well, if you’re doing analytics of a wind turbine to optimize electricity production, you need to be able to analyze the data within seconds or in real time and adjust the wind blades accordingly. You need to be able to store large amounts of data. You need to provide sub-second response time. Then the third requirement is that you need to be able to handle the constant flow and stream of data from the devices.
Now you have the combination of large data, you have users who want sub-second response time, and you need to be able to import all that data without affecting performance, that’s the third requirement. The last one that we see more, and more and more is what we call edge analytics.
People want to analyze data at the source of the data. It’s simply not economically feasible or technically possible to move all the data into a central place to analyze it. Take, for example, an oil rig. If you have 20 oil rigs in the Mexican Gulf connected by a satellite link, it just is not technically possible to move all the data to do adequate analysis. Or, if you take a cell phone tower, it’s not economically feasible to move all those billions of rows from all these cell phone towers back and analyze them.
Instead, what the telcos want to do is have a central view, but a decentralized analysis, and really have a database in every single cell phone tower. Then instead, sell the bandwidth that they save back to their customers.
When we say IOT analytics, it’s a combination of these four factors, and that’s what we have done. If you ask, and you did, how do we differentiate or what is unique about our platform, it is that we have set ourselves a goal that we want to achieve these four objectives: store lots of data, ingest it fast, sub-second response time and analyze either centrally or out in the cloud.
Eric: That’s really sharp. I think you just spoke to one of the under-sung aspects of high-quality enterprise software and that really focuses on the kernel of the technology. What I mean is that some of the older technologies, although they may be very robust and very good at what they do, they simply were not designed to handle the throughput capabilities necessary for IOT analytics. I’m guessing that’s why you saw an opportunity and you decided to focus on those four you just described as the kernel of your platform. Is that right?
Peter: Yes, it’s totally right. In fact, we had to do some trade-offs, too. You always have to do trade-offs when you develop technology. I used to work for one of the big database companies. The typical relational databases that are in use out were designed 30 years ago when the relational model came out. There was no Internet and there definitely was no IOT. Databases were designed as general-purpose databases that could to transactions to a certain extent, and they could do some analytics or at least they could do some data warehousing as long as the data amounts didn’t become too large. They were just not designed for the needs that I just covered. That is why we see so many other new data models or databases and database products and companies out there.
Another decision we made was we wanted to have 100% full control over all our code. We were looking at basing ourselves on Open Source out there, but really these Open Source codes were not designed to address the four needs that I just talked about. We built everything from scratch and that has allowed us to create a very tight database, if you will, with super low footprint. That is what allows us to deploy the databases out in the network.
Eric: Sure, that makes a lot of sense. I’m guessing what you’ve got is a sort of tiered structure of how the application works, right? Because again, you’ve got these edge analytics taking place and that requires a very thoughtful approach to the architecture of the product itself and how it operates, right?
Peter: Yes, exactly, I mean it’s not trivial to do that. It’s not trivial to have thousands of databases in cell phone towers importing data and making all that data available for global couriers. Yes, we had to design it for that specific purpose.
Eric: Let’s talk about some of the challenges facing this world of IOT, because as you say, there are windows of opportunity. If you miss the window, then you miss the value. You’re talking about customers who want, in some cases, sub-second response times. It seems to me that you have to have all of these nodes that are out in the cell towers to go with this example. They all have to be capable of not only ingesting the data, but also running analysis on that data, and those analytics have to be designed by someone. So how do the people who actually implement and run your software figure out ways to tell those sensors, those nodes basically, what to look for and what to report, what’s a higher priority, what’s a lower priority?
Peter: We provide the platform. The center of our platform is our columnar database on which we have provided and built additional functionalities, specifically to run IOT analytics on. But what we do require are some of the data platform engines out there. You can look at Axeda and all of those companies. They really are the ones who are doing all of the interaction with the actual physical devices. What they do is stream their data directly into our platform. What the customers then do is build applications on top of our platform. They either build their own applications, custom made, or they use some of the tools out there. Datawatch is one that’s very strong within IOT. They can use products like Qlik or Tableau or stuff like that.
Eric: That’s great. Basically, they take a combination of tools and they leverage your platform to be able to see what’s happening in their environment. Let’s say again it’s a cell phone company or some company associated with cell phone usage. They use this combination of tools to recognize when, for example, there’s a gathering that was unexpected in some area and lots of people show up, and those cell towers just get hammered. A company using your software with these partners could figure out very quickly what’s happening, could address that to reroute traffic and do other things to be able to handle this new source of input. Is that right?
Peter: Yes. In this specific example, this customer, what they want to know, it’s very simple and it’s very real for most of us, they want to answer a very simple question: What is the main reason for dropped calls at this moment in my network? There you have actual people, and so if an application has been built and they can query it in California or wherever they want to do it, and then they can figure it out for a certain geographical region, then they can act upon that.
Another example is a more automated approach. One of our customers is a manufacturer of wind turbines. They have built an automated engine that constantly looks at the data coming in from the sensors, including wind and so on, and then automatically without human intervention of course, adjust the blades and the direction of the wind turbines. Sometimes it’s people that are involved, and sometimes it’s 100% automated.
Eric: There’s really no limit in terms of the application of the technology, but what the customer really should have is a distributed environment, a need to stay on top of what’s happening in near real time and an ability to design applications that are going to affect this environment, right?
Peter: That is true.
Eric: That’s interesting. This is good stuff. Let’s talk about how you deal with edge analytics and what kinds of things you look for. You just gave me a really good example right there in the mobile world. What would be an example in this renewable energy field? What would happen if a storm comes in or what are some use case that would be very well suited for what you’re providing?
Peter: As I mentioned, we have this specific customer that within the wind turbine scenario. When you drive through the landscapes out there, you see pools of wind turbines. Let’s say at one hill you may have 10 different wind turbines and then you have a point there, a box really, and you have a database in there that constantly collects all the data. Maybe two miles up the road, you’ll have another pool of 10 or 20 wind turbines. But this company is trying to do two things. They’re trying to constantly optimize and get more kilowatt hours out of the wind turbine. Then the other thing is that they want to use the data for predictive maintenance.
They’re really trying to constantly predict which part is going to break and then go out and service it on a day when there’s no wind, as opposed to when it breaks on the most windy day. What they’re saying is they expect to be able to generate 15% more power per wind turbine by analyzing the data at a very granular level. They have been analyzing data before, but they were not able to analyze it to the level they want it, because the current platform simply couldn’t handle the four needs that I mentioned in the beginning. Also, I do want to mention this whole thing about edge computing. It’s not every single use case that needs that. There’s nothing wrong in shipping data back to a central location and then analyzing it.
It just doesn’t work for all use cases. We’ve been discussing a few of those where it actually does not make sense. But it’s going to be a combination in the future, and we believe that smart customers are going to choose platforms that give them both options.
Eric: Right, and one of the things you talk about on your Web site is time series analytics. It seems to me this is a really wonderful area for being able to understand patterns that matter to a particular company. One of the challenges is always try to figure out what are the markers in time where you want to analyze something. Is it every minute, is it every hour, is it every day, is it once a week? What are the parameters for that analysis? What’s your take on how organizations can optimize their ability to figure out exactly where to put those markers and where to understand where the value is in the technologies that they have?
Peter: First, as we talked about earlier, having the capability to do time series analytics and special functionality is very important. That’s why we have built these modules on top of our database. Whether you want to collect data by the minute or by the second is dependent on the use case. In general, most users want to have the data at a super-granular level. They’ve been told in the past, “Well, you can’t have that, because if you want to store data every single second and we have a million sensors, the amounts become so big that if you also want a decent response time when we issue a query, it’s just not possible.” But now it is.
Our advice is, of course you have to think about what makes sense, but in general if you think that collecting data every minute makes sense, you should probably consider what it would mean to you if you had the data at a much lower granularity, i.e., every second. Would it really make a difference? Now as I said, it depends on the use case. It’s hard for me to say specifically what makes sense in general.
Eric: No, that makes complete sense. Let me throw this one at you. You cater to a number of different industries. We’ve talked about a couple of them already. What about supply chain? It seems to me that very large organizations, especially those who are heavily reliant on supplies being there on time (I’m thinking about car companies, any kind of manufacturer that’s dealing with very expensive products) that something of this nature could be extremely valuable to them. Can you talk about what you do for the supply chain world and how that works?
Peter: Yes, definitely. It makes a lot of sense, and this is happening already. You have sensors in containers, shipping containers, on trains, in trucks, and so on. It allows you to do many things, right? It allows you to, of course, predict when the goods are going to be there, which is fairly simple. It allows you to monitor, for example, the temperature in every single shipping container. If you have perishable goods, you can predict if there are going to be problems, how long it can stay on the shelves, and so on. There are many, many, many good examples on that. Something that’s related to this, it’s not necessarily logistics but it’s close enough, is the whole telematics space, where companies are putting GPS devices and sensors into cars and trucks and using it for all sorts of purposes.
One that we are specifically involved in is usage-based insurance, and many people have read about insurance companies moving toward that model. Which basically means that if you keep speeding and doing all the bad things, your insurance rate will go up and if not, it’ll go down. That is one area where this technology is being used. It does require these companies to constantly analyze the data. If they can analyze it to a granular level, they can calculate the right premium, they can get the right customers and get rid of the wrong customers. It really makes a difference.
Eric: You’re reminding me of one of my favorite parts about the Internet of Things, my big sort of half joke is that machines don’t lie. Of course, people can remember things incorrectly, they can give false information, but machines are not going to choose to deceive anyone. I view this Internet of Things as really, a whole new green field for better understanding the dynamics of many of these industries. You talk about the car industry and the insurance industry, for example, think of how many cars are out there and how many insurance contracts. How we’ve really been playing a guessing game for many years now in trying to understand what are the chances of this person going to get in an accident, what are the chances of something else happening?
Now once you get the actual data and can map that against people, regions and different things like lifestyle, it seems that the picture is going to get much more clear in terms of what the opportunity cost is, for example, of offering a certain plan. What the likelihood of some kind of event is going to be. It really does change the game for how any kind of insurance company or other manufacturer can predict their required construction costs and so forth. It seems to me this is a new era that’s just now beginning to unfold. What do you think?
Peter: I totally agree. It is very early days with the Internet of Things. I look at this as…you remember this little term called e-commerce from 10, 15 years ago. We could all see the opportunities, yet it took some time to get it implemented everywhere, and there’s still a lot of development there. I think the same thing is going to happen here. We’re talking about people, we’re talking about companies, we’re talking about identifying where to focus and how to do it. As much as it makes sense for everybody, it’s just going to take 10 years to do it, just like it did with e-commerce, but boy, we’ve come a long way.
Because we’ve seen this story play out before, what’s going to be interesting if you are a CEO of a company is you can choose to embrace this and say, “I’m going to be taking a lead on this, and I want to use the Internet of Things as a competitive differentiator to get ahead of my competitors.” There are some who are already doing that. I was talking about that wind turbine manufacturer. If they can generate 15% better yield for a wind turbine, that matters, right? Or, and there will be companies doing this, that should do it and want to do it, but it’s not driven from the top management down, and they will just fall behind. Those are the companies that suddenly will be having a competitive disadvantage. What I think is so sad is, why wait five years until you’re behind your competitors? You know you’re going to have to do it sooner or later, anyway. Of course, as the head of one of the vendors, I think people should be doing it sooner.
Eric: Right. I agree. I think you’ve hit the nail on the head here. I’ll throw one last question over to you. Just based upon what I’m hearing, what I know about you already, and what I see on your Web site and from your customers, I’m guessing that you really paid a great deal of attention to, as we talked about at the top of this interview, the kernel itself — that’s your database, this columnar database. With the understanding that you’re not going to be able to predict where things are going to move next, because we are on a fairly long path, it seems to me, before this whole paradigm reaches its fruition, which will happen. I mean, e-commerce, like you said, 10, 15 years ago was pretty nascent and now it’s much more polished and you have all these different players who partner together to deliver solutions.
The end result is just good news for the customer. I’m guessing that you saw far enough into the future to know that you’re not going to be able to predict which direction things turn at any given moment in time, and so you tried to make sure that the kernel itself was rock solid. Is that right?
Peter: Yes. A couple of things that we’re betting on, we’re betting on the fact that there’s going to be a lot of data out there. In fact, we’re betting that there’s going to be much more than anybody thinks today. For example, and I compare this when I get my new iPhone, it says, do you want 16, or 64 or 128 gigabytes? I remember a couple of years ago, I wanted to save a few hundred bucks and I chose the middle one. Then we all know what happens, right? After a year it says you’re out of space. Or it’s when you buy your backup plan.
Half of our customers today are customers who come to us and say, “Look, we wanted to analyze all of this great IOT data, we standardized on a platform, and we thought we knew how this was going to develop, and it just developed much faster and much bigger than we thought, and we chose the wrong platform, because it can’t keep up.”
Half of our customers are customers who replaced whatever they chose a couple of years ago because they didn’t predict the development. So yes, that’s what we’re betting on. If you want to have a racecar, if you don’t have a good engine, it doesn’t matter what bells and whistles you have around it. That’s what we did. We wanted to build the best engine for this specific purpose and of course, continue the add functionality like what we were talking, time series analytics, but also other things on top of it.
Eric: That’s brilliant. Well, folks, we’ve been talking to Peter Jensen, CEO of ParStream. I have to say watch out for these guys, they are onto something, and I think they’re going to be a company to watch going forward. Thank you for your time today.
Peter: Thank you very much.