Inside Analysis

A New Database Paradigm: Speed at Scale

Eric Kavanagh, Bloor Group CEO, chatted with Peter Goldmacher, VP of Strategy for Aerospike, on February 8, 2016. Here is a transcript of the interview.

Eric Kavanagh: Ladies and gentleman, hello and welcome back once again to Inside Analysis. My name is Eric Kavanagh, I will be your host for today’s conversation with Peter Goldmacher. He is VP of Strategy for a company called Aerospike, an open source database company doing some very interesting things. Peter, welcome to the show.

 

Peter Goldmacher: Hey Eric, thanks for having us.

Eric: Sure thing. Why don’t you start us off with just a few quick words about Aerospike, where you see yourselves in the market and what’s going on with you?

Peter: Thanks for the opportunity to talk to you today. Aerospike is an open source, NoSQL database specializing in speed at scale. If you look at the database world today, we’ve transitioned databases from a product to a category and there are many, many databases for application developers to choose from. We don’t consider ourselves a general-purpose database; we are quite specific. We are a key value store and our expertise is speed at scale, and I’m happy to tell you more about it as we go into the conversation.

Eric: Okay, good. The serendipity of the timing here is interesting because as it happens, I’m working on an article right now that’s going into the evolution of database over the last 15-20 years or so and I thought back to interviews I did back in 2005 to 2007 with people who are, without question, visionaries in the field of database. One of those folks was Doctor Michael Stonebraker, I’m sure you know of him. At the time he was promoting Vertica.

It’s interesting because you have mentioned this whole concept of how database is now not a product anymore; it’s really a category. What I find so fascinating is that’s precisely what Doctor Stonebraker was saying back then. His mantra at the time was focused on the one size fits all storyline which he was challenging, and his point was that IBM and Oracle and Microsoft were pushing their standard relational database technology as suitable for any kind of use case.

I recall that he even specifically mentioned, because I asked him why, that part of the problem was that it was so difficult to articulate the need of a different database type to marketing people, to PR people, to the media and then of course, even to prospective customers out there. But I thought that was interesting when I came across Aerospike at the Strata Conference and you talked about this database not being a product anymore but a category. I thought to myself, I remember the earliest days of people making that pitch but what do you think about that?

Peter: Yes, I agree 100%. It takes guys like Mike Stonebraker to have the vision, probably five or 10 years before anyone else and actually build a product and show people what he means through the products he’s built. When you look back on the history of the database, if you think IBM built IMS in the 1970s which was a hierarchical database, it had a 15-year run before Oracle got involved and started to disrupt the database world with the relational database. Then there was another arguably 25-year run before the traditional relational database world started to get disrupted by some of these new, open source document databases, key values stores, time-series, graph databases so the database world as guys like you and I, who are a little bit older historically think about it has been an IBM, Oracle world but what’s happened to catalyze this change is the nature of data has changed dramatically.

When you think about IMS and about DB2 and Oracle 7 which was Oracle’s big database, they really kind of did the same thing which was they were addressing specifically business data. Right? This is data about your business. It’s very rigid, very clearly defined, and they were building very concise applications, rigid applications on top of a rigid data set and they didn’t want flexibility and they didn’t want a whole lot of variability because if you want to build an accounts payable app, you know how that works.

If you want to build order entry or manufacturing or billing materials, you need a very, very tight and highly functional database to support these applications and these workflows. But what you start to see probably in 2005 or 2006, and this is when Mike Stonebraker started to proselytize his vision, is the nature of data changes because the cost of compute starts getting so cheap that we can all walk around with essentially PC’s in our pockets. All of a sudden, if my ability to create data goes from specific business data: did somebody order something from me, did I pay them, did I deliver it, to I’m now browsing the Internet all the time and giving out GPS data and there’s all this what I like calling or I’ve heard called data exhaust, all this other data that’s not really about the business but could potentially influence the business. If that’s coming to the fore, the opportunity to capture that data and make sense of that data and analyze that data and then feed that back into a transactional system creates an entirely new opportunity.

You really have a couple of problems when you think about these data sets and the exponential growth they’ve gone through. One is, how am I going to capture it all technically and two is, how am I going to capture it all affordably?

The venture community gave everybody a gift when they said, “We’re going to fund big data companies,” and depending on which data you look at, it looks like the venture community has put about $25 billion into about 300+ big data companies that are all going after different angles of the same problem. They’re all going after: How do we capture this data? How do we leverage this data? How do we make sense of this data? How do we bring this data back into a system that lets us differentiate ourselves from our competitors and become a better business?

Peter: We look at the Googles of the world and the LinkedIns of the world and the Facebooks of the world and these guys really are the big data companies. Companies like Aerospike, we’re only enablers. It’s the application guys that use the infrastructure technology like Aerospike that are really going to create the next big thing. We’re the pick and shovel guys in the gold rush.

Eric: I’m glad you brought up this whole concept of competition because I’m a big believer of all these different clichés over the years, one of which being: competition breeds excellence. When you think about the amount of competition in the database space these days, that tells you how much activity is going on and let’s face it, even though venture capital does come into support these ideas, venture capitalists are in business to make money. They’re placing bets in areas that they think are going to win.

Obviously, they don’t all win but that’s what they’re thinking and what this tells me is that there is a tremendous variety of needs for using data these days which is why, to your point, database is now becoming a category. As I think about other metaphors, I think about how pressure turns coal into diamonds. That’s what is happening right now with some of the leaders in the database space meaning there is so much pressure, there is so much data just bearing down on companies and organizations, and these needs that companies have to make use of that data are what’s driving the innovation creating the kinds of architectures that you at Aerospike use to fulfill very specific business needs which as you say is that speed at scale, right?

Peter: Yes. Let me give you my perspective on how the landscape for databases is shaking out. There are two opportunities. One is cost avoidance and one is creating new opportunities. In the cost avoidance category, there are a lot of pretty well-funded, pretty well-known databases in the market that are NoSQL databases that do a lot of the things that people have historically done in relational databases but they do them a little bit better and they do them a little more cheaply.

I look at Mongo, MongoDB, which is a fantastic document database and what you can do in MongoDB is so much more robust and useful and lower cost than the way you would have had to do it in an Oracle database. Mongo’s got a great opportunity. It’s always hard to replace a database but for new opportunities where you might have thought of using Oracle and you can now use Mongo, it’s great. These are things like product catalogs or content management systems or just fairly simple websites where Mongo’s calling card is ease of use. They are really capitalizing on a very large opportunity and that’s not our market. Our market is a little bit different. We see our market as helping very forward-thinking companies build applications that have never been built before. Our early success in AdTech really makes this point.

If you think about AdTech, the value in the AdTech world is for these application providers to write applications that make sure that they’re serving up the most relevant banner ads to the right people at the right time. If you understand the dynamics of that market, what you have to understand is they’ve got a gigantic profile store of a couple of 100 million Internet users. They know Eric Kavanagh is a 30-something male who lives in New Orleans and who likes these things and knows this because they’ve collected your cookies over the past 10 years and they’ve booked a profile of you.

That profile might change a little bit day to day but it doesn’t really change very much. What changes day to day is your activity on the Internet and what things is Eric interested in today. Our AdTech customers have built applications that say: “I have written an algorithm that is incredibly sophisticated, and the life blood of that algorithm is the data I’m collecting every second of every day that gives me Eric’s profile and also Eric’s activities day to day.” We are an integral part of that so the way it works simply is these AdTech guys load your profile and 300 million of your closest friends’ profiles into Aerospike every morning.

They load Aerospike with here’s Eric, here’s his profile, here’s what we know about him and then over the course of the day, they collect web traffic from everybody. If you go online at the beginning of the day and you show up, they say, “We know Eric lives in New Orleans, it’s going to be raining in New Orleans. Serve him an ad for an umbrella or something simple.” But over the course of the day, if you go on the Adidas website or the Nike website and the Puma website, they know, “We know Eric through his profile enjoys exercise. He’s showing us he’s shopping for something. What if we serve him an ad for a discount at Sports Authority or Modells.”

All of the sudden, the value that that AdTech provider is giving to their customers is enormous because they can say, “I know who Eric is. I know what he’s interested in today and if you serve him this banner ad, the probability that he clicks through and does a deal is good,” and that’s how they differentiate themselves from the competition. That’s a new, new thing. That business and that business model didn’t exist five or 10 years ago because the technology didn’t exist to support it, so now we think about database not as innovation inhibitors but as innovation enablers because if you can think of it, you can actually execute it because the core infrastructure technology is there.

Eric: You made a bunch of really good points there, and I’ll throw out this one too. It seems to me with a technology such as yours, these companies that are born in the data world, that are truly data driven, like these AdTech companies, they simply would not have been able to build their solutions on some of this data technology. They need a very purpose-built system to underpin the speed and scale and demands of velocity of data that they deal with, right?

Peter: Yes, exactly. Let me share another observation and guiding light at Aerospike. Any company that’s going to build a product, as they spec out the product whether explicitly or implicitly makes a decision: are we going for product depth or are we going for product breadth? Do we want to track a lot of buyers with a general-purpose product or do we want to go after a specific market? We’ve chosen depth over breadth. We only do speed at scale. If your problem is not speed and scale, then we’re probably not a good fit for you.

That is working really, really well for us because when we go into sales engagements and we talk to our customers about their problem and they say, “I’ve got this problem,” as long as it’s speed at scale, we can demonstrate to them exactly what we do and why no one else can do it. We help application developers cut through a lot of the noise and that’s proving to be very productive for us.

Eric: There’s a concept that I’ll throw in here, which of course is near and dear to the hearts of developers everywhere which is design point. If you try to build something in the software world, that design point is the epicenter of the functionality that you are trying to create. Now that we live in this world of big data where we have so much information that we can leverage, as you said data exhaust earlier in the call, forward-thinking companies can really start to wrap their heads around how to use all this new information. If your design point, as you suggest for Aerospike, is speed at scale that means you are enabling the kinds of organizations that are looking to leverage this vast reservoir of multivariant data types to do things like you suggested for the AdTech business but not just AdTech.

There are other spaces where that works. I’m thinking off the top of my head certainly financial services and telco. Obviously there are use cases across the board but you guys have had some success in financial services and telco I believe too, right?

Peter: Yes, we have. They’re our two biggest verticals where we’re focusing the most. Let me give you a couple of customer use stories because I think they help illustrate the point. We’ve got two big financial services customers and unfortunately we don’t have permission to use their name, I wish we could because I know you’d be impressed. We have two pretty different use cases but they’re both speed at scale. One is a fraud use case and one is almost a, if I could just be as simple as possible, almost a cash replacement use case.

Let me give you the fraud use case first because I think it’s most interesting. When an electronic payments customer is scoring fraud, the biggest problem they’re solving for is a time horizon. They have 750 milliseconds, which is about three quarters of a second, to determine whether or not the electronic payment is fraudulent and if it is, they deny it and if it isn’t, they let it through. One of the biggest problems in fraud is false negatives which is denying a valid claim. It’s a problem because it upsets your customers, they get annoyed and it’s also keeping good money out of the pockets of the electronic payment provider.

The reason we get false negatives is because the fraud algorithms that have historically been written on traditional databases can’t really take advantage of all the data that exists because the databases are to slow. Historical fraud algorithms ask pretty simple questions: Who is this person? Are they current on their payments? Are they over their credit limit? Is the amount under $50? Let it through. That’s not really enough diligence on every charge to know whether or not that’s a valid charge.

Our customer who is a very forward-thinking electronic customer said, “Look, we want to write better algorithms and we want to feed those algorithms with more data. We want to include questions in our algorithm such as: Is this charge with a vendor we know and trust? Is this charge coming from a device or an IP address that we know and trust? Is there something in this person’s charge history that would indicate that this charge is valid? Very, very simply, if I bought a plane ticket to London and I booked a hotel in London and I landed Heathrow and I try and buy a cup of coffee from Coffee Republic, you should know that that’s me and you should let that charge go through.

The opportunity is to write a fraud algorithm that enables these electronic payment providers to ask a lot more questions that can only be supported by a database that can ingest a lot more data and run transactions on that database much, much faster than you can in a traditional database paradigm. That’s a great use case for us, that’s where speed at a scale really matters. We can collect a ton more data and you can write an algorithm that just hits that database, reads and writes much more quickly than others.

Let me give you a second example and this is another high-profile customer whose name we can’t use yet and I wish we could. This is an online trading platform and the challenge these guys have is one of their big initiatives has been mobile. They’ve built pretty good mobile technology so that anybody with an iPhone or an android can access their account quickly and easily and they can trade or they can look up stocks or they can check balances and this is effectively put an enormous load on the back end of this system so they’ve got a mainframe as it’s system of record.

Then they had to put a cache in front of that system of record where they were populating that cache. So everything coming over the web or mobile devices was hitting that cache and that cache was going down and it was incredibly expensive. They saw their mobile data growth doubling and tripling over the next few years and the costs were going to double and triple, not to mention the complexity of managing that system. What we’ve essentially done for these guys is eliminated that cache.

That cache is gone and what they’ve done now is they populate Aerospike every day with mainframe data and everything that comes in during the day hits Aerospike. It’s real time and it’s a lot less expensive because one of the elements of Aerospike that we don’t talk about as a primary attribute but which is a really, really important secondary attribute is we run on sometimes as few as half or a tenth as many servers as our competitors do and this is Cassandra or Oracle or pick your favorite database. We have dramatic total cost of ownership advantages.

It’s not just fewer servers but it’s less admin and that doesn’t mean anything if you can’t demonstrate speed at scale but when we talk about speed at scale and it works and it’s better and it’s half the cost or a tenth the cost, people tend to notice that.

Eric: That’s a really good point and here I am thinking of analogies again. The one popping into my head is in the world of audio engineering, they always tell you that the fewer components you have strung together, the better. I know for sure that’s true just in terms of how much resistance you get on the line and how much static you’ll hear on the line and other different artifacts that come in. This is actually one of the reasons why VOIP is not nearly as good in terms of sound quality as old-fashioned, copper wire because you have some of these larger networks at large organizations and if there’s a spike in web traffic, if someone is downloading a large file or something, it’s actually going to impact your phone call.

When you talk about being able to deploy with 10% 20%, 30% of the servers as some of these other solutions, that’s a seriously big deal for a number of reasons. One of which, and I think this is kind of the dirty, little secret of Hadoop in general is that all that stuff goes over the network. That’s all over TCPIP and that will in fact slow you down. That’s an overhead, it’s a tax so when you need speed at scale which is what you’re talking about, you want to have those simpler environments, right?

Peter: You absolutely do. It’s funny, one of the things we find is we do a pretty brisk business on AWS. We have a lot of engineers that download the product and they want to play around with it and when they’re successful, they almost always bring it on premises because they want that speed. One of the things that we make sure we address early in every sales cycle with every customer is how important is speed to you because if it’s only kind of important, then there are other options for you in the market. If it’s really important, then you really don’t even need to talk to anyone else, we’re the guys.

Eric: It’s pretty funny, reminding me of one of my favorite vendors from days gone by. I spent some time working with a company called Razza. They had a Razza dimension server, which was actually bought by Hyperion and then bought by Oracle. There’s a characteristic of good technology people in the enterprise software space that noticed and it goes like this, a really good, honest, trustworthy software executive won’t just tell you what they can do, they’ll also tell you what they don’t do.

I used to hear this guy, Doug Goncozzi was his name, I would hear him talking to the major banks, Wells Fargo, Chase, different people, different conferences and they’d say, “Here’s what we do and here’s what we don’t do.” They would come right out and put that on the table and that’s really useful information because you’re telling people, “Look, if you don’t have these needs, you probably should explore some other option.” I think that just engenders trust and really builds your reputation too, right?

Peter: Yes. It’s interesting you bring that up. That’s something we really embrace here. What you see in the market today is with so many databases and so many database companies that are so well funded, PR departments and PR activities are completely hyperactive. You’ve seen a lot of what have historically been kind of bits and bites transmogrified by PR into these high-level concepts and ideas that if you try to look back and understand what the real feature or benefit is, it’s hard to understand.

You have a lot of confusion out in the market. One of the things we realized pretty early on is, we have an explicit technical advantage, almost an unfair advantage. One of the things we’ve done is we’ve said, “Let’s not be all things to all people. Let’s just hammer on the one thing we do really, really well, speed at scale.” I’m not sure if you’ve been to our website in the last couple of weeks but we’ve published what we call a benchmark manifesto.

Eric: Yes.

Peter: What we’re trying to do is to take benchmarks back from the marketing department and put them where they belong which is in product. We’re trying to shine light on benchmarks and help developers understand what a good benchmark looks like. What elements does a good benchmark need to capture and how should you look at a benchmark? In that process, what we’ve also done is we’ve said, “If you are building an application, your infrastructure decision is one of the most important decisions you’re going to make and we want to help you understand your database choices because we don’t really want to be misappropriated. We don’t want you to build your application on your database if our database isn’t the best match but if you are going to build an application and our database is the best match, you better make sure you build it on Aerospike or else you’re going to be in a lot of trouble later.”

We have taken it upon ourselves with this benchmark manifesto to say, “This is what a good benchmark looks like,” and now we’re going after a number of benchmarks that have been published in the market. Redis just did a benchmark, Cassandra published a benchmark and we’re going after these benchmarks and saying, “Look, here’s where they fall short,” or, “Here’s how we compare to these benchmarks,” and we’re really trying to make sure that we’re spreading information and not misinformation and bringing benchmarks back to what they’re supposed to be I think is going to just be a credit to the entire development community.

Eric: That’s, of course, music to our ears here at the Bloor Group too because the whole idea from our perspective in everything we do is to help bring clarity to the communication that people receive about technologies. If you’re out there in the real world trying to solve some problem, trying to build a solution, you really do need to get that brass tacks information about how things work, what they do, why they work a certain way.

Back to that earlier issue, what the design point was and for folks listening to this podcast, they can check out the webcast (Tectonic Shift: A New Foundation for the Information Economy) we did with Aerospike late last year to really get an understanding for the architecture because that’s really one of the hallmarks I think of the tools and technologies that will be successful going forward, is architecture really matters these days. If you think 20 years ago, with the exception of some really high-velocity environments like banking, most organizations could get by with a database that just did okay.

I think of when you go to the store and someone’s like, “Our computers are running slow,” or you’re on the phone with someone at a call center, “I’m sorry, our computers are running slow.” A lot of times that’s a database issue underneath or it’s a network issue or some combination. These days, especially if you are a cutting-edge organization, good enough is just not going to cut it. If you need that speed at scale like you talked about, if you’re in financial services or if you’re in AdTech or you talked about telco a bit too, we should get into that a bit more as well, if you’re in these spaces then you cannot compromise performance. You have to focus on that, you must get it right.

Peter: Right. A lot of our customers look at performance as a priority but they don’t always immediately start with performance or speed at scale. A lot of times, they’re looking for opportunities to create applications that give them competitive advantage. Let me give you a story about one of our customers, Alcatel-Lucent. Alcatel-Lucent is a telecom services provider. They write a lot of the applications that the carriers like AT&T and Verizon use, such as billing applications and usage applications.

When you get a text from AT&T and its says, “Peter, you’re about to go over your data usage limit. Do you want to buy a gig for $15?” You buy it. What happens occasionally is you had two days left in your billing cycle, you ended up using 50 megs, not a gig and you really overpaid and that makes you annoyed and it makes you unhappy with your carrier and it increases the probability that you churn from that carrier and I leave AT&T and go to Verizon. Why does that happen? Is it that AT&T wants to monetize every transaction? Absolutely not.

It happens because the databases that track billing and the databases that track usage are two different applications. AT&T doesn’t know that my billing cycle is almost over, they just see my data usage spiking. What Acatel-Lucent did is they said, “Look, let’s rewrite the application. Combine billing and usage into one application on one database and it’s got to be a really, really, really big database because usage data is enormous and it’s also got to be really fast because we got to let people know in the moment when they’re about to go over limits or when they’re about to trigger certain things in their billing cycle, it’s going to charge them more money.

Now, what AT&T can do is they can say, “Hey Peter, you have two days left in your billing cycle. We know you use about five gig a month. It looks like you’re going to use about 5.1 gig this month, do you want to buy another two hundred gigs for three bucks?” Of course I do because I don’t want to pay the overage fees so I do that transaction. I feel like AT&T is looking out for me, everyone’s happy and if you think about AT&T’s business, I’m paying them a $150 a month. They don’t care about the occasional extra $15 from me. What they care about is that I don’t churn. If they can reduce the churn from their install base, that is extremely valuable to them and much more valuable than the occasional $10 or $15 transaction.

What Acatel-Lucent did is they said, “Look, let’s do something that we’ve done historically. Let’s reimagine it, let’s do it a lot better and let’s really provide value to our customers,” and that’s helping them be a much better provider and add much more value to their customers which is always good for business.

Eric: That’s a really great story. I think it’s a good closing story too because it ties together a lot of the themes that we’ve been talking about. To me, one of the real keys here is maintaining an open mind and an open perspective on how to solve problems because these new purpose-built technologies enable a whole different range of solutions for dealing with some of the historical challenges that we’ve had across industries. The more people can educate themselves about why architecture is important and how these new and hardened technologies, for some of them that have been around for a while, can come together. That’s where the big successes are going to come in the future, right?

Peter: Absolutely. We believe very strongly that there are going to be trillions of dollars in market cap created by big data companies. It’s already happened. LinkedIn is a big data company, Google is big data, Facebook is big data. Every single company, no matter what vertical you’re in, if you make tractors or you drill for oil, it doesn’t matter what you do, if you’re not using data to help you compete, your competitors are and it’s going to be a problem for you. What we want to do is we want to help people that know that when they build their applications, where they need speed at scale, they come to us and they build those applications on top of Aerospike.

Eric: I love it. This is exciting stuff. This is a great time and just as a reminder of how we got to the point we’re at today. Like I said at the beginning of this call, a lot of these technologies have been in play for a long time. The open source movement in many ways dates back to the 1970s if you look at Python or R, the roots of these different languages and it’s just taken off and it’s only going to get better. I have to commend you at Aerospike for really focusing your attention on, as you suggest, speed at scale. Good job.

Folks, we’ve been talking to Peter Goldmacher of Aerospike. Thanks for your time.

Peter: Okay, all right, thanks a lot.

Eric: All right, take care, bye bye.

Peter: Okay, bye bye.

Leave a Reply

Your email address will not be published. Required fields are marked *