I recently attended a MapR analyst conference – their first, as it happens. I came away with the confirmed opinion that MapR is very likely to be the winner in the three-way Hadoop technology race that has been in progress for at least five years. I could justify this opinion in different ways, but I’ll mention the one that catches my attention most. About 40% of MapR’s customers began their projects on other distributions. They migrated to MapR because they encountered problems – usually production problems – that other Hadoop distributions couldn’t handle. And just in case you think there’s a kind of square dance going on where Hadoop users regularly change their distros in time with the music, you should also know that 99% of MapR’s customers stay put. No doubt there are reasons for that.
A migration that MapR is only too pleased to discuss is its participation in the Indian biometric project. You may have read about this. The Indian government decided to take biometric readings for every member of its population – roughly 1.2 billion people. The idea is for the biometric reading to serve as an identity card (a remarkably foolproof one), allowing citizens swift access to all the government services they are entitled to. The project was started using the Cloudera distro but migrated to MapR when scalability issues arose. The project is now nearing completion of the data gathering phase, with over 800 million biometric readings already taken. In terms of sheer scale, this easily falls within the definition of Big Data.
If you are wondering what’s so different about the MapR distribution, in my opinion, the biggest differentiator is MapR-FS, MapR’s Posix-compliant file system. Vanilla HDFS is strangely constrained to an append-only write capability, whereas MapR-FS is a true file system. In some applications that makes a huge difference. If you couple MapR-FS with MapRs Streams (which is similar to Kafka, but with some added bells and whistles) you get some obvious advantages. They include superior scaling in very large clusters and, additionally, a certain economy of resource usage.
There’s a good deal more I could add, but the point is that of the three Hadoop distros, MapR was the one that paid attention to the technical details and formulated a genuine engineering vision for the evolution of the Hadoop ecosystem. As I was sitting on the airplane back to Austin, I was reminded of Sun Tzu’s words. “Strategy without tactics is the slowest route to victory, tactics without strategy is the noise before defeat.” MapR did the strategic work and has forged a genuine technical and market strategy, and it’s beginning to have an impact.