I spent a good part of the last four months writing a book, entitled The Algebra of Data. It was a lot more effort than I imagined when I took on the task, but it has to be said, it had been decades since I did anything serious with mathematics. The book is unashamedly a mathematics book, as the title proclaims, but it has been written to make the subject as approachable as possible. It is no textbook.
The point about data algebra is that it genuinely represents data in a software compatible manner – any data. There is a back story to why this algebra was created. It was not a small effort, and it was years in gestation.
In fact, Algebraix Data Corporation, founded by software engineers who believed a mathematical approach to data was possible, spent over six years creating, enriching and proving data algebra’s applicability. This was an extensive research activity that primarily involved using data algebra directly in a variety of data management activities: defining data, organizing data, querying data and optimizing the queries for performance.
This was the focus of the research partly because it was decided that the best area to prove data algebra was in using it to manipulate and transform data in applications that did little else: database optimizers for data in both tables and graphs. I had no involvement in that research activity, by the way, although I was briefed on it a couple of times prior to completion of the mathematical work.
Algebraix decided early this year to reveal the mathematical side of its activities to the world, and thus I spent a good deal of time co-authoring The Algebra of Data with Professor Gary Sherman, who was the mathematician who created and refined data algebra while working for Algebraix. In regards to the book, he articulated the mathematics, and I did most of the writing in between.
The uniqueness of data algebra
The book launch consisted of a short keynote at the the NoSQL conference in San Jose earlier this month. One of the messages of that presentation was that “there has never previously been a data algebra.” This is true, although mathematically speaking, data algebra does not actually include any new mathematics. It is an application of set theory to data.
I have met some people in the past few weeks who were under the impression that so-called “relational algebra” was set theory. It is not, for a variety of reasons. It started out as an attempt to apply set theory to data, it just went badly wrong. However, it was successful enough to be employed, more or less imperfectly, by a whole generation of relational databases – and it’s the main reason why the mathematical effort that Algebraix put in was not carried out long ago. There was an incorrect but dominant assumption that mathematics had been tried, and relational database was all it could achieve.
Now that the fundamentals of data algebra are out there – available to anyone who cares to “do the math” – it’s likely, if not inevitable, that it will spawn new efforts to get mathematical with data. Whether that interest leads to new database products, new Hadoop components or products as yet undreamed of is difficult to say. Time will tell.
In the meantime, if you want to get familiar with data algebra, the book can be found on Amazon. There’s also a Python Library that Algebraix created for Python programmers who want to mess with the math:
The release of the book has already provoked a few articles on the web:
Forbes.com: Only algebra can save us now
Forbes.com: Semantic Technology: Building The HAL 9000 Computer? (just a couple of paragraphs on the third page of Jason Bloomberg’s article from the NoSQL conference)
BusinessInsider.com: This company is using insanely complicated math to save its customers tons of cash (I wouldn’t characterize the math as “insanely complicated” unless you think all math is insanely complicated)
We’ll add to that list if it grows, of course. Currently, I’m in the process of preparing training courses for people who want to get to know the math. We presume there will be some valiant IT people who choose to explore the concepts. We’ll let you know when we can deliver.