Inside Analysis

The Algebra of Data Says ‘Hello World’

I spent a good part of the last four months writing a book, entitled The Algebra of Data. It was a lot more effort than I imagined when I took on the task, but it has to be said, it had been decades since I did anything serious with mathematics. The book is unashamedly a mathematics book, as the title proclaims, but it has been written to make the subject as approachable as possible. It is no textbook.Front-cover

The point about data algebra is that it genuinely represents data in a software compatible manner – any data. There is a back story to why this algebra was created. It was not a small effort, and it was years in gestation.

In fact, Algebraix Data Corporation, founded by software engineers who believed a mathematical approach to data was possible, spent over six years creating, enriching and proving data algebra’s applicability. This was an extensive research activity that primarily involved using data algebra directly in a variety of data management activities: defining data, organizing data, querying data and optimizing the queries for performance.

This was the focus of the research partly because it was decided that the best area to prove data algebra was in using it to manipulate and transform data in applications that did little else: database optimizers for data in both tables and graphs. I had no involvement in that research activity, by the way, although I was briefed on it a couple of times prior to completion of the mathematical work.

Algebraix decided early this year to reveal the mathematical side of its activities to the world, and thus I spent a good deal of time co-authoring The Algebra of Data with Professor Gary Sherman, who was the mathematician who created and refined data algebra while working for Algebraix. In regards to the book, he articulated the mathematics, and I did most of the writing in between.

The uniqueness of data algebra

The book launch consisted of a short keynote at the the NoSQL conference in San Jose earlier this month. One of the messages of that presentation was that “there has never previously been a data algebra.” This is true, although mathematically speaking, data algebra does not actually include any new mathematics. It is an application of set theory to data.

I have met some people in the past few weeks who were under the impression that so-called “relational algebra” was set theory. It is not, for a variety of reasons. It started out as an attempt to apply set theory to data, it just went badly wrong. However, it was successful enough to be employed, more or less imperfectly, by a whole generation of relational databases – and it’s the main reason why the mathematical effort that Algebraix put in was not carried out long ago. There was an incorrect but dominant assumption that mathematics had been tried, and relational database was all it could achieve.

Now that the fundamentals of data algebra are out there – available to anyone who cares to “do the math” – it’s likely, if not inevitable, that it will spawn new efforts to get mathematical with data. Whether that interest leads to new database products, new Hadoop components or products as yet undreamed of is difficult to say. Time will tell.

In the meantime, if you want to get familiar with data algebra, the book can be found on Amazon. There’s also a Python Library that Algebraix created for Python programmers who want to mess with the math:

GitHub project page
PyPI project page
Documentation

The release of the book has already provoked a few articles on the web:

Forbes.com: Only algebra can save us now

Forbes.com: Semantic Technology: Building The HAL 9000 Computer? (just a couple of paragraphs on the third page of Jason Bloomberg’s article from the NoSQL conference)

BusinessInsider.com: This company is using insanely complicated math to save its customers tons of cash (I wouldn’t characterize the math as “insanely complicated” unless you think all math is insanely complicated)

Tech.co: Algebraix Library Brings Algebra to Data Usage

We’ll add to that list if it grows, of course. Currently, I’m in the process of preparing training courses for people who want to get to know the math. We presume there will be some valiant IT people who choose to explore the concepts. We’ll let you know when we can deliver.

Robin Bloor

About Robin Bloor

Robin is co-founder and Chief Analyst of The Bloor Group. He has more than 30 years of experience in the world of data and information management. He is the creator of the Information-Oriented Architecture, which is to data what the SOA is to services. He is the author of several books including, The Electronic B@zaar, From the Silk Road to the eRoad; a book on e-commerce and three IT books in the Dummies series on SOA, Service Management and The Cloud. He is an international speaker on information management topics. As an analyst for Bloor Research and The Bloor Group, Robin has written scores of white papers, research reports and columns on a wide range of topics from database evaluation to networking options and comparisons to the enterprise in transition.

Robin Bloor

About Robin Bloor

Robin is co-founder and Chief Analyst of The Bloor Group. He has more than 30 years of experience in the world of data and information management. He is the creator of the Information-Oriented Architecture, which is to data what the SOA is to services. He is the author of several books including, The Electronic B@zaar, From the Silk Road to the eRoad; a book on e-commerce and three IT books in the Dummies series on SOA, Service Management and The Cloud. He is an international speaker on information management topics. As an analyst for Bloor Research and The Bloor Group, Robin has written scores of white papers, research reports and columns on a wide range of topics from database evaluation to networking options and comparisons to the enterprise in transition.

3 Responses to "The Algebra of Data Says ‘Hello World’"

  • Steve
    August 31, 2015 - 11:17 am Reply

    Paperback only?

  • A Peterson
    August 31, 2015 - 12:34 pm Reply

    Sorry, Bloor Group, but belittling the work by E.F. Codd on relational algebra almost 50 years ago is not a good start. While sound, relational databases have their limits.

Leave a Reply

Your email address will not be published. Required fields are marked *