Every now and then something occurs that tears up your expectations of how technology will evolve. This happened when I looked at the graph shown below:
The VectorWise results are jaw dropping.
[singlepic=162,530,530,,center]This is a set of TPC-H benchmarks. The TPC-H is the standard BI benchmark for BI databases. It comprises a collection of business oriented ad-hoc queries coupled with concurrent data modifications. The queries and the data were chosen to be typical of the traffic for most large volume BI business usage. The queries include some that are complex. All in all, it’s a reasonable benchmark. The QphH@x figures you see in the above chart refer to the TPC-H Composite Query-per-Hour Performance Metric which is the “0-60 figure” that is distilled from the benchmark.
So this is like a set of 0-60 mph figures with four cars scoring somewhere in the 5-7 second range and the final one scoring below 2 seconds. It’s astonishing. Something inside you says, “that’s no car, it’s a rocket.”
What Is VectorWise?
I could write a book on the limitation of benchmarks for judging database performance. Benchmarks do not reflect real world product usage. Vendors do whatever they can, short of actually cheating, to score the highest possible figure. Benchmark teams spend weeks studying the details of the benchmark and determining what features they can switch off to improve the final figure.
Nevertheless, this VectorWise figure is extraordinary.
This is the first VectorWise benchmark. Ingres doesn’t have a seasoned team of benchmarkers who have honed their skills over years. It may be even be that the team was not particularly skilled at doing benchmarks – but if so, it doesn’t show. You simply cannot achieve a result like this on this kind of benchmark by accident.
The massive performance advantage of VectorWise begs for explanation…
What Is Vector Processing?
I knew about vector processing way back – because I’d had the curiosity to research into “what made supercomputers so super.” One feature of supercomputers is a devotion to parallelism at every level so that as much processing that can be, is speeded up by parallelism. A specific kind of parallelism that supercomputers use is the very fine-grained parallelism of vector processing. The idea is simple:
Quite often you end up processing one dimensional arrays (i.e. columns) of data. If you want to do something with the whole column, like add it to another column, you can do this serially by adding each item at a time. Alternatively (with the use of vector registers) you can do it with one instruction that adds all the items in parallel. This is a good deal faster.
This activity is sometimes referred to as Single Instruction, Multiple Data (SIMD). The use of SIMD and vector registers is one of the techniques that gives VectorWise its blistering performance. Indeed, as I understand it, the product was designed specifically to exploit this capability.
If you are wondering why nobody did this before, then there are several factors to consider:
- SIMD capabilities never became available on commodity chips (Intel & AMD) until around 1999.
- There were early problems with SIMD slowing down the main processor activity, which took a while to iron out.
- Database is not an obvious area of application for SIMD, whereas 3D Vector graphics, for example, is. Neither is it an easy area to apply SIMD techniques. So few developers thought of using SIMD in database development.
- It wasn’t until about 2004 that column store databases emerged and cpus started to add processor cores. Both of these developments are positive for the use of SIMD.
The Impact Of VectorWise
We have come to expect a doubling in CPU power every 18 months or so, but this has never translated into a doubling in database performance – even though all the components of a computer (except the disk read heads) seem to double in speed or capacity at roughly the same pulse. In practice, we tend to get a performance increase at the rate of 30-40% each year. That’s how it goes.
So, in simple mathematical terms, this inevitability puts VectorWise 4 years ahead of the competition in terms of performance – and it will remain 4 years ahead until some competitor finds a way to catch up at a software level. This is unprecedented.
VectorWise is clearly going to make a huge impact.
I sense a disturbance in the force.
February 19, 2011 - 3:08 pm
What makes this benchmark even more amazing is that they *didn’t* game it. I’ve been getting similar results myself, using standard hardware. Anyone can get this level of performance. See http://www.rationalcommerce.com/uploads/V16-Benchmark-20100630.pdf and http://community.ingres.com/forum/blogs/rhann/79-vectorwise-tpc-h-results-posted.html.
February 20, 2011 - 3:15 am
Robin,
Thanks for this glowing review. VectorWise is indeed a revolutionary product, and we are very proud to introduce this new query execution technology to the database market.
I need to take exception to the statement:
“Ingres doesn’t have a seasoned team of benchmarkers who have honed their skills over years. It may be even be that the team was not particularly skilled at doing benchmarks – but if so, it doesn’t show. You simply cannot achieve a result like this on this kind of benchmark by accident.”
The Ingres Performance Engineering team is second to none. John Galloway is our senior performance architect, with over 30 years of distinguished service in the OS, storage and database industries. In addition to his duties at Ingres, John is the chair of the TPC-H Technical Sub-Committee, which defines, develops and maintains the TPC-H benchmark.
Rilson Oscar do Nascimento is another database performance veteran. He builds our TPC benchmark kits, as well as kits for other benchmarks, e.g. SSB and DBT-x. Rilson is one of very few performance engineers to have developed kits for all the TPC benchmarks!
Best regards,
Dan Koren
Direector, Performance Engineering
Ingres Corporation
February 20, 2011 - 6:31 pm
What I was pointing out is that Ingres doesn’t do TPC benchmarks and publish the results on a regular basis. Other companies that do, game the benchmarks (all of them do as far as I know). My experience in this areas suggests that it takes a few iterations before a company gets good at achieving exaggerated results.
I cannot remember ever seeing Ingres publish a TPC benchmark. I searched the web for evidence of any and found none, until VectorWise. Please correct me about this if I’m wrong.
February 21, 2011 - 12:22 am
Robin,
Thanks for the prompt follow up.
May I respectfully point out that two different ideas got mixed up?
It is indeed true that Ingres has never published a TPC benchmark
result — until now! The irony of it is that Ingres was one of the
companies that founded the TPC in 1988, and was a full member until
sometime in the early 2000′s when CA management decided to no longer
sign the checks. We rejoined the TPC in 2006 when I joined Ingres to
set up a performance engineering program — unlike other database
companies Ingres never had a performance engineering team until
then!
One should not infer however that the Ingres performance engineering
tesm is inexperienced! We simply acquired our experience working for
other database and/or system vendors — Sun, MIPS, SGI, Informix,
Sequent, Veritas, Oracle, Versant, Portal, Data General and Itautec!
We all published groundbreaking benchmark results for our respective
employers many times over.
There is no better performance engineering team in the entire database
industry, and our results prove it.
Best regards,
Dan Koren
Director, Performance Engineering
Ingres Corporation
March 9, 2011 - 4:27 pm
DeepCloud are using Vectorwise in our new MPP engine. We are seeing fantastic speed with massive data scaling.
On 3 nodes DeepCloud is nearly 4 times the speed of Vectorwise.
See: http://deepcloud.co/web1/?p=163