There should be an obvious answer to this, but once you step past “it’s the stuff the programs store on disk”, things get a little bit murky. So let’s try to shine some light on it. Here, slightly adapted, is the Merriam-Webster on-line definition of data:
- Data is factual information used as a basis for reasoning, discussion, or calculation
- Data is information output by a sensing device or [biological] organ
- Data is information in numerical form that can be digitally transmitted or processed
What’s clear from this is attempt to nail data, is that the first two definitions are technically surplus to our digital requirements. All factual information can be digitized and all sensory output, whether biological or otherwise, can probably be digitized too. Nevertheless, these definitions usefully point out that data can be analyzed (definition 1), experienced (definition 2) or moved around and processed (definition 3).
From The Dawn of Computing
In the early years, the commercial pay-off from computers was in doing calculations on “structured data,” often in accounting systems, banking systems and insurance systems. Computer data was simple; numeric or alphabetic. There was structured data and there were programs. Programs, which “did all the thinking”, were data too. And the relationship between programs (the smart stuff) and ordinary data was horribly intimate. The program knew the structure of the data and the data files usually held no trace of their own structure.
That was such a bad idea for so many reasons that it soon got fixed. The solution was called a “database.” This introduced a third kind of data, called metadata; the data that defined the structure of a data file (or table.) And that was the only data we had, until those inconvenient IT users barged into the world of computing. We had kept them quiet with endless reams of printed reports – using structured data exclusively, but once the PC appeared and acquired a GUI, the game was up.
The Empowered Ones
Those greedy PC users were mad for every kind of data you can imagine: text, graphics, photos, sound, video. They didn’t just want to update databases, they wanted to write letters, manage spreadsheets, draw diagrams, mess with photos and even play video games. There was no holding them back. You might think this created many other kinds of data, and it did, but technically, it simply created more data types or – a mild innovation – collections of data types.
Sound is digitized in a different fashion to an image and that’s different to text. When you digitize information in a new way, you create a new data type. If you bag several data types together, you create a collection. So with the PC there was a massive growth in data for direct consumption by people, but technically, there wasn’t much that was new in the way of data. But something new and important had been added; mark-up instructions. These were instructions on how to present data, on a screen or on the printed page. They were program instructions of a kind; ones that lived with the data. So the raw data, whether it was text or an image, was accompanied by both metadata (defining the data types) and usage information.
Passing It Around
We can complete the whole story by adding in communications, which really means communication between one program and another. Physically, data may be a stream of photons when it is passed as light down a fibre optic cable and it may be waves of electromagnetic radiation when it flies through the ether, but its always a stream of electrons that define digitized data when it eventually meets a program.