The content strategy community – and other digital service communities – have repeatedly debated the meanings of the terms data, information, content and metadata. A core difficulty is that any definition of one almost invariably uses another as its starting point. Such cross-referencing inherently confuses.
Following is a set of definitions that remove any ambiguity from the terms.
Data are any values, attributes or meanings that can be identified or measured in some form. The datum is the natural state of this value – its existence. A line has length, whether measured or not. Everything is made up of data.
The representation of basic data – the form in which it is stored so we can use it – while technically metadata, can be considered as raw data.
Information is a set of data.
But information is not absolute. In his excellent work, The Information, author James Gleick describes information as subjective. What is, and what is not, information depends on the person (or system) receiving the data. This definition of information derives from its etymology: that which informs.
Information is that portion of a data set that provides its recipient with relevant meaning or knowledge. Any data that does not achieve this goal is not information: it is noise. To qualify as information, the provided data must have value to the recipient.
As with information, content is a set of data. Unlike information, content is defined objectively: it is the set of data used to deliver a particular message.
Every aspect of the message we deliver to our audience is content. This include the words, images and other media we present, the visual packaging of those elements, and the behaviour and timing of that delivery.
While content’s basic scope is easily defined, there are further distinctions:
- The content pool is the set of all content we have available to make available to the audience, whether used or not.
- Delivered content describes the subset of the content pool used in a particular communication.
- Subject matter
Subject matter describes any data that could constitute content or information, within the scope of a particular communication. The subject matter’s scope depends on both the communicator’s perception of significance, and the recipient’s.
Metadata is data about data. The recording of existing data, usually in a form that makes it more useful, technically constitutes metadata. This complete definition is not particularly useful.
For practical purposes within digital communications, the term metadata is generally restricted to identify two sub-types:
- That used in the process of serving content, to determine what to serve; and
- Those data served as part of the content to be processes by machines, either to display the content, or understand it.
Overlapping information and content
Effective communication requires that one understand the distinction between information and content, and use that knowledge to optimise the content provided to one’s audience.
Our aim is to communicate meaningfully with our audiences. We want the content we deliver in each transaction to align as closely as possible with the specific audience’s definition of information. The alternatives are both equally unappealing:
- To fail to serve the audience’s desire for information, leaving the recipient of our content unfulfilled, or
- To contaminate the information we serve to the audience with noise.
What we think the audience should or should not receive is irrelevant to the discussion. As the party defining what constitutes information, what is noise, and what is missing, the audience is the dominant actor in this transaction.