Saturday, August 9, 2008

Data Book: Data, Information, Knowledge, and Wisdom

Data, Information, Knowledge, and Wisdom
In the computer field, we sometimes use data and other related terms like information or knowledge in a loose way, as if they were synonyms that are interchangable. Yes, the terms are related; but no, they don't mean the same thing at all. You'll get the most out of this book if we make a point of clearing this up right here and now before going any further.

Let's start with data. Data is nothing more than a series of symbols or bits of electronic storage. That storage can live in a few different places: It can live in the memory of a computer, where it is being consumed, produced, or merely held in storage on behalf of a running program. Data can also live in persistent storage, such as on a disk drive, DVD, or backup tape. Data can also be represented in messages flowing over a network connection from one computer to another. Lastly, data can also be represented in non-electronic form, such as a printed page, bar code label, or engravings in marble. From these examples it can be seen that data has a lifetime, which can range from fleeting microseconds to centuries.

A key, central thing to understand about data is that there is nothing--absolutely nothing at all-- in those symbols or bits of computer storage that have any kind of intrinsic meaning. A series of bits in computer storage is much like a code or cipher: both the creater of the data and the consumer of the data must have the same expectations of how to understand the data in order for it to convey any meaning. Data, then, has no measurable meaning in and of itself, but there is potential meaning to be derived from the data if it comes into the hands of a recipient with the right expectations.

If data itself has no intrinsic meaning, the best way to think of it is as a vehicle to convey information from one entity to another. Like speech, data is just a way for a sender to encode information. Like writing, persistent data is a way for a sender to encode information for longer periods of time and for multiple recipients.

Notice the word information has crept into the above paragraph as we start to talk about how data is used. If two or more human beings want to exchange information, and they lack telepathy, they are forced to use a means of encoding information as data that both parties are familiar with. While we have multiple means of doing this (speech, writing, sign language, e-mail), all serve the same purpose: turning information into a data representation using some encoding method, and transmitting that data to a recipient who reverses the process, gleaning information from data.

But what, exactly, is information? If we're just thinking about human beings, it can be easy to jump the gun and start talking about all the things that go on in person's mind. If we keep in mind that the producers and consumers of data can also be machines, we need to be more careful. If data is sharing something, and the entities that send/receive data may or may not be people, what is it that is being shared? Facts or claims, which is what we mean by information. Information includes things like the current outside temperature; the content of a novel; the code for a software program; content such as images, audio, or video.

We've already established that the encoding scheme for data needs to be understood by both the producers and consumers in order to successfully share information. However, understanding the encoding scheme is not sufficient in and of itself. 3 major factors that affect one's ability to turn data to information are awareness, context, and precision.

Awareness refers to whether or not the intended recipient of the data becomes aware that the data is available. An important message on your answering machine will never come to your attention if you never check your messages.

Context means understanding the encoding scheme for data is not necessarily all you need to reconstruct the information. While some data does provides all of the contextual information you need, that is not always the case. You might know enough to confidently decode some data as a series of numbers, but is 4-8-15-16-23-48 a bank account number in Switzerland or the winning numbers for a lottery? The context you need to put meaning to data may require out-of-band information, such as knowing who delivered the data or when the data arrived.

Precision refers to how accurate the encoding of information into data and back is. A notation such as the English language can be rife with ambiguity; does "the seal" refer to a stamp or an animal? It may or may not be clear from the rest of the data and its known context. Precision can also refer to amount of detail. One could render the arithmetic value Pi as a word or symbol, or as a decimal number. The decimal number would of course be imprecise since Pi has an infinite number of digits.

Putting this all together, information is a fact or claim. People (or other entities) share information by encoding it, conveying it and decoding it in the form of data. The success of this process is dependent on awareness, context, and precision.

The next step up from information is knowledge. Knowledge is the ability to take certain kinds of information and produce something new from it. For example, a chef can take a list of available ingredients as input and produce as output one or more recipes for dishes that can be made using those ingredients. A weatherman can take information on temperate, wind speed, and weather patterns and attempt a prediction of future behavior. A calculator can turn a sequence of numbers and operators into an arithmetic result. A medical expert system can suggest diagnoses based on patient symptoms. Credit agency software can detect fraudelent patterns in credit card usage information.

Knowledge, then, is an ability to start with information that fits a known pattern and produce or derive something more from it. A software program is away to equip a computer with knowledge.

Wisdom is an understanding of underlying principles, the principles from which knowledge is derived.

Machines can't have wisdom, it is uniquely the province of human beings and higher beings. However, wisdom can be exported to the lower levels of the data-information-knowledge-wisdom ladder in the form of rules. The Ten Commandments in the Bible are one of the greatest examples of this. A software program containing business rules is another example; it may have taken a great amount of wisdom to come up with the rules; but once established, they can be put to use broadly.

Integrating Data, Information, Knowledge, and Wisdom
These 4 terms are different but related.
  • Data is the lowest level of these terms: it's merely a sequence of symbols who serve as a way for parties to share information.
  • Information is facts or claims, which can be shared among parties by encoding them as data.
  • Knowledge is an ability to make something more from a pattern of information. Knowledge can be innate or learned.
  • Wisdom is an understanding of underlying principles; a computer can't experience wisdom, but it can execute rules or logic that embodies wisdom.
How is all of this relevant to software developers? For one thing, you won' t make the mistake of confusing what your software programs can accomplish vs. what human beings should be responsible for. All software systems are actually a partnership between human beings and computers, where the roles and relationship between the two are precisely defined by where there is common ground between them and where there is not. Thus, we expect wisdom from human beings to shape the knowledge we embed in computer programs. Programs must be able to apply their knowledge to information, and to convert information to and from data in order to share it.


I like a lot of what this article has to say about defining data, information, knowledge, and wisdom and how they relate to each other.

No comments: