A nightmare scenario

You are at a large library to find a book you need to finish an important term paper. You are feeling tense.

Image courtesy of Surachai at FreeDigitalPhotos.net

You have not visited this particular library before so you go to the desk to ask for assistance in finding your book. The librarian stares at you blankly, listens to your request then says: “How am I supposed to know where it is? You’ll have to find it yourself.” The librarian turns away. You look around and realize there are no computer search terminals, no card-files. As you walk over to the first shelf you notice that there are no sections marked, no category signs. You start scanning the titles of the books on the first shelf: “Moby Dick” by Herman Melville, “Think and Grow Rich” by Napoleon Hill, “100 Vegetarian Recipes”. You realize with a growing sense of panic that the books are shelved randomly. You look around and the shelves seem to stretch to infinity in all directions -- hundreds of thousands, millions of books. You awake with a start in a cold sweat.

Which came first, the data or the database?

When I taught database design, I liked to start my first class with the library story, as a library is a good metaphor for a database. Like the nightmare scenario, you soon realize that a library is not just a collection of books. A library is a system of categorizing and filing books so they can be easily accessed and then returned to their correct location on the shelf. A system is in place – whether it is the Dewey Decimal System or the Library of Congress Classification – before the first book hits the shelf and the system is used from then on to locate and reshelve books, and to categorize and place all new arrivals to the library.

In our example, the books represent the data and the library’s filing system represents the database. For this reason, database software is often referred to by the acronyms DBMS (Database Management System) or RDBMS (Relational Database Management System). The database system must be in place before the first piece of data is entered.

Lots of data; good and bad

Just as a large book collection can be nice to have, so it can be nice to have lots of data. Problems arise when your personal book collections grows to the point where you cannot remember where all the books are located. If you cannot find the book you need quickly enough it may as well be lost.

Many organizations start off with relatively small amounts of data: customers, suppliers, employees. Once the organization grows past a certain point it possesses too much data to search through manually or to recall by memory. A system for storing the data for quick retrieval later is required: a database. Existing and ongoing data in various formats can be imported and in the process categorized and indexed. Enhancements to searches in the form of stored procedures and queries can be added for reporting and analysis purposes.

Databases are not without costs, even if they are open source. Just like that addition required to the local library, storage space must eventually be found for your growing collection of data, whether storage space is rented from a cloud provider or purchased in the form of disk space in your own data centre. Also, once your data collection grows to a certain point you will probably have to hire “librarians” to keep everything in order.

Too much information

There are times when it makes no sense to store information in a database, at least not in your own. Often the information you need -- stock prices, currency exchange rates, weather forecasts – can be looked up elsewhere, either live or stored in someone else’s database.

We are currently at the very beginning of the era of Big Data and The Internet of Things. New ways are being devised to connect to and search huge amounts of stored and live data for predictive analysis purposes.

Where to from here?

Hopefully the library example has given you a new or different view about databases and what they are. You may be ready to start organizing your existing data in more efficient ways which may include database software. That is just the beginning. Even imported into a database table, data that is unindexed and therefore not efficiently searchable is still referred to as a “heap”.