Author Archives: admin

Dr. Michael Stonebraker

Just finished listening to an interview with MIT professor Dr. Michael Stonebraker explaining why traditional databases are obsolete.

The interview is available here on the Software Engineering Radio site:

http://www.se-radio.net/episode-199-michael-stonebraker/?goback=%2Egde_73235_member_5829977148491862019#%21

Seventeen years ago, as a newbie professor at a private technical college I was thrown into teaching database fundamentals — a theory course that no other professor appeared to want (an attitude reflected in many of my students it seemed). Preparing for the course I poured over my own undergrad notes and reviewed the seminal work of Dr. E. F. Codd and his paper from the early 1970’s that eventually formed the foundation of relational databases as we know them today. Everything that I have learned about databases since then has adhered to Dr. Codd’s model — until now.

Dr. Stonebraker is a Dr. Codd for the 21st Century and in his hour long interview explains why databases have to move on from their traditional row-based model to one that more easily handles the vast amounts of data being generated and used today. Dr. Stonebraker is not out to tear down Dr. Codd’s work, but rather to build on it as someone who has spent decades working with something that was just a theory in Dr. Codd’s day. During the interview he uses solid practical examples drawn from modern business models. He comments on the main players in what is coming to be known as NewDB. He also plugs his own start-up company VoltDB.

Overall, the effect is both exciting and scary for traditional database practitioners.

Starting small with Big Data

It’s difficult to read a tech blog or IT newsletter lately that does not mention “Big Data”. Definitions of what Big Data really is can be hard to come by. One of the best I have found is by Patrick Schwerdtfeger and is available on YouTube at http://www.youtube.com/watch?v=c4BwefH5Ve8.

A simple explanation is that Big Data is the third wave of computer data.

The first data wave was data entered manually into relational databases and then queried later. This is still going on judging by the number of “data entry” positions available.

The second data wave is also still with us and is what most people think of when they think of data and databases. This is data that accumulates as a by-product of other computerized processes such as accounting or CRM. It may be imported from text files or spreadsheets. It might be the result of adding the sender of an email to your contacts list or posting something to your Facebook account. The amount of data accumulated can be enormous and often must be stored in “data warehouses” for reporting purposes.

The third data wave is a veritable tsunami of automatically generated “machine data”: cell phone logs, online reservations, GPS locations, CCTV records. This data is all around us all the time and is constantly changing. It is “live” data that is too big to store in any one place but must be accessed and utilized on the fly in other ways.

If you have a website and want to begin taking advantage of Big Data, a good start would be setting up and using Google Analytics. It is a free service that provides amazingly in-depth information about who is visiting your site, how long they stay, whether they return, where they are from, what browser and operating systems they use, even what their screen resolution is. With some practice you can create custom reports that allow you to judge the efficacy of your individual web pages and marketing campaigns, that is if the available standard reports and dashboard views are not enough.

Your company does not have to be big to use Big Data; in fact Big Data can empower smaller firms that do not possess the resources for large data centres and data warehouses.

Database Application Fundamentals: ANSI/SPARC architecture

ANSI/SPARC architecture is a design strategy for relational database applications that separates database applications into three components:

– Physical Data Storage
– A Database Management System
– User Views

Physical Data Storage refers to the actual hard data — the bits and bytes of digital information written to a device like the hard-drive in the server. This storage is managed by the computer’s operating system and typically arranged in “files” and other types of pre-allocated space on the disk.

Data stored in digital format would be useless without a means of cataloging it for later retrieval. The database management system, or DBMS, is equivalent to the card-file system at a library, allowing users to quickly search for and find information based on various search criteria. The system also allows the librarian the all-important task of filing new or modified information in the appropriate place so that it can be easily located later.

Finally, User Views are the part of your database application that users see and interact with — the so-called “front-end” or graphical user interface (GUI). Graphical user interfaces could include windows and menus on the computer screen as well as output sent to other devices such as printers and fax modems.

The purpose of maintaining this functional separation is flexibility and scalability. Any of these three components can be modified or replaced with minimal disruption to the others. When an application is created based on the principals of ANSI/SPARC architecture, it is fairly easy to create a new interface for existing data, or even use more than one interface simultaneously. For example, at the office you could connect to your data using an application running on your network while your salespeople and even customers and suppliers could be connecting to the same data using a laptop and a modem from a hotel room or a web-page running on a browser at another office on the other side of the world.

Similarly, the actual stored bits and bytes of data can be transferred to a larger, more powerful server that may even (depending on your choice of DBMS) be running using a different operating system with minimal disruption to the application the user sees on the monitor.

Business Software: Rent, Buy or Build

My older sister worked for many years as a costume designer in the movie business (or “film” as they prefer to be called).

Despite the artsy veneer, the movie business is a business and a very bottom line driven one at that. In the costume department they had a very simple rule:

“If you can rent it, then rent it. If you can’t rent it, buy it. If you can’t rent or buy it, then build it.”

The same thinking can be applied to the software you use to run your business.

Custom-coded software is hands down the most expensive and time-consuming way to go. Even larger organizations can seldom justify re-inventing the wheel when the same idea has been covered by dozens or even hundreds of others.

Even buying software tools can seem questionable when the licensing costs are as high as the obsolescence rate.

“Renting” software on a monthly basis can make a lot of sense. Monthly billing satisfies accounting’s “time principle” and requires no capital cost allowance calculations at tax time. If you need more, rent more. If you need less or the product turns out to be crap, then simply stop renting. Many software rental plans include cloud storage by definition and allow access from anywhere and any machine with just a web connection.

Something to think about the next time you consider shelling out hundreds or even thousands of dollars for software…

XBRL

This wasn’t even on my radar until the senior accountant at work asked about creating xbrl files to submit information to an Ontario government agency. Seems that xbrl files have been around since 1998. Where have I been?

XBRL is an acronym for eXtensible Business Reporting Language. It is an xml variant designed to standardize financial report submission. It is already required by many government agencies and financial bodies around the world, including the SEC in the US and just about every government department in Australia.

Although there is software available to aid in the creation of xbrl files or “instance documents”, most tools seem to be geared more toward the creation of custom “taxonomies” (basically dsd files).

I plan to see if I can create xbrl output files directly from SQL Server in a manner not unlike outputting data-driven html source code from MySQL using php’s “echo” function.

I’ll keep you posted…