Just finished listening to an interview with MIT professor Dr. Michael Stonebraker explaining why traditional databases are obsolete.
The interview is available here on the Software Engineering Radio site:
Seventeen years ago, as a newbie professor at a private technical college I was thrown into teaching database fundamentals — a theory course that no other professor appeared to want (an attitude reflected in many of my students it seemed). Preparing for the course I poured over my own undergrad notes and reviewed the seminal work of Dr. E. F. Codd and his paper from the early 1970’s that eventually formed the foundation of relational databases as we know them today. Everything that I have learned about databases since then has adhered to Dr. Codd’s model — until now.
Dr. Stonebraker is a Dr. Codd for the 21st Century and in his hour long interview explains why databases have to move on from their traditional row-based model to one that more easily handles the vast amounts of data being generated and used today. Dr. Stonebraker is not out to tear down Dr. Codd’s work, but rather to build on it as someone who has spent decades working with something that was just a theory in Dr. Codd’s day. During the interview he uses solid practical examples drawn from modern business models. He comments on the main players in what is coming to be known as NewDB. He also plugs his own start-up company VoltDB.
Overall, the effect is both exciting and scary for traditional database practitioners.
It’s difficult to read a tech blog or IT newsletter lately that does not mention “Big Data”. Definitions of what Big Data really is can be hard to come by. One of the best I have found is by Patrick Schwerdtfeger and is available on YouTube at http://www.youtube.com/watch?v=c4BwefH5Ve8.
A simple explanation is that Big Data is the third wave of computer data.
The first data wave was data entered manually into relational databases and then queried later. This is still going on judging by the number of “data entry” positions available.
The second data wave is also still with us and is what most people think of when they think of data and databases. This is data that accumulates as a by-product of other computerized processes such as accounting or CRM. It may be imported from text files or spreadsheets. It might be the result of adding the sender of an email to your contacts list or posting something to your Facebook account. The amount of data accumulated can be enormous and often must be stored in “data warehouses” for reporting purposes.
The third data wave is a veritable tsunami of automatically generated “machine data”: cell phone logs, online reservations, GPS locations, CCTV records. This data is all around us all the time and is constantly changing. It is “live” data that is too big to store in any one place but must be accessed and utilized on the fly in other ways.
If you have a website and want to begin taking advantage of Big Data, a good start would be setting up and using Google Analytics. It is a free service that provides amazingly in-depth information about who is visiting your site, how long they stay, whether they return, where they are from, what browser and operating systems they use, even what their screen resolution is. With some practice you can create custom reports that allow you to judge the efficacy of your individual web pages and marketing campaigns, that is if the available standard reports and dashboard views are not enough.
Your company does not have to be big to use Big Data; in fact Big Data can empower smaller firms that do not possess the resources for large data centres and data warehouses.
ANSI/SPARC architecture is a design strategy for relational database applications that separates database applications into three components:
– Physical Data Storage
– A Database Management System
– User Views
Physical Data Storage refers to the actual hard data — the bits and bytes of digital information written to a device like the hard-drive in the server. This storage is managed by the computer’s operating system and typically arranged in “files” and other types of pre-allocated space on the disk.
Data stored in digital format would be useless without a means of cataloging it for later retrieval. The database management system, or DBMS, is equivalent to the card-file system at a library, allowing users to quickly search for and find information based on various search criteria. The system also allows the librarian the all-important task of filing new or modified information in the appropriate place so that it can be easily located later.
Finally, User Views are the part of your database application that users see and interact with — the so-called “front-end” or graphical user interface (GUI). Graphical user interfaces could include windows and menus on the computer screen as well as output sent to other devices such as printers and fax modems.
The purpose of maintaining this functional separation is flexibility and scalability. Any of these three components can be modified or replaced with minimal disruption to the others. When an application is created based on the principals of ANSI/SPARC architecture, it is fairly easy to create a new interface for existing data, or even use more than one interface simultaneously. For example, at the office you could connect to your data using an application running on your network while your salespeople and even customers and suppliers could be connecting to the same data using a laptop and a modem from a hotel room or a web-page running on a browser at another office on the other side of the world.
Similarly, the actual stored bits and bytes of data can be transferred to a larger, more powerful server that may even (depending on your choice of DBMS) be running using a different operating system with minimal disruption to the application the user sees on the monitor.
My older sister worked for many years as a costume designer in the movie business (or “film” as they prefer to be called).
Despite the artsy veneer, the movie business is a business and a very bottom line driven one at that. In the costume department they had a very simple rule:
“If you can rent it, then rent it. If you can’t rent it, buy it. If you can’t rent or buy it, then build it.”
The same thinking can be applied to the software you use to run your business.
Custom-coded software is hands down the most expensive and time-consuming way to go. Even larger organizations can seldom justify re-inventing the wheel when the same idea has been covered by dozens or even hundreds of others.
Even buying software tools can seem questionable when the licensing costs are as high as the obsolescence rate.
“Renting” software on a monthly basis can make a lot of sense. Monthly billing satisfies accounting’s “time principle” and requires no capital cost allowance calculations at tax time. If you need more, rent more. If you need less or the product turns out to be crap, then simply stop renting. Many software rental plans include cloud storage by definition and allow access from anywhere and any machine with just a web connection.
Something to think about the next time you consider shelling out hundreds or even thousands of dollars for software…
It seems that not everyone likes the idea of Microsoft’s new Report Builder 3.
Sure, it’s loaded with features like shared datasets, an Office-like interface, and wizards that make it easy for power users to make the move from spreadsheets to ad hoc reports that they design themselves. But it seems this can cause some job-security angst amongst uneasy IT people used to holding the keys to the kingdom and doling out half-baked reports to the data-starved masses.
There’s really no need for concern, as the shared datasets still need the skills of a dba or developer familiar with the back-end data and all its esoteric table and column names and datatypes. In fact, freedom from the chore of mundane report creation might actually mean that dba and developer skills can be applied where they are needed in more creative database applications and automation projects.
Here’s hoping that requests from consumers of data for Report Builder 3 grow in intensity and persistence until all the political roadblocks at your workplace are flushed away.
This wasn’t even on my radar until the senior accountant at work asked about creating xbrl files to submit information to an Ontario government agency. Seems that xbrl files have been around since 1998. Where have I been?
XBRL is an acronym for eXtensible Business Reporting Language. It is an xml variant designed to standardize financial report submission. It is already required by many government agencies and financial bodies around the world, including the SEC in the US and just about every government department in Australia.
Although there is software available to aid in the creation of xbrl files or “instance documents”, most tools seem to be geared more toward the creation of custom “taxonomies” (basically dsd files).
I plan to see if I can create xbrl output files directly from SQL Server in a manner not unlike outputting data-driven html source code from MySQL using php’s “echo” function.
I’ll keep you posted…
I’m just discussing my sister’s website with her and we have decided to try a “blended” approach using normal “test and ftp” content and some newer features such as WordPress.
I’m guessing that once she starts blogging she won’t stop.
It seems that Microsoft has gone FLAC (Four Letter Acronym Crazy) with its enterprise database products. What used to be Enterprise Manager for SQL Server has become SQL Server Management Studio (SSMS). In addition there is SQL Server Reporting Services (SSRS) and SQL Server Integration Services (SSIS).
In my new position I spend most of my time using the first two – SSMS to test queries and create database objects and SSRS to create reports based on those queries. We are in the process of moving everything from Seagate Crystal Reports 7 into SSRS. Comparing the two side-by-side shows how far things have progressed in the last few years. One of my favourite features is report parameters. If you create a date parameter, for example, the report automatically displays a perpetual calendar control to the user to select a date. Exporting to Excel (the most common user request) is complicated by the XML format of reports, but in general it works well.
SSIS has a bit more of a learning curve. The last time I used its predecessor DTS was in Version 7. The scripting capabilities are much improved, allowing VB.NET and C# coding complete with Intellisense. There is a wealth of information available online to help me along as I gradually automate all the mundane daily processes so I have more time for reporting.
Soon, I’ll get to tackle the last of the four letter acronyms: SSAS (SQL Server Analysis Services). I’ll let you know how it goes.
I have a week off before starting a new job as a database analyst for a credit union in Kitchener.
I have spent the last couple of weeks reviewing Microsoft SQL Server Reporting Services and am very impressed so far. I used a crude version of Reporting Services many years ago with VB 6 and I must say that the most recent version rivals and/or surpasses any reporting tool I have used to date from Access reports to PowerBuilder data windows to Crystal Reports.
I am really looking forward to working with data instead of students for a change.
I just finished delivering a four week course on using ASP.NET for E-Commerce. I was a little bit rusty. The last time I taught ASP.NET was three years ago at Centennial College. Thankfully, Visual Studio came to the rescue as usual and I was soon creating connection strings and dragging and dropping with ease. The inventor of Microsoft’s “Intellisense” feature should win the Nobel Prize for computing if there is such a thing.
I can still remember coding an entire desktop database application using VB 4 and having to memorize all variable and object names. The trick then was to name things in mixed case and then type all your code in lower case and watch for your code to change case as you went to each new line, verifying that object and variable names were spelled correctly and would be recognized during compilation. Now C# code almost writes itself, complete with drop-down argument lists for overloaded functions. Eclipse has a similar feature and Dreamweaver has “code-hinting” but neither seem to work as seamlessly as Intellisense does for me. I have taken to using Visual Studio to write and edit HTML and XML files because of it.
The course used the 2008 version. Haven’t had a chance to try 2010, but it should be more of the same…