For over 30 years the good old RDBMS was the workhorse in data management applications. Traditional RDBMS have excelled in managing data for vastly different applications ranging from high volume transaction systems to analytical and business intelligence to scientific research. Herein lie their strengths and also their potential weaknesses. In the same way that a dump truck (albeit very useful in construction) will not win a Formula 1 race a RDBMS has difficulty excelling in specialized data management problems. Massive growth in data that needs to be managed, the types of data (structured vs. unstructured vs. multi media content) and ever increasing sophistication in analyses and use of advanced statistical algorithms has led way to a new wave of innovation in data management. This entry is the first in a series of posts where I will introduce different architectural approaches to data management and examine the vendors, products and approaches that are emerging on the market. I will compare their usage patterns, strength and weaknesses and link these technologies back to business capabilities that are enabled by these technologies. I will also keep my eye on new announcements in the data management space and provide a hopefully usable digest of features and benefits.
So let’s begin in this post to layout the landscape of commercial offerings and their basic tenets:
Traditional RDBMS – Oracle, DB2, SQL Server, MySQL
Very mature technologies that excel at mixed workload, high volume transactional systems and medium volume analytical systems. Work best if managed well and can sometimes have unpredictable response times due to different I/O and CPU requirements for different query types. Can be deployed on a wide range of hardware architectures but therefore often fail to take advantage of specialized hardware strength. Traditionally have been developed for SMTP systems but have evolved to be able to support MPP deployments albeit requiring very careful planning and operations to work well in MPP settings
Data Warehouse Appliances – Teradata, Neteeza, DATAllegro (now Microsoft), HP Neoview
Usually a highly integrated bundle of storage, MPP processing nodes, fast I/O buses, a query coordinator / access server that excels at analyzing high volume tables such as fact tables in large data warehouses and the underlying OS to run it all. The all in one design, use of specialized hardware such as Field Programmable Gate Arrays (think super fast stream processing for filtering and projecting) and their ability to parallelize queries optimally over I/O and processing nodes make them achieve throughputs that are orders of magnitude higher than an RDBMS on large data streams. Another advantage for analytical processing can be that the no index design reduces in some cases I/O and storage requirements and makes the administration of the appliance much easier. Classic disadvantages may be that if the typical query results return a small fraction of a very large table the full scan approach of appliances is very inefficient.
Special Use Data Management
Special use databases cover a wide range including text and document search engines, XML processors, scientific and statistical applications, real time and streaming databases, pattern matching and intelligence applications, multi media storage, search and streaming and many more. Each and every one of these applications requires specialized approaches, architectures and processing. I will cover these areas of data management on an ad hoc basis when something comes up that I think should be discussed in terms of business applicability or an opportunity to create breakthrough service levels that were formerly unattainable using more traditional approaches.
Now that we have a high level overview of the space I am looking forward to going into more details about the different products, architectures and use cases in many posts to follow.

