Data Layoutand Access Patterns:Column stores physically partition the databaseinto vertical columns rather than in rows. In the above employee example, therow oriented database approach all the data for a separate entity will bestored together whereas in column approach all the names, id’s etc. will besaved together. This approach of saving data enables less processing as thequeries need to only search for the required attribute rather than reading allthe attributes and then discarding the unwanted.
Trade Offs:Along with the features, there are also someinteresting trade-offs. For example, if a query wants to access only a singlerecord, row based system will be faster, since it will need to go through onlyone records. But the column approach will need to search all the content forall the column to find all the data for that record. But as soon as the numberof records involved increase, column oriented model is a better fit. For thesame reason, column model is best for analytical systems where a lot of data needsto be accessed to analyse it. Column Architecture:Other than the vertical partitioning feature, thereare many other architectural features. The architecture is designed to maximizethe performance on Analytical workloads.
· Virtual IdsA distinct id is given to all the records inevery column to distinguish them from each other. · Block Oriented and Vectorized processingIt is easy to iterate through cache sizedblocks of data. · Late materializationIt means that data stores not only store datain the form of columns, they also process data column by column. · Column specific compressionStoring data from the same attribute together,data stores can achieve higher compression ratios.· Direct operation on compressed dataWorking with compressed data saves a lot ofprocessing speed. · Efficient join implementationsMore efficient joins can be designed to displaydata when it is stored in the form of columns. · Redundant representation of individual columnsin different sort ordersIt would take less effort to sort data on thebasis of columns.
· Database cracking and adaptive indexingAdaptive indexing or rearranging is done easilywhenever a query accesses the column. The data is left usable for the nextquery. · Efficient loading architecturesOne concern about column databases is that theymay have speed issues when loading or updating since the data is stored incompressed form. Figure 1 Performance of C-Storeversus a commercial database system Contrast with standard row based storage Figure2 Row oriented database is write-optimisedwhereas column oriented database is read-optimised. Write is not as commonfunction to be performed as read in large database. Figure 1 demonstrates thephysical differences in architecture of both the column based and traditionalrow based database models. While row oriented databases may be faster forsystems that involve lot of transitions i.
e. OLTP, since it stores all the datafor an attribute together, it is easier to retrieve that. Column databasesprovide better solutions for the analytical systems i.e. OLAP.
Columndatabases support datasets to be stored column by column rather than row by rowin row based databases. Abadi, Madden and Hachem did a research that proved thefact that column-stores were faster than row-stores when reading large datasetsoptimised for analysis. This conclusion was made on four main advantages thatrevealed themselves in experiments: late materialisation, block iteration,compression and invisible joints.Thedifferentiating feature of this database model is that the data is stored incolumns in place of rows. Data is stored in long columns with the correspondingserial numbers to their link in other columns.
So, let’s suppose if retrievalof data from only one or two columns, the query needs to check only relevantcolumns. On the other hand, if same was needed to be achieved with the roworiented database system, it would take a lot more queries to be performed togather the required data. Thecolumn oriented approach is read optimised. Let’s suppose an employee table has10 columns namely: emp_id, name, rank, post, salary, age, experience,qualification, marital_status, no_children.
But we frequently access only threecolumns: (e.g. emp_id, name, rank), so there is no need to read the data in allthe irrelevant columns. Instead, we can get the required information by justperforming queries on the 3 columns instead of all the 10 columns. This saves alot of processing especially when working with huge datasets. Sincedata is stored and read from the memory in blocks, a single block that holdsdata for the employee table has the data for a single column in columnoriented database systems and the data for a row or entity in row orienteddatabase systems. In the above example since we assumed that the three columnsi.
e. name, id and rank need to be accessed repeatedly, a column orienteddatabase system will only access the three block that contain the mentionedcolumns. Present statusInthe recent years there has been continuous interest in the column orienteddatabase is also called as column Stores. Initial efforts were taken by theeducational systems such as Monnet dB, vector wise, etc. This trend wasfollowed by the commercial industry. In fact, by the end of 2013 all major setups (IBM, Microsoft and oracle to name a few) had shifted to column orienteddatabases following this trend. Someother products like Aster data, Greenplum, info bright, Paracel et cetera havebenefited from the column approach of storing data.
Oracle, Exadata have alsoIncorporated column approach in some way in their products. These products have been beneficial over thetraditional row based approach in Data Analytics and compression. Future prospectsDespite success of column based systems incommercial and educational industries, there is a huge scope for futureresearch in many interesting directions. There may be work on hybrid systemsthat are partially column oriented.
Systemsthat choose between row and column oriented databases adapting to therequirements will become really important. These systems will guide the users to decide which approach will be bestsuited for their requirements. Microsoftannounced columnar storage with advanced query processing for its SQL serverproduct. It doesn’t yet have a full functionality but it is a step in thisdirection. It is expected that these ideas will be implemented on all the otherplatforms in the near future.