and Access Patterns:
Column stores physically partition the database
into vertical columns rather than in rows. In the above employee example, the
row oriented database approach all the data for a separate entity will be
stored together whereas in column approach all the names, id’s etc. will be
saved together. This approach of saving data enables less processing as the
queries need to only search for the required attribute rather than reading all
the attributes and then discarding the unwanted.
Along with the features, there are also some
interesting trade-offs. For example, if a query wants to access only a single
record, row based system will be faster, since it will need to go through only
one records. But the column approach will need to search all the content for
all the column to find all the data for that record. But as soon as the number
of records involved increase, column oriented model is a better fit. For the
same reason, column model is best for analytical systems where a lot of data needs
to be accessed to analyse it.
Other than the vertical partitioning feature, there
are many other architectural features. The architecture is designed to maximize
the performance on Analytical workloads.
· Virtual Ids
A distinct id is given to all the records in
every column to distinguish them from each other.
· Block Oriented and Vectorized processing
It is easy to iterate through cache sized
blocks of data.
· Late materialization
It means that data stores not only store data
in the form of columns, they also process data column by column.
· Column specific compression
Storing data from the same attribute together,
data stores can achieve higher compression ratios.
· Direct operation on compressed data
Working with compressed data saves a lot of
· Efficient join implementations
More efficient joins can be designed to display
data when it is stored in the form of columns.
· Redundant representation of individual columns
in different sort orders
It would take less effort to sort data on the
basis of columns.
· Database cracking and adaptive indexing
Adaptive indexing or rearranging is done easily
whenever a query accesses the column. The data is left usable for the next
· Efficient loading architectures
One concern about column databases is that they
may have speed issues when loading or updating since the data is stored in
Figure 1 Performance of C-Store
versus a commercial database system
Contrast with standard row based storage
Row oriented database is write-optimised
whereas column oriented database is read-optimised. Write is not as common
function to be performed as read in large database. Figure 1 demonstrates the
physical differences in architecture of both the column based and traditional
row based database models.
While row oriented databases may be faster for
systems that involve lot of transitions i.e. OLTP, since it stores all the data
for an attribute together, it is easier to retrieve that. Column databases
provide better solutions for the analytical systems i.e. OLAP.
databases support datasets to be stored column by column rather than row by row
in row based databases. Abadi, Madden and Hachem did a research that proved the
fact that column-stores were faster than row-stores when reading large datasets
optimised for analysis. This conclusion was made on four main advantages that
revealed themselves in experiments: late materialisation, block iteration,
compression and invisible joints.
differentiating feature of this database model is that the data is stored in
columns in place of rows. Data is stored in long columns with the corresponding
serial numbers to their link in other columns. So, let’s suppose if retrieval
of data from only one or two columns, the query needs to check only relevant
columns. On the other hand, if same was needed to be achieved with the row
oriented database system, it would take a lot more queries to be performed to
gather the required data.
column oriented approach is read optimised. Let’s suppose an employee table has
10 columns namely: emp_id, name, rank, post, salary, age, experience,
qualification, marital_status, no_children. But we frequently access only three
columns: (e.g. emp_id, name, rank), so there is no need to read the data in all
the irrelevant columns. Instead, we can get the required information by just
performing queries on the 3 columns instead of all the 10 columns. This saves a
lot of processing especially when working with huge datasets.
data is stored and read from the memory in blocks, a single block that holds
data for the employee table has the data for a single column in column
oriented database systems and the data for a row or entity in row oriented
database systems. In the above example since we assumed that the three columns
i.e. name, id and rank need to be accessed repeatedly, a column oriented
database system will only access the three block that contain the mentioned
the recent years there has been continuous interest in the column oriented
database is also called as column Stores. Initial efforts were taken by the
educational systems such as Monnet dB, vector wise, etc. This trend was
followed by the commercial industry. In fact, by the end of 2013 all major set
ups (IBM, Microsoft and oracle to name a few) had shifted to column oriented
databases following this trend. Some
other products like Aster data, Greenplum, info bright, Paracel et cetera have
benefited from the column approach of storing data. Oracle, Exadata have also
Incorporated column approach in some way in their products. These products have been beneficial over the
traditional row based approach in Data Analytics and compression.
Despite success of column based systems in
commercial and educational industries, there is a huge scope for future
research in many interesting directions. There may be work on hybrid systems
that are partially column oriented. Systems
that choose between row and column oriented databases adapting to the
requirements will become really important. These systems will guide the users to decide which approach will be best
suited for their requirements. Microsoft
announced columnar storage with advanced query processing for its SQL server
product. It doesn’t yet have a full functionality but it is a step in this
direction. It is expected that these ideas will be implemented on all the other
platforms in the near future.