Companies today will have data coming from multiple sources that can be used to a better management of their business and processes, if they can combine, ready and understand the data clearly and fast.
The sources of data can be transactional data that are been created in the ERP system for example, sensors in equipment’s and machines providing several events and stream data, real-time analytics from reporting, planning and simulation and text data from log files, interface systems, IoT and etc.
There are two different types of work that a database can do to your company’s application: Transactional and Analytical. The transactional and analytical queries they are different in some spheres but they are nearly the same in other characteristics. Transactional Queries in the OLTP (Online Transaction Processing) can be entering sales orders in the system, billing documents, accounts receivable, displaying a single document entered in the system or checking a single master data while the OLAP (Online Analytical Processing) it is all about analysis of the transactional data like reporting, sales forecast, payment reminders, potential cross selling (a sales person can identify opportunities to selling additional services or products to a customer based on information of his Social Network interests for example).
The data system landscape was built and designed to handle the OLTP and OLAP queries separately as the execution time of OLAP queries are higher than OLTP. Then analytical processing tends to affect performance of daily business that are core processes to the company, like manufacturing and selling. So the enterprise management systems are optimized either to OLTP or OLAP workload because of performance.
It was always considered that the workload on OLTP was write intensive while OLAP was ready-only, however, recently research in multiple companies shows that this is not true.
Those researches showed also that the lookup rate in an OLTP system is only 10% higher than OLAP and the number of inserts is a little higher on OLTP, but OLAP systems also deals with inserts as they need to keep the data updated based on what is been performed in the transactional system. It is also known now that the number of updates in OLTP system is not too high as it did believe. In most of the industries with an updated management system the updated on OLTP represents 12% which means, 88% of the rows stored in OLTP are never updated. Those information leads to a new approach on deleting and inserting records in the database and also combine the two system in one.
83 % on OLTP are read queries which makes sense, you don’t just enter data, but you want to do something with the data, like reading the invoice that you will send to customer.
In OLAP system you have a bit more read queries but the difference is very low. So, why not using the optimization done in OLAP for read optimize, and use that in OLTP system as well? And that’s where the Column Oriented system comes in place, which was possible due to the modernization on hardware area, creating then a single source of truth for real-time analytics, doing everything on the fly and removing aggregates.
The main difference between OLTP and OLAP is that in the OLTP system, a single select deals with more queries returning only few rows results while OLAP calculate aggregations for few columns of a table but returning a large number of rows. In order to keep the OLAP and OLTP systems updated and synchronized, it is required a huge ETL process which takes time and it is complex to extract the data and make it fit in the analytical format.
The Disadvantage on separate OLAP from OLTP
It is true that keeping OLAP and OLTP in separated system optimize workload performance to both of them, but in other hands, there are some disadvantages, such as:
- OLAP will never have the latest updated data as the time to refresh data between the two systems can range from minutes to hours, so many decisions taken based on reports will have to rely on “old” data instead real-time data. Think about in a marketing campaign that needs to be analyzed based on the mood of your customers analyzing what they are saying on Twitter. The number of tweets goes over 350,000 tweets per minute, so, what is happening out there can change in minutes.
- In order to have an acceptable performance, OLAP system needs to use materialized aggregated, reducing the flexibility of the reporting to different user’s needs.
- OLTP and OLAP schemas are very different, then the applications that use both of them requires a complex ETL process to keep the data synchronized.
What can be changed?
It was possible to identify through research on companies’ systems behavior that many fields in a table is not used while those tables get wider and wider. Roughly, more than 50% of the columns are not used in tables where hundreds of columns exist. Among of the columns used there are few distinct values in them, and the other columns are full of NULL or default values, they are not used even once, so those tables contents are very low, near to zero.
By identifying those characteristics, it is possible to think about compression techniques which results in lower memory / data consumption.