Cloud-based enterprise data warehouses have flexibility and capabilities that are usually not present in on-premises solutions, but require a strategic approach.
Let’s get one thing out of the way upfront: today it’s not a matter of ‘if’ you go to the cloud to manage enterprise data, but rather, when.
Migrating a data warehouse/data lake from a legacy environment requires a massive upfront investment in resources and time. It doesn’t have to be that way, but it is important to understand there is much to consider prior to the migration.
Businesses need to take a strategic approach if they are to streamline this process and ultimately reduce costly man hours through rapid deployment.
On-premises vs cloud
Let’s clarify some issues first. On-premises data warehouses/data lakes collect, store and analyse data on on-site servers, resulting in the need for hardware infrastructure management. However, on-premises data is not always a viable option.
Cloud-based enterprise data warehouses have flexibility and capabilities that are usually not present in on-premises solutions, but require a strategic approach.
Over the past couple of years, organisations have started moving their data warehouses to the cloud. Here’s why:
Security and data protection: Is the top driver for cloud migration.
Next data modernisation: Primarily involves moving data from legacy to modern databases.
Upfront costs: On-premises data warehouses require upfront expenses on hardware infrastructure. When working with cloud data warehouses, these expenses are not necessary.
Ongoing costs: The cloud offers a low, pay-as-you-go model, while businesses with traditional data warehouses must deal with upgrade and maintenance costs.
Performance: Cloud-based data warehouse architectures leverage the ‘extract, load, transform’ (ELT) process to make data processing much faster than on-premises options.
Flexibility: Cloud data warehouses are designed to work with big data formats and structures. Traditional relational options are simply designed to integrate similarly structured data.
Scale: The elasticity of the cloud enables companies to scale big datasets. In addition, cloud-based data warehousing options can scale down as needed. Enterprises can’t easily do it with traditional approaches.
Enterprise data warehouse (EDW) migration challenges
Migrating data from an on-premises warehouse to a cloud-based environment creates several challenges – the important thing is to understand them.
When migrating to a cloud-based EDW platform, companies need to consider critical migration and design implications. To ensure downtime is kept at a minimum, a seamless integration strategy must be designed for all EDW functionalities being migrated to the cloud.
In this way it becomes possible to migrate data to the new environment without interrupting remaining on-premises processes.
Migrating data to the cloud
Migrating large volumes of data from one repository to another can often be time-consuming. When attempting to migrate an EDW, it is critical to accurately define data sets and volumes early in the process.
This technique enables the optimal connectivity for data movement. An accurate project schedule for data migration time frames is also important.
Data integration and access
To enable the flow of data from on-premises repositories to the cloud traditionally would require an extract, transform and load (ETL) process.
Different ETL tools need to be validated to ensure proper cloud operation and support, as well as the features required for integration with cloud-native EDW technologies. The last step is to recreate the transformations that produce final data models in the new cloud environment.
Although the upfront investment in time, labour and money might be steep, it is far better in the long term to follow an ELT rather than ETL paradigm, as this will allow for faster deployments of cloud-based EDW.
Consider investigating tools that assist in the building of the cloud-based EDW, as these tools will allow the company to be up and running in record time. This is important because the cost of migrating data can quickly add up if the movement of data is not efficient.
As cloud service providers offer cost-effective data storage prices, firms don’t want to waste this opportunity by having to migrate again. It is also a good idea to maximise data compression before transmitting.
Developer experience vs newer toolsets
Cloud-based EDWs come with flexibility and capabilities that are usually not present in on-premises solutions. These can often prove to be challenging to developers who are required to keep pace with new features and constantly learn the new functionalities added by the cloud service provider.
While it is true that migration of any IT service can be a daunting prospect, EDWs pose an added risk because it may interrupt business continuity.
When migrating to a cloud EDW, some key factors need to be analysed and planned before migration, so with this in mind, the advent of newer, rapid deployment technologies means cloud DW/lakes can now be created in the space of hours and tweaked over a couple of weeks to start delivering meaningful insights in record time.
This massively reduces total cost of ownership of the cloud DW/lake when compared to the traditional methods that typically took a year or two to get up and running before meaningful insights from data could be gleaned. This latter has been shown to add another 75% in costs.
With the demand for real-time data being driven from businesses, the next step is setting up an ongoing change data capture pipeline that will constantly feed the cloud DW/lake with data that is timeous.
The bottom line
A typical data warehouse contains a large amount of data covering many business areas. Migrating all the data at once would almost guarantee failure.
Organisations need to take incremental steps to successfully migrate their data warehouse to the cloud, especially when undertaking significant design changes. An incremental approach enables the company to keep operating the on-premises data warehouse, while the cloud data warehouse comes online.
By Carl Butler, Qlik Data Integration sales executive, iOCO.