OpenBI, a leading a professional services firm specializing in open source business intelligence solutions, announced today that it has completed a review of Pentaho for Apache Hadoop and found several key benefits.
Hadoop’s Relevance for BI
Organizations often process “sub-transactional” data – data that occurs within or between business transactions. Examples include web data, device monitoring, network performance, call detail, online gaming, and social media interactions. Sub-transactions dramatically increase the volume of information to be integrated into a business intelligence (BI) environment. And today's ETL platforms struggle to process this volume in a reliable, fault-tolerant manner. Hadoop is designed specifically to address these ETL shortcomings of “big data” processing by providing distributed and fault-tolerant processing.
The use of Hadoop has grown in recent years, though the technical learning curve, lack of support and graphical user interfaces for typical ETL data transformation functions, and high-latency for accessing data present roadblocks for BI usage. Pentaho Corporation, a commercial open source business intelligence company, has attempted to overcome these limitations through its new products Pentaho Data Integration for Hadoop and Pentaho BI Suite for Hadoop.
OpenBI has found significant benefits in Pentaho for Hadoop, including the ability to process big data more easily, efficiently and reliably and enabling ad-hoc query, reporting, analysis and dashboarding.
Benefits of Pentaho’s Hadoop Integration
In the absence of Pentaho, developing Hadoop MapReduce (MR) jobs to perform BI data transformations requires deep Java programming skills that lead to complex streams of MR steps for challenging ETL tasks. Pentaho for Hadoop enables use of its developer-friendly graphical ETL platform, Pentaho Data Integration, to create and manage MR jobs – greatly simplifying development and increasing the semantic expressiveness of the MR data processing paradigm. Without the Pentaho integration, several MR jobs would have to be programmatically linked to perform a parallelized ETL transformation. Now a single MR job that embeds more powerful data transformation logic can be used, making the MR code easier to manage, all the while significantly reducing Hadoop processing time.
Pentaho has also introduced the concept of the “Data Lake Architecture, ” whereby Hadoop can be used as a historic archive for big data, while Pentaho Data Integration can be used to quickly extract data subsets to relational data stores for speed of thought BI processing. In addition, Pentaho’s enhancements to the Apache Hive data warehouse infrastructure for Hadoop enables Pentaho products to access Hadoop data using a SQL based query language, greatly enhancing an organization’s ability to access archived information for ad hoc inquiries or scheduled reports.
The ability to create MR jobs using Pentaho Data Integration significantly simplifies development while improving performance of big data processing. And the use of Hive enables exploratory, ad hoc, big data queries that were previously impractical and costly. Where previously, several MR jobs would be manually streamed to perform a BI transformation, now a single MR job that embeds more powerful data transformation logic can be used, making it easier to manage code while improving performance.
Because there's significant overhead and infrastructure required to use the Hadoop engine, the benefits will most likely come with big data. OpenBI believes that Pentaho’s integration approach provides a significant opportunity for the BI marketplace to evolve to big data. As summarized by OpenBI founder and partner Dave Reinke, “Clients who are concerned about processing, managing and querying large quantities of sub-transactional data should seriously consider Pentaho’s Hadoop offering.”
Contact OpenBI to find out how we can help your organization integrate Pentaho for Hadoop.
OpenBI (openbi.com)professional services and outsourcing firm provides open source business intelligence (OSBI) solutions to commercial enterprise IT executives, managers and architects. Our goal: help you bring actionable intelligence to your business strategies and operations through best practice data warehouse integration, analytics, and performance management services – all while attaining the economic and architectural benefits of commercial open source software.