blob: ec1c9375e7279bceee3aed17707478a87883eda4 [file] [log] [blame]
<?php
/*******************************************************************************
* Copyright (c) 2015 Eclipse Foundation and others.
* All rights reserved. This program and the accompanying materials
* are made available under the terms of the Eclipse Public License v1.0
* which accompanies this distribution, and is available at
* http://eclipse.org/legal/epl-v10.html
*
* Contributors:
* Eric Poirier (Eclipse Foundation) - Initial implementation
*******************************************************************************/
?>
<h1 class="article-title"><?php echo $pageTitle; ?></h1>
<p>According to Wikipedia, Big data is a collection of data sets so
large and complex that they are difficult to process using on-hand
database management tools or traditional data processing
applications. Big data usually includes data sets with sizes
beyond the ability of commonly used software tools.</p>
<p>It’s a good thing BIRT is not your traditional data processing
application!</p>
<p>Eclipse BIRT was built with data source extensibility in mind.
BIRT does this by leveraging the Eclipse Data Tools Project (DTP)
and more specifically, the Open Data Access (ODA) framework. This
framework allows new data sources, like recent big data sources,
to be easily added to BIRT as needed. This post walks through
creating a connection to Hadoop in order to visualize the data
within BIRT.</p>
<h4>Using HQL to query Hadoop data</h4>
<p>BIRT provides an out-of-the-box driver that allows access to
Hadoop Data through Hive using Hive Query Language (HQL). Hive is
a data warehouse infrastructure built on top of Hadoop for
providing data summarization, query, and analysis. To retrieve
data from Hadoop, you write a query in Hive Query Language (HQL).
HQL supports many of the same keywords as SQL, for example SELECT,
WHERE, GROUP BY, ORDER BY, JOIN, and UNION.</p>
<p>A Hive query is executed by a series of automatically generated
MapReduce jobs. Alternatively, you can use the TRANSFORM statement
to specify scripts that translate into MapReduce functions in
Hadoop. These scripts can be written in virtually any programming
language. For example, the following HQL query specifies the
script file mytest.py, written in the Python programming language.</p>
<pre class="prettyprint lang-xtend">
SELECT
TRANSFORM (userid, movieid, rating, unixtime)
USING 'python mytest.py'
AS (userid, movieid, rating, weekday)
FROM u_data
</pre>
<br />
<h4>Creating an HQL query in BIRT</h4>
<p>To create a new query, select Hive Datasource from the New Data
Source wizard and enter the connection properties, as shown in the
figure below.</p>
<img
src="/community/eclipse_newsletter/2013/april/images/article1_1.jpg"
alt="New Data Source" />
<p>Next, choose Manage Drivers and add the Hive client JAR files.
You only need to do this once.</p>
<img
src="/community/eclipse_newsletter/2013/april/images/article1_2.jpg"
alt="JAR Files" />
<p>Now, you can create a data set by writing an HQL query. If your
query uses TRANSFORM statements that reference script files, use
the Add File Statement property to add files to the Hadoop
distributed cache. Type a semicolon-separated list of Add File
commands. This property can be overridden by the data source or
data set, using property binding or script. Type the HQL query in
the query text area of the data set editor as shown below.</p>
<img
src="/community/eclipse_newsletter/2013/april/images/article1_3(2).jpg"
alt="Edit Data Set" /><br /> <i>Complex HQL Subquery Example</i><br />
<br /> <img
src="/community/eclipse_newsletter/2013/april/images/article1_4(2).jpg"
alt="Edit Data Set" /><br /> <i>Get JSON Object Example</i><br />
<br /> <img
src="/community/eclipse_newsletter/2013/april/images/article1_5(2).jpg"
alt="Edit Data Set" /><br /> <i>Regular Expression Example</i><br />
<br /> <img
src="/community/eclipse_newsletter/2013/april/images/article1_6(2).jpg"
alt="Edit Data Set" /><br /> <i>HQL Hints Example</i><br />
<br />
<h4>Getting the Important Data to Stand Out</h4>
<p>You can create multiple data sets using the same steps above,
even joining data sets within BIRT. Once you have your big data
connections and queries defined, you can start using the data to
define your report within BIRT. At this point, you can simply drag
your data sets onto the report canvas and start formatting.</p>
<img
src="/community/eclipse_newsletter/2013/april/images/article1_7(2).jpg"
alt="Report Canvas" />
<p>But, being able to store more data brings its own new sets of
challenges. The more data collected typically means more data that
needs to be analyzed and displayed. This means, the important data
really needs to stand out. BIRT supports this with several
out-of-the-box features.</p>
<h4>Highlighting</h4>
<p>With the Highlighting feature in BIRT, you can set up formatting
rules that are based on expressions. You can create simple to
quite complex expressions in order to highlight the data.
Highlighting can be added to grids, tables, columns, rows, data
elements, labels, charts, and images.</p>
<img
src="/community/eclipse_newsletter/2013/april/images/article1_8(2).jpg"
alt="Highlight List" /><br />
<br />
<h4>Visibility</h4>
<p>The Visibility feature allows you to use expressions to decide
which areas of BIRT are visible. This is quite useful for allowing
certain groups of people to see only their data but can also be
used to hide whole areas based on the data discovered. Visibility
can also be applied to grids, tables, columns, rows, data
elements, labels, charts, and images, but you can also have
seperate visibility rules based on the final output, like PDF,
HTML, etc.</p>
<img
src="/community/eclipse_newsletter/2013/april/images/article1_9.jpg"
alt="Visibility" /><br />
<br />
<h4>Try it Yourself</h4>
<p>The Hive/Hadoop data source has been available since BIRT 3.7 and
can be downloaded as part of the Eclipse BIRT Designer from
eclipse.org/birt or BIRT Exchange.</p>
<script
src="http://www.eclipse.org/xtend/google-code-prettify/prettify.js"
type="text/javascript"></script>
<script
src="http://www.eclipse.org/xtend/google-code-prettify/lang-xtend.js"
type="text/javascript"></script>
<script type="text/javascript">
prettyPrint();
</script>
<div class="bottomitem">
<h3>About the Authors</h3>
<div class="row">
<div class="col-sm-12">
<div class="row">
<div class="col-sm-8">
<img class="author-picture"
src="/community/eclipse_newsletter/2013/april/images/virgildodson1.jpg"
alt="Virgil Dodson" />
</div>
<div class="col-sm-16">
<p class="author-name">
Virgil Dodson <br />
<a target="_blank" href="http://www.actuate.com/home/">Actuate</a>
</p>
<ul class="author-link">
<li><a target="_blank" href="<?php echo $original_url; ?>">Original
Article</a></li>
</ul>
</div>
</div>
</div>
</div>
</div>