Title Data-intensive distributed applications thanks Haddoop coding, talend integration and benchmarking
OW2 project Talend
OW2 project URL http://forge.ow2.org/projects/talend/
Other OW2 projects and URL (optional)

Keywords Java, Hadoop, Talend, benchmark, distributed applications, Optimization, petabytes of data, thousands of nodes, Google's MapReduce, Google File System, data integration
Description {{html clean="false" wiki="false"}}

Apache Hadoop, a top level apache project [1] is a Java software framework that supports data-intensive distributed applications under an Open Source license.
Talend is a the recognized market leader in open source data integration and we achieved a first step of Hadoop technologies (HIVE, HDFS) integration [2]. The goal of this project is to extend the support of Hadoop thanks coding some optimize HIVE programs and benchmark in a grid of servers. You will get some high knowledge of hadoop after this project will be done.
You will work daily with Chinese local leader in Talend Beijing location (50 developers) and report to an English spoken project manager.
[1] http://hadoop.apache.org/
[2] http://cn.talend.com/products-data-integration/talend-integration-suite-mpx.php#feature

Main Topic Contact Person Name Cedric Carbone
Main Topic Contact Person e-mail ccarbone@REMOVETHIStalendREMOVETHIS.com
Other Topic Contact Person(s) Name(s) (optional) Michael Hirt
Other Topic Contact e-mail(s) (optional) mhirt(at)talend(dot)com
Estimated Workload (total, in manmonths) 4
Targeted Contestants master/PhD