Title Data-intensive distributed applications thanks to Hadoop coding, Talend integration and benchmarking
OW2 project Talend
OW2 project URL http://forge.ow2.org/projects/talend/
Keywords Java, Hadoop, Talend, benchmark, distributed applications, Optimization, petabytes of data, thousands of nodes, Google's MapReduce, Google File System, data integration
Apache Hadoop, a top level apache project [1] is a Java software framework that supports data-intensive distributed applications under an Open Source license.
Talend is a the recognized market leader in open source data integration and we achieved a first step of Hadoop technologies (HIVE, HDFS, HIVE, PIG, Scooq...) integration [2].
The goal of this project is to extend the support of Hadoop thanks coding some optimize HIVE SQL programs (HIVE Templates), Pig scripts and benchmark in a grid of servers. You will get some high knowledge of hadoop after this project will be done.
You will work daily with Chinese local leader in Talend Beijing location (80 developers) and report to an English spoken project manager based in France.
[1] http://hadoop.apache.org/
[2] http://cn.talend.com/products-data-integration/talend-integration-suite-mpx.php#feature

Main Topic Contact Person Name Michael Hirt
Main Topic Contact Person e-mail mhirt@talend.com
Other Topic Contact Person(s) Name(s) (optional) Cedric Carbone / Remy Dubois
Other Topic Contact e-mail(s) (optional) ccarbone@talend.com / rdubois@talend.com
Estimated Workload (total, in manmonths) 4
Targeted Contestants master/PhD