Title WebLabDC
OW2 project WebLab
OW2 project URL weblab.ow2.org
Other OW2 projects and URL (optional) http://forge.ow2.org/projects/weblab/
Keywords Content Management, Hadoop, SOA, Amazon Web Services
Description {{html clean="false" wiki="false"}}

The WebLab is an open source (under LGPL 2.1) platform aimed at providing intelligence systems that need to process multimedia data. Thus a system based on WebLab tackle the problem of “unstructured document processing” and in particular in the analysis of documents coming for the Internet. One of its typical application is media monitoring which could serve many different business needs.

One of the problem faced while processing information from the web is the large amount of data that is created. The architecture allows to easily distribute processing power and duplicate services. This project will then focus on the processing part and explore the possibility to integrate distributed storage and in particular explore Hadoop technologies and relevant sub projects (HDFS and Cassandra). The goal will be to study the integration of such storage capabilities as WebLab service for the multiple data types encountered: raw data from the Web (text, audio, images, video), XML and RDF triples).

Main Topic Contact Person Name Arnaud Saval
Main Topic Contact Person e-mail arnaud.saval@gmail.com
Other Topic Contact Person(s) Name(s) (optional) weblab user mailing list
Other Topic Contact e-mail(s) (optional) user@weblab-project.org
Estimated Workload (total, in manmonths) 6
Targeted Contestants master/PhD