The WebLab is an open source (under LGPL 2.1) platform aimed at providing intelligence systems that need to process multimedia data. Thus a system based on WebLab tackle the problem of “unstructured document processing” and in particular in the analysis of documents coming for the Internet. One of its typical application is media monitoring which could serve many different business needs.
One of the problem faced while processing information from the web is the large amount of data that is created. The architecture allows to easily distribute processing power and duplicate services. This project will then focus on the processing part and explore the possibility to integrate distributed storage and in particular explore Hadoop technologies and relevant sub projects (HDFS and Cassandra). The goal will be to study the integration of such storage capabilities as WebLab service for the multiple data types encountered: raw data from the Web (text, audio, images, video), XML and RDF triples).
This wiki is licensed under a Creative Commons 2.0 licenseXWiki Enterprise 6.4.4 - Documentation