Novertur

date
2014 — 2015
groupe de compétences
Analyse de données

Development of an online social media platform dedicated to business networking that aims at helping businesses to internationalize their activities by matching them to worldwide partners.

Context and challenges

Novertur Platform is an online social media dedicated to business networking. Partner matching is provided by Novertur Matchmaking Tool, a distributed processing chain implemented on Hadoop framework using Map and Reduce techniques scalable to large data sets (Big Data).

It is composed of the following steps:

  • Textual data upload. We use Sqoop Hadoop to upload data from Novertur platform SQL databases to Amazon HDFS buckets.
  • Textual data preprocessing. We use Lucene and Mahout Libraries to chunk, lemmatize and extract keywords.
  • Indexing and vector space model construction. We use Mahout and other Java Libraries to create the vector space model of keywords.
  • Dimension reduction. We use the scalable Mahout Latent Dirichlet Allocation algorithm to reduce the vector space dimension and retain only the most relevant keywords.
  • We apply value chain clustering and industry classification of businesses using Mahout scalable machine learning algorithms.
  • Matchmaking partners. We suggest partnership to businesses from the perspective of their industry and value chain characteristics.
novertur

Objectives

The goal is to help businesses to internationalize their activities by matching them to worldwide partners.

Partners and funding

Novertur International SA – industrial partner

Funded by the Commission for Technology and Innovation CTI

Results

All the steps have been implemented and tested over a set of 10’000 businesses (more than 40 Go of data). Evaluations of the tool reported satisfactory performance at the level of industry classification (75% of success) and value chain classification (51%).

Project Manager