Node.js web crawler + Apache Lucene + Data mining

A project for Data Mining course.

I wanted to build an aplication that will detect the most popular articles online druing a certain period of time. The application will take as input a list of websites from a specific domain. Ex: Sport doamin, the app will take as input a list of 50 sport blogs, sport news websites, etc...

It will crawl these webistes and save information in database. Then apache Lucene will read form the database, analize the data and decide wich article is most popular at that time. The factors taken in consideration for determining popularity are: Nr of shares on social networks, nr of views, published date etc..

You can see the source code for the crawler project here: