Mahout Review by iletken

– This is a post I copied from iletken’s blog

mahout

Taste is an open-source recommendation library in which you can create a basic recommender
easily.

In  this post, we will  try  to describe  some  key  features of Mahout’s Taste  implementation,  and  also discuss  some problems  associated with  it. Because Taste  has  standard  textbook  algorithms, we will  not  get  into  accuracy  details.  Therefore  our  main focus will be on performance. As a recommender system provider, we are also struggling  with scalability issues and we are investigating up to what extent a Hadoop implementation can help.  Therefore, we went  ahead  and  tested Mahout’s  latest  release  in  order  to  answer  thisquestion:


Does Taste in Mahout somehow solve scalability issues in recommender systems?

Sadly,  the  answer  was  not a  Yes.  Taste  is  a  great  API  which  is  very  easy  to  develop  with,  however  even  with  the  Hadoop’s  scalability  features,  Taste  is  very  limited.  Taste’s  original implementation is not scalable and this implementation does not fit into Hadoop’s map-reduce  logic  naturally.
After  providing  a  detailed  review  of  Mahout-Taste,  we  will  be  describing  some  of  the
methodologies  and  technologies  we  rely  on  to  overcome  our  performance  and  resource
management  problems with  iletken. We  will  be  recommending  some  basic  implementation
tips.

Download the complete review in PDF format

Summary
Pros:

  • Mahout  manages  to  scale  training  session  with  slope  one  method  and  hadoop implementation
  • Mahout is Open Source

Cons:

  • Does not scale as expected
  • Mahout scalability is achieved by only using a slope one method.
  • Standard Slope one  recommender  is not very accurate  (Netflix  test: Success  -3%  (0.98RMSE))  compared  to  other  algorithms(not  included  in  Mahout)  such  as  Matrix Factorization(Netflix test: Success: 8.4% (0.87 RMSE))
  • Inefficient implementation
  • High memory & resource consumption
  • Only Collaborative filtering
  • Only standart algorithms

Edit: Some of our readers told me that mahout has improved. Please pay attention to what they are saying:

haltux:

” Many criticizes are not true anymore.

The complete netflix dataset can be processed by the mahout slope one implementation with few hundreds megabytes.
Other algorithms are now available, included distributed ones.”

One thought on “Mahout Review by iletken

  1. Many criticizes are not true anymore.
    The complete netflix dataset can be processed by the mahout slope one implementation with few hundreds megabytes.
    Other algorithms are now available, included distributed ones.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s