– This is a post I copied from iletken’s blog –
Taste is an open-source recommendation library in which you can create a basic recommender
In this post, we will try to describe some key features of Mahout’s Taste implementation, and also discuss some problems associated with it. Because Taste has standard textbook algorithms, we will not get into accuracy details. Therefore our main focus will be on performance. As a recommender system provider, we are also struggling with scalability issues and we are investigating up to what extent a Hadoop implementation can help. Therefore, we went ahead and tested Mahout’s latest release in order to answer thisquestion:
Does Taste in Mahout somehow solve scalability issues in recommender systems?
Sadly, the answer was not a Yes. Taste is a great API which is very easy to develop with, however even with the Hadoop’s scalability features, Taste is very limited. Taste’s original implementation is not scalable and this implementation does not fit into Hadoop’s map-reduce logic naturally.
After providing a detailed review of Mahout-Taste, we will be describing some of the
methodologies and technologies we rely on to overcome our performance and resource
management problems with iletken. We will be recommending some basic implementation
- Mahout manages to scale training session with slope one method and hadoop implementation
- Mahout is Open Source
- Does not scale as expected
- Mahout scalability is achieved by only using a slope one method.
- Standard Slope one recommender is not very accurate (Netflix test: Success -3% (0.98RMSE)) compared to other algorithms(not included in Mahout) such as Matrix Factorization(Netflix test: Success: 8.4% (0.87 RMSE))
- Inefficient implementation
- High memory & resource consumption
- Only Collaborative filtering
- Only standart algorithms
Edit: Some of our readers told me that mahout has improved. Please pay attention to what they are saying:
” Many criticizes are not true anymore.
The complete netflix dataset can be processed by the mahout slope one implementation with few hundreds megabytes.
Other algorithms are now available, included distributed ones.”