Apache Mahout: Installation on Hadoop cluster
by Vijay Olety • January 31, 2012 • Big Data, Hadoop, Technical • 1 Comment
For an overview of Mahout, please refer Apache Mahout – Machine Learning for Big Data.
Installation
As mentioned in the overview, Mahout is an open source scalable machine learning library. It is recommended that Mahout runs on top of Hadoop when processing large amounts of data. It is sufficient to install Mahout only on the Hadoop master node.
Before installing Mahout, ensure that Hadoop is installed in any of the modes. For the purposes of this blog, install Hadoop in cluster mode. Proceed with these commands in the same order -
user1@ubuntu-server:~$ apt-get install maven2
user1@ubuntu-server:~$ cd /opt
user1@ubuntu-server:~$ svn co http://svn.apache.org/repos/asf/mahout/trunk
user1@ubuntu-server:~$ mv trunk mahout_trunk
user1@ubuntu-server:~$ ln -s mahout_trunk/ mahout
user1@ubuntu-server:~$ cd mahout
user1@ubuntu-server:~$ mvn install
P.S.: Sometimes some tests fail while building mahout from source. In such cases use – user1@ubuntu-server:~$ mvn -DskipTests install
Edit .bash_profile to add entry for $MAHOUT_HOME, $HADOOP_CONF_DIR and change $PATH
vim ~/.bash_profile
- export HADOOP_CONF_DIR=$HADOOP_HOME/conf
- export MAHOUT_HOME=/opt/mahout
- export PATH=$PATH:$MAHOUT_HOME
Logout and login for the changes to take effect. After successful login, typing echo $MAHOUT_HOME should print /opt/mahout on the console.
That’s it! Mahout is installed successfully. Let’s play around.
For more details, please refer the following links -
https://cwiki.apache.org/confluence/display/MAHOUT/BuildingMahout

Missing the $HADOOP_HOME value