• Apache Mahout: Installation on Hadoop cluster

    by  • January 31, 2012 • Big Data, Hadoop, Technical • 1 Comment

    For an overview of Mahout, please refer Apache Mahout – Machine Learning for Big Data.

    Installation

    As mentioned in the overview, Mahout is an open source scalable machine learning library. It is recommended that Mahout runs on top of Hadoop when processing large amounts of data. It is sufficient to install Mahout only on the Hadoop master node.

    Before installing Mahout,  ensure that Hadoop is installed in any of the modes. For the purposes of this blog, install Hadoop in cluster mode. Proceed with these commands in the same order -

    user1@ubuntu-server:~$ apt-get install maven2
    user1@ubuntu-server:~$ cd /opt
    user1@ubuntu-server:~$ svn co http://svn.apache.org/repos/asf/mahout/trunk
    user1@ubuntu-server:~$ mv trunk mahout_trunk
    user1@ubuntu-server:~$ ln -s mahout_trunk/ mahout
    user1@ubuntu-server:~$ cd mahout
    user1@ubuntu-server:~$ mvn install

    P.S.: Sometimes some tests fail while building mahout from source. In such cases use – user1@ubuntu-server:~$ mvn -DskipTests install

    Edit .bash_profile to add entry for $MAHOUT_HOME, $HADOOP_CONF_DIR and change $PATH

    vim ~/.bash_profile
    • export HADOOP_CONF_DIR=$HADOOP_HOME/conf
    • export MAHOUT_HOME=/opt/mahout
    • export PATH=$PATH:$MAHOUT_HOME

    Logout and login for the changes to take effect. After successful login, typing echo $MAHOUT_HOME should print /opt/mahout on the console.

    That’s it! Mahout is installed successfully. Let’s play around.

    For more details, please refer the following links -

    https://cwiki.apache.org/confluence/display/MAHOUT/BuildingMahout

     

    About

    One Response to Apache Mahout: Installation on Hadoop cluster

    1. Alejandro
      June 6, 2012 at 5:34 am

      Missing the $HADOOP_HOME value