Coder Social home page Coder Social logo

kiminh / spark_recommender Goto Github PK

View Code? Open in Web Editor NEW

This project forked from illidanlab/spark_recommender

0.0 1.0 0.0 72.49 MB

An example of recommender system based on Spark.

Shell 1.75% Scala 35.20% JavaScript 0.01% Python 0.54% HTML 62.50%

spark_recommender's Introduction

recsys-spark

A learning-to-rank recommender system built upon Apache Spark. This page shows how to set up development environment and a wiki site is associated with this project, showing technical details.

Technical Requirements

In this project we use Scala as the main development language, in which the Spark is based on. To collaborate with the team to work on the project, you will also need to set up Git in the local machine (to run this program in Yarn, Git should also be set up in the gateway machine in order to clone and pull code from GitHub).

Getting Started

The following are some installations that will help to learn Scala and Spark. Note that the two steps are not required to develop and run the project, and the Scala and Spark used in the project will be downloaded (separately)

  1. Download and install Scala to computer. The version of the Scala should be compatible with the Spark to be installed in the next step. For example, Spark 1.0.0 and Spark 0.91 use Scala 2.10. After installation, type scala in command line should take you to the Scala interative shell, which is the best way to learn Scala.
  2. Download and install Spark. The latest Spark version up-to-date is 1.0.0. The Spark provides a Spark shell that includes an instance of val sc:SparkContext, which is the main entrance of Spark. The Spark shell is the best way to learn Spark.

Setup Development Environment

Currently the development environment is Eclipse. To use Eclipse as an IDE, follow these steps:

  1. Download and install Eclipse. The current latest one is Kepler 4.3.
  2. Install the following plugins to your Eclipse
  • Install Scala IDE. Recommended installation is via Update Site in the Eclipse (Help > Install New Software... > Add...). Note that the version of Scala IDE should be consistent with the Scala version to be worked with Spark. For Scala 2.10.4, the Scala IED 3.03/3.04 update site is
http://download.scala-ide.org/sdk/helium/e38/scala210/stable/site
  • Install m2eclipse. The Spark project is managed by the Maven build system. Recommended installation is dragging the installiation icon
in the Eclipse workspace, and the installation dialogue will pop up. * Install [m2eclipse-scala](https://github.com/sonatype/m2eclipse-scala). This allows you to work with Maven in Scala project. The installation can be done by cloning `https://github.com/sonatype/m2eclipse-scala.git` to your Eclipse `dropins` folder. 3. Verify Git works fine with Eclipse (right click on any project and Git operations can be accessed in the `Team` context menu). Since we are using GitHub, it provides a nice native client for both [Windows](https://windows.github.com/) and [Mac](https://mac.github.com/). 4. Check out the project to your workspace (e.g., `$HOME\workspace`) 5. In Eclipse, import the checked out project folder as an existing project (`File` > `Import` > `General` > `Existing Projects into Workspace`). 6. If this is the first time Maven is used, then there are probably many missing libraries, which is normal because Maven does not automatically download necessary libraries for you. To fix download the missing libraries, simply build the project (Right click on the project folder > `Run As` > `Maven Build`. In the window poped up, type "install" into the `Goals`, and click `Run`). The maven should start to build and download all required libraries from Internet. 7. In rare case the Eclipse Maven will fail due to some downloading issues. In such case, one solution to build outside the Maven. To do this, install Maven in command line, remove the corrupted maven downloads (`rm -r $HOME\.m2`) and navigate to the project folder, and run `mvn install`, and it should build.

Create Projects from Scrach

If you are creating a Spark project from scratch in GitHub and want to use Eclipse, the most efficient way (hold before you finish reading the entire section) to do this is as follows:

  1. Create a GitHub repository in GitHub
  2. Clone the repository using git client on local machine
  3. Create a plain Scala (or Python) project using the folder cloned from GitHub. The Eclipse should be able to recognize the folder is related to a repository (additional cylinder in the icon).
  4. Add a Maven dependency on spark-core of corresponding version. To do this, right click the project, and choose Configure > Convert to Maven Project and follow the instructions to add dependencies. To use Spark 1.0.0, the Maven information is groupId = org.apache.spark, artifactId = spark-core_2.10, version = 1.0.0.

However, this may impose some problems for deployment because it may not compile a jar file (according to Yuan Zhang). Therefore an alternative is to use Maven project:

  1. Create a GitHub repository in GitHub
  2. Clone the repository using git client on local machine
  3. Create a Maven project, use the folder name cloned from GitHub as the artifact name. The Eclipse should be able to recognize the folder is related to a repository (additional cylinder in the icon).
  4. Add Scala nature by Configure > Add Scala Nature to enable Scala. To this end, you should be able to compile Scala files.
  5. To use Spark 1.0.0, add dependency groupId = org.apache.spark, artifactId = spark-core_2.10, version = 1.0.0. Since the spark-core_2.10 dependency already includes Scala, we may remove the Scala Library added by Eclipse in Step (4) from Java Build Path in the project Properties.
  6. More often than not the default Maven JVM setting J2SE-1.5 may not have the same compatible level as the system, and there may be a disturbing warning. Follow this page to adjust the compiler compliance level in Java Compiler in the project Properties.
  7. The the pom.xml file must be set-up according to project requirement. One reference is the one in this project. This might be the most time consuming part, and in case it is not working, copy and paste the one in this project with minor changes (group ID, artifact ID and etc).

spark_recommender's People

Contributors

jiayuzhou avatar mohit-shrma avatar zhouyin avatar zhangy72 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.