Apache Spark is an open-source cluster computing framework originally developed in the AMPLab at UC Berkeley. In contrast to Hadoop's two-stage disk-based MapReduce paradigm, Spark's in-memory primitives provide performance up to 100 times faster for certain applications. By allowing user programs to load data into a cluster's memory and query it repeatedly, Spark is well-suited to machine learning algorithms.
Install theAUR package.
Some environment variables are set in
You may need to adjust your
PATH environment variable if your shell inhibits
Enable R support
as described in
$SPARK_HOME/R/README.md. You may also wish to build the package documentation following the instructions in
Once the sparkR R package has been built you can connect using