Difference between revisions of "Apache Spark"

From ArchWiki
Jump to navigation Jump to search
m (StrayArch moved page Apache spark to Apache Spark: Grammar Enthusiast)
(change to the new category Category:Distributed computing)
Line 1: Line 1:
[[Category:Web server]]
+
[[Category:Distributed computing]]
 
{{Related articles start}}
 
{{Related articles start}}
 
{{Related|Hadoop}}
 
{{Related|Hadoop}}

Revision as of 14:34, 5 November 2015

Apache Spark is an open-source cluster computing framework originally developed in the AMPLab at UC Berkeley. In contrast to Hadoop's two-stage disk-based MapReduce paradigm, Spark's in-memory primitives provide performance up to 100 times faster for certain applications. By allowing user programs to load data into a cluster's memory and query it repeatedly, Spark is well-suited to machine learning algorithms.

Installation

Install the apache_sparkAUR package.

Configuration

Some environment variables are set in /etc/profile.d/apache_spark.sh.

ENV Value Description
PATH $PATH:/usr/lib/apache_spark/bin Spark binaries

You may need to adjust your PATH environment variable if your shell inhibits /etc/profile.d:

export PATH=$PATH:/usr/lib/apache_spark/bin