{"id":5398,"date":"2015-12-13T09:34:26","date_gmt":"2015-12-13T01:34:26","guid":{"rendered":"http:\/\/rmohan.com\/?p=5398"},"modified":"2015-12-13T09:35:34","modified_gmt":"2015-12-13T01:35:34","slug":"apache-spark","status":"publish","type":"post","link":"https:\/\/mohan.sg\/?p=5398","title":{"rendered":"Apache Spark"},"content":{"rendered":"<p>Apache Spark 1.5.2 release, this version is a maintenance release that includes fixes Spark stability in some areas, mainly: DataFrame API, Spark Streaming, PySpark, R, Spark SQL and MLlib<\/p>\n<p>&nbsp;<\/p>\n<p>Apache Spark is one of the<span class=\"Apple-converted-space\"> hadoop <\/span>open source cluster computing environments similar, but there are some differences between the two, these useful differences make Spark in some workloads behaved more superior, in other words, Spark Enable memory distributed data sets, in addition to providing interactive query, it also can optimize iterative workloads.<\/p>\n<p>Spark is implemented in the Scala language, which will Scala as its application framework.<span class=\"Apple-converted-space\">\u00a0<\/span>And Hadoop different, Spark and Scala can be tightly integrated, which can operate as a local collection Scala objects as easily as operating a distributed data sets.<\/p>\n<p>Although creating Spark iterative job to support distributed data sets, but in fact it is complementary to Hadoop, it can run in parallel Hadoo file system.<span class=\"Apple-converted-space\">\u00a0<\/span>Through third-party clustering framework called Mesos can support this behavior.<span class=\"Apple-converted-space\">\u00a0<\/span>Spark by the University of California, Berkeley AMP Lab (Algorithms, Machines, and People Lab) development, can be used to build large, low-latency data analysis applications.<\/p>\n<p>&nbsp;<\/p>\n<div>Spark (<a href=\"http:\/\/spark-project.org\/\" target=\"_blank\">http:\/\/spark-project.org<\/a>) is developed in the UC Berkeley AMPLab, to make data analytics fast. It is open source. Spark is for in-memory cluster computing whereas Hadoop-MapReduce is disk-based. Our job can load data into memory and query it repeatedly much quicker than Hadoop-MapReduce. For programmers Spark provides APIs in both Scala and Java. Spark is developed focusing two applications where keeping data in memory helps<\/div>\n<ul>\n<li>Iterative Algorithms, which are common in machine learning.<\/li>\n<li>Interactive data mining.<\/li>\n<\/ul>\n<div><b>\u00a0Abstractions Provided by Spark<\/b><\/div>\n<div><\/div>\n<div>The main abstraction Spark provides is a Resilient Distributed Dataset (RDD).<br \/>\nRDDs are fault-tolerant, parallel data structures that let users explicitly persist intermediate results in memory, control their partitioning to optimize data placement, and manipulate them using a rich set of operators.<\/div>\n<div><\/div>\n<div>A second abstraction in Spark is shared variables that can be used in parallel operations. Spark supports two types of shared variables<\/div>\n<ul>\n<li>broadcast variables<\/li>\n<li>accumulators<\/li>\n<\/ul>\n<div><b>Driver Program<\/b><\/div>\n<div><\/div>\n<div>At a high level, every Spark application consists of a driver program that runs the user\u2019s main function and executes various parallel operations on a cluster.<\/div>\n<div><\/div>\n<div><b>Operations on RDDs<\/b><\/div>\n<div><\/div>\n<div>Spark exposes RDDs through a language-integrated APIs. RDDs support two types of operations.<\/div>\n<ul>\n<li><i>\u00a0<\/i><i>Transformations,<span class=\"Apple-converted-space\">\u00a0<\/span><\/i>which create a new dataset from an existing one.<\/li>\n<li><i>Actions,<span class=\"Apple-converted-space\">\u00a0<\/span><\/i>which return a value to the driver program after running a computation on the dataset<i>.<\/i><\/li>\n<\/ul>\n<div>For example,<span class=\"Apple-converted-space\">\u00a0<\/span><i>map<\/i><span class=\"Apple-converted-space\">\u00a0<\/span>is a transformation that passes each dataset element through a function and returns a new distributed dataset representing the results. On the other hand,<span class=\"Apple-converted-space\">\u00a0<\/span><i>reduce<\/i><span class=\"Apple-converted-space\">\u00a0<\/span>is an action that aggregates all the elements of the dataset using some function and returns the final result to the driver program.<\/div>\n<div><\/div>\n<div>More examples of Transformation operations are<span class=\"Apple-converted-space\">\u00a0<\/span><i>filter(func), flatMap(func), distinct([numTasks])), reduceByKey(func, [numTasks])\u00a0<\/i><\/div>\n<div><\/div>\n<div>More examples of Action operations are<span class=\"Apple-converted-space\">\u00a0<\/span><i>collect(), count(), first(),<span class=\"Apple-converted-space\">\u00a0<\/span><br \/>\nsaveAsTextFile(path), saveAsSequenceFile(path)<\/i><\/div>\n<div><\/div>\n<div><\/div>\n<p><strong>Centos7 install and configure Spark 1.5.2 Standalone mode<\/strong><br \/>\n[root@clusterserver1 ~]# cat \/etc\/hosts<br \/>\n127.0.0.1\u00a0\u00a0 localhost localhost.localdomain localhost4 localhost4.localdomain4<br \/>\n::1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 localhost localhost.localdomain localhost6 localhost6.localdomain6<br \/>\n192.168.1.20 clusterserver1.rmohan.com clusterserver1<br \/>\n192.168.1.21 clusterserver2.rmohan.com clusterserver2<\/p>\n<p>wget &#8211;no-cookies &#8211;no-check-certificate &#8211;header &#8220;Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie&#8221; &#8220;http:\/\/download.oracle.com\/otn-pub\/java\/jdk\/8u65-b17\/jdk-8u65-linux-x64.tar.gz&#8221;<br \/>\ntar -zxvf jdk-8u65-linux-x64.tar.gz<br \/>\nmkdir \/usr\/java<br \/>\nmv jdk1.8.0_65 \/usr\/java\/<\/p>\n<p>cd \/usr\/java\/jdk1.8.0_40\/<br \/>\n[root@cluster1 java]# ln -s \/usr\/java\/jdk1.8.0_40\/bin\/java \/usr\/bin\/java<br \/>\n[root@cluster1 java]# alternatives \u2013install \/usr\/java\/jdk1.8.0_40\/bin\/java java \/usr\/java\/jdk1.8.0_40\/bin\/java 2<\/p>\n<p>alternatives \u2013install \/usr\/java\/jdk1.8.0_40\/bin\/java java \/usr\/java\/jdk1.8.0_40\/bin\/java 2<br \/>\nalternatives \u2013config java<\/p>\n<p>[root@cluster1 java]# alternatives &#8211;config java<\/p>\n<p>There is 1 program that provides \u2018java\u2019.<\/p>\n<p>Selection Command<br \/>\n\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2013<br \/>\n*+ 1 \/usr\/java\/jdk1.8.0_40\/bin\/java<\/p>\n<p>Enter to keep the current selection[+], or type selection number: 1<br \/>\n[root@cluster1 java]#<\/p>\n<p>alternatives &#8211;install \/usr\/bin\/jar jar \/opt\/jdk1.8.0_40\/bin\/jar 2<br \/>\nalternatives &#8211;install \/usr\/bin\/javac javac \/opt\/jdk1.8.0_40\/bin\/javac 2<br \/>\nalternatives &#8211;set jar \/opt\/jdk1.8.0_40\/bin\/jar<br \/>\nalternatives &#8211;set\u00a0 javac \/opt\/jdk1.8.0_40\/bin\/javac<br \/>\nvi \/etc\/profile.d\/java.sh<\/p>\n<p>export JAVA_HOME=\/usr\/java\/jdk1.8.0_40<br \/>\nPATH=$JAVA_HOME\/bin:$PATH<br \/>\nexport PATH=$PATH:$JAVA_HOME<br \/>\nexport JRE_HOME=\/usr\/java\/jdk1.8.0_40\/jre<br \/>\nexport PATH=$PATH:\/usr\/java\/jdk1.8.0_40\/bin:\/usr\/java\/jdk1.8.0_40\/jre\/bin<\/p>\n<p>wget http:\/\/www.apache.org\/dyn\/closer.lua\/spark\/spark-1.5.2\/spark-1.5.2.tgz<\/p>\n<p>gunzip -c spark-1.5.2.tgz | tar xvf &#8211;<\/p>\n<p>wget http:\/\/mirror.nus.edu.sg\/apache\/spark\/spark-1.5.2\/spark-1.5.2-bin-hadoop1-scala2.11.tgz<br \/>\ngunzip -c spark-1.5.2-bin-hadoop1-scala2.11.tgz | tar xvf &#8211;<\/p>\n<p>Download Scala<br \/>\nhttp:\/\/downloads.typesafe.com\/scala\/2.11.7\/scala-2.11.7.tgz?_ga=1.97307478.816346610.1449891008<\/p>\n<p>mkdir \/usr\/hadoop<br \/>\nmv spark-1.5.2 \/usr\/hadoop\/<br \/>\nmv scala-2.11.7 \/usr\/hadoop\/<br \/>\nmv spark-1.5.2-bin-hadoop1-scala2.11 \/usr\/hadoop\/<\/p>\n<p>vi \/etc\/profile.d\/scala<br \/>\n#SCALA VARIABLES START<br \/>\nexport SCALA_HOME=\/usr\/hadoop\/scala-2.11.7<br \/>\nexport PATH=$PATH:$SCALA_HOME\/bin<br \/>\n#SCALA VARIABLES END<\/p>\n<p>#SPARK VARIABLES START<br \/>\nexport SPARK_HOME=\/usr\/hadoop\/spark-1.5.2-bin-hadoop1-scala2.11<br \/>\nexport PATH=$PATH:$SPARK_HOME\/bin<br \/>\n#SPARK VARIABLES END<\/p>\n<p>export SPARK_MASTER_IP=localhost<br \/>\nexport SPARK_WORKER_MEMORY=1024m<br \/>\nexport master=spark:\/\/localhost:7070<\/p>\n<p>[root@clusterserver1 spark-1.5.2-bin-hadoop1-scala2.11]# scala -version<br \/>\nScala code runner version 2.11.7 &#8212; Copyright 2002-2013, LAMP\/EPFL<br \/>\nYou have new mail in \/var\/spool\/mail\/root<br \/>\n[root@clusterserver1 spark-1.5.2-bin-hadoop1-scala2.11]#<\/p>\n<p>[root@clusterserver1 sbin]# .\/start-all.sh<br \/>\nstarting org.apache.spark.deploy.master.Master, logging to \/usr\/hadoop\/spark-1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 .5.2-bin-hadoop1-scala2.11\/sbin\/..\/logs\/spark-root-org.apache.spark.deploy.mas\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ter.Master-1-clusterserver1.rmohan.com.out<br \/>\nlocalhost: Warning: Permanently added &#8216;localhost&#8217; (ECDSA) to the list of known hosts.<br \/>\nroot@localhost&#8217;s password:<br \/>\nlocalhost: starting org.apache.spark.deploy.worker.Worker, logging to \/usr\/hadoop\/spark-1.5.2-bin-hadoop1-scala2.11\/sbin\/..\/logs\/spark-root-org.apache.spark.deploy.worker.Worker-1-clusterserver1.rmohan.com.out<br \/>\n[root@clusterserver1 sbin]#<\/p>\n<p>root@clusterserver1 bin]# spark-shell<br \/>\nlog4j:WARN No appenders could be found for logger (org.apache.hadoop.security.\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Groups).<br \/>\nlog4j:WARN Please initialize the log4j system properly.<br \/>\nlog4j:WARN See http:\/\/logging.apache.org\/log4j\/1.2\/faq.html#noconfig for more\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 info.<br \/>\nUsing Spark&#8217;s repl log4j profile: org\/apache\/spark\/log4j-defaults-repl.propert\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ies<br \/>\nTo adjust logging level use sc.setLogLevel(&#8220;INFO&#8221;)<br \/>\n15\/12\/13 08:20:19 WARN MetricsSystem: Using default name DAGScheduler for sour\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ce because spark.app.id is not set.<br \/>\nSpark context available as sc.<br \/>\n15\/12\/13 08:20:22 WARN Connection: BoneCP specified but not present in CLASSPA\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 TH (or one of dependencies)<br \/>\n15\/12\/13 08:20:23 WARN Connection: BoneCP specified but not present in CLASSPA\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 TH (or one of dependencies)<br \/>\n15\/12\/13 08:20:29 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0<br \/>\n15\/12\/13 08:20:30 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException<br \/>\n15\/12\/13 08:20:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform&#8230; using builtin-java classes where applicable<br \/>\n15\/12\/13 08:20:31 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)<br \/>\n15\/12\/13 08:20:32 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)<br \/>\n15\/12\/13 08:20:38 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0<br \/>\n15\/12\/13 08:20:38 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException<br \/>\n15\/12\/13 08:20:39 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform&#8230; using builtin-java classes where applicable<br \/>\nSQL context available as sqlContext.<br \/>\nWelcome to<br \/>\n____\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 __<br \/>\n\/ __\/__\u00a0 ___ _____\/ \/__<br \/>\n_\\ \\\/ _ \\\/ _ `\/ __\/\u00a0 &#8216;_\/<br \/>\n\/___\/ .__\/\\_,_\/_\/ \/_\/\\_\\\u00a0\u00a0 version 1.5.2<br \/>\n\/_\/<\/p>\n<p>Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_65)<br \/>\nType in expressions to have them evaluated.<br \/>\nType :help for more information.<\/p>\n<p>scala&gt; 1+2<br \/>\nres0: Int = 3<\/p>\n<p>scala&gt;<\/p>\n<p>\/root\/word.txt<br \/>\nhello world<br \/>\nhello hadoop<br \/>\npls say hello<\/p>\n<p>val readFile = sc.textFile(&#8220;file:\/\/\/root\/word.txt&#8221;)<\/p>\n<p>scala&gt; val readFile = sc.textFile(&#8220;file:\/\/\/root\/word.txt&#8221;)<br \/>\nreadFile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[5] at textFile at &lt;console&gt;:24<\/p>\n<p>scala&gt; readFile.count()<br \/>\n15\/12\/13 08:36:25 WARN LoadSnappy: Snappy native library not loaded<br \/>\nres2: Long = 3<\/p>\n<p><a href=\"http:\/\/rmohan.com\/wp-content\/uploads\/2015\/12\/spark-001.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-5399\" src=\"http:\/\/rmohan.com\/wp-content\/uploads\/2015\/12\/spark-001.jpg\" alt=\"spark 001\" width=\"1630\" height=\"617\" srcset=\"https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-001.jpg 1630w, https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-001-300x114.jpg 300w, https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-001-1024x388.jpg 1024w, https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-001-150x57.jpg 150w, https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-001-400x151.jpg 400w, https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-001-900x341.jpg 900w\" sizes=\"(max-width: 1630px) 100vw, 1630px\" \/><\/a> <a href=\"http:\/\/rmohan.com\/wp-content\/uploads\/2015\/12\/spark-002.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-5400\" src=\"http:\/\/rmohan.com\/wp-content\/uploads\/2015\/12\/spark-002.jpg\" alt=\"spark 002\" width=\"1914\" height=\"666\" srcset=\"https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-002.jpg 1914w, https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-002-300x104.jpg 300w, https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-002-1024x356.jpg 1024w, https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-002-150x52.jpg 150w, https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-002-400x139.jpg 400w, https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-002-900x313.jpg 900w\" sizes=\"(max-width: 1914px) 100vw, 1914px\" \/><\/a> <a href=\"http:\/\/rmohan.com\/wp-content\/uploads\/2015\/12\/spark-003.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-5401\" src=\"http:\/\/rmohan.com\/wp-content\/uploads\/2015\/12\/spark-003.jpg\" alt=\"spark 003\" width=\"1913\" height=\"624\" srcset=\"https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-003.jpg 1913w, https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-003-300x98.jpg 300w, https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-003-1024x334.jpg 1024w, https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-003-150x49.jpg 150w, https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-003-400x130.jpg 400w, https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-003-900x294.jpg 900w\" sizes=\"(max-width: 1913px) 100vw, 1913px\" \/><\/a> <a href=\"http:\/\/rmohan.com\/wp-content\/uploads\/2015\/12\/spark-004.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-5402\" src=\"http:\/\/rmohan.com\/wp-content\/uploads\/2015\/12\/spark-004.jpg\" alt=\"spark 004\" width=\"1916\" height=\"569\" srcset=\"https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-004.jpg 1916w, https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-004-300x89.jpg 300w, https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-004-1024x304.jpg 1024w, https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-004-150x45.jpg 150w, https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-004-400x119.jpg 400w, https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-004-900x267.jpg 900w\" sizes=\"(max-width: 1916px) 100vw, 1916px\" \/><\/a> <a href=\"http:\/\/rmohan.com\/wp-content\/uploads\/2015\/12\/spark-005.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-5403\" src=\"http:\/\/rmohan.com\/wp-content\/uploads\/2015\/12\/spark-005.jpg\" alt=\"spark 005\" width=\"1888\" height=\"683\" srcset=\"https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-005.jpg 1888w, https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-005-300x109.jpg 300w, https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-005-1024x370.jpg 1024w, https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-005-150x54.jpg 150w, https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-005-400x145.jpg 400w, https:\/\/mohan.sg\/wp-content\/uploads\/2015\/12\/spark-005-900x326.jpg 900w\" sizes=\"(max-width: 1888px) 100vw, 1888px\" \/><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Apache Spark 1.5.2 release, this version is a maintenance release that includes fixes Spark stability in some areas, mainly: DataFrame API, Spark Streaming, PySpark, R, Spark SQL and MLlib<\/p>\n<p>&nbsp;<\/p>\n<p>Apache Spark is one of the hadoop open source cluster computing environments similar, but there are some differences between the two, these useful differences make [&#8230;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[60],"tags":[],"_links":{"self":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts\/5398"}],"collection":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5398"}],"version-history":[{"count":2,"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts\/5398\/revisions"}],"predecessor-version":[{"id":5405,"href":"https:\/\/mohan.sg\/index.php?rest_route=\/wp\/v2\/posts\/5398\/revisions\/5405"}],"wp:attachment":[{"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5398"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5398"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mohan.sg\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5398"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}