Monday, September 9, 2019

scala : Basic Setup for Scala and Spark

Basic Setup for Scala

Scala can be run using :
  1. Spark-Shell
  2. SBT

Spark-Shell :Usually this is used as it provides scala along with spark capabilites to create RDD/DataSet and DF

Installation for Linux
  1. java -version
  2. sudo apt-get install scala #scala -version
    1. type $scala
    2. println("hi")
    3. :q
  3. goto - https://spark.apache.org/downloads.html
  4. Download latest tar file
  5. tar -zxvf spark-2.0.2-bin-hadoop2.7.tgz
  6. cd spark-2.0.2-bin-hadoop2.7.tgz
    1. ./spark-shell
    2. :q //to quit
    3. open .bashrc
  7. add line SPARK_HOME=~/Downloads/spark-3.0.0-preview2-bin-hadoop2.7
  8. $export PATH=$SPARK_HOME/bin:$PATH
  9. $source ~/.bashrc
  10. $spark-shell
Installation for mac
  1. Run homebrew installer : $/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"#(visit homebrew for more details)
  2. $xcode-select –install
  3. $brew cask install java
  4. $brew install scala
  5. $brew install apache-spark
  6. $spark-shell # to start spark-shell
  7. $brew upgrade apache-spark #to Upgrade 
Start session 
$spark-shell

To Create a Dataframe:
:imports
#copy 4th point as 
#import org.apache.spark.sql.functions._ as  
#import org.apache.spark.sql._
import org.apache.spark.sql._
val mockDF1: DataFrame = Seq((0, "A"), (1, "B"), (0, "C")).toDF("col1", "col2")
mockDF1

TO uninstall 
brew uninstall scala
--------------

SBT shell

Linux
  • Install brew
  • open terminal
  • Enter "/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
  • After brew is installed successfully
  • Enter"brew install sbt@1"
  • TO uninstall - brew uninstall scala
Linux :

Getting SBT Started in Terminal:
  1. Open Terminal
  2. enter "sbt"
  3. After sbt shell is opened
  4. Open Scala REPL session inside SBT using - "console" or "consoleQuick"
  5. Type "println("helloworld")
  6. To quit type ":q" or":quit"
  7. And "exit" to exit sbt shell
Note : You can only do basic operations here . But cannot do operations where there is dependency on libraries .For that you need a build tool like sbt or bazel .

To add library into your sbt shell :
  1. Download required jar file Eg:https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.12/2.4.4
  2. Open sbt shell
  3. run cmd ":require spark-sql_2.12.jar" #Added - file.jar
import org.apache.spark.sql.{Column, DataFrame, SaveMode, SparkSession}

Using SBT in IntelliJ :
  1. IntelliJ by default comes with sbt
  2. Install Idea intelliJ
  3. Create a new sbt project
  4. goto project /src/main/test
  5. Create a new file "test1.scala"
  6. copy below code  
  7. set SBT SDT to 2.11
  8. add contents into build file
  9. Run it (rt ck run)
build.sbt:

name := "sbt_test" version := "0.1" scalaVersion := "2.11.8"
libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % "2.1.0", "org.apache.spark" %% "spark-sql" % "2.1.0")

libraryDependencies += "org.scalactic" %% "scalactic" % "3.0.0"
libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.4" % "test"
libraryDependencies += "org.mockito" % "mockito-all" % "1.8.4"
libraryDependencies += "org.scalamock" %% "scalamock" % "4.3.0" % "test"
libraryDependencies += "org.testng" % "testng" % "6.10"

//Basic Class:

object test1 extends App{
  println("Hello World")
}
or 
object test1 { def
            main(args:
              Array[String]): Unit = println("Hello, World!") }
or

import org.apache.spark.sql.SparkSession
import org.scalamock.scalatest.MockFactory
import org.apache.spark.sql.{Column, DataFrame, SaveMode, SparkSession}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._

object test1  extends App with MockFactory {
  val spark = SparkSession.builder().master("local").appName("spark-shell").getOrCreate()
  import spark.implicits._
  val df1: DataFrame = Seq((1,1), (2,2), (3,3)).toDF("col1", "col2")
  df1.show()
}

Issues :

1. Error:scalac: Multiple 'scala-library*.jar' files (scala-library.jar, scala-library.jar, scala-library.jar) in
          Scala
            compiler classpath in
          Scala
            SDK scala-sdk-2.11.7`

Solution:
File > Project_Structure > Libraries     
Remove "SBT:org.scala-lang:scala-library:2.11.8:jar"

2. Cannot resolve App
Solution:
Set the current Scala SDT to 2.11.8
--------------------------------------------------------------------------------------------------------
Issues :

1. Error:scalac: Multiple 'scala-library*.jar' files (scala-library.jar, scala-library.jar, scala-library.jar) in
        Scala
          compiler classpath in
        Scala
          SDK scala-sdk-2.11.7`

Solution : Goto File > Project Stucture > Platform Settings > SDKs > Remove Duplicate
Solution 2: Remove "scalaVersion := "2.xx.xx" from build.sbt file
Solution 3:


2. Could not find or load main class in scala in intellij IDE
Solution: Right click on "src folder" and select Mark Directory as -> Sources root
--------------------------------------------------------------------------------------------------------

Ref:

No comments:

Post a Comment