Basic Setup for Scala
Scala can be run using :- Spark-Shell
- SBT
Spark-Shell :Usually this is used as it provides scala along with spark capabilites to create RDD/DataSet and DF
Installation for Linux
To Create a Dataframe:
:imports
#copy 4th point as
- java -version
- sudo apt-get install scala #scala -version
- type $scala
- println("hi")
- :q
- goto - https://spark.apache.org/downloads.html
- Download latest tar file
- tar -zxvf spark-2.0.2-bin-hadoop2.7.tgz
- cd spark-2.0.2-bin-hadoop2.7.tgz
- ./spark-shell
- :q //to quit
- open .bashrc
- add line SPARK_HOME=~/Downloads/spark-3.0.0-preview2-bin-hadoop2.7
- $export PATH=$SPARK_HOME/bin:$PATH
- $source ~/.bashrc
- $spark-shell
Installation for mac
- Run homebrew installer : $/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"#(visit homebrew for more details)
- $xcode-select –install
- $brew cask install java
- $brew install scala
- $brew install apache-spark
- $spark-shell # to start spark-shell
- $brew upgrade apache-spark #to Upgrade
Start session
$spark-shell
$spark-shell
To Create a Dataframe:
:imports
#copy 4th point as
#import
org.apache.spark.sql.functions._ as
#import
org.apache.spark.sql._
import org.apache.spark.sql._val mockDF1: DataFrame = Seq((0, "A"), (1, "B"), (0, "C")).toDF("col1", "col2")mockDF1
TO uninstall
brew uninstall scala
--------------SBT shell
Linux- Install brew
- open terminal
- Enter "/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
- After brew is installed successfully
- Enter"brew install sbt@1"
- TO uninstall - brew uninstall scala
Linux :
Getting SBT Started in Terminal:
- Open Terminal
- enter "sbt"
- After sbt shell is opened
-
Open Scala REPL session inside SBT using - "console" or "
consoleQuick
" - Type "println("helloworld")
- To quit type ":q" or":quit"
- And "exit" to exit sbt shell
Note : You can only do basic operations here . But cannot do operations
where there is dependency on libraries .For that you need a build tool
like sbt or bazel .
To add library into your sbt shell :
To add library into your sbt shell :
- Download required jar file Eg:https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.12/2.4.4
- Open sbt shell
- run cmd ":require spark-sql_2.12.jar" #Added - file.jar
import org.apache.spark.sql.{Column, DataFrame, SaveMode,
SparkSession}
Using SBT in IntelliJ :
- IntelliJ by default comes with sbt
- Install Idea intelliJ
- Create a new sbt project
- goto project /src/main/test
- Create a new file "test1.scala"
- copy below code
- set SBT SDT to 2.11
- add contents into build file
- Run it (rt ck run)
build.sbt:
name := "sbt_test" version := "0.1" scalaVersion := "2.11.8"
libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % "2.1.0", "org.apache.spark" %% "spark-sql" % "2.1.0")
libraryDependencies += "org.scalactic" %% "scalactic" % "3.0.0"
libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.4" % "test"
libraryDependencies += "org.mockito" % "mockito-all" % "1.8.4"
libraryDependencies += "org.scalamock" %% "scalamock" % "4.3.0" % "test"
libraryDependencies += "org.testng" % "testng" % "6.10"
//Basic Class:
object test1 extends App{println("Hello World")}
or
object test1 { defmain(args:Array[String]): Unit = println("Hello, World!") }
or
import org.apache.spark.sql.SparkSessionimport org.scalamock.scalatest.MockFactoryimport org.apache.spark.sql.{Column, DataFrame, SaveMode, SparkSession}import org.apache.spark.sql.functions._import org.apache.spark.sql.types._object test1 extends App with MockFactory {val spark = SparkSession.builder().master("local").appName("spark-shell").getOrCreate()import spark.implicits._val df1: DataFrame = Seq((1,1), (2,2), (3,3)).toDF("col1", "col2")df1.show()}
Issues :
1. Error:scalac: Multiple 'scala-library*.jar' files (scala-library.jar, scala-library.jar, scala-library.jar) in
Scala
compiler classpath in
Scala
SDK scala-sdk-2.11.7`
Solution:
File > Project_Structure > Libraries
Remove "SBT:org.scala-lang:scala-library:2.11.8:jar"
2. Cannot resolve App
Solution:
Set the current Scala SDT to 2.11.8
--------------------------------------------------------------------------------------------------------
Issues :
1. Error:scalac: Multiple 'scala-library*.jar' files (scala-library.jar, scala-library.jar, scala-library.jar) in
Scala
compiler classpath in
Scala
SDK scala-sdk-2.11.7`
Solution 2: Remove "scalaVersion := "2.xx.xx" from build.sbt file
Solution 3:
2. Could not find or load main class in scala in intellij IDE
Solution: Right click on "src folder" and select Mark Directory as -> Sources root
--------------------------------------------------------------------------------------------------------
Ref: