Spark : Speeding / Optimization
val spark:SparkSession = SparkSession.builder()
.master("local[1]")
.appName("SparkByExamples.com")
.getOrCreate()
import spark.implicits._
val df2: DataFrame =Seq(1,2).toDF("CURRENCY")
.withColumn("c2", lit(8))
.withColumn("c3", lit(1))
spark.conf.set("spark.sql.shuffle.partitions",100)
Less data reduce the shuffle partitions or you will end up with many
partitioned files with less number of records in each partition. which results
in running process for long time .When you have too much of data and having less number of partitions results in fewer longer running tasks and some times you may also get out of memory error.
Default is set to 200
No comments:
Post a Comment