Thursday, June 4, 2020

Scala : Sum of each row in a spark dataframe

Scala : Sum of each row and Column in a spark dataframe 


Note :
Map , Filter , Reduce on a Dataframe performs row wise operation and stores final result as a Column.

Method 1 :
object DataFrames {
val df2: DataFrame =Seq(1,2).toDF("CURRENCY")
.withColumn("c2", lit(8)) .withColumn("c3", lit(1))
def main(args: Array[String]): Unit = {
val sumDF = df2.
withColumn("TOTAL", df2.columns.map(c => col(c)).reduce((c1, c2) => c1 + c2)) sumDF.show() }
}
Method 2:
df2.withColumn("TOTAL", expr("c2+c3"))
Method 3:
df2.withColumn("TOTAL",col("c2")+col("c3"))
+--------+---+---+-----+ |CURRENCY| c2| c3|TOTAL| +--------+---+---+-----+ | 1| 8| 1| 10| | 2| 8| 1| 11| +--------+---+---+-----+
To calculate Sum of Columns:
df2.agg(sum(col("c2"))).show()

Result :
+-------+
|sum(c2)|
+-------+
| 16|
+-------+


You can use Highlight JS. Go to themes and edit html, place below codes in .

No comments:

Post a Comment