Friday, July 10, 2020

Dataframe : agg

Spark Dataframe : agg()


To perform column related operations

  • ds.agg(...) = ds.groupBy().agg(...)
  • agg() is a DataFrame method that accepts those aggregate functions as arguments.
Example :

val df3: DataFrame =Seq(1.6,7,42).toDF("CURRENCY")
.withColumn("c2", lit(8.6).cast(DoubleType))
.withColumn("c3", lit(1).cast(DoubleType))
+--------+---+---+
|CURRENCY| c2| c3| +--------+---+---+ | 1.6|8.6|1.0| | 7.0|8.6|1.0| | 42.0|8.6|1.0| +--------+---+---+
df3.agg(max("CURRENCY")).show()
+-------------+ |max(CURRENCY)| +-------------+
| 42.0|
+-------------+
println(df3.agg(max("CURRENCY")).collect()(0))
[42.0]
println(df3.agg(sum("CURRENCY")).collect()(0))
[50.6]
df3.agg(sum("CURRENCY"),sum("c2")).show()
+-------------+------------------+
|sum(CURRENCY)| sum(c2)|
+-------------+------------------+
| 50.6|25.799999999999997|
+-------------+------------------+
In the above example In order write .sum this method has to exist. It is hardcoded on the API. Using .agg you can provide other aggregating functions, sum("column") is just one of them.


Same as :
df3.groupBy().max("CURRENCY")
[42.0]



No comments:

Post a Comment