Spark Dataframe : agg()
- ds.agg(...) = ds.groupBy().agg(...)
- agg() is a DataFrame method that accepts those aggregate functions as arguments.
Example :
val df3: DataFrame =Seq(1.6,7,42).toDF("CURRENCY")
.withColumn("c2", lit(8.6).cast(DoubleType))
.withColumn("c3", lit(1).cast(DoubleType))
+--------+---+---++-------------+ |max(CURRENCY)| +-------------+
|CURRENCY| c2| c3| +--------+---+---+ | 1.6|8.6|1.0| | 7.0|8.6|1.0| | 42.0|8.6|1.0| +--------+---+---+
df3.agg(max("CURRENCY")).show()
| 42.0|
+-------------+
println(df3.agg(max("CURRENCY")).collect()(0))
[42.0]
println(df3.agg(sum("CURRENCY")).collect()(0))
[50.6]In the above example In order write .sum this method has to exist. It is hardcoded on the API. Using .agg you can provide other aggregating functions, sum("column") is just one of them.df3.agg(sum("CURRENCY"),sum("c2")).show()+-------------+------------------+
|sum(CURRENCY)| sum(c2)|
+-------------+------------------+
| 50.6|25.799999999999997|
+-------------+------------------+
Same as :
df3.groupBy().max("CURRENCY")
[42.0]
[42.0]