TRIGGERBLOCK: June 2020

Friday, June 26, 2020

Scala :Protobuf

Protobuf file (Like json and xml , protobuf is used to store and transmit data).

Below are some of the advantages of protobuf

1. Scheme is preserved

2. similar to json / xml

3. Cross platform

4. Parsing is faster

5. Compressed

Example :

Pre - Req : Create a new project in scala using sbt

build.sbt

name := "protobuf"

version := "0.1"

scalaVersion := "2.12.7"

PB.targets in Compile := Seq(

scalapb.gen() -> (sourceManaged in Compile).value

)

PB.protoSources in Test := Seq(file("/src/protobuf/"))

libraryDependencies += "com.thesamet.scalapb" %% "scalapb-runtime" % scalapb.compiler.Version.scalapbVersion % "protobuf"

project_name/project/plugins.sbt

addSbtPlugin("com.thesamet" % "sbt-protoc" % "0.99.18")

libraryDependencies += "com.thesamet.scalapb

src/main/protobuf/person.proto

syntax = "proto3";

message Person {

string name = 1;

int32 age = 2;

Gender gender = 3;

}

enum Gender {

MALE = 0;

FEMALE = 1;

}

src/main/Persontest.scala

import com.google.protobuf.ByteString

import java.io.{InputStream, OutputStream}

import person.{Gender, Person}

object Persontest {

val person = Person(

name = "User1",

age = 15,

gender = Gender.MALE

)

}

Redis : Free Open Source Database in Cloud (30 MB)

Pre-Req:

Make sure home brew is installed in mac

Getstarted

Create an account in Redis Website and select free account which is available upto 30MB of space.

1. http://try.redis.io/

2. https://www.tutorialspoint.com/redis/redis_commands.htm

Installation for mac :

brew install redis

Check Redis

redis-server --version

redis-cli --version

Connect to Remote :

Get Redis host name from your profile under your database
Get Redis password from your profile under your database

command:

redis-cli -h SERVER-IP -p PORT -a YOURPASSWORD

example:

redis-cli -h redis-19xxx.c1.ap-southeast-1-1.ec2.cloud.redislabs.com -p 19xxx -a redis_password
exit

Thursday, June 25, 2020

KAFKA

Cloud kafka

https://gist.github.com/j-thepac/6714798337374d9c7820163c8998ffe2

Kafka :

https://gist.github.com/j-thepac/58192f87b82f79cc27729cd6645678fd

Monday, June 22, 2020

Scala : Reflections

used to Convert the input string into Scala object.
Eg :
In below example class name "testsubclass" is a string in the main object Which is converted into scala object.

class testsubclass {

def hi:String=return ("newx")

}

object Test{

def main((args: Array[String]):{

val x:Any=(Class.forName("testsubclass").newInstance() println(x.asInstanceOf[{def hi:String}]).hi)

//method declared and method name called should match

} }

Thursday, June 4, 2020

Spark Dataframe: Speeding / Optimisation /Shuffle Partition

Spark : Speeding / Optimization


  val spark:SparkSession = SparkSession.builder()

  .master("local[1]")

  .appName("SparkByExamples.com")

  .getOrCreate()

  

  import spark.implicits._

  

  val df2: DataFrame =Seq(1,2).toDF("CURRENCY")

  .withColumn("c2", lit(8))

  .withColumn("c3", lit(1))

  

  spark.conf.set("spark.sql.shuffle.partitions",100)

Less data reduce the shuffle partitions or you will end up with many partitioned files with less number of records in each partition. which results in running process for long time .

When you have too much of data and having less number of partitions results in fewer longer running tasks and some times you may also get out of memory error.

Default is set to 200

Scala : Sum of each row in a spark dataframe

Scala : Sum of each row and Column in a spark dataframe

Note :
Map , Filter , Reduce on a Dataframe performs row wise operation and stores final result as a Column.

Method 1 :

object DataFrames {
val df2: DataFrame =Seq(1,2).toDF("CURRENCY")
.withColumn("c2", lit(8)) .withColumn("c3", lit(1))
def main(args: Array[String]): Unit = {
val sumDF = df2.
withColumn("TOTAL", df2.columns.map(c => col(c)).reduce((c1, c2) => c1 + c2)) sumDF.show() }
}

Method 2:

df2.withColumn("TOTAL", expr("c2+c3"))

Method 3:

df2.withColumn("TOTAL",col("c2")+col("c3"))

+--------+---+---+-----+
|CURRENCY| c2| c3|TOTAL|
+--------+---+---+-----+
| 1| 8| 1| 10|
| 2| 8| 1| 11|
+--------+---+---+-----+
 To calculate Sum of Columns:
 df2.agg(sum(col("c2"))).show()



Result :
+-------+
|sum(c2)|
+-------+
| 16|
+-------+

You can use Highlight JS. Go to themes and edit html, place below codes in .