Friday, June 26, 2020

Scala :Protobuf

Scala :Protobuf

Protobuf file (Like json and xml , protobuf is used to store and transmit data).
Below are some of the advantages of protobuf

1. Scheme is preserved
2. similar to json / xml
3. Cross platform
4. Parsing is faster
5. Compressed

Example :
Pre - Req : Create a new project in scala using sbt

build.sbt
name := "protobuf"
version := "0.1"
scalaVersion := "2.12.7"
PB.targets in Compile := Seq(
  scalapb.gen() -> (sourceManaged in Compile).value
)
PB.protoSources in Test := Seq(file("/src/protobuf/"))
libraryDependencies += "com.thesamet.scalapb" %% "scalapb-runtime" % scalapb.compiler.Version.scalapbVersion % "protobuf"

project_name/project/plugins.sbt
addSbtPlugin("com.thesamet" % "sbt-protoc" % "0.99.18")
libraryDependencies += "com.thesamet.scalapb

src/main/protobuf/person.proto
syntax = "proto3";
message Person {
    string name = 1;
    int32 age = 2;
    Gender gender = 3;
}
enum Gender {
    MALE = 0;
    FEMALE = 1;
}
src/main/Persontest.scala
import com.google.protobuf.ByteString
import java.io.{InputStream, OutputStream}
import person.{Gender, Person}

object Persontest {
  val person = Person(
    name = "User1",
    age = 15,
    gender = Gender.MALE
  )
}

Redis : Free Open Source Database in Cloud (30 MB)

Redis : Free Open Source Database in Cloud (30 MB)


Pre-Req:

Make sure home brew is installed in mac

Getstarted

Create an account in Redis Website and select free account which is available upto 30MB of space. 
1. http://try.redis.io/
2. https://www.tutorialspoint.com/redis/redis_commands.htm

Installation for mac :

brew install redis

Check Redis

redis-server --version
redis-cli --version

Connect to Remote :

  1. Get Redis host name from your profile under your database 
  2. Get Redis password from your profile under your database 
command:
redis-cli -h SERVER-IP  -p PORT -a YOURPASSWORD

example:
  1. redis-cli -h redis-19xxx.c1.ap-southeast-1-1.ec2.cloud.redislabs.com -p 19xxx -a redis_password
  2. exit

Thursday, June 25, 2020

Kafka

 KAFKA


Cloud kafka

https://gist.github.com/j-thepac/6714798337374d9c7820163c8998ffe2

Kafka :

https://gist.github.com/j-thepac/58192f87b82f79cc27729cd6645678fd

Monday, June 22, 2020

Scala : Reflections

Scala : Reflections 


used to Convert the input string into Scala object.
Eg :
In below example class name "testsubclass" is a string in the main object Which is converted into scala object.

class testsubclass { 
def hi:String=return ("newx") 
}
 
object Test{ 
def main((args: Array[String]):{
val x:Any=(Class.forName("testsubclass").newInstance() println(x.asInstanceOf[{def hi:String}]).hi) 
//method declared and method name called should match 
} }




Thursday, June 4, 2020

Spark Dataframe: Speeding / Optimisation /Shuffle Partition

Spark : Speeding / Optimization 




  val spark:SparkSession = SparkSession.builder()
.master("local[1]")
.appName("SparkByExamples.com")
.getOrCreate()

import spark.implicits._

val df2: DataFrame =Seq(1,2).toDF("CURRENCY")
.withColumn("c2", lit(8))
.withColumn("c3", lit(1))

spark.conf.set("spark.sql.shuffle.partitions",100)


Less data reduce the shuffle partitions or you will end up with many partitioned files with less number of records in each partition. which results in running process for long time .

When you have too much of data and having less number of partitions results in fewer longer running tasks and some times you may also get out of memory error.


Default is set to 200

Scala : Sum of each row in a spark dataframe

Scala : Sum of each row and Column in a spark dataframe 


Note :
Map , Filter , Reduce on a Dataframe performs row wise operation and stores final result as a Column.

Method 1 :
object DataFrames {
val df2: DataFrame =Seq(1,2).toDF("CURRENCY")
.withColumn("c2", lit(8)) .withColumn("c3", lit(1))
def main(args: Array[String]): Unit = {
val sumDF = df2.
withColumn("TOTAL", df2.columns.map(c => col(c)).reduce((c1, c2) => c1 + c2)) sumDF.show() }
}
Method 2:
df2.withColumn("TOTAL", expr("c2+c3"))
Method 3:
df2.withColumn("TOTAL",col("c2")+col("c3"))
+--------+---+---+-----+ |CURRENCY| c2| c3|TOTAL| +--------+---+---+-----+ | 1| 8| 1| 10| | 2| 8| 1| 11| +--------+---+---+-----+
To calculate Sum of Columns:
df2.agg(sum(col("c2"))).show()

Result :
+-------+
|sum(c2)|
+-------+
| 16|
+-------+


You can use Highlight JS. Go to themes and edit html, place below codes in .