Scala Spark functions library do not have these function but spark sql librry do have these functions. This is why you are not able to use as spark function API.
https://spark.apache.org/docs/2.0.2/api/java/org/apache/spark/sql/functions.html
isNull function exists that can be combined with when/then clause to set values.
I hope it helps.
Answer from Ramdev Sharma on Stack OverflowVideos
This seems to be working
File: tbl1
1 a
2 b
3 c
File: tbl2
1 c
3 d
case class c_tbl1(c1: String,c2: String)
sc.textFile("tbl1").map { row =>
val parts = row.split("\t")
c_tbl1(parts(0),parts(1)) }.registerTempTable("t_tbl1")
case class c_tbl2(c1: String,c2: String)
sc.textFile("tbl2").map { row =>
val parts = row.split("\t")
c_tbl2(parts(0),parts(1)) }.registerTempTable("t_tbl2")
sqlContext.sql("""select t.c1,t.c2,IF(t2.c1 is null,1,2),t2.c2 from t_tbl1 t left outer join t_tbl2 t2 on t.c1=t2.c1""".stripMargin).collect.foreach(println)
[1,a,2,c]
[2,b,1,null]
[3,c,2,d]
Try Case statement, not sure this CASE statement is supported by Spark SQL:-
select a.x, a.x1,
CASE WHEN b.x1 IS NULL THEN a.x1
ELSE b.x1
END as bx1
from t1 a LEFT OUTER JOIN t2 b on a.x1=b.x1;
You can use NULLIF() function and replace the column having empty string with null value.
select nullif(<column_name>,'') from <table-name>
Nullif returns null if the 1st expression in it equals to the 2nd expression. Here, if column has empty string, then nullif returns null.
I am referring to the documentation here: Databricks Update Reference
In specific this syntax:
UPDATE table_name [table_alias]
SET { { column_name | field_name } = [ expr | DEFAULT } [, ...]
[WHERE clause]
And I'm referring to the function from here: Databricks Built-In Functions , it's on the last section titled Miscellaneous functions, from this section we're using the nullif command.
nullif(expr1, expr2) : Returns NULL if expr1 equals expr2, or expr1 otherwise.
This allows us to update the table in this way:
update db.table_name
set column_1 = nullif(column_1, '')
This syntax above updates the column_1 in the case that the value in column_1 matches the empty string ''. If it doesn't match the empty string then the row stays as it is, with whatever value is already there.
In this situation UPDATE with NULLIF is better than UPDATE with REPLACE because REPLACE is a search, delete and insert.
In general it is more inefficient to use REPLACE because it performs a search along your specified key whereas NULLIF is a predicate match that can implicitly filter under the hood.
nullif documentation