You have to use coalesce or NOT NULL to substitute null values in select query.
Check below if it helps:
Try this:
spark.sql("""
select
patient_id,
CASE
when cough is NOT NULL THEN cough
else 0
END as cough,
CASE
when feaver is NOT NULL THEN feaver
else 0
END as feaver,
CASE
when `head ache` is NOT NULL THEN `head ache`
else 0
END as `head ache`
from (
select * from patient
)
PIVOT(
Count(dx)
for dx in ('cough','feaver','head ache')
)
;
""").show()
The output will be:
| patient_id | cough | feaver | head ache |
|---|---|---|---|
| Donna | 1 | 0 | 1 |
| Jerry | 1 | 0 | 0 |
| Bob | 1 | 1 | 0 |
if you want it to be dynamic
dist=spark.sql("select collect_set(dx) from patient;").toPandas()
val=spark.sql("""
select
patient_id,
coalesce(cough,0) as `cough`,
coalesce(feaver,0) as `feaver`,
coalesce(`head ache`,0) as `head ache`
from (
select * from patient
)
PIVOT(
Count(dx)
for dx in """
+
str(tuple(map(tuple, *dist.values))[0])
+
"""
)
;
""")
Answer from Saibal on Stack OverflowVideos
You can use NULLIF() function and replace the column having empty string with null value.
select nullif(<column_name>,'') from <table-name>
Nullif returns null if the 1st expression in it equals to the 2nd expression. Here, if column has empty string, then nullif returns null.
I am referring to the documentation here: Databricks Update Reference
In specific this syntax:
UPDATE table_name [table_alias]
SET { { column_name | field_name } = [ expr | DEFAULT } [, ...]
[WHERE clause]
And I'm referring to the function from here: Databricks Built-In Functions , it's on the last section titled Miscellaneous functions, from this section we're using the nullif command.
nullif(expr1, expr2) : Returns NULL if expr1 equals expr2, or expr1 otherwise.
This allows us to update the table in this way:
update db.table_name
set column_1 = nullif(column_1, '')
This syntax above updates the column_1 in the case that the value in column_1 matches the empty string ''. If it doesn't match the empty string then the row stays as it is, with whatever value is already there.
In this situation UPDATE with NULLIF is better than UPDATE with REPLACE because REPLACE is a search, delete and insert.
In general it is more inefficient to use REPLACE because it performs a search along your specified key whereas NULLIF is a predicate match that can implicitly filter under the hood.
nullif documentation