本文介绍Databricks 读写 Storage Account Gen 2

用到的链接:

DBFS目录下的文件:

https://learn.microsoft.com/zh-cn/azure/databricks/dbfs/root-locations

链接Data lake 的脚本:

https://learn.microsoft.com/zh-cn/azure/databricks/getting-started/connect-to-azure-storage#--step-6-connect-to-azure-data-lake-storage-gen2-using-python

配置链接datalake gen2

service_credential = dbutils.secrets.get(scope="key-vault-scope",key="sp-key")

spark.conf.set("fs.azure.account.auth.type.databriilakke.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.databriilakke.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.databriilakke.dfs.core.windows.net", "07471450-8553-49ed-9d18-c2017eef9b69")
spark.conf.set("fs.azure.account.oauth2.client.secret.databriilakke.dfs.core.windows.net", service_credential)
spark.conf.set("fs.azure.account.oauth2.client.endpoint.databriilakke.dfs.core.windows.net", "https://login.microsoftonline.com/c023101b-be0d-4a03-991a-824f9032469a/oauth2/token")

 

读取自带的json文件

df = spark.read.json("/databricks-datasets/iot/iot_devices.json")

写入到 datalake gen2

 

df.write.save("abfss://iotcontainer@databriilakke.dfs.core.windows.net/iot/jsondata")

 

列举出datalake中的文件

dbutils.fs.ls("abfss://iotcontainer@databriilakke.dfs.core.windows.net/iot/jsondata")

 

读取datalake中的文件

df2 = spark.read.load("abfss://iotcontainer@databriilakke.dfs.core.windows.net/iot/jsondata")