Spark Cheat Sheet for Scala/Python

Spark Example Read the parquet file scala> val param = spark.read.parquet("s3://file_path_you_put") Print the parquet file schema scala> param.printSchema() root |-- sha1: string (nullable = true) |-- label: string (nullable = true) |-- time: long (nullable = true) Print the parquet content scala> new_result.show() +--------------------+-----+----------+ | uuid|label| time| +--------------------+-----+----------+ |d8f9ba869c19f25cc...| Hell|1562112000| |f8e172cb34d620bbe...| |1562112000| |28eb0ec1e0d549a58...| PUMA|1562112000| |145760249908bb4f7...| PUMA|1562112000| |e5622270036303a86...| Hell|1562112000| +--------------------+-----+----------+ only showing top 20 rows Get the number of rows scala>