/img/avatar-2.jpg

墨冊

Spark Cheat Sheet for Scala/Python

Spark Example Read the parquet file scala> val param = spark.read.parquet("s3://file_path_you_put") Print the parquet file schema scala> param.printSchema() root |-- sha1: string (nullable = true) |-- label: string (nullable = true) |-- time: long (nullable = true) Print the parquet content scala> new_result.show() +--------------------+-----+----------+ | uuid|label| time| +--------------------+-----+----------+ |d8f9ba869c19f25cc...| Hell|1562112000| |f8e172cb34d620bbe...| |1562112000| |28eb0ec1e0d549a58...| PUMA|1562112000| |145760249908bb4f7...| PUMA|1562112000| |e5622270036303a86...| Hell|1562112000| +--------------------+-----+----------+ only showing top 20 rows Get the number of rows scala>

Vim for Python Development

最近在寫 Python,已經習慣用 Vim 的我,當然先找看看 Python 相關的套件跟設定怎麼做開發起來比較方便, 於是 整理近期有套用的設定與大家分享,如果有更好的作法也歡迎分享給我