1. 程式人生 > >用spark分析北京積分落戶資料,按使用者年齡分析

用spark分析北京積分落戶資料,按使用者年齡分析

載入剛才解析json格式儲存而成的csv檔案。

按使用者年齡分析

df = spark.read.format("csv").option("header", "true").load("jifenluohu.csv")
#df.show()
df.createOrReplaceTempView("jflh")
#按年齡分組
#按照數量倒序
spark.sql("select 2018-substring(idCard,7,4) as age,count(*) as num from jflh group by age order by num desc").show(30)
#按照年齡正序
spark.sql("select 2018-substring(idCard,7,4) as age,count(*) as num from jflh group by age order by age asc").show(30)

+----+
| num|
+----+
|6019|
+----+

+----+---+
| age|num|
+----+---+
|42.0|813|
|41.0|799|
|40.0|773|
|43.0|757|
|44.0|586|
|39.0|507|
|45.0|507|
|46.0|378|
|38.0|302|
|47.0|238|
|37.0|162|
|36.0|109|
|35.0| 39|
|34.0| 13|
|49.0|  9|
|54.0|  5|
|48.0|  4|
|51.0|  4|
|52.0|  3|
|33.0|  3|
|53.0|  2|
|50.0|  1|
|60.0|  1|
|58.0|  1|
|59.0|  1|
|57.0|  1|
|55.0|  1|
+----+---+

+----+---+
| age|num|
+----+---+
|33.0|  3|
|34.0| 13|
|35.0| 39|
|36.0|109|
|37.0|162|
|38.0|302|
|39.0|507|
|40.0|773|
|41.0|799|
|42.0|813|
|43.0|757|
|44.0|586|
|45.0|507|
|46.0|378|
|47.0|238|
|48.0|  4|
|49.0|  9|
|50.0|  1|
|51.0|  4|
|52.0|  3|
|53.0|  2|
|54.0|  5|
|55.0|  1|
|57.0|  1|
|58.0|  1|
|59.0|  1|
|60.0|  1|
+----+---+