pyspark學習系列(三)利用SQL查詢
阿新 • • 發佈:2019-02-16
對於spark 中存在dataframe,我們可以用 .creatOrReplaceTempView方法建立臨時表。
臨時表建立之後我們就可以用SQL語句對這個臨時表進行查詢統計:
from pyspark.sql.types import * # Generate our own CSV data # This way we don't have to access the file system yet. stringCSVRDD = sc.parallelize([(123, 'Katie', 19, 'brown'), (234, 'Michael', 22, 'green'), (345, 'Simone', 23, 'blue')]) # The schema is encoded in a string, using StructType we define the schema using various pyspark.sql.types schemaString = "id name age eyeColor" schema = StructType([ StructField("id", LongType(), True), StructField("name", StringType(), True), StructField("age", LongType(), True), StructField("eyeColor", StringType(), True) ]) # Apply the schema to the RDD and Create DataFrame swimmers = spark.createDataFrame(stringCSVRDD, schema) # Creates a temporary view using the DataFrame swimmers.createOrReplaceTempView("swimmers")
spark.sql("select * from swimmers").show()
swimmers.select("id", "age").filter("age = 22").show()
spark.sql("select name, eyeColor from swimmers where eyeColor like 'b%'").show()