1. 程式人生 > >Spark實戰(4) DataFrame基礎之資料篩選

Spark實戰(4) DataFrame基礎之資料篩選

文章目錄

filter寫法一

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('ops').getOrCreate()

df = spark.read.csv('appe_stock.csv',inferSchema = True, header = True)

df.printSchema()

df.show()

# The first way

df.
filter("Close < 500").show() # 傳入一個條件 df.filter("Close < 500").select('Open').show() df.filter("Close < 500").select(['Open','Close']).show()

filter寫法二

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('ops').getOrCreate()

df = spark.read.csv('appe_stock.csv',inferSchema =
True, header = True) df.printSchema() df.show() # The second way df.filter(df['Close'] < 500).select('Volume').show() df.filter(df['Close'] < 200 and df['Open'] > 200).show() # wrong df.filter((df['Close'] < 200) & (df['Open'] > 200)).show() # right

條件符號

# not operation
df.filter(
(df['Close'] < 200) & ~(df['Open'] > 200)).show() # right # equal operation df.filter(df['Low'] == 197.16).show()

獲取結果

# if we want to save it, we could use collect()
result  = df.filter(df['Low'] == 197.16).collect()

# one row as many format
result[0].asDict()

# and then you could get specific attribute
result[0].asDict()['Volume']