1. 程式人生 > >Spark實戰(3) DataFrame基礎之行列操作和SQL

Spark實戰(3) DataFrame基礎之行列操作和SQL

文章目錄

行列操作

df['age'] # I only get a column object
df.select('age').show() # I get a datafram with a column that we could use with show() method

# see the first two row elements
df.head(2) # return a list

df.select(['age','name']).show() # get two columns

# create a new column
df.withColumn(
'double_age',df['age'] * 2).show() # this is not inplace # rename a column df.withColumnRenamed('age','my_new_age').show()

SQL操作

# very useful when you are familar with SQL

# create a temp view at first
df.createOrReplaceTempView('people') # the table name is people

# create one sql query and get the result
results = spark.sql("SELECT * FROM people") results.show() # create another sql query and get the result new_results = spark.sql("SELECT * FROM people WHERE age=30") new_results.show()