shell 指令碼執行python指令碼，連線hive提交資料寫入表

阿新 • • 發佈：2018-12-22

使用說明

1.cd /opt/zy
在這個目錄下以root使用者許可權執行命令
2.
在SAP查詢的時候
Tcode:ZMMR0005
Purchase Org *
PO Creating：2017/3/1 (開始日期） 2017/6/31(結束日期）
Vendor
1000341
plant *

這樣查詢處理的結果代表發貨日期在20170301-20170631的所有記錄，不管到達日期在那個月

從SAP匯出資料表格，存為txt形式以”\t”分隔
用rz命令把匯出的檔案上傳到/opt/zy目錄下，
3.執行命令注意引數必須嚴格符合XXXXXXXXtoYYYYYYYY的格式,代表startdate to enddate
example:
[

[email protected] zy]# bash try2.sh 20170301to20170632
4.去Hue裡查詢分析結果
SELECT * from saplifttime WHERE querypocredatestart=’XXXXXXXX’[and querypocredateend=’YYYYYYYY’];
run command

5.如果想看原資料，去pcg.sap表，命令如下:
SELECT * from sap WHERE querypocredatestart=’20170301’;

執行結果截圖：
rsesult

技術實現說明

用shell 指令碼呼叫python指令碼

shell 指令碼 try2.sh

#!/bin/sh
#echo $1
daterange=$1#賦值給daterange這個變數是因為後面擷取字串要用到，否則我不會寫
python3 /opt/zy/runtask.py $1 #執行python指令碼
startdate=${daterange:0:8}   #擷取查詢的開始日期
#echo $startdate
enddate=${daterange:10:18}   #擷取查詢的結束日期
#echo $enddate
sed -i '1,3d' /opt/zy/$1.txt   #刪除前三行，因為前三行是空行
sed 's/.\{1\}//' $1.txt>$1regular.txt #刪除第一列，因為第一列是空列 

hdfs dfs -put -f /opt/zy/$1regular.txt /user/hive/pcg-data/zhouyi6_files #把伺服器上的本地檔案上傳到hadoop叢集上
hive -e "LOAD DATA INPATH '/user/hive/pcg-data/zhouyi6_files/$1regular.txt' INTO TABLE pcg.sap partition(querypocredatestart=$startdate,querypocredateend=$enddate)" #把檔案的資料載入表 
rm $1.txt #刪除本地原檔案，只保留格式處理後的檔案

備註：
1.因為sed命令不修改檔案本身，所以要把修改後的結果存入新檔案 +regular字尾的
2.sed -i，-i代表不把刪除前三行後的結果顯示在命令列上
3.hdfs dfs -put -f
-f option will overwrite the destination if it already exists.
4.執行這個指令碼的前提是已經建立了pcg.sap表，建表語句如下：

CREATE TABLE SAP(`PO Cre Date` string,
`Vendor` string, 
`WW Partner` string, 
`Name of Vendor` string,
`PO Cre by` string, 
`Purch Doc Type` string,
`Purch Order` string,
`PO Item` string,
`Deletion Indicator in PO Item` string, 
`Request Shipment Day` string,
`Material` string,
`Short Text` string, 
`Plant` string, 
`Issuing Stor location` string,
`Receive Stor loaction` string, 
`PO item change date` string, 
`Delivery Priority` string,
`PO Qty` string,
`Total GR Qty` string,
`Still to be delivered` string,
`Delivery Note` string,
`Delivery Note Type (ASN or DN)` string, 
`Delivery Note item` string,
`Delivery Note qty` string, 
`Delivery Note Creation Date` string,
`Delivery Note ACK Date` string, 
`Incoterm` string, 
`Part Battery Indicator` string,
`BOL/AWBill` string, 
`Purchase order type` string, 
`Gr Date`string) 
partitioned by (`queryPoCreDateStart` string,`queryPoCreDateEnd` string)
row format delimited fields terminated by "\t" stored as textfile

python指令碼

import  pandas as pd
import  sys
data = pd.read_csv(sys.argv[1]+".txt", sep="\t")
#print(data.columns)
data['Delivery Note Creation Date']=pd.to_datetime(data['Delivery Note Creation Date'],format='%d.%m.%Y')
data['Gr Date']=pd.to_datetime(data['Gr Date'],format='%d.%m.%Y')
data=data.drop(data[data['Delivery Note Creation Date'].isnull()].index.tolist())#刪除某列為空值所在的行
data=data.drop(data[data['Gr Date'].isnull()].index.tolist())#刪除某列為空值所在的行
data['delta']=(data['Gr Date']-data['Delivery Note Creation Date']).apply(lambda  x:x.days)#相差的時間
print(data['delta'].describe())
#sql_content="insert into table saplifttime values(%,%s,%s,%s,%s,%s,%s,%s,%s,%s)"%\
import hdfs
from impala.dbapi import connect
filename=sys.argv[1]+".txt"
hdfspath='/user/hive/pcg-data/zhouyi6_files'
client=hdfs.Client("http://10.100.208.222:50070")#50070
#8888是我登入WEB 操作介面時候的介面
#print(client.status("/user/zhouyi",strict=True))#檢視路徑資訊
#print(client.list("/user/zhouyi"))#檢視資料夾下的檔案
#client.upload(hdfs_path=hdfspath,local_path="/opt/zy/"+filename,overwrite=True)
# overwrtie=True means Delete any uploaded files if an error occurs during theupload.
conn = connect(host='10.100.208.222', port=21050,database='pcg')
cur = conn.cursor()
stdate,edate=sys.argv[1].split("to")
#print(sys.argv[1])
cnt=str(data['delta'].describe()[0])
mean=str(data['delta'].describe()[1])
std=str(data['delta'].describe()[2])
mini=str(data['delta'].describe()[3])
twentyfive=str(data['delta'].describe()[4])
fifty=str(data['delta'].describe()[5])
seventyfive=str(data['delta'].describe()[6])
maxm=str(data['delta'].describe()[7])
args=[stdate,edate,cnt,mean,std,mini,twentyfive,fifty,seventyfive,maxm]
print(args)

#對的SQL
#sql_content="insert into table saplifttime values("+str(5555)+",'20200607','22','4.2','9.88','1','2','5','10','9999999999999')"
sql_content="insert into table saplifttime values(?,?,?,?,?,?,?,?,?,?)"
cur.execute(sql_content,args)#把運算結果插到表pcg.saplifttime裡

備註：
1.執行cur.execute的前提是已經建好pcg.saplifttime的表，建表語句如下：

CREATE TABLE SAPLifttime(querypocredatestart STRING,querypocredateend STRING,cnt STRING,mean STRING,std STRING,minimum STRING,25percent STRING,50percent STRING,75percent STRING,maxmum STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" STORED AS Textfile

2.計算邏輯：
第一步：把欄位“Delivery Note Creation Date”視為貨物發出日期，如果為空則刪除該行
第二步：把欄位“Gr Date”視為貨物到達日期，如果為空則刪除該行
第三步：貨物在途時間= Gr Date - Delivery Note Creation Date
第四步：對貨物在途時間求cnt,mean,std,minimum,25%,50%,75%,maxmum

踩過的坑：
1.我的表字段都是STRING型別，values的佔位符問題，我一開始試過%s,%d,總與python裡對應的值格式不匹配。後來用?佔位就好了
2.cur.excute(sql.args)這樣寫的好處在於看起來清晰，不用拼接特別長的sql字串了，非常容易拼錯

shell 指令碼執行python指令碼，連線hive提交資料寫入表

使用說明

技術實現說明

shell 指令碼 try2.sh

python指令碼

shell 指令碼執行python指令碼，連線hive提交資料寫入表

實現每隔一段時間，自動執行某個功能，比如自動提交資料到伺服器等

[轉載] Linux export變數的生命週期和shell的生命週期相同，即shell指令碼執行完畢後，相應的export變數便失效了

python-crontab自動任務執行python指令碼中的shell命令

C#呼叫命令列執行python指令碼，這個辦法可以呼叫python第三方模組和對本地檔案進行操作

java呼叫Linux執行Python爬蟲，並將資料儲存到elasticsearch--（一、環境指令碼搭建）

hive中執行python指令碼

解決Oracle缺少動態連結庫cannot open shared object file: No such file o；解決 Linux中python指令碼執行無問題，配置crontab定時任務報錯

【mysql資料庫】python指令碼執行SQL語句，關於字串變數的注意事項

如何在Windows下開發Python：在cmd下執行Python指令碼+如何使用Python Shell（command line模式和GUI模式）+如何使用Python IDE

JAVA web呼叫執行python指令碼程式的四種方式，迴避java.lang.OutOfMemoryError：PermGen space記憶體溢位問題

Windows下執行python指令碼報錯“ImportError： No Module named ...”的解決方法

jenkins執行python指令碼

記一次使用crontab計劃任務執行python指令碼所遇問題及處理的過程

linux下執行python指令碼的兩種方式

JAVA使用Runtime.getRuntime()執行python指令碼檔案

inotify+rsync實現實時同步(附解決crontab中無法執行python指令碼的問題）

win cmd執行Python指令碼提示找不到模組問題

Shell基礎--執行Bash指令碼的方式

命令列執行Python指令碼時傳入引數的三種方式

shell 指令碼執行python指令碼，連線hive提交資料寫入表

使用說明

技術實現說明

shell 指令碼 try2.sh

python指令碼

相關推薦