1. 程式人生 > >ORACLE行列轉換之字串拆分

ORACLE行列轉換之字串拆分

ORACLE中將帶分隔符的字串拆分成多行,有很多方法,我將多種常見和不常見的拆分方法進行了收集整理。 通常這個操作被歸類為行列轉換的範疇。
為了方便測試,我將每一種方法封裝成一個函式,返回一個字串集合。

0.建立自定義集合型別

SQL> create type t_vchars as table of varchar2(4000);
  2  /
Type created

1.正則表示式REGEXP_SUBSTR方法

直接SQL查詢逗號分隔的語句為:

SQL> select regexp_substr('男,女', '[^,]+', 1, level) as col
  2
from dual 3 connect by level <= length('男,女')-length(REPLACE('男,女', ',', '')) + 1; COL ------ 男 女

封裝成函式:

SQL> create or replace function F_SPLITSTR1
       (V_CHAR_IN VARCHAR2   --需要拆分字串
       ,V_DELIMER VARCHAR2   --分隔符
       ) return t_vchars is
  2    FunctionResult t_vchars;
  3
begin 4 5 select regexp_substr(V_CHAR_IN, '[^'||V_DELIMER||']+', 1, level) as col 6 BULK COLLECT INTO FunctionResult 7 from dual 8 connect by level <= length(V_CHAR_IN)-length(REPLACE(V_CHAR_IN, V_DELIMER, '')) + 1; 9 RETURN FunctionResult; 10 end F_SPLITSTR1; 11
/ Function created SQL> select * FROM TABLE(F_SPLITSTR1('A,B,C,#D,E,F',',')); COLUMN_VALUE -------------------------------------------------------------------------------- A B C #D E F 6 rows selected SQL> select * FROM TABLE(F_SPLITSTR1('A,B,C,#D,E,F','#')); COLUMN_VALUE -------------------------------------------------------------------------------- A,B,C, D,E,F

2.簡單SUBSTR方法

直接SQL查詢逗號分隔的語句為:

SQL> select substr(t.ca,instr(t.ca,',',1,c.lv)+1,instr(t.ca,',',1,c.lv+1)-(instr(t.ca,',',1,c.lv)+1)) as col
  2    from (select ','||'男,女'||',' as ca,length(','||'男,女'||',')-nvl(length(replace(','||'男,女'||',',',')),0)-1 as cnt from dual) t,
  3         (select level lv from dual connect by level<=10) c
  4  where c.lv<=t.cnt;
COL
----------
男
女

封裝成函式:

SQL> create or replace function F_SPLITSTR2(V_CHAR_IN VARCHAR2,V_DELIMER VARCHAR2) return t_vchars is
  2    FunctionResult t_vchars;
  3  begin
  4  
  5    select substr(t_string.str,instr(t_string.str,V_DELIMER,1,t_cnt.lv)+1
  6                 ,instr(t_string.str,V_DELIMER,1,t_cnt.lv+1)-(instr(t_string.str,V_DELIMER,1,t_cnt.lv)+1)) as col
  7      BULK COLLECT INTO FunctionResult
  8      from (select V_DELIMER||V_CHAR_IN||V_DELIMER as str
  9                  ,length(V_DELIMER||V_CHAR_IN||V_DELIMER)-nvl(length(replace(V_DELIMER||V_CHAR_IN||V_DELIMER,V_DELIMER)),0)-1 as cnt
 10             from dual) t_string
 11          ,(select level lv from dual
 12             connect by level<=length(V_DELIMER||V_CHAR_IN||V_DELIMER)-nvl(length(replace(V_DELIMER||V_CHAR_IN||V_DELIMER,V_DELIMER)),0)-1) t_cnt
 13    where t_cnt.lv<=t_string.cnt;
 14  
 15    RETURN FunctionResult;
 16  end F_SPLITSTR2;
 17  /
Function created

SQL> 
SQL> select * FROM TABLE(F_SPLITSTR1('A,B,C,#D,E,F',','));
COLUMN_VALUE
--------------------------------------------------------------------------------
A
B
C
#D
E
F
6 rows selected
Executed in 0.062 seconds
SQL> select * FROM TABLE(F_SPLITSTR1('A,B,C,#D,E,F','#'));
COLUMN_VALUE
--------------------------------------------------------------------------------
A,B,C,
D,E,F

3.XML轉換方法

直接SQL查詢逗號分隔的語句為:

SQL> select new_str.extract('/X/text()').getstringVal() name
  2   from table(xmlSequence(extract(XMLType('<DOC><X>'||replace('男,女',',','</X><X>')||'</X></DOC>'),'/DOC/X'))) new_str;
NAME
--------------------------------------------------------------------------------
男
女

封裝成函式:

SQL> create or replace function F_SPLITSTR3(V_CHAR_IN VARCHAR2,V_DELIMER VARCHAR2) return t_vchars is
  2    FunctionResult t_vchars;
  3  begin
  4  
  5    select new_str.extract('/X/text()').getstringVal() name
  6      BULK COLLECT INTO FunctionResult
  7      from table(xmlSequence(extract(XMLType('<DOC><X>'||replace(V_CHAR_IN,V_DELIMER,'</X><X>')||'</X></DOC>'),'/DOC/X'))) new_str;
  8  
  9    RETURN FunctionResult;
 10  end F_SPLITSTR3;
 11  /
Function created

SQL> select * FROM TABLE(F_SPLITSTR3('A,B,C,#D,E,F',','));
COLUMN_VALUE
--------------------------------------------------------------------------------
A
B
C
#D
E
F
6 rows selected

SQL> select * FROM TABLE(F_SPLITSTR3('A,B,C,#D,E,F','#'));
COLUMN_VALUE
--------------------------------------------------------------------------------
A,B,C,
D,E,F

4.自定義函式方法

迴圈擷取的方式

SQL> create or replace function F_SPLITSTR4(V_CHAR_IN VARCHAR2,V_DELIMER VARCHAR2)
  2   return t_vchars PIPELINED is
  3    l_str    varchar2(4000);
  4    l_vchars varchar2(4000):='';
  5    n        number;
  6  begin
  7    l_str:=V_CHAR_IN||V_DELIMER;
  8    loop
  9      n:=instr(l_str,V_DELIMER);
 10      if nvl(n,0)>0 then
 11        l_vchars:=substr(l_str,1,n-1);
 12        l_str:=substr(l_str,n+1);
 13      else
 14        l_vchars:=null;
 15      end if;
 16      pipe row( l_vchars );
 17      exit when l_str is null;
 18    end loop;
 19    RETURN;
 20  end F_SPLITSTR4;
 21  /
Function created

SQL> 
SQL> select * FROM TABLE(F_SPLITSTR4('A,B,C,#D,E,F',','));
COLUMN_VALUE
--------------------------------------------------------------------------------
A
B
C
#D
E
F
6 rows selected

SQL> select * FROM TABLE(F_SPLITSTR4('A,B,C,#D,E,F','#'));
COLUMN_VALUE
--------------------------------------------------------------------------------
A,B,C,
D,E,F

5.SQLLoader方法

先寫入一個txt,再用SQLLoader匯入,這裡不做介紹了

效能比較

最後,找一個2000個單字元分隔的字串對上述函式進行測試:

SQL> var  v_str varchar2(32767);
SQL> exec :v_str:=rpad('A,',2000,'A,');
PL/SQL procedure successfully completed

SQL> select count(*) from TABLE(F_SPLITSTR1(:v_str,','));
  COUNT(*)
----------
      1001
Executed in 4.222 seconds

SQL> select count(*) from TABLE(F_SPLITSTR2(:v_str,','));
  COUNT(*)
----------
      1001
Executed in 0.05 seconds

SQL> select count(*) from TABLE(F_SPLITSTR3(:v_str,','));
select count(*) from TABLE(F_SPLITSTR3(:v_str,','))
ORA-01489: result of string concatenation is too long
ORA-06512: at "DONGFENG.F_SPLITSTR3", line 5

SQL> select count(*) from TABLE(F_SPLITSTR4(:v_str,','));
  COUNT(*)
----------
      1001
Executed in 0.039 seconds

從測試結果看:
1. 自定義函式是最快的,0.039s,資料量大是加併發會更有優勢;
2. 簡單SUBSTR方法,次之,0.05s
3. 正則表示式REGEXP_SUBSTR方法,較慢,4.22s
4. XML轉換方法,報錯,長度不足