<case>:大量資料進行資料比對-透過PLSQL－Eason [資料科學//Python學習/資料庫] & [拍片&剪片]

近期處理到一筆大量數據的case，資料儲存方式是使用excel，user有一萬多筆(15672筆)的資料需要與120個磁區中的備份資料進行比對(約70萬筆的紀錄)，找出這些資料是被儲存在哪個備份磁區中，由於都是用excel儲存資料，因此若要進行人工比對需要耗費相當大的人力以及時間，因此我將這些excel檔匯入oracle中使用plsql程式來進行處理。

匯入的方法是透過sql developer來處理，參考資料如下網址:

http://wazai.net/2711/oracle-%E5%8C%AF%E5%85%A5%E5%8C%AF%E5%87%BA-excel-%E9%80%8F%E9%81%8E-sql-developer/comment-page-1

需比對的一萬多筆資料為了比對方便因此建立連續的編號，方法是透過rownum來處理:

SQL>create table aabc as select rownum rn,id from abc1;

SQL>select * from aabc;

擷取1

然後將每個磁區中的資料分別建立一個table來儲存，共建立了120個table

擷取2

由於每個磁區中的資料格式都是不一樣的(資料格式未統一)，因此若要找到資料為K2XXXXXXXXXXX(rn=1)

是否存在於磁區中則需透過此語法來達成:

SQL>select count(*) from hr.tape001 where id0001 like '%'||(select id from hr.aabc where rn=1)||'%';

此語法是將原本的 like '%XXXX%' 中加入一層子查詢而形成，這樣就可以查到rn=1的資料有幾筆(count>0表示有資料存在於磁區，count=0就表示不存在)

再來:

如果每個磁區要一筆一筆比對勢必會耗費很多人工時間，由於有一萬多筆資料因此我們可以透過撰寫PLSQL來比對資料

針對磁區tape001比對資料方法如下:

**********************************

set serveroutput on

create or replace procedure proc01
is
v_count number;
v_id varchar2(150);
begin
dbms_output.put_line('----------tape001 begin search----------');
for i in 1..15672 loop
select id into v_id from hr.abc where rn=i;
select count(*) into v_count from HR.tape001 where id001 like '%'|| (select id from hr.abc where rn=i )||'%';
if v_count>0 then
dbms_output.put_line('rn='||i||' number='||v_id||' count='||v_count);
end if;
end loop;
end;
/

*********************************

宣告兩個變數v_count v_id來儲存count數與料號，將15672筆資料透過迴圈一筆一筆進入tape001來比對，如果比對結果發現v_count(count數)>0 表示有資料，沒有則不輸出，再將結果output出來，因此就可以找到磁區中有哪些料號是存在的。

結果如下:

匿名區塊已完成
----------tape001 begin search----------
rn=31 number=K2A0XXXXXXXXXXXX count=10
rn=1136 number=K2A00270XXXXXXXXXXXXXX count=10
rn=2203 number=K2A0XXXXXXXXXXXX000PAD count=10
rn=2310 number=K2A002705NXXXXXXXXXX00PAA count=8
rn=2495 number=K2A002705N000000000XXXXXXX count=10
rn=9025 number=K2A001342N000000XXXXXXXXX count=10

因此就可以找到磁區中有哪些料號是存在的!

磁區tape002~0120以此類推

小結:

經由比對後，發現在1萬多筆資料中只有3千多筆資料是存在於備份磁區中的，後來驗證了其他不存在在磁區的料號原來已經被改號了，這就引出了一個問題，若在建立資料時就沒有定義完整的參考限制則後續會發生更多的問題(如:資料已被改號但備份中的資料卻是原始的料號)。

預防此錯誤可寫trigger來將更改過的資料存放在另一張表格中:

********************************

利用old new 記錄異動前後資料
*****************************
drop table table1 purge;
drop table table1_log purge;
create table table1
(
pro_id number primary key,
pro_name varchar2(20) not null,
pro_price number);

create table table1_log
(
old_pro_id number,
old_pro_name varchar2(20),
old_pro_price number,
new_pro_id number,
new_pro_name varchar2(20),
new_pro_price number,
opcode varchar2(20),
m_date date default sysdate);

create or replace trigger trigger4
after insert or update or delete
on table1
for each row
begin
dbms_output.put_line('trigger enable...');
case
when inserting then
insert into table1_log(new_pro_id,
new_pro_name,
new_pro_price,
opcode,
m_date)
values(:new.pro_id,
:new.pro_name,
:new.pro_price,
'insert',
default);
when updating then
insert into table1_log(old_pro_id,
old_pro_name,
old_pro_price,
new_pro_id,
new_pro_name,
new_pro_price,
opcode,
m_date)
values(:old.pro_id,
:old.pro_name,
:old.pro_price,
:new.pro_id,
:new.pro_name,
:new.pro_price,
'update',
default);
when deleting then
insert into table1_log(old_pro_id,
old_pro_name,
old_pro_price,
opcode,
m_date)
values(:old.pro_id,
:old.pro_name,
:old.pro_price,
'delete',
default);
end case;
end;

測試:

********************

set serveroutput on;
insert into table1 values(3,'berry',35);
set serveroutput on;
update table1 set pro_price=20 where pro_id=3;
set serveroutput on;
delete table1;
select * from table1_log;
********************************************
1 apple 50 insert 06-9月 -14
2 orange 10 insert 06-9月 -14
3 berry 35 insert 06-9月 -14
3 berry 35 3 berry 20 update 06-9月 -14
1 apple 50 delete 06-9月 -14
2 orange 10 delete 06-9月 -14