This transformation requires pivoting a column in the incoming row into multiple rows. Input & outputs are file based so no database table structure is presented in this section.
Input Data:
Input file contains two data elements - email address and list of pages (properties) visited by the user. Second field is of variable length and number of properties (e.g. 3 pages for the first email as compared to 1 for the second email address) is each line is not known beforehand. Email & "properties" are separated by "," while individual elements in the second field is separated by semi-colon (";").Example:
Input data |
a@sbcglobal.net,Auto;Mail;Finance |
b@sbcglobal.net,Auto |
Intended Transformation:
The transformation will generate two output files. The format of the output files are described below.First file will contain columns from source pivoted into rows.
Output will look like
Output data |
a@sbcglobal.net,Auto |
a@sbcglobal.net,Mail |
a@sbcglobal.net,Finance |
b@sbcglobal.net,Auto |
Second file will contain count of unique emails per property. So this output file will look like:
Output Data |
Auto,2 |
Mail,1 |
Finance,1 |
Generation of test data:
Oracle PL/SQL block is used to generated random data set with email ids and pages that these users visited.
DECLARE
l_fp UTL_FILE.file_type := UTL_FILE.fopen('DATA_DIR','output.txt','w') ;
BEGIN
FOR c1 IN (SELECT email,
CASE
WHEN rn = 1 THEN TRIM(SUBSTR(property_list,1,INSTR(property_list,';') - 1))
ELSE TRIM(SUBSTR(property_list,INSTR(property_list || ';' ,';',1,rn - 1) + 1, INSTR(property_list || ';' ,';',1,rn ) - INSTR(property_list || ';' ,';',1,rn - 1 ) - 1 ))
END property_name
FROM (SELECT a.* , b.column_value rn
FROM (SELECT a.email, a.property_list, LENGTH(property_list || ';' ) - LENGTH(REPLACE(property_list || ';' ,';','')) no_of_elements
FROM property_visited_ext a) a,
TABLE(generate_rows(no_of_elements)) b
)
)
LOOP
UTL_FILE.put_line(l_fp,c1.email || ',' || c1.property_name) ;
END LOOP ;
UTL_FILE.fclose(l_fp) ;
END ;
No comments:
Post a Comment