File processing is a day to day task in ETL world, and there is huge need of validation regarding source file format, headers, footers, column name, data type and so on, thanks to tSchemaComplianceCheck component which can do most of the validation like.
- Length checking.
- Date pattern/format.
- Data Types
But does not support number of columns and column sequence validation, that which we have to manage using java code, in this post I will describe you how to validate column names and their sequence.
This is our final job design.
Let’s start with adding first component to job designer. Add tFileList and configure to get expected files.
Add tFileInputFullRow component and configure as shown in below screen.
- Add tMap and connect with main link from tFileinputFullRow component.
- Add tFixedFlowInput and connect with lookup link to tMap then configure as follows.
Note: if you have your refrence header row stored in file or database you can use it instead of tFixedFlowInput.
- Configure tMap as follows.
- Make inner join with your reference line and main line of input.
- Add two outputs and named it as “matching” and “reject” respectively.
- In the reject output click on setting “catch lookup inner join reject”=true
- Add source line to both the flows.
See image for more details.
- Add tJava next to tMap and connect with “matching” flow.
- Add another tJava next to tMap and connect with “reject” flow.
- Add tFileInputDelimited and connect with first tjava using “iterate” flow.
- Configure tFileInputDelimited as shown in below image.
Add tLogRow component to see the output from file.
You can see that for each file whole sub job will be executed if it is matching with header row then it will be used for reading.
You can connect reject row to make a note of rejected file based on your requirement.
This post I will describe you how to split rows into columns, we will use below sample as input records.
Create a Job and add tFixedFlowInput component and put above input as “Use inline content” and create schema as shown in image.
Add tPivotToColumnsDelimited component and connect with tFixedFlowInput component as main connection then configured this component shown in below image.
Pivot Column =”Type”
Aggregation Function =”last”
Group by “ID” and “Name” column.
Rest of the configuration is for output file, where our output will be transferred. to read output file we can use either delimited component but for quick review I`ll use tFileInputFullRow.
Add tFileInputFullRow below the tFixedFlowInput component and connect with “On Sub Job Ok” trigger. and provide previously created file path and rest of the details.
add tLogRow and connect to tFileInputFullRow component and execute the job you will get above out put on console.
Final Job Design.
This component will create N number of columns based on your input, if you are dealing with fix schema then it will create complexity for further processing.