Blog Archives

How to Add Header and Footer to a File

I will demonstrate you, how to add custom header, footer to file. This is the often required functionality for file processing and validation.

Requirement: Create delimited file with below structure.

Header Part

File Name: Name of the actual File.

PID : process ID.

Header Columns.

Body Part

Text Data. 

Footer Part

File Created Date: Date & Time.

Number of Records: record count.

This is our final job design.

Final Job Design

Final Job Design

Step 1: We`ll create Header part first.

Step 2: Add tFixedFlowInput component and configure as below.

  • Add one column “value” with string type.
  • Select “Use inline table” option and add three rows.
  • Add each line of below code to the each line we created previously.

"File Name: CustomerDetails.csv"

"PID :"+pid

"Id,Name,City,State,Street"

  • See the image for more details.
tFixedFlowInput Setting

tFixedFlowInput Setting

Step 3: Add tFileOutputDelimited and connect with tFixedFlowInput component using Main flow, then configured as follows.

  • Add file path name “C:/home/CustomerDetails.csv”.(you can change this)
  • Row separator=”n”
  • Field Separator=”,”

Step 4: Add tRowGenerator and configured as explained in previous blog post.

Step 5: Copy Paste tFileOutputDelimited_1 component which we have created in Step 3, then connect  tFileOutputDelimited_2 with tRowGenerator using main flow. Configure as follows.

  • Select Option “Append”.
  • Click on  “Main” flow and go to the “Advance setting” tab and select option “Monitor this connection”.
  • Unchecked “Use Input Connection as Label”. it will show you a “Label” and text box to write, write  “Number Of Records:” in same text box.

Step 6: Create context variable named as “NumberOfRows”, then copy and paste tFixedFlowInput_1.

Step 7: Configure newly pasted  tFixedFlowInput_2 component.

  • Delete second last line from list.
  • One second line add this code "File Created Date: "+TalendDate.getDate("dd-MMM-yyyy HH:mm:ss").
  • on third line add the context variable. context.NumberOfRows. 

Step 8: Copy paste tFileOutputDelimited_2 and connect with tFixedFlowInput_2 using main flow. no need to do any additional configuration.

Step 9: Add tFlowMeterCatcher component and add tJavaRow and connect each other using main flow, don`t click on “Synch” button on tJavaRow. Write below code in tJavaRow component.

context.NumberOfRows=input_row.label+input_row.count;

Step 10: Run the job it will create file with Header part, Data part and footer part. below is our final output.

Header and Footer Output

Header and Footer Output

 

 

 

Validate CSV headers in Talend.

File processing is a day to day task in ETL world, and there is huge need of validation regarding source file format, headers, footers, column name, data type and so on, thanks to tSchemaComplianceCheck component which can do most of the validation like.

  • Length checking.
  • Date pattern/format.
  • Data Types

But does not support number of columns and column sequence validation, that which we have to manage using java code, in this post I will describe you how to validate column names and their sequence.

This is our final job design.

Complete CSV Validation Job

Complete CSV Validation Job

Let’s start with adding first component to job designer. Add tFileList and configure to get expected files.

Add tFileInputFullRow component and configure as shown in below screen.

tFileInputFullRow Configuration

tFileInputFullRow Configuration

  • Add tMap and connect with main link from tFileinputFullRow component.
  • Add tFixedFlowInput and connect with lookup link to tMap then configure as follows.

Note: if you have your refrence header row stored in file or database you can use it instead of tFixedFlowInput.

tFixedFlowInput Configuration

tFixedFlowInput Configuration

  • Configure tMap as follows.
    • Make inner join with your reference line and main line of input.
    • Add two outputs and named it as “matching” and “reject” respectively.
    • In the reject output click on setting “catch lookup inner join reject”=true
    • Add source line to both the flows.

See image for more details.

tMap Setting

tMap Setting

  • Add tJava next to tMap and connect with “matching” flow.
  • Add another tJava next to tMap and connect with “reject” flow.
  • Add tFileInputDelimited and connect with first tjava using “iterate” flow.
  • Configure tFileInputDelimited as shown in below image.

Add tLogRow component to see the output from file.

tFileInputDelimited Configuration

tFileInputDelimited Configuration

You can see that for each file whole sub job will be executed if it is matching with header row then it will be used for reading.

You can connect reject row to make a note of rejected file based on your requirement.

Split Rows to Columns

This post I will describe you how to split rows into columns, we will use below sample as input records.

Input Rows.

Input Rows

Input Rows

Expected Output.

out put

Out Put

Create a Job and add tFixedFlowInput component and  put above input as “Use inline content” and create schema as shown in image.

Input Schema

Input Schema

Add tPivotToColumnsDelimited  component and connect with tFixedFlowInput component as main connection then configured this component shown in below image.

tPivot component Configuration

tPivot component Configuration

Configurations :

Pivot Column =”Type”

Aggregation column=”Value”

Aggregation Function =”last”

Group by “ID” and “Name” column.

Rest of the configuration is for output file, where our output will be transferred. to read output file we can use either delimited component but for quick review I`ll use tFileInputFullRow.

Add tFileInputFullRow below the tFixedFlowInput component and connect with “On Sub Job Ok” trigger. and provide previously created file path and rest of the details.

add tLogRow and connect to tFileInputFullRow component and execute the job you will get above out put on console.

Final Job Design.

Job with OutPut

Job with OutPut

This component will create N number of columns based on your input, if you are dealing with fix schema then it will create complexity for further processing.