How to Add Header and Footer to a File

I will demonstrate you, how to add custom header, footer to file. This is the often required functionality for file processing and validation.

Requirement: Create delimited file with below structure.

Header Part

File Name: Name of the actual File.

PID : process ID.

Header Columns.

Body Part

Text Data. 

Footer Part

File Created Date: Date & Time.

Number of Records: record count.

This is our final job design.

Final Job Design
Final Job Design

Step 1: We`ll create Header part first.

Step 2: Add tFixedFlowInput component and configure as below.

  • Add one column “value” with string type.
  • Select “Use inline table” option and add three rows.
  • Add each line of below code to the each line we created previously.

"File Name: CustomerDetails.csv"

"PID :"+pid

"Id,Name,City,State,Street"

  • See the image for more details.
tFixedFlowInput Setting
tFixedFlowInput Setting

Step 3: Add tFileOutputDelimited and connect with tFixedFlowInput component using Main flow, then configured as follows.

  • Add file path name “C:/home/CustomerDetails.csv”.(you can change this)
  • Row separator=”n”
  • Field Separator=”,”

Step 4: Add tRowGenerator and configured as explained in previous blog post.

Step 5: Copy Paste tFileOutputDelimited_1 component which we have created in Step 3, then connect  tFileOutputDelimited_2 with tRowGenerator using main flow. Configure as follows.

  • Select Option “Append”.
  • Click on  “Main” flow and go to the “Advance setting” tab and select option “Monitor this connection”.
  • Unchecked “Use Input Connection as Label”. it will show you a “Label” and text box to write, write  “Number Of Records:” in same text box.

Step 6: Create context variable named as “NumberOfRows”, then copy and paste tFixedFlowInput_1.

Step 7: Configure newly pasted  tFixedFlowInput_2 component.

  • Delete second last line from list.
  • One second line add this code "File Created Date: "+TalendDate.getDate("dd-MMM-yyyy HH:mm:ss").
  • on third line add the context variable. context.NumberOfRows. 

Step 8: Copy paste tFileOutputDelimited_2 and connect with tFixedFlowInput_2 using main flow. no need to do any additional configuration.

Step 9: Add tFlowMeterCatcher component and add tJavaRow and connect each other using main flow, don`t click on “Synch” button on tJavaRow. Write below code in tJavaRow component.

context.NumberOfRows=input_row.label+input_row.count;

Step 10: Run the job it will create file with Header part, Data part and footer part. below is our final output.

Header and Footer Output
Header and Footer Output

 

 

 

Generate sample data in Talend

This post we will generate sample data for further utilisation.  In ETL world, if you want to test some components you must need data, but getting a desire sample data is very difficult.

To generate sample data we will use tRowGenerator component which has built in editor where you can select functions or write your won expressions to get expected sample data.

Spte 1: Start writing on Talend designer canvas “trow…” it will show you list of component, from the list select tRowGenerator component.

Note: This is new feature from Talend wherein you don`t need to search component in pallet and then drag and drop.

See in picture.

Add Component writing  on Talend Designer pan
Add Component writing on Talend Designer pan

Step 2: Double click on component and do the setting in tRowGenerator using it`s editor.

  • Click on [+] sign to add new column, add new column with name =”name”,
  • Select function form “Function” tab on same columns grid.
  • Select “TalendDataGenerator.getFirstName” function from function list.
  • Add following columns and select the relavant function as we did previously.
    • City = TalendDataGenerator.getUsCity
    • State=TalendDataGenerator.getUsState
    • Street=TalendDataGenerator.getUsStreet
  • Now we have 4 columns but we need another one column for Identity number, so add column “ID” with “integer” data type.
  • In function Tab select “” (three dots). You will see function parameters window with single row, down to the columns grid.
  • There are three tabs first one “Parameter” is fixed, with no edit option, second on for “value” and last one for “comment”
  • Click on value tab it will show you “…” dots then click on it, it opens expression builder for edit. you can add your custom logic here.
  • Select “Numeric” routine then select “sequence” keep the default values.
  • In “Number of Rows for RowGenerator” text box write value=10. ( we required only ten rows to be generated)
  • Click on preview button on below window it will show you generated sample data as a result, it will look like below Image.
tRowGenerator setting
tRowGenerator setting

For demonstration we have generated only ten rows, but you are free to generate as many as rows you require.

Step 3: Add tLogRow, and connect with tRowGenerator using main flow.

Step 4: Run the job it will show you below result.

tRowGenerator output sample data
tRowGenerator output sample data

If you want to insert this data to the file or database then use the specific compoent e.g. tFileOutputDelimited to store in delimited file.

How to get latest file from directory

This post I will describe, how to get most resent file from directory based on display date or a date from file name.

We have below sample files in our directory and every file has date in the name of file, based on that we will decide which file is most resent rather than file created date/ modified date.

Sample xls files
Sample xls files

As you can see, we have three list of files.

  1. sales element 11 2014.xls has been modified at 28-01-2015
  2. sales element 02 2015.xls has been modified at 28-01-2015
  3. sales element 12 2014.xls has been modified at 03-03-2015

If we use file created or modified date to get most resent file then we will get ” sales element 12 2014.xls” which is a wrong file.

To get a latest file from directory we will use below steps.

Step 1: Add tFileList component and configure it get all .xls files form directory. see the image for details.

tFileList Configuration
tFileList Settings

Step 2: Add tFileProperties component and connect with tFileList using Iterator link, then provide file path and name from global variable. which looks like this ((String)globalMap.get(“tFileList_1_CURRENT_FILEPATH”)).

tFileProperties Setting
tFileProperties Setting

Step 3: Add tMap after tFileProperties and connect with main link and do the fowling setting in it.

  • Create output name as “FileList”.
  • Add all the source columns to this output.
  • Add new variable in tMap using variable creation, write this code in it.

row9.basename.substring(row9.basename.indexOf(“.”)-7).replace(“.xls”, “”)

  • Create new column in output with the name “DisplayDate” and datatype is Date.
  • Add below code in it.

TalendDate.parseDate(“MM yyyy”, Var.var1)

  • See the image for more details.
tMap Setting
tMap Setting

Step 4: Add tHashOutput component after tMap and connect with main link.

Step 5:  Add tHashInput Component below tFileList and link using “OnSubJobOk” trigger.

Step 6: Copy Schema from tHashOutput to tHashInput.

Step 7: Add tAggregateRow component and connect with tHashInput using main flow link. Do the basic setting like below.

tAggregateRow Setting
tAggregateRow Setting

Step 8: Add tLogRow to check the result. you will see the output as below.

Output Result
Output Result

Step 9: Your job design should be look like in below Image.

Final Job Design
Final Job Design

Note: You can avoid using tHash***** components just use tAggregateRow after tMap and do the setting as is, it will work.

Difference between tJava,tJavarow,tJavaflex

I have listed down some of the common differences between tJava, tJavaRow and tJavaFlex component.

 

Operations tJava tJavaRow tJavaFlex
Use component to integrate your custom Java code Yes Yes Yes
It will be executed first but only once in the subjob. Yes No
It requires Input Flow No Yes No
It Requires Output Flow No If output Schema Defined If output Schema Defined
It can be used as Start of the Job Yes No Yes
It can be used as a separate subjob Yes No Yes
It accepts Main Flow or Iterator Flow Both Only Main Both
It has three Java code parts (start, main, end) No No Yes
It will Auto propagate Data No No Yes

 

tJava advantage: this component can be use as trigger component, At start of the job at end of the job.

tJavaRow : this component required main flow so it can be used at end of the Sub Job but not at the start of subjob.

tJavaFlex: this component holds capabilities of tJava & tJavaRows you can use this component for row generation, or at start of job, or at then end of sub job. or individual  subjob.  It gives you ability to auto propagate data.

Talend DI Interview Questions

You came across here that means it is worth of writing this post. 🙂

Whenever I go for the interview there will be some new questions, so I thought why not to draft all these questions at single place?

It is just attempt to remember all Talend Interview question nothing else.

  1.  Difference between tMap and tJoin component in Talend .
  2. Difference between tAggregaterow and tAggregatesortedrow.
  3. Difference between tJava,tJavarow,tJavaflex.
  4. How to improve the performance of Talend job having complex design?
  5. Difference between built in schema and Repository.
  6. What is the declaration of method which we define in system routine?
  7. What is XMS and XMX parameter in Talend?
  8. How to resolve heap space issue in Talend ?
  9. How to do the exception handling in Talend?
  10. What is Default join for tMap.
  11. What are the different lookup patterns available with Talend?
  12. What is the basic requirement while updating the perticular table?
  13. How to generate surrogate key by using Talend?
  14. What is the use of Expression editor in Talend?
  15. How to debug a particular Talend job.
  16. What is context variable and context group?
  17. How to pass the variables from parent job to child job and from child job to Parent?
  18. How to forcefully exit the job.
  19. Explain the use of tContextload.
  20. How to execute multiple queries by using Talend?
  21. How to do the multithreading while executing the Job?
  22. What is hashmap in Talend and how to use it?
  23. How to do the full join in Talend.Explain the steps.
  24. How to do the right outer join in Talend? Explain the steps?
  25. How the ELT database components are differ from ETL database Components.
  26. How to use the external libraries in Talend?
  27. How to pass data from parent job to child jobs through tRunjob component ?
  28. How to load context variables dynamically?
  29. How to Share DB Connection in Talend?
  30. Skip Header rows and footer rows before load.
  31. What is Incremental Load? Describe using Talend.
  32. How can you pass a value form parent job to child job in Talend.
  33. How to call stored procedure and function in Talend Job

If you have any question which is not listed here please send me i will add it to the list. with your reference.

Read Multi Schema Positional File

To read a Multi schema Positional file we will use tFileInutMSPositional component, this component gives ability to read multiple schema based on certain columns value.

We are using Invoice file which has Invoice Header & invoice Details records in single file.

Sample file.

Header-Details Multi Schema Sample File
Header-Details Multi Schema Sample File

Invoice Header records start with “H” alphabet.

Invoice Details records start with “T’ alphabet.

follow the steps to read these multi schema records.

  • Create a job and add tFileinputMSPositional component.
  • Configure tFileinputMSPositional component with.
    • provide file Name.
    • row separator as per your file defaulted to “n”
    • “Header Filed Position” this is the most important filed by which component will distinguish records whether it is header or detail. in our case first letter of row which is “H” for header and “T” for detail. so we will add value as “0-1” that means start of record to 1 character long.
    • In “Records” section add two rows name it as “header” for Header records and “details”  for Details record.
    • once you add row it will ask you for schema add Header schema like below.
    • Header Schema
      Header Schema
    • Create details schema as follows.
    • Details Schema
      Details Schema
    •  Now add pattern for “header” as “1,8,6,8,3”.
    • Add pattern for details as “1,8,1,6,1,1,*”.
  • now our configuration completed.
  • Add two tLogRow component and from settings tab select “mode>Table”
  • Connect first tLogRow to the tFileInputMSPositional using “Header” link.
  • Connect Second tLogRow to the tFileInputMSPositional using “Detail” link.
  • Synch tLogRow columns.

Execute the job.  It will show you below output.

Header-Detail Output
Header-Detail Output

 

Read Multi Schema Delimited File

In this post i will describe you how to read/process multi schema delimited file using tFileInutMSDelimited component.

here is our sample input data which has header records then its trailer records.

Invoice Header record start with “H” alphabet and Invoice Detail/trailer record start with “T” Alphabet.

Sample header-trailer records
Sample header-trailer records

we will use tFileInutMSDelimited component to parse these records. so create a job and drag and drop tFileInutMSDelimited component and then double click on it, it will show you another screen for metadata creation.

select our source file, then select column separator & row separator, once that is done then flow the below steps.

  • In preview window select first column which has “H” and “T” values.
  • In Schema grid click on add button and provide following values.
    • Schema=H
    • Record=H
    • Separator=;
  • Click on “OK” you will one record created with provided values.
  • Click on fetch Codes button, it fetch all the visible codes from mention file and creates schema based on column delimiter.
  • Our configuration looks like below image.
tFileInputMSDelimited configuration
tFileInputMSDelimited configuration
  • Now go to the component setting tab and click on first schema which may look like row_A_1 click on it.
  • it will show you schema with default column names.
  • modify column name as follows.
Header Schema details
Header Schema details
  • We are done with Header schema now do the same thing with Trailer schema and modify column names.
Trailer Schema details
Trailer Schema details
  • Add tLogRow and connect to the tFileInputMSDelimited component using “row_A_1” schema.
  • Do the Same thing for “row_B_1” schema.
  • Now Execute the job your output will be like this.
Header-Trailer Output
Header-Trailer Output

you can achieve same thing using other ways, but trust me this is the most simple way.

 

Local, Remote, and Local offline Project

When you start Talend you will get wellcome screen with few option to select, create or import projects, but if you are starting Talend very first time then you should be aware of following project and connection type.

Create/Connect Local Project. 

  • Click on three dot button (…) which take to the next step of creating new project.
  • At left side click on [+] button to add new connection.
  • Select Repository as “local”
  • Provide “Name” for connection like “local”, “LocalTalend”
  • Provide Description if required.
  • write email address and password to the respective filed.
  • Select “workspace” area which is more important because all your jobs and projects will be created in this area.
  • Click on “OK” button. here you have created your first connection as local.

By this way you can add many connection as per your requirement.

Create local project in Talend
Create local project in Talend

Create/Connect to Remote Project. 

  • Select “Remote” option from Repository drop-down box.
  • Provide valid name in “Name” field, it will be defaulted with “remote”
  • Description is optional, but recommend you to provide.
  • provide TAC user email address, it try using this email id with your remote repository.
  • provide Password against email id, this should be configured/created prior to this step using TAC.
  • provide your local machines workspace area ( where do want to create local copy of repository?).
  • Web-app url, you need to provide valid repository url wherein repository have been configured, generally you could find it in TAC or ask to your TAC administrator.
  •  check on “check-URL” to verify repository connection/access.
  • Click on “OK” and go back to login window.
Connecting to the Remote Project Talend
Connecting to the Remote Project Talend
  • once came back on previous screen you will able to select remote repository.
  • Now select Project from “project list”.
  • if you don`t see anything then click on “refresh button” it will show you available projects from repository.
  • SVN branch, will populated automatic if you different branch address for project then you select it from available branches.
  • click on “Open” button, it will show you Studio with available project & Jobs.

Offline Project. 

  • Offline project is available in enterprise addition.
  • You don`t need to configure or create, it should be activated from TAC.
  • By using this you can work offline when you are not connected to the “Remote” project. it ill check-in all changes to the repository as soon as you get connected.

You are free to ask your questions.

Read XML with Nested Loops

In this post, I will describe you, how to parse xml having nested loops in Talend. For that I am using below XML as source.

Source XML has list of Items from a retail store,  and <item> node repeated for each item. Inside <item> node we have nested nodes for <batter> and <topping> our task is to read all the items with nested loops in their separate flow.

Image of source XML.

Sample XMl file with Nested Loops

Sample XMl file with Nested Loops

First of all you should create metadata of your XML file, but in this post I am using XSD to populate source schema in tXMLMap.

Create new job and add tFileinputXML component and configured it as shown in Image.

Image of tFileInputXMl component.

We have set XPATH Loop to “/items” as this is the root node of XML file. then I have created one column with “Document” data type. see in Image.

Set tFileInputXML for Nested loops
Set tFileInputXML for Nested loops

tXMLMap and connect with tFileinputXML component using “Main” link.

Open tXMLMap right click on “items” node at right(source) side then select “Import from file”.

provide XSD file it will automatically create all sub nodes. like below image.

Set batter & toppiq as “loop” element by right click  and select ” As loop element”

Configure tXMlMap
Configure tXMlMap

Create two outputs,  “batter” and “tappinq” then drag respective columns to “batter & tappinq”

Click on “set loop function” box it will open a window wherein you have add new row and select respective loop path e.g. for “toppinq” i will select “toppinq” loop like wise batter for “batter” output.

Your Final settings looks like below image.

Configuration tXMLMap output with loop path
Configuration tXMLMap output with loop path

we are ready to get the result add tLogRow components for each  of the output and execute the job it show results shown in image.

Nested XML loop parse Output
Nested XML loop parse Output

 

Talend Tips & Tricks

This post will contain Talend Tips & Tricks which saves your time while Coding, Java Converts, If Conditions, Expression Builder and many more area.

  • Always do Empty string and Null handling before converting.

!Relational.ISNULL(row1.StartDate) && row1.StartDate!=""?TalendDate.parseDate("yyyy-MM-dd", "2015-02-19"): TalendDate.parseDate("yyyy-MM-dd", "1999-01-01")

  •  Comparing with String values always use like below.

"Umesh".equals(row1.UserName) or "Umesh".equalsIgnoreCase(row1.UserName) instead of  "Umesh"=row1.UserName or row1.UserName.equalsIgnoreCase("Umesh")

  • Avoid using Short hand method or class methods for string.

Use StringHandling.TRIM(string); //for Trim

Use String.valueOf(SomeObject) instead .toString()

Use TalendDate.isDate("yyyy-MM-dd", "2015-02-19") for date validation

Use Relational.ISNULL(null) function to check whether string is null or not.

  • Exclude unwanted routines from build.

Remove or add required routines to the job, because unnecessarily it will be exported with job. you can manage routines by following below steps.

  1. Right Click on job from Repository.
  2. Select Option “Setup routine dependencies”.
  3. Now you will have new window with two tabs one for “user routines” and other one for ” System routines” by using [+] or remove button on each tab you can manage routines.
  4. see the Image for more details.
Exclude routines from Talend Build
Exclude routines from Talend Build

adding more….

If you think I am missing some other areas then do let me know.