Blog Archives

How to get latest file from directory

This post I will describe, how to get most resent file from directory based on display date or a date from file name.

We have below sample files in our directory and every file has date in the name of file, based on that we will decide which file is most resent rather than file created date/ modified date.

Sample xls files

Sample xls files

As you can see, we have three list of files.

  1. sales element 11 2014.xls has been modified at 28-01-2015
  2. sales element 02 2015.xls has been modified at 28-01-2015
  3. sales element 12 2014.xls has been modified at 03-03-2015

If we use file created or modified date to get most resent file then we will get ” sales element 12 2014.xls” which is a wrong file.

To get a latest file from directory we will use below steps.

Step 1: Add tFileList component and configure it get all .xls files form directory. see the image for details.

tFileList Configuration

tFileList Settings

Step 2: Add tFileProperties component and connect with tFileList using Iterator link, then provide file path and name from global variable. which looks like this ((String)globalMap.get(“tFileList_1_CURRENT_FILEPATH”)).

tFileProperties Setting

tFileProperties Setting

Step 3: Add tMap after tFileProperties and connect with main link and do the fowling setting in it.

  • Create output name as “FileList”.
  • Add all the source columns to this output.
  • Add new variable in tMap using variable creation, write this code in it.

row9.basename.substring(row9.basename.indexOf(“.”)-7).replace(“.xls”, “”)

  • Create new column in output with the name “DisplayDate” and datatype is Date.
  • Add below code in it.

TalendDate.parseDate(“MM yyyy”, Var.var1)

  • See the image for more details.
tMap Setting

tMap Setting

Step 4: Add tHashOutput component after tMap and connect with main link.

Step 5:  Add tHashInput Component below tFileList and link using “OnSubJobOk” trigger.

Step 6: Copy Schema from tHashOutput to tHashInput.

Step 7: Add tAggregateRow component and connect with tHashInput using main flow link. Do the basic setting like below.

tAggregateRow Setting

tAggregateRow Setting

Step 8: Add tLogRow to check the result. you will see the output as below.

Output Result

Output Result

Step 9: Your job design should be look like in below Image.

Final Job Design

Final Job Design

Note: You can avoid using tHash***** components just use tAggregateRow after tMap and do the setting as is, it will work.

Read Multi Schema Positional File

To read a Multi schema Positional file we will use tFileInutMSPositional component, this component gives ability to read multiple schema based on certain columns value.

We are using Invoice file which has Invoice Header & invoice Details records in single file.

Sample file.

Header-Details Multi Schema Sample File

Header-Details Multi Schema Sample File

Invoice Header records start with “H” alphabet.

Invoice Details records start with “T’ alphabet.

follow the steps to read these multi schema records.

  • Create a job and add tFileinputMSPositional component.
  • Configure tFileinputMSPositional component with.
    • provide file Name.
    • row separator as per your file defaulted to “n”
    • “Header Filed Position” this is the most important filed by which component will distinguish records whether it is header or detail. in our case first letter of row which is “H” for header and “T” for detail. so we will add value as “0-1” that means start of record to 1 character long.
    • In “Records” section add two rows name it as “header” for Header records and “details”  for Details record.
    • once you add row it will ask you for schema add Header schema like below.
    • Header Schema

      Header Schema

    • Create details schema as follows.
    • Details Schema

      Details Schema

    •  Now add pattern for “header” as “1,8,6,8,3”.
    • Add pattern for details as “1,8,1,6,1,1,*”.
  • now our configuration completed.
  • Add two tLogRow component and from settings tab select “mode>Table”
  • Connect first tLogRow to the tFileInputMSPositional using “Header” link.
  • Connect Second tLogRow to the tFileInputMSPositional using “Detail” link.
  • Synch tLogRow columns.

Execute the job.  It will show you below output.

Header-Detail Output

Header-Detail Output

 

Read XML with Nested Loops

In this post, I will describe you, how to parse xml having nested loops in Talend. For that I am using below XML as source.

Source XML has list of Items from a retail store,  and <item> node repeated for each item. Inside <item> node we have nested nodes for <batter> and <topping> our task is to read all the items with nested loops in their separate flow.

Image of source XML.

Sample XMl file with Nested Loops

Sample XMl file with Nested Loops

First of all you should create metadata of your XML file, but in this post I am using XSD to populate source schema in tXMLMap.

Create new job and add tFileinputXML component and configured it as shown in Image.

Image of tFileInputXMl component.

We have set XPATH Loop to “/items” as this is the root node of XML file. then I have created one column with “Document” data type. see in Image.

Set tFileInputXML for Nested loops

Set tFileInputXML for Nested loops

tXMLMap and connect with tFileinputXML component using “Main” link.

Open tXMLMap right click on “items” node at right(source) side then select “Import from file”.

provide XSD file it will automatically create all sub nodes. like below image.

Set batter & toppiq as “loop” element by right click  and select ” As loop element”

Configure tXMlMap

Configure tXMlMap

Create two outputs,  “batter” and “tappinq” then drag respective columns to “batter & tappinq”

Click on “set loop function” box it will open a window wherein you have add new row and select respective loop path e.g. for “toppinq” i will select “toppinq” loop like wise batter for “batter” output.

Your Final settings looks like below image.

Configuration tXMLMap output with loop path

Configuration tXMLMap output with loop path

we are ready to get the result add tLogRow components for each  of the output and execute the job it show results shown in image.

Nested XML loop parse Output

Nested XML loop parse Output