Blog Archives

How to get latest file from directory

This post I will describe, how to get most resent file from directory based on display date or a date from file name.

We have below sample files in our directory and every file has date in the name of file, based on that we will decide which file is most resent rather than file created date/ modified date.

Sample xls files

Sample xls files

As you can see, we have three list of files.

  1. sales element 11 2014.xls has been modified at 28-01-2015
  2. sales element 02 2015.xls has been modified at 28-01-2015
  3. sales element 12 2014.xls has been modified at 03-03-2015

If we use file created or modified date to get most resent file then we will get ” sales element 12 2014.xls” which is a wrong file.

To get a latest file from directory we will use below steps.

Step 1: Add tFileList component and configure it get all .xls files form directory. see the image for details.

tFileList Configuration

tFileList Settings

Step 2: Add tFileProperties component and connect with tFileList using Iterator link, then provide file path and name from global variable. which looks like this ((String)globalMap.get(“tFileList_1_CURRENT_FILEPATH”)).

tFileProperties Setting

tFileProperties Setting

Step 3: Add tMap after tFileProperties and connect with main link and do the fowling setting in it.

  • Create output name as “FileList”.
  • Add all the source columns to this output.
  • Add new variable in tMap using variable creation, write this code in it.

row9.basename.substring(row9.basename.indexOf(“.”)-7).replace(“.xls”, “”)

  • Create new column in output with the name “DisplayDate” and datatype is Date.
  • Add below code in it.

TalendDate.parseDate(“MM yyyy”, Var.var1)

  • See the image for more details.
tMap Setting

tMap Setting

Step 4: Add tHashOutput component after tMap and connect with main link.

Step 5:  Add tHashInput Component below tFileList and link using “OnSubJobOk” trigger.

Step 6: Copy Schema from tHashOutput to tHashInput.

Step 7: Add tAggregateRow component and connect with tHashInput using main flow link. Do the basic setting like below.

tAggregateRow Setting

tAggregateRow Setting

Step 8: Add tLogRow to check the result. you will see the output as below.

Output Result

Output Result

Step 9: Your job design should be look like in below Image.

Final Job Design

Final Job Design

Note: You can avoid using tHash***** components just use tAggregateRow after tMap and do the setting as is, it will work.

Advertisements

tFileList Exclude Mask

This post I will describe you how to exclude files using tFileList component.

Below are our sample files which stored in folder.

Sample Files

Sample Files

From above file list we want to read only files with name starts with “Orders_” and ends with “.csv” therefore we are using tFileList mask to get the file list.

Add tFileList component and configure as follows.

tFileList1 Configuration

tFileList1 Configuration

Now you will get all the files from mentioned location but we want to exclude two files which contains “US or USA” so let’s use Advance setting of tFileList and configure as follows.

tFileList configuration

tFileList configuration

Here you can see I have use regular expression to exclude files and the regular expression is “Orders_US.*” after running job I will get only one file which I wanted to process here is the output.

tFileList Exclude Mask Output

tFileList Exclude Mask Output

If you want to exclude multiple types of file then use comma to separate each pattern like below.

“(Orders_US.*),(Orders_UAE.*)”