Blog Archives

Difference between tJava,tJavarow,tJavaflex

I have listed down some of the common differences between tJava, tJavaRow and tJavaFlex component.

 

Operations tJava tJavaRow tJavaFlex
Use component to integrate your custom Java code Yes Yes Yes
It will be executed first but only once in the subjob. Yes No
It requires Input Flow No Yes No
It Requires Output Flow No If output Schema Defined If output Schema Defined
It can be used as Start of the Job Yes No Yes
It can be used as a separate subjob Yes No Yes
It accepts Main Flow or Iterator Flow Both Only Main Both
It has three Java code parts (start, main, end) No No Yes
It will Auto propagate Data No No Yes

 

tJava advantage: this component can be use as trigger component, At start of the job at end of the job.

tJavaRow : this component required main flow so it can be used at end of the Sub Job but not at the start of subjob.

tJavaFlex: this component holds capabilities of tJava & tJavaRows you can use this component for row generation, or at start of job, or at then end of sub job. or individual  subjob.  It gives you ability to auto propagate data.

Advertisements

Validate CSV headers in Talend.

File processing is a day to day task in ETL world, and there is huge need of validation regarding source file format, headers, footers, column name, data type and so on, thanks to tSchemaComplianceCheck component which can do most of the validation like.

  • Length checking.
  • Date pattern/format.
  • Data Types

But does not support number of columns and column sequence validation, that which we have to manage using java code, in this post I will describe you how to validate column names and their sequence.

This is our final job design.

Complete CSV Validation Job

Complete CSV Validation Job

Let’s start with adding first component to job designer. Add tFileList and configure to get expected files.

Add tFileInputFullRow component and configure as shown in below screen.

tFileInputFullRow Configuration

tFileInputFullRow Configuration

  • Add tMap and connect with main link from tFileinputFullRow component.
  • Add tFixedFlowInput and connect with lookup link to tMap then configure as follows.

Note: if you have your refrence header row stored in file or database you can use it instead of tFixedFlowInput.

tFixedFlowInput Configuration

tFixedFlowInput Configuration

  • Configure tMap as follows.
    • Make inner join with your reference line and main line of input.
    • Add two outputs and named it as “matching” and “reject” respectively.
    • In the reject output click on setting “catch lookup inner join reject”=true
    • Add source line to both the flows.

See image for more details.

tMap Setting

tMap Setting

  • Add tJava next to tMap and connect with “matching” flow.
  • Add another tJava next to tMap and connect with “reject” flow.
  • Add tFileInputDelimited and connect with first tjava using “iterate” flow.
  • Configure tFileInputDelimited as shown in below image.

Add tLogRow component to see the output from file.

tFileInputDelimited Configuration

tFileInputDelimited Configuration

You can see that for each file whole sub job will be executed if it is matching with header row then it will be used for reading.

You can connect reject row to make a note of rejected file based on your requirement.

Create File Name with Date and TimeStamp

This post I will describe you how to name a file with Timestamps in Talend. File name format depends upon your business requirement, for example your business requirement is to name file with time stamp like “Order_yyyyMMdd_hhmmss.dat” so it will have time stamp up to seconds hence you want to get the same file for reading or any other purpose you will not find out easily therefore you can maintain file names in variable to access it later.

In our scenario we will create a file with above name format and then same file will be used to read and display result after reading will use same file copy to the same day folder name.

This is our final job design.

 

File Name With Time Stamp

File Name With Time Stamp

  • Create context variable context.FileName as string to hold the file name.
  • Add tJava and write below code in it.

context.FileName=TalendDate.getDate(“yyyyMMdd_hhmmss”);

  • Now add tRowGenerator to generate dummy data ( you can use your source e.g. database, file or anything) and link with tJava using “OnSubJobOk”.
  • Add tFileOutputDeimited component and connect with tRowGenerator using main flow then configure as shown in below image.
Add File Name for Time Stamp

Add File Name for Time Stamp

As you can see we have assign file name from context variable, in the same way you can add dynamic file name like “D:/Orders”+TalendDate.getDate(“yyyyMMdd_hhmmss”)+”.csv” this.

Our source file created with expected file name that is “Orders_20150129_134310.csv”, now we want to read same file so follow below steps.

  • Add tFileInputDelimited and connect with tRowGenerator using “OnSubJobOk” link.
  • Configure tFileInputDelimited component as shown in below image.
tFileInputDelimited

tFileInputDelimited configuration & setting

You can observe that we are using file name in same way we did for file creation, because we don`t know how long file creation will take if it exceed in more than one second then you will miss the file name previously created to avoid that we are storing file name in context.FileName variable.

Now you have read the file and want to copy it to some other location, but it should be stored in folder with today’s date when it was created to do so,

  • Add tFileCopy component and link with tFileInputDelimited using “OnSubJobOk” link.
  • Configure tFileCopy component as shown in image.
tFileCopy Configuration

tFileCopy Configuration

You can notice that we are using same file name as we are using for rest of the component above. And the only change here is we are creating directory with Dynamic name by using “D:/”+TalendDate.getDate(“yyyyMMdd”) this code. This component allows us to create directory if it is not exist with provided name.

We have moved entire file from source location to dynamically created folder. Same you can use dynamic file or directory name as per business need.

Loop through start date to end date

Loop Start Date through End Date using tLoop

This post I will describe you how to loop through start date to end date. For that we will use tLoop component which will give us two loop options first one is “for loop” and second one is “ while loop”.

Write down below code in tJava.

java.util.Date start_date=TalendDate.parseDate(“yyyy-MM-dd”, “2015-01-01”);

java.util.Date end_date=TalendDate.parseDate(“yyyy-MM-dd”, “2015-01-10”);

long l=TalendDate.diffDate(end_date, start_date);

context.Days=l;

code look likes as follows.

loop through start and end date

loop through start and end date

In above line of you can see we have parse two dates first one is start_date and second one is end_date.

Then we have calculated number of days using TalendDate.diffDate() method it will return number of days in long data type that is stored in variable “l” then this being assigned to “context.days” context variable.

Drop tLoop component next to the tJava and link with “OnsubJobOk” trigger then configure tLoop as follows.

Loop trough start and end date

Loop trough start and end date

Add tJava component next to tLoop and link with “iterate” flow. tLoop as two global variables which can be used in flow  for calculation or manipulation.

Here are those variables.

CURRENT_VALUE

((Integer)globalMap.get(“tLoop_1_CURRENT_VALUE”))((Integer)globalMap.get(“tLoop_1_CURRENT_ITERATION”))

CURRENT_ITERATION

((Integer)globalMap.get(“tLoop_1_CURRENT_ITERATION”))

We will use CURRENT_VALUE to get the day from start day through end date. To print each day on console we will use add date method from TalendDate routine. See the below code, wherein we are adding current value from flow to the start_date to increment start date by one day.

TalendDate.addDate(TalendDate.parseDate(“yyyy-MM-dd”,”2015-01-01″),+((Integer)globalMap.get(“tLoop_1_CURRENT_VALUE”)),”dd”);

After job run you will see below output on console.

Complete job design with output

Complete job design with output

Convert String To Date

This Post I will describe you, how to convert string to date in Talend. I will use various string dates to demonstrate.

  • Converting simple string with consistent format: “MM/dd/yyyy hh:mm”

12/21/2000 0:00

we will convert above date with “MM/dd/yyyy hh:mm” format. for that we will use below built in function from TalendDate routine.

TalendDate.parseDate(“MM/dd/yyyy hh:mm”,”12/21/2000 0:00″)

above function will return Date object if you print it will give you output as

Thu Dec 21 00:00:00 IST 2000

if you want this date in any other format then use below function.

TalendDate.formatDate(“dd-MMM-yyyy”, TalendDate.parseDate(“MM/dd/yyyy hh:mm”, “12/21/2000 0:00”))

TalendDate.formatDate(pattern, Dated) will return date in string type “21-Dec-2000” .

we can parse below non consistent formatted string using same method.

12/21/2000 0:00
5/11/2007 0:00
5/1/2009 0:00

  • convert heterogeneous formatted string to date.  Sample String dates are as follows.

2014/12/21
20140214
2014/12/13
2014/12/23
20141201

We will write some java code to replace “/” with non. so below code will replace “/” with empty string “” and then parse date function convert it using given format.

TalendDate.parseDate(“yyyyMMdd”, InputString.replaceAll(“/”, “”))

Convert dates with time stamp.

  • Input String “2014-11-14T10:41:34-08:00”
  • Format  “yyyy-MM-dd’T’HH:mm:ssXXX”

TalendDate.parseDate(“yyyy-MM-dd’T’HH:mm:ssXXX”,”2014-11-14T10:41:34-08:00″)

  • Input String: “2013-09-03T21:54:32.027+02:00”
  • Format: “yyyy-MM-dd’T’HH:mm:ss.SSSX:00”

TalendDate.parseDate(“yyyy-MM-dd’T’HH:mm:ss.SSSX:00″,”2013-09-03T21:54:32.027+02:00”)

  • Input String: “Tue May 08 00:00:00 CEST 2012”
  • Format: “EEE MMM dd HH:mm:ss zzz yyyy”

TalendDate.parseDateLocale(“EEE MMM dd HH:mm:ss zzz yyyy”, “Tue May 08 00:00:00 CEST 2012”, “EN”)

  • Input String: “30 Aug 2011 07:06:00”
  • Format: “dd MMM yyyy HH:mm:ss”

TalendDate.parseDateLocale(“dd MMM yyyy HH:mm:ss”,”30 Aug 2011 07:06:00″,”EN”)

  • Input String “24/02/2015 23:15:37.250000000”
  • Format: “dd/MM/yyyy HH:mm:ss.SSSS”

System.out.println(TalendDate.parseDate(“dd/MM/yyyy HH:mm:ss.SSSS”, “24/02/2015 23:15:37.250000000”));

Parse DateTime-string with AM/PM marker

  • Input String : “12/20/2012 10:02 PM”
  • Format String: “MM/dd/yyyy HH:mm a”

System.out.println(TalendDate.parseDate(“MM/dd/yyyy HH:mm a”,”12/20/2012 10:02 PM”);

If you have any other format which is not listed here, then please send us we will include in list.

Keep visiting this page for newer formats.