How to add third party libraries to Talend project?

If you want to add/load third party libraries in Talend Project, then you can choose any of the solution below.

  • Window -> Preferences -> Java -> User Libraries This will include jar files for all the project jobs.
  • Use the tLibraryLoad component to load a lib file in a job.
  • Use Routines “Edit Routine Libraries” option
    • Right click on Routine
    • Select Option “Edit Routine Libraries
    • On popup window click on “New” Button.
    • Select “Browse a Library file” option.
    • Browse and select required Library.
    • Click on “if the library file is required” click “Ok”.
  • Another ways is use “Module” Tab to download and installed Library.
Advertisements

Set Java Job Upper Heap Memory Limit

Talend provides many features for improving performance of Talend Job execution; this post I will describe how to increase Java Heap space.

You can allocate Heap space based on available memory on your machine.

  1. Using Run Tab.
    1. Go to the Talend Run Tab and then select “ Advance Setting”
    2. Select checkbox “Use specific JVM argument” option, it will enable argument Table.
    3. Click on “New” button to add first parameter for minimum memory allocation by writing “-Xms256M” this code.
    4. Do the same for to add second parameter for maximum memory allocation by writing “-Xmx1024M” this code.
    5. See the image for more details.
Configuration For heapSpace

Configuration For heapSpace

if you want to modify existing assigned memory size then double click on argument row it will open a pop up to edit there you can modify it.

  1. Using Preference menu.
    1. Go to Window menu and select “Preference” option.
    2. It will open new dialog box with various configuration options.
    3. Click on “Talend” node it will expand and show you other options.
    4. Inside “Talend” node click on “Run/Debug” option it will show you various options but we will add only JVM arguments as we did in step one above.

See the image for more details.

Configuration For heapSpace

Configuration For heapSpace

These are the two ways of assigning JVM arguments for Java Heap space but if you want to supply JVM argument at run time then you have to modify your .bat or .sh file based on running environment.

After opening .bat or .sh file you can see code like “java -Xms256M -Xmx1024M”  this can be modify and saved, once that is done then job will use modified arguments instead default one which we have set on above steps.

tFileList Exclude Mask

This post I will describe you how to exclude files using tFileList component.

Below are our sample files which stored in folder.

Sample Files

Sample Files

From above file list we want to read only files with name starts with “Orders_” and ends with “.csv” therefore we are using tFileList mask to get the file list.

Add tFileList component and configure as follows.

tFileList1 Configuration

tFileList1 Configuration

Now you will get all the files from mentioned location but we want to exclude two files which contains “US or USA” so let’s use Advance setting of tFileList and configure as follows.

tFileList configuration

tFileList configuration

Here you can see I have use regular expression to exclude files and the regular expression is “Orders_US.*” after running job I will get only one file which I wanted to process here is the output.

tFileList Exclude Mask Output

tFileList Exclude Mask Output

If you want to exclude multiple types of file then use comma to separate each pattern like below.

“(Orders_US.*),(Orders_UAE.*)”

Validate CSV headers in Talend.

File processing is a day to day task in ETL world, and there is huge need of validation regarding source file format, headers, footers, column name, data type and so on, thanks to tSchemaComplianceCheck component which can do most of the validation like.

  • Length checking.
  • Date pattern/format.
  • Data Types

But does not support number of columns and column sequence validation, that which we have to manage using java code, in this post I will describe you how to validate column names and their sequence.

This is our final job design.

Complete CSV Validation Job

Complete CSV Validation Job

Let’s start with adding first component to job designer. Add tFileList and configure to get expected files.

Add tFileInputFullRow component and configure as shown in below screen.

tFileInputFullRow Configuration

tFileInputFullRow Configuration

  • Add tMap and connect with main link from tFileinputFullRow component.
  • Add tFixedFlowInput and connect with lookup link to tMap then configure as follows.

Note: if you have your refrence header row stored in file or database you can use it instead of tFixedFlowInput.

tFixedFlowInput Configuration

tFixedFlowInput Configuration

  • Configure tMap as follows.
    • Make inner join with your reference line and main line of input.
    • Add two outputs and named it as “matching” and “reject” respectively.
    • In the reject output click on setting “catch lookup inner join reject”=true
    • Add source line to both the flows.

See image for more details.

tMap Setting

tMap Setting

  • Add tJava next to tMap and connect with “matching” flow.
  • Add another tJava next to tMap and connect with “reject” flow.
  • Add tFileInputDelimited and connect with first tjava using “iterate” flow.
  • Configure tFileInputDelimited as shown in below image.

Add tLogRow component to see the output from file.

tFileInputDelimited Configuration

tFileInputDelimited Configuration

You can see that for each file whole sub job will be executed if it is matching with header row then it will be used for reading.

You can connect reject row to make a note of rejected file based on your requirement.

Create File Name with Date and TimeStamp

This post I will describe you how to name a file with Timestamps in Talend. File name format depends upon your business requirement, for example your business requirement is to name file with time stamp like “Order_yyyyMMdd_hhmmss.dat” so it will have time stamp up to seconds hence you want to get the same file for reading or any other purpose you will not find out easily therefore you can maintain file names in variable to access it later.

In our scenario we will create a file with above name format and then same file will be used to read and display result after reading will use same file copy to the same day folder name.

This is our final job design.

 

File Name With Time Stamp

File Name With Time Stamp

  • Create context variable context.FileName as string to hold the file name.
  • Add tJava and write below code in it.

context.FileName=TalendDate.getDate(“yyyyMMdd_hhmmss”);

  • Now add tRowGenerator to generate dummy data ( you can use your source e.g. database, file or anything) and link with tJava using “OnSubJobOk”.
  • Add tFileOutputDeimited component and connect with tRowGenerator using main flow then configure as shown in below image.
Add File Name for Time Stamp

Add File Name for Time Stamp

As you can see we have assign file name from context variable, in the same way you can add dynamic file name like “D:/Orders”+TalendDate.getDate(“yyyyMMdd_hhmmss”)+”.csv” this.

Our source file created with expected file name that is “Orders_20150129_134310.csv”, now we want to read same file so follow below steps.

  • Add tFileInputDelimited and connect with tRowGenerator using “OnSubJobOk” link.
  • Configure tFileInputDelimited component as shown in below image.
tFileInputDelimited

tFileInputDelimited configuration & setting

You can observe that we are using file name in same way we did for file creation, because we don`t know how long file creation will take if it exceed in more than one second then you will miss the file name previously created to avoid that we are storing file name in context.FileName variable.

Now you have read the file and want to copy it to some other location, but it should be stored in folder with today’s date when it was created to do so,

  • Add tFileCopy component and link with tFileInputDelimited using “OnSubJobOk” link.
  • Configure tFileCopy component as shown in image.
tFileCopy Configuration

tFileCopy Configuration

You can notice that we are using same file name as we are using for rest of the component above. And the only change here is we are creating directory with Dynamic name by using “D:/”+TalendDate.getDate(“yyyyMMdd”) this code. This component allows us to create directory if it is not exist with provided name.

We have moved entire file from source location to dynamically created folder. Same you can use dynamic file or directory name as per business need.

Loop through start date to end date

Loop Start Date through End Date using tLoop

This post I will describe you how to loop through start date to end date. For that we will use tLoop component which will give us two loop options first one is “for loop” and second one is “ while loop”.

Write down below code in tJava.

java.util.Date start_date=TalendDate.parseDate(“yyyy-MM-dd”, “2015-01-01”);

java.util.Date end_date=TalendDate.parseDate(“yyyy-MM-dd”, “2015-01-10”);

long l=TalendDate.diffDate(end_date, start_date);

context.Days=l;

code look likes as follows.

loop through start and end date

loop through start and end date

In above line of you can see we have parse two dates first one is start_date and second one is end_date.

Then we have calculated number of days using TalendDate.diffDate() method it will return number of days in long data type that is stored in variable “l” then this being assigned to “context.days” context variable.

Drop tLoop component next to the tJava and link with “OnsubJobOk” trigger then configure tLoop as follows.

Loop trough start and end date

Loop trough start and end date

Add tJava component next to tLoop and link with “iterate” flow. tLoop as two global variables which can be used in flow  for calculation or manipulation.

Here are those variables.

CURRENT_VALUE

((Integer)globalMap.get(“tLoop_1_CURRENT_VALUE”))((Integer)globalMap.get(“tLoop_1_CURRENT_ITERATION”))

CURRENT_ITERATION

((Integer)globalMap.get(“tLoop_1_CURRENT_ITERATION”))

We will use CURRENT_VALUE to get the day from start day through end date. To print each day on console we will use add date method from TalendDate routine. See the below code, wherein we are adding current value from flow to the start_date to increment start date by one day.

TalendDate.addDate(TalendDate.parseDate(“yyyy-MM-dd”,”2015-01-01″),+((Integer)globalMap.get(“tLoop_1_CURRENT_VALUE”)),”dd”);

After job run you will see below output on console.

Complete job design with output

Complete job design with output

Convert String To Date

This Post I will describe you, how to convert string to date in Talend. I will use various string dates to demonstrate.

  • Converting simple string with consistent format: “MM/dd/yyyy hh:mm”

12/21/2000 0:00

we will convert above date with “MM/dd/yyyy hh:mm” format. for that we will use below built in function from TalendDate routine.

TalendDate.parseDate(“MM/dd/yyyy hh:mm”,”12/21/2000 0:00″)

above function will return Date object if you print it will give you output as

Thu Dec 21 00:00:00 IST 2000

if you want this date in any other format then use below function.

TalendDate.formatDate(“dd-MMM-yyyy”, TalendDate.parseDate(“MM/dd/yyyy hh:mm”, “12/21/2000 0:00”))

TalendDate.formatDate(pattern, Dated) will return date in string type “21-Dec-2000” .

we can parse below non consistent formatted string using same method.

12/21/2000 0:00
5/11/2007 0:00
5/1/2009 0:00

  • convert heterogeneous formatted string to date.  Sample String dates are as follows.

2014/12/21
20140214
2014/12/13
2014/12/23
20141201

We will write some java code to replace “/” with non. so below code will replace “/” with empty string “” and then parse date function convert it using given format.

TalendDate.parseDate(“yyyyMMdd”, InputString.replaceAll(“/”, “”))

Convert dates with time stamp.

  • Input String “2014-11-14T10:41:34-08:00”
  • Format  “yyyy-MM-dd’T’HH:mm:ssXXX”

TalendDate.parseDate(“yyyy-MM-dd’T’HH:mm:ssXXX”,”2014-11-14T10:41:34-08:00″)

  • Input String: “2013-09-03T21:54:32.027+02:00”
  • Format: “yyyy-MM-dd’T’HH:mm:ss.SSSX:00”

TalendDate.parseDate(“yyyy-MM-dd’T’HH:mm:ss.SSSX:00″,”2013-09-03T21:54:32.027+02:00”)

  • Input String: “Tue May 08 00:00:00 CEST 2012”
  • Format: “EEE MMM dd HH:mm:ss zzz yyyy”

TalendDate.parseDateLocale(“EEE MMM dd HH:mm:ss zzz yyyy”, “Tue May 08 00:00:00 CEST 2012”, “EN”)

  • Input String: “30 Aug 2011 07:06:00”
  • Format: “dd MMM yyyy HH:mm:ss”

TalendDate.parseDateLocale(“dd MMM yyyy HH:mm:ss”,”30 Aug 2011 07:06:00″,”EN”)

  • Input String “24/02/2015 23:15:37.250000000”
  • Format: “dd/MM/yyyy HH:mm:ss.SSSS”

System.out.println(TalendDate.parseDate(“dd/MM/yyyy HH:mm:ss.SSSS”, “24/02/2015 23:15:37.250000000”));

Parse DateTime-string with AM/PM marker

  • Input String : “12/20/2012 10:02 PM”
  • Format String: “MM/dd/yyyy HH:mm a”

System.out.println(TalendDate.parseDate(“MM/dd/yyyy HH:mm a”,”12/20/2012 10:02 PM”);

If you have any other format which is not listed here, then please send us we will include in list.

Keep visiting this page for newer formats.

Read XML with Optional Elements

This post I will describe how to parse XML with optional element.

We will use below source xml file which has three customer details, along with awards details, and <CUSTOMERAWARDS> is a optional xml element.

Sample XML file

Sample XML file

We will parse this file using tXMLMap component. so fist of all add tFileInputXML and configure as below.

  • Assign source file path
  • Create single column in schema named as
  • Create CUSTOMERS column with “Document” data type in schema.
  • Put loop Xpath query = “/CUSTOMERS”
  • In Mapping section add XPath Query =”.”
  • Select Get Nodes check box.

Add tXMLMap component and connect with tFileInputXML component using Main link and create source tree structure as shown in image.

Note: You can create create sub elements manually or  it can be  populated from XSD file or from repository.

Add two Outputs and drag and drop relevant source columns to output (Refer image).

tXMLMap Configuration

tXMLMap Configuration

Click on first output`s “set loop function” short menu and add one sequence then select xpath = customerid xpath, see the image for more details.

tXml Map First Output

tXml Map First Output

Our first Output is ready now you have to configure second output so follow the steps we did for first output and select xpath= customerawards, see the image for more details.

tXml Map Second Output

tXml Map Second Output

Add tlogrow for each output and then execute the job you will see output like below. If you observe, customer id 1236 it has no awards extracted but customer id 1234 and 1235 awards extracted completely.

OutPut

Out Put

Difference between tMap and tJoin

tMap is frequently used component for joins and lookup purpose, it is also use for verity of operations and transformations, whereas tJoin is used for join and lookups only.

tMap

tJoin

It accepts more than one input one is main and rests of the lookups.

It accepts only two inputs and only one is main and other one is lookup.

We can create more than one output

It has two default outputs one is “Main” and another one is ” Inner join reject”

tMap has “inner join ” and ” left outer join” joining model

tJoin offer`s only “inner join”

tMap offers three match model

  1. Unique Match
  2. First Match
  3. All Matches

tJoin defaulted with Unique match

tMap allows to store data on file option for lookup data processing

tJoin doesn`t offer this feature

In tMap you can filter data using filter expression

tJoin doesn`t offer this feature

You can write transformation using expression builder at each column level

tJoin doesn`t offer this feature

Split Rows to Columns

This post I will describe you how to split rows into columns, we will use below sample as input records.

Input Rows.

Input Rows

Input Rows

Expected Output.

out put

Out Put

Create a Job and add tFixedFlowInput component and  put above input as “Use inline content” and create schema as shown in image.

Input Schema

Input Schema

Add tPivotToColumnsDelimited  component and connect with tFixedFlowInput component as main connection then configured this component shown in below image.

tPivot component Configuration

tPivot component Configuration

Configurations :

Pivot Column =”Type”

Aggregation column=”Value”

Aggregation Function =”last”

Group by “ID” and “Name” column.

Rest of the configuration is for output file, where our output will be transferred. to read output file we can use either delimited component but for quick review I`ll use tFileInputFullRow.

Add tFileInputFullRow below the tFixedFlowInput component and connect with “On Sub Job Ok” trigger. and provide previously created file path and rest of the details.

add tLogRow and connect to tFileInputFullRow component and execute the job you will get above out put on console.

Final Job Design.

Job with OutPut

Job with OutPut

This component will create N number of columns based on your input, if you are dealing with fix schema then it will create complexity for further processing.