If you want to add/load third party libraries in Talend Project, then you can choose any of the solution below.
- Window -> Preferences -> Java -> User Libraries This will include jar files for all the project jobs.
- Use the tLibraryLoad component to load a lib file in a job.
- Use Routines “Edit Routine Libraries” option
- Right click on Routine
- Select Option “Edit Routine Libraries“
- On popup window click on “New” Button.
- Select “Browse a Library file” option.
- Browse and select required Library.
- Click on “if the library file is required” click “Ok”.
- Another ways is use “Module” Tab to download and installed Library.
Talend provides many features for improving performance of Talend Job execution; this post I will describe how to increase Java Heap space.
You can allocate Heap space based on available memory on your machine.
- Using Run Tab.
- Go to the Talend Run Tab and then select “ Advance Setting”
- Select checkbox “Use specific JVM argument” option, it will enable argument Table.
- Click on “New” button to add first parameter for minimum memory allocation by writing “-Xms256M” this code.
- Do the same for to add second parameter for maximum memory allocation by writing “-Xmx1024M” this code.
- See the image for more details.
if you want to modify existing assigned memory size then double click on argument row it will open a pop up to edit there you can modify it.
- Using Preference menu.
- Go to Window menu and select “Preference” option.
- It will open new dialog box with various configuration options.
- Click on “Talend” node it will expand and show you other options.
- Inside “Talend” node click on “Run/Debug” option it will show you various options but we will add only JVM arguments as we did in step one above.
See the image for more details.
These are the two ways of assigning JVM arguments for Java Heap space but if you want to supply JVM argument at run time then you have to modify your .bat or .sh file based on running environment.
After opening .bat or .sh file you can see code like “java -Xms256M -Xmx1024M” this can be modify and saved, once that is done then job will use modified arguments instead default one which we have set on above steps.
This post I will describe you how to exclude files using tFileList component.
Below are our sample files which stored in folder.
From above file list we want to read only files with name starts with “Orders_” and ends with “.csv” therefore we are using tFileList mask to get the file list.
Add tFileList component and configure as follows.
Now you will get all the files from mentioned location but we want to exclude two files which contains “US or USA” so let’s use Advance setting of tFileList and configure as follows.
Here you can see I have use regular expression to exclude files and the regular expression is “Orders_US.*” after running job I will get only one file which I wanted to process here is the output.
If you want to exclude multiple types of file then use comma to separate each pattern like below.
File processing is a day to day task in ETL world, and there is huge need of validation regarding source file format, headers, footers, column name, data type and so on, thanks to tSchemaComplianceCheck component which can do most of the validation like.
- Length checking.
- Date pattern/format.
- Data Types
But does not support number of columns and column sequence validation, that which we have to manage using java code, in this post I will describe you how to validate column names and their sequence.
This is our final job design.
Let’s start with adding first component to job designer. Add tFileList and configure to get expected files.
Add tFileInputFullRow component and configure as shown in below screen.
- Add tMap and connect with main link from tFileinputFullRow component.
- Add tFixedFlowInput and connect with lookup link to tMap then configure as follows.
Note: if you have your refrence header row stored in file or database you can use it instead of tFixedFlowInput.
- Configure tMap as follows.
- Make inner join with your reference line and main line of input.
- Add two outputs and named it as “matching” and “reject” respectively.
- In the reject output click on setting “catch lookup inner join reject”=true
- Add source line to both the flows.
See image for more details.
- Add tJava next to tMap and connect with “matching” flow.
- Add another tJava next to tMap and connect with “reject” flow.
- Add tFileInputDelimited and connect with first tjava using “iterate” flow.
- Configure tFileInputDelimited as shown in below image.
Add tLogRow component to see the output from file.
You can see that for each file whole sub job will be executed if it is matching with header row then it will be used for reading.
You can connect reject row to make a note of rejected file based on your requirement.
Loop Start Date through End Date using tLoop
This post I will describe you how to loop through start date to end date. For that we will use tLoop component which will give us two loop options first one is “for loop” and second one is “ while loop”.
Write down below code in tJava.
java.util.Date start_date=TalendDate.parseDate(“yyyy-MM-dd”, “2015-01-01”);
java.util.Date end_date=TalendDate.parseDate(“yyyy-MM-dd”, “2015-01-10”);
long l=TalendDate.diffDate(end_date, start_date);
code look likes as follows.
In above line of you can see we have parse two dates first one is start_date and second one is end_date.
Then we have calculated number of days using TalendDate.diffDate() method it will return number of days in long data type that is stored in variable “l” then this being assigned to “context.days” context variable.
Drop tLoop component next to the tJava and link with “OnsubJobOk” trigger then configure tLoop as follows.
Add tJava component next to tLoop and link with “iterate” flow. tLoop as two global variables which can be used in flow for calculation or manipulation.
Here are those variables.
We will use CURRENT_VALUE to get the day from start day through end date. To print each day on console we will use add date method from TalendDate routine. See the below code, wherein we are adding current value from flow to the start_date to increment start date by one day.
After job run you will see below output on console.
This Post I will describe you, how to convert string to date in Talend. I will use various string dates to demonstrate.
- Converting simple string with consistent format: “MM/dd/yyyy hh:mm”
we will convert above date with “MM/dd/yyyy hh:mm” format. for that we will use below built in function from TalendDate routine.
TalendDate.parseDate(“MM/dd/yyyy hh:mm”,”12/21/2000 0:00″)
above function will return Date object if you print it will give you output as
Thu Dec 21 00:00:00 IST 2000
if you want this date in any other format then use below function.
TalendDate.formatDate(“dd-MMM-yyyy”, TalendDate.parseDate(“MM/dd/yyyy hh:mm”, “12/21/2000 0:00”))
TalendDate.formatDate(pattern, Dated) will return date in string type “21-Dec-2000” .
we can parse below non consistent formatted string using same method.
- convert heterogeneous formatted string to date. Sample String dates are as follows.
We will write some java code to replace “/” with non. so below code will replace “/” with empty string “” and then parse date function convert it using given format.
TalendDate.parseDate(“yyyyMMdd”, InputString.replaceAll(“/”, “”))
Convert dates with time stamp.
- Input String “2014-11-14T10:41:34-08:00”
- Format “yyyy-MM-dd’T’HH:mm:ssXXX”
- Input String: “2013-09-03T21:54:32.027+02:00”
- Format: “yyyy-MM-dd’T’HH:mm:ss.SSSX:00”
- Input String: “Tue May 08 00:00:00 CEST 2012”
- Format: “EEE MMM dd HH:mm:ss zzz yyyy”
TalendDate.parseDateLocale(“EEE MMM dd HH:mm:ss zzz yyyy”, “Tue May 08 00:00:00 CEST 2012”, “EN”)
- Input String: “30 Aug 2011 07:06:00”
- Format: “dd MMM yyyy HH:mm:ss”
TalendDate.parseDateLocale(“dd MMM yyyy HH:mm:ss”,”30 Aug 2011 07:06:00″,”EN”)
- Input String “24/02/2015 23:15:37.250000000”
- Format: “dd/MM/yyyy HH:mm:ss.SSSS”
System.out.println(TalendDate.parseDate(“dd/MM/yyyy HH:mm:ss.SSSS”, “24/02/2015 23:15:37.250000000”));
Parse DateTime-string with AM/PM marker
- Input String : “12/20/2012 10:02 PM”
- Format String: “MM/dd/yyyy HH:mm a”
System.out.println(TalendDate.parseDate(“MM/dd/yyyy HH:mm a”,”12/20/2012 10:02 PM”);
If you have any other format which is not listed here, then please send us we will include in list.
Keep visiting this page for newer formats.
This post I will describe how to parse XML with optional element.
We will use below source xml file which has three customer details, along with awards details, and <CUSTOMERAWARDS> is a optional xml element.
We will parse this file using tXMLMap component. so fist of all add tFileInputXML and configure as below.
- Assign source file path
- Create single column in schema named as
- Create CUSTOMERS column with “Document” data type in schema.
- Put loop Xpath query = “/CUSTOMERS”
- In Mapping section add XPath Query =”.”
- Select Get Nodes check box.
Add tXMLMap component and connect with tFileInputXML component using Main link and create source tree structure as shown in image.
Note: You can create create sub elements manually or it can be populated from XSD file or from repository.
Add two Outputs and drag and drop relevant source columns to output (Refer image).
Click on first output`s “set loop function” short menu and add one sequence then select xpath = customerid xpath, see the image for more details.
Our first Output is ready now you have to configure second output so follow the steps we did for first output and select xpath= customerawards, see the image for more details.
Add tlogrow for each output and then execute the job you will see output like below. If you observe, customer id 1236 it has no awards extracted but customer id 1234 and 1235 awards extracted completely.
tMap is frequently used component for joins and lookup purpose, it is also use for verity of operations and transformations, whereas tJoin is used for join and lookups only.
It accepts more than one input one is main and rests of the lookups.
It accepts only two inputs and only one is main and other one is lookup.
We can create more than one output
It has two default outputs one is “Main” and another one is ” Inner join reject”
tMap has “inner join ” and ” left outer join” joining model
tJoin offer`s only “inner join”
tMap offers three match model
tJoin defaulted with Unique match
tMap allows to store data on file option for lookup data processing
tJoin doesn`t offer this feature
In tMap you can filter data using filter expression
tJoin doesn`t offer this feature
You can write transformation using expression builder at each column level
tJoin doesn`t offer this feature
This post I will describe you how to split rows into columns, we will use below sample as input records.
Create a Job and add tFixedFlowInput component and put above input as “Use inline content” and create schema as shown in image.
Add tPivotToColumnsDelimited component and connect with tFixedFlowInput component as main connection then configured this component shown in below image.
Pivot Column =”Type”
Aggregation Function =”last”
Group by “ID” and “Name” column.
Rest of the configuration is for output file, where our output will be transferred. to read output file we can use either delimited component but for quick review I`ll use tFileInputFullRow.
Add tFileInputFullRow below the tFixedFlowInput component and connect with “On Sub Job Ok” trigger. and provide previously created file path and rest of the details.
add tLogRow and connect to tFileInputFullRow component and execute the job you will get above out put on console.
Final Job Design.
This component will create N number of columns based on your input, if you are dealing with fix schema then it will create complexity for further processing.