Monthly Archives: February 2015
To read a Multi schema Positional file we will use tFileInutMSPositional component, this component gives ability to read multiple schema based on certain columns value.
We are using Invoice file which has Invoice Header & invoice Details records in single file.
Invoice Header records start with “H” alphabet.
Invoice Details records start with “T’ alphabet.
follow the steps to read these multi schema records.
- Create a job and add tFileinputMSPositional component.
- Configure tFileinputMSPositional component with.
- provide file Name.
- row separator as per your file defaulted to “n”
- “Header Filed Position” this is the most important filed by which component will distinguish records whether it is header or detail. in our case first letter of row which is “H” for header and “T” for detail. so we will add value as “0-1” that means start of record to 1 character long.
- In “Records” section add two rows name it as “header” for Header records and “details” for Details record.
- once you add row it will ask you for schema add Header schema like below.
- Create details schema as follows.
- Now add pattern for “header” as “1,8,6,8,3”.
- Add pattern for details as “1,8,1,6,1,1,*”.
- now our configuration completed.
- Add two tLogRow component and from settings tab select “mode>Table”
- Connect first tLogRow to the tFileInputMSPositional using “Header” link.
- Connect Second tLogRow to the tFileInputMSPositional using “Detail” link.
- Synch tLogRow columns.
Execute the job. It will show you below output.
In this post i will describe you how to read/process multi schema delimited file using tFileInutMSDelimited component.
here is our sample input data which has header records then its trailer records.
Invoice Header record start with “H” alphabet and Invoice Detail/trailer record start with “T” Alphabet.
we will use tFileInutMSDelimited component to parse these records. so create a job and drag and drop tFileInutMSDelimited component and then double click on it, it will show you another screen for metadata creation.
select our source file, then select column separator & row separator, once that is done then flow the below steps.
- In preview window select first column which has “H” and “T” values.
- In Schema grid click on add button and provide following values.
- Click on “OK” you will one record created with provided values.
- Click on fetch Codes button, it fetch all the visible codes from mention file and creates schema based on column delimiter.
- Our configuration looks like below image.
- Now go to the component setting tab and click on first schema which may look like row_A_1 click on it.
- it will show you schema with default column names.
- modify column name as follows.
- We are done with Header schema now do the same thing with Trailer schema and modify column names.
- Add tLogRow and connect to the tFileInputMSDelimited component using “row_A_1” schema.
- Do the Same thing for “row_B_1” schema.
- Now Execute the job your output will be like this.
you can achieve same thing using other ways, but trust me this is the most simple way.
When you start Talend you will get wellcome screen with few option to select, create or import projects, but if you are starting Talend very first time then you should be aware of following project and connection type.
Create/Connect Local Project.
- Click on three dot button (…) which take to the next step of creating new project.
- At left side click on [+] button to add new connection.
- Select Repository as “local”
- Provide “Name” for connection like “local”, “LocalTalend”
- Provide Description if required.
- write email address and password to the respective filed.
- Select “workspace” area which is more important because all your jobs and projects will be created in this area.
- Click on “OK” button. here you have created your first connection as local.
By this way you can add many connection as per your requirement.
Create/Connect to Remote Project.
- Select “Remote” option from Repository drop-down box.
- Provide valid name in “Name” field, it will be defaulted with “remote”
- Description is optional, but recommend you to provide.
- provide TAC user email address, it try using this email id with your remote repository.
- provide Password against email id, this should be configured/created prior to this step using TAC.
- provide your local machines workspace area ( where do want to create local copy of repository?).
- Web-app url, you need to provide valid repository url wherein repository have been configured, generally you could find it in TAC or ask to your TAC administrator.
- check on “check-URL” to verify repository connection/access.
- Click on “OK” and go back to login window.
- once came back on previous screen you will able to select remote repository.
- Now select Project from “project list”.
- if you don`t see anything then click on “refresh button” it will show you available projects from repository.
- SVN branch, will populated automatic if you different branch address for project then you select it from available branches.
- click on “Open” button, it will show you Studio with available project & Jobs.
- Offline project is available in enterprise addition.
- You don`t need to configure or create, it should be activated from TAC.
- By using this you can work offline when you are not connected to the “Remote” project. it ill check-in all changes to the repository as soon as you get connected.
You are free to ask your questions.
In this post, I will describe you, how to parse xml having nested loops in Talend. For that I am using below XML as source.
Source XML has list of Items from a retail store, and <item> node repeated for each item. Inside <item> node we have nested nodes for <batter> and <topping> our task is to read all the items with nested loops in their separate flow.
Image of source XML.
Sample XMl file with Nested Loops
First of all you should create metadata of your XML file, but in this post I am using XSD to populate source schema in tXMLMap.
Create new job and add tFileinputXML component and configured it as shown in Image.
Image of tFileInputXMl component.
We have set XPATH Loop to “/items” as this is the root node of XML file. then I have created one column with “Document” data type. see in Image.
tXMLMap and connect with tFileinputXML component using “Main” link.
Open tXMLMap right click on “items” node at right(source) side then select “Import from file”.
provide XSD file it will automatically create all sub nodes. like below image.
Set batter & toppiq as “loop” element by right click and select ” As loop element”
Create two outputs, “batter” and “tappinq” then drag respective columns to “batter & tappinq”
Click on “set loop function” box it will open a window wherein you have add new row and select respective loop path e.g. for “toppinq” i will select “toppinq” loop like wise batter for “batter” output.
Your Final settings looks like below image.
we are ready to get the result add tLogRow components for each of the output and execute the job it show results shown in image.
This post will contain Talend Tips & Tricks which saves your time while Coding, Java Converts, If Conditions, Expression Builder and many more area.
- Always do Empty string and Null handling before converting.
!Relational.ISNULL(row1.StartDate) && row1.StartDate!=""?TalendDate.parseDate("yyyy-MM-dd", "2015-02-19"): TalendDate.parseDate("yyyy-MM-dd", "1999-01-01")
- Comparing with String values always use like below.
"Umesh".equals(row1.UserName) or "Umesh".equalsIgnoreCase(row1.UserName) instead of "Umesh"=row1.UserName or row1.UserName.equalsIgnoreCase("Umesh")
- Avoid using Short hand method or class methods for string.
StringHandling.TRIM(string); //for Trim
TalendDate.isDate("yyyy-MM-dd", "2015-02-19") for date validation
Relational.ISNULL(null) function to check whether string is null or not.
- Exclude unwanted routines from build.
Remove or add required routines to the job, because unnecessarily it will be exported with job. you can manage routines by following below steps.
- Right Click on job from Repository.
- Select Option “Setup routine dependencies”.
- Now you will have new window with two tabs one for “user routines” and other one for ” System routines” by using [+] or remove button on each tab you can manage routines.
- see the Image for more details.
If you think I am missing some other areas then do let me know.
If you want to add/load third party libraries in Talend Project, then you can choose any of the solution below.
- Window -> Preferences -> Java -> User Libraries This will include jar files for all the project jobs.
- Use the tLibraryLoad component to load a lib file in a job.
- Use Routines “Edit Routine Libraries” option
- Right click on Routine
- Select Option “Edit Routine Libraries“
- On popup window click on “New” Button.
- Select “Browse a Library file” option.
- Browse and select required Library.
- Click on “if the library file is required” click “Ok”.
- Another ways is use “Module” Tab to download and installed Library.
Talend provides many features for improving performance of Talend Job execution; this post I will describe how to increase Java Heap space.
You can allocate Heap space based on available memory on your machine.
- Using Run Tab.
- Go to the Talend Run Tab and then select “ Advance Setting”
- Select checkbox “Use specific JVM argument” option, it will enable argument Table.
- Click on “New” button to add first parameter for minimum memory allocation by writing “-Xms256M” this code.
- Do the same for to add second parameter for maximum memory allocation by writing “-Xmx1024M” this code.
- See the image for more details.
if you want to modify existing assigned memory size then double click on argument row it will open a pop up to edit there you can modify it.
- Using Preference menu.
- Go to Window menu and select “Preference” option.
- It will open new dialog box with various configuration options.
- Click on “Talend” node it will expand and show you other options.
- Inside “Talend” node click on “Run/Debug” option it will show you various options but we will add only JVM arguments as we did in step one above.
See the image for more details.
These are the two ways of assigning JVM arguments for Java Heap space but if you want to supply JVM argument at run time then you have to modify your .bat or .sh file based on running environment.
After opening .bat or .sh file you can see code like “java -Xms256M -Xmx1024M” this can be modify and saved, once that is done then job will use modified arguments instead default one which we have set on above steps.
This post I will describe you how to exclude files using tFileList component.
Below are our sample files which stored in folder.
From above file list we want to read only files with name starts with “Orders_” and ends with “.csv” therefore we are using tFileList mask to get the file list.
Add tFileList component and configure as follows.
Now you will get all the files from mentioned location but we want to exclude two files which contains “US or USA” so let’s use Advance setting of tFileList and configure as follows.
Here you can see I have use regular expression to exclude files and the regular expression is “Orders_US.*” after running job I will get only one file which I wanted to process here is the output.
If you want to exclude multiple types of file then use comma to separate each pattern like below.
File processing is a day to day task in ETL world, and there is huge need of validation regarding source file format, headers, footers, column name, data type and so on, thanks to tSchemaComplianceCheck component which can do most of the validation like.
- Length checking.
- Date pattern/format.
- Data Types
But does not support number of columns and column sequence validation, that which we have to manage using java code, in this post I will describe you how to validate column names and their sequence.
This is our final job design.
Let’s start with adding first component to job designer. Add tFileList and configure to get expected files.
Add tFileInputFullRow component and configure as shown in below screen.
- Add tMap and connect with main link from tFileinputFullRow component.
- Add tFixedFlowInput and connect with lookup link to tMap then configure as follows.
Note: if you have your refrence header row stored in file or database you can use it instead of tFixedFlowInput.
- Configure tMap as follows.
- Make inner join with your reference line and main line of input.
- Add two outputs and named it as “matching” and “reject” respectively.
- In the reject output click on setting “catch lookup inner join reject”=true
- Add source line to both the flows.
See image for more details.
- Add tJava next to tMap and connect with “matching” flow.
- Add another tJava next to tMap and connect with “reject” flow.
- Add tFileInputDelimited and connect with first tjava using “iterate” flow.
- Configure tFileInputDelimited as shown in below image.
Add tLogRow component to see the output from file.
You can see that for each file whole sub job will be executed if it is matching with header row then it will be used for reading.
You can connect reject row to make a note of rejected file based on your requirement.