Read Multi Schema Positional File

To read a Multi schema Positional file we will use tFileInutMSPositional component, this component gives ability to read multiple schema based on certain columns value.

We are using Invoice file which has Invoice Header & invoice Details records in single file.

Sample file.

Header-Details Multi Schema Sample File
Header-Details Multi Schema Sample File

Invoice Header records start with “H” alphabet.

Invoice Details records start with “T’ alphabet.

follow the steps to read these multi schema records.

  • Create a job and add tFileinputMSPositional component.
  • Configure tFileinputMSPositional component with.
    • provide file Name.
    • row separator as per your file defaulted to “n”
    • “Header Filed Position” this is the most important filed by which component will distinguish records whether it is header or detail. in our case first letter of row which is “H” for header and “T” for detail. so we will add value as “0-1” that means start of record to 1 character long.
    • In “Records” section add two rows name it as “header” for Header records and “details”  for Details record.
    • once you add row it will ask you for schema add Header schema like below.
    • Header Schema
      Header Schema
    • Create details schema as follows.
    • Details Schema
      Details Schema
    •  Now add pattern for “header” as “1,8,6,8,3”.
    • Add pattern for details as “1,8,1,6,1,1,*”.
  • now our configuration completed.
  • Add two tLogRow component and from settings tab select “mode>Table”
  • Connect first tLogRow to the tFileInputMSPositional using “Header” link.
  • Connect Second tLogRow to the tFileInputMSPositional using “Detail” link.
  • Synch tLogRow columns.

Execute the job.  It will show you below output.

Header-Detail Output
Header-Detail Output

 

Read Multi Schema Delimited File

In this post i will describe you how to read/process multi schema delimited file using tFileInutMSDelimited component.

here is our sample input data which has header records then its trailer records.

Invoice Header record start with “H” alphabet and Invoice Detail/trailer record start with “T” Alphabet.

Sample header-trailer records
Sample header-trailer records

we will use tFileInutMSDelimited component to parse these records. so create a job and drag and drop tFileInutMSDelimited component and then double click on it, it will show you another screen for metadata creation.

select our source file, then select column separator & row separator, once that is done then flow the below steps.

  • In preview window select first column which has “H” and “T” values.
  • In Schema grid click on add button and provide following values.
    • Schema=H
    • Record=H
    • Separator=;
  • Click on “OK” you will one record created with provided values.
  • Click on fetch Codes button, it fetch all the visible codes from mention file and creates schema based on column delimiter.
  • Our configuration looks like below image.
tFileInputMSDelimited configuration
tFileInputMSDelimited configuration
  • Now go to the component setting tab and click on first schema which may look like row_A_1 click on it.
  • it will show you schema with default column names.
  • modify column name as follows.
Header Schema details
Header Schema details
  • We are done with Header schema now do the same thing with Trailer schema and modify column names.
Trailer Schema details
Trailer Schema details
  • Add tLogRow and connect to the tFileInputMSDelimited component using “row_A_1” schema.
  • Do the Same thing for “row_B_1” schema.
  • Now Execute the job your output will be like this.
Header-Trailer Output
Header-Trailer Output

you can achieve same thing using other ways, but trust me this is the most simple way.

 

Local, Remote, and Local offline Project

When you start Talend you will get wellcome screen with few option to select, create or import projects, but if you are starting Talend very first time then you should be aware of following project and connection type.

Create/Connect Local Project. 

  • Click on three dot button (…) which take to the next step of creating new project.
  • At left side click on [+] button to add new connection.
  • Select Repository as “local”
  • Provide “Name” for connection like “local”, “LocalTalend”
  • Provide Description if required.
  • write email address and password to the respective filed.
  • Select “workspace” area which is more important because all your jobs and projects will be created in this area.
  • Click on “OK” button. here you have created your first connection as local.

By this way you can add many connection as per your requirement.

Create local project in Talend
Create local project in Talend

Create/Connect to Remote Project. 

  • Select “Remote” option from Repository drop-down box.
  • Provide valid name in “Name” field, it will be defaulted with “remote”
  • Description is optional, but recommend you to provide.
  • provide TAC user email address, it try using this email id with your remote repository.
  • provide Password against email id, this should be configured/created prior to this step using TAC.
  • provide your local machines workspace area ( where do want to create local copy of repository?).
  • Web-app url, you need to provide valid repository url wherein repository have been configured, generally you could find it in TAC or ask to your TAC administrator.
  •  check on “check-URL” to verify repository connection/access.
  • Click on “OK” and go back to login window.
Connecting to the Remote Project Talend
Connecting to the Remote Project Talend
  • once came back on previous screen you will able to select remote repository.
  • Now select Project from “project list”.
  • if you don`t see anything then click on “refresh button” it will show you available projects from repository.
  • SVN branch, will populated automatic if you different branch address for project then you select it from available branches.
  • click on “Open” button, it will show you Studio with available project & Jobs.

Offline Project. 

  • Offline project is available in enterprise addition.
  • You don`t need to configure or create, it should be activated from TAC.
  • By using this you can work offline when you are not connected to the “Remote” project. it ill check-in all changes to the repository as soon as you get connected.

You are free to ask your questions.

Read XML with Nested Loops

In this post, I will describe you, how to parse xml having nested loops in Talend. For that I am using below XML as source.

Source XML has list of Items from a retail store,  and <item> node repeated for each item. Inside <item> node we have nested nodes for <batter> and <topping> our task is to read all the items with nested loops in their separate flow.

Image of source XML.

Sample XMl file with Nested Loops

Sample XMl file with Nested Loops

First of all you should create metadata of your XML file, but in this post I am using XSD to populate source schema in tXMLMap.

Create new job and add tFileinputXML component and configured it as shown in Image.

Image of tFileInputXMl component.

We have set XPATH Loop to “/items” as this is the root node of XML file. then I have created one column with “Document” data type. see in Image.

Set tFileInputXML for Nested loops
Set tFileInputXML for Nested loops

tXMLMap and connect with tFileinputXML component using “Main” link.

Open tXMLMap right click on “items” node at right(source) side then select “Import from file”.

provide XSD file it will automatically create all sub nodes. like below image.

Set batter & toppiq as “loop” element by right click  and select ” As loop element”

Configure tXMlMap
Configure tXMlMap

Create two outputs,  “batter” and “tappinq” then drag respective columns to “batter & tappinq”

Click on “set loop function” box it will open a window wherein you have add new row and select respective loop path e.g. for “toppinq” i will select “toppinq” loop like wise batter for “batter” output.

Your Final settings looks like below image.

Configuration tXMLMap output with loop path
Configuration tXMLMap output with loop path

we are ready to get the result add tLogRow components for each  of the output and execute the job it show results shown in image.

Nested XML loop parse Output
Nested XML loop parse Output

 

Talend Tips & Tricks

This post will contain Talend Tips & Tricks which saves your time while Coding, Java Converts, If Conditions, Expression Builder and many more area.

  • Always do Empty string and Null handling before converting.

!Relational.ISNULL(row1.StartDate) && row1.StartDate!=""?TalendDate.parseDate("yyyy-MM-dd", "2015-02-19"): TalendDate.parseDate("yyyy-MM-dd", "1999-01-01")

  •  Comparing with String values always use like below.

"Umesh".equals(row1.UserName) or "Umesh".equalsIgnoreCase(row1.UserName) instead of  "Umesh"=row1.UserName or row1.UserName.equalsIgnoreCase("Umesh")

  • Avoid using Short hand method or class methods for string.

Use StringHandling.TRIM(string); //for Trim

Use String.valueOf(SomeObject) instead .toString()

Use TalendDate.isDate("yyyy-MM-dd", "2015-02-19") for date validation

Use Relational.ISNULL(null) function to check whether string is null or not.

  • Exclude unwanted routines from build.

Remove or add required routines to the job, because unnecessarily it will be exported with job. you can manage routines by following below steps.

  1. Right Click on job from Repository.
  2. Select Option “Setup routine dependencies”.
  3. Now you will have new window with two tabs one for “user routines” and other one for ” System routines” by using [+] or remove button on each tab you can manage routines.
  4. see the Image for more details.
Exclude routines from Talend Build
Exclude routines from Talend Build

adding more….

If you think I am missing some other areas then do let me know.

How to add third party libraries to Talend project?

If you want to add/load third party libraries in Talend Project, then you can choose any of the solution below.

  • Window -> Preferences -> Java -> User Libraries This will include jar files for all the project jobs.
  • Use the tLibraryLoad component to load a lib file in a job.
  • Use Routines “Edit Routine Libraries” option
    • Right click on Routine
    • Select Option “Edit Routine Libraries
    • On popup window click on “New” Button.
    • Select “Browse a Library file” option.
    • Browse and select required Library.
    • Click on “if the library file is required” click “Ok”.
  • Another ways is use “Module” Tab to download and installed Library.

Set Java Job Upper Heap Memory Limit

Talend provides many features for improving performance of Talend Job execution; this post I will describe how to increase Java Heap space.

You can allocate Heap space based on available memory on your machine.

  1. Using Run Tab.
    1. Go to the Talend Run Tab and then select “ Advance Setting”
    2. Select checkbox “Use specific JVM argument” option, it will enable argument Table.
    3. Click on “New” button to add first parameter for minimum memory allocation by writing “-Xms256M” this code.
    4. Do the same for to add second parameter for maximum memory allocation by writing “-Xmx1024M” this code.
    5. See the image for more details.
Configuration For heapSpace
Configuration For heapSpace

if you want to modify existing assigned memory size then double click on argument row it will open a pop up to edit there you can modify it.

  1. Using Preference menu.
    1. Go to Window menu and select “Preference” option.
    2. It will open new dialog box with various configuration options.
    3. Click on “Talend” node it will expand and show you other options.
    4. Inside “Talend” node click on “Run/Debug” option it will show you various options but we will add only JVM arguments as we did in step one above.

See the image for more details.

Configuration For heapSpace
Configuration For heapSpace

These are the two ways of assigning JVM arguments for Java Heap space but if you want to supply JVM argument at run time then you have to modify your .bat or .sh file based on running environment.

After opening .bat or .sh file you can see code like “java -Xms256M -Xmx1024M”  this can be modify and saved, once that is done then job will use modified arguments instead default one which we have set on above steps.

tFileList Exclude Mask

This post I will describe you how to exclude files using tFileList component.

Below are our sample files which stored in folder.

Sample Files
Sample Files

From above file list we want to read only files with name starts with “Orders_” and ends with “.csv” therefore we are using tFileList mask to get the file list.

Add tFileList component and configure as follows.

tFileList1 Configuration

tFileList1 Configuration

Now you will get all the files from mentioned location but we want to exclude two files which contains “US or USA” so let’s use Advance setting of tFileList and configure as follows.

tFileList configuration
tFileList configuration

Here you can see I have use regular expression to exclude files and the regular expression is “Orders_US.*” after running job I will get only one file which I wanted to process here is the output.

tFileList Exclude Mask Output
tFileList Exclude Mask Output

If you want to exclude multiple types of file then use comma to separate each pattern like below.

“(Orders_US.*),(Orders_UAE.*)”

Validate CSV headers in Talend.

File processing is a day to day task in ETL world, and there is huge need of validation regarding source file format, headers, footers, column name, data type and so on, thanks to tSchemaComplianceCheck component which can do most of the validation like.

  • Length checking.
  • Date pattern/format.
  • Data Types

But does not support number of columns and column sequence validation, that which we have to manage using java code, in this post I will describe you how to validate column names and their sequence.

This is our final job design.

Complete CSV Validation Job
Complete CSV Validation Job

Let’s start with adding first component to job designer. Add tFileList and configure to get expected files.

Add tFileInputFullRow component and configure as shown in below screen.

tFileInputFullRow Configuration

tFileInputFullRow Configuration

  • Add tMap and connect with main link from tFileinputFullRow component.
  • Add tFixedFlowInput and connect with lookup link to tMap then configure as follows.

Note: if you have your refrence header row stored in file or database you can use it instead of tFixedFlowInput.

tFixedFlowInput Configuration
tFixedFlowInput Configuration
  • Configure tMap as follows.
    • Make inner join with your reference line and main line of input.
    • Add two outputs and named it as “matching” and “reject” respectively.
    • In the reject output click on setting “catch lookup inner join reject”=true
    • Add source line to both the flows.

See image for more details.

tMap Setting
tMap Setting
  • Add tJava next to tMap and connect with “matching” flow.
  • Add another tJava next to tMap and connect with “reject” flow.
  • Add tFileInputDelimited and connect with first tjava using “iterate” flow.
  • Configure tFileInputDelimited as shown in below image.

Add tLogRow component to see the output from file.

tFileInputDelimited Configuration
tFileInputDelimited Configuration

You can see that for each file whole sub job will be executed if it is matching with header row then it will be used for reading.

You can connect reject row to make a note of rejected file based on your requirement.

Create File Name with Date and TimeStamp

This post I will describe you how to name a file with Timestamps in Talend. File name format depends upon your business requirement, for example your business requirement is to name file with time stamp like “Order_yyyyMMdd_hhmmss.dat” so it will have time stamp up to seconds hence you want to get the same file for reading or any other purpose you will not find out easily therefore you can maintain file names in variable to access it later.

In our scenario we will create a file with above name format and then same file will be used to read and display result after reading will use same file copy to the same day folder name.

This is our final job design.

 

File Name With Time Stamp
File Name With Time Stamp
  • Create context variable context.FileName as string to hold the file name.
  • Add tJava and write below code in it.

context.FileName=TalendDate.getDate(“yyyyMMdd_hhmmss”);

  • Now add tRowGenerator to generate dummy data ( you can use your source e.g. database, file or anything) and link with tJava using “OnSubJobOk”.
  • Add tFileOutputDeimited component and connect with tRowGenerator using main flow then configure as shown in below image.
Add File Name for Time Stamp
Add File Name for Time Stamp

As you can see we have assign file name from context variable, in the same way you can add dynamic file name like “D:/Orders”+TalendDate.getDate(“yyyyMMdd_hhmmss”)+”.csv” this.

Our source file created with expected file name that is “Orders_20150129_134310.csv”, now we want to read same file so follow below steps.

  • Add tFileInputDelimited and connect with tRowGenerator using “OnSubJobOk” link.
  • Configure tFileInputDelimited component as shown in below image.
tFileInputDelimited
tFileInputDelimited configuration & setting

You can observe that we are using file name in same way we did for file creation, because we don`t know how long file creation will take if it exceed in more than one second then you will miss the file name previously created to avoid that we are storing file name in context.FileName variable.

Now you have read the file and want to copy it to some other location, but it should be stored in folder with today’s date when it was created to do so,

  • Add tFileCopy component and link with tFileInputDelimited using “OnSubJobOk” link.
  • Configure tFileCopy component as shown in image.
tFileCopy Configuration
tFileCopy Configuration

You can notice that we are using same file name as we are using for rest of the component above. And the only change here is we are creating directory with Dynamic name by using “D:/”+TalendDate.getDate(“yyyyMMdd”) this code. This component allows us to create directory if it is not exist with provided name.

We have moved entire file from source location to dynamically created folder. Same you can use dynamic file or directory name as per business need.