Category Archives: Configuration

This is a Generic category of Configuration for any component tool or Databases.

Talend DI Interview Questions

You came across here that means it is worth of writing this post. 🙂

Whenever I go for the interview there will be some new questions, so I thought why not to draft all these questions at single place?

It is just attempt to remember all Talend Interview question nothing else.

  1.  Difference between tMap and tJoin component in Talend .
  2. Difference between tAggregaterow and tAggregatesortedrow.
  3. Difference between tJava,tJavarow,tJavaflex.
  4. How to improve the performance of Talend job having complex design?
  5. Difference between built in schema and Repository.
  6. What is the declaration of method which we define in system routine?
  7. What is XMS and XMX parameter in Talend?
  8. How to resolve heap space issue in Talend ?
  9. How to do the exception handling in Talend?
  10. What is Default join for tMap.
  11. What are the different lookup patterns available with Talend?
  12. What is the basic requirement while updating the perticular table?
  13. How to generate surrogate key by using Talend?
  14. What is the use of Expression editor in Talend?
  15. How to debug a particular Talend job.
  16. What is context variable and context group?
  17. How to pass the variables from parent job to child job and from child job to Parent?
  18. How to forcefully exit the job.
  19. Explain the use of tContextload.
  20. How to execute multiple queries by using Talend?
  21. How to do the multithreading while executing the Job?
  22. What is hashmap in Talend and how to use it?
  23. How to do the full join in Talend.Explain the steps.
  24. How to do the right outer join in Talend? Explain the steps?
  25. How the ELT database components are differ from ETL database Components.
  26. How to use the external libraries in Talend?
  27. How to pass data from parent job to child jobs through tRunjob component ?
  28. How to load context variables dynamically?
  29. How to Share DB Connection in Talend?
  30. Skip Header rows and footer rows before load.
  31. What is Incremental Load? Describe using Talend.
  32. How can you pass a value form parent job to child job in Talend.
  33. How to call stored procedure and function in Talend Job

If you have any question which is not listed here please send me i will add it to the list. with your reference.

Local, Remote, and Local offline Project

When you start Talend you will get wellcome screen with few option to select, create or import projects, but if you are starting Talend very first time then you should be aware of following project and connection type.

Create/Connect Local Project. 

  • Click on three dot button (…) which take to the next step of creating new project.
  • At left side click on [+] button to add new connection.
  • Select Repository as “local”
  • Provide “Name” for connection like “local”, “LocalTalend”
  • Provide Description if required.
  • write email address and password to the respective filed.
  • Select “workspace” area which is more important because all your jobs and projects will be created in this area.
  • Click on “OK” button. here you have created your first connection as local.

By this way you can add many connection as per your requirement.

Create local project in Talend

Create local project in Talend

Create/Connect to Remote Project. 

  • Select “Remote” option from Repository drop-down box.
  • Provide valid name in “Name” field, it will be defaulted with “remote”
  • Description is optional, but recommend you to provide.
  • provide TAC user email address, it try using this email id with your remote repository.
  • provide Password against email id, this should be configured/created prior to this step using TAC.
  • provide your local machines workspace area ( where do want to create local copy of repository?).
  • Web-app url, you need to provide valid repository url wherein repository have been configured, generally you could find it in TAC or ask to your TAC administrator.
  •  check on “check-URL” to verify repository connection/access.
  • Click on “OK” and go back to login window.
Connecting to the Remote Project Talend

Connecting to the Remote Project Talend

  • once came back on previous screen you will able to select remote repository.
  • Now select Project from “project list”.
  • if you don`t see anything then click on “refresh button” it will show you available projects from repository.
  • SVN branch, will populated automatic if you different branch address for project then you select it from available branches.
  • click on “Open” button, it will show you Studio with available project & Jobs.

Offline Project. 

  • Offline project is available in enterprise addition.
  • You don`t need to configure or create, it should be activated from TAC.
  • By using this you can work offline when you are not connected to the “Remote” project. it ill check-in all changes to the repository as soon as you get connected.

You are free to ask your questions.

Talend Tips & Tricks

This post will contain Talend Tips & Tricks which saves your time while Coding, Java Converts, If Conditions, Expression Builder and many more area.

  • Always do Empty string and Null handling before converting.

!Relational.ISNULL(row1.StartDate) && row1.StartDate!=""?TalendDate.parseDate("yyyy-MM-dd", "2015-02-19"): TalendDate.parseDate("yyyy-MM-dd", "1999-01-01")

  •  Comparing with String values always use like below.

"Umesh".equals(row1.UserName) or "Umesh".equalsIgnoreCase(row1.UserName) instead of  "Umesh"=row1.UserName or row1.UserName.equalsIgnoreCase("Umesh")

  • Avoid using Short hand method or class methods for string.

Use StringHandling.TRIM(string); //for Trim

Use String.valueOf(SomeObject) instead .toString()

Use TalendDate.isDate("yyyy-MM-dd", "2015-02-19") for date validation

Use Relational.ISNULL(null) function to check whether string is null or not.

  • Exclude unwanted routines from build.

Remove or add required routines to the job, because unnecessarily it will be exported with job. you can manage routines by following below steps.

  1. Right Click on job from Repository.
  2. Select Option “Setup routine dependencies”.
  3. Now you will have new window with two tabs one for “user routines” and other one for ” System routines” by using [+] or remove button on each tab you can manage routines.
  4. see the Image for more details.
Exclude routines from Talend Build

Exclude routines from Talend Build

adding more….

If you think I am missing some other areas then do let me know.

How to add third party libraries to Talend project?

If you want to add/load third party libraries in Talend Project, then you can choose any of the solution below.

  • Window -> Preferences -> Java -> User Libraries This will include jar files for all the project jobs.
  • Use the tLibraryLoad component to load a lib file in a job.
  • Use Routines “Edit Routine Libraries” option
    • Right click on Routine
    • Select Option “Edit Routine Libraries
    • On popup window click on “New” Button.
    • Select “Browse a Library file” option.
    • Browse and select required Library.
    • Click on “if the library file is required” click “Ok”.
  • Another ways is use “Module” Tab to download and installed Library.

Set Java Job Upper Heap Memory Limit

Talend provides many features for improving performance of Talend Job execution; this post I will describe how to increase Java Heap space.

You can allocate Heap space based on available memory on your machine.

  1. Using Run Tab.
    1. Go to the Talend Run Tab and then select “ Advance Setting”
    2. Select checkbox “Use specific JVM argument” option, it will enable argument Table.
    3. Click on “New” button to add first parameter for minimum memory allocation by writing “-Xms256M” this code.
    4. Do the same for to add second parameter for maximum memory allocation by writing “-Xmx1024M” this code.
    5. See the image for more details.
Configuration For heapSpace

Configuration For heapSpace

if you want to modify existing assigned memory size then double click on argument row it will open a pop up to edit there you can modify it.

  1. Using Preference menu.
    1. Go to Window menu and select “Preference” option.
    2. It will open new dialog box with various configuration options.
    3. Click on “Talend” node it will expand and show you other options.
    4. Inside “Talend” node click on “Run/Debug” option it will show you various options but we will add only JVM arguments as we did in step one above.

See the image for more details.

Configuration For heapSpace

Configuration For heapSpace

These are the two ways of assigning JVM arguments for Java Heap space but if you want to supply JVM argument at run time then you have to modify your .bat or .sh file based on running environment.

After opening .bat or .sh file you can see code like “java -Xms256M -Xmx1024M”  this can be modify and saved, once that is done then job will use modified arguments instead default one which we have set on above steps.

tFileList Exclude Mask

This post I will describe you how to exclude files using tFileList component.

Below are our sample files which stored in folder.

Sample Files

Sample Files

From above file list we want to read only files with name starts with “Orders_” and ends with “.csv” therefore we are using tFileList mask to get the file list.

Add tFileList component and configure as follows.

tFileList1 Configuration

tFileList1 Configuration

Now you will get all the files from mentioned location but we want to exclude two files which contains “US or USA” so let’s use Advance setting of tFileList and configure as follows.

tFileList configuration

tFileList configuration

Here you can see I have use regular expression to exclude files and the regular expression is “Orders_US.*” after running job I will get only one file which I wanted to process here is the output.

tFileList Exclude Mask Output

tFileList Exclude Mask Output

If you want to exclude multiple types of file then use comma to separate each pattern like below.

“(Orders_US.*),(Orders_UAE.*)”

Useful Configuration and Log Variables

Talend has various global variables which can be use for logging and configuration propose. I have identified them and listed down below, you can use them as it is in expressions or costume code component.

  • pid
  • rootPid
  • fatherPid
  • clientHost
  • defaultClientHost
  • contextStr
  • startTime
  • isChildJob

These variables name itself suggesting the use of that particular variable.

You can use it like.

context.ProcessID=pid;

 

Set/change Workspace default location

To set this workspace location, edit “config.ini” file in “configuration” folder at root of Talend installation folder.
In this file you can use variable “osgi.instance.area” or “osgi.instance.area.default” to specify your workspace location.
Follow below steps to complete this activity.

Step 1. Go to the Talend Installed directory then configuration folder like C:\Talend\TOS_DI-Win32-……..\configuration.

Step 2. Open config.ini file.

Step 3. Write down below code at first line of file.

osgi.instance.area=provide your workspace path.

e.g.

osgi.instance.area=C:/Talend/TOS_DI-Win32…../workspace

Step 4. Save config.ini file keep open.

Step 5. Restart Talend, once you see first window then remove above line of code, save and close the config.ini file.

 

Syntax error on tokens, delete these tokens

You may face this error when start development using Talend, once you get hands on with Talend  this type of errors will be disappeared because, this error comes whenever you do typo mistakes or any configuration during populating schema from one component to another. if you follow simple steps in working environment you will not face this any more.

  • All string which are set for any component must be closed with double quotes
    • File name and paths
    • line feed string
    • column separator
    • Host names
    • URL
    • Table name
    • Database name and many more
  • if you are using schema for any component then make sure the same schema is assign to next components schema other wise it throw error.
    • tJavaRow is most popular component and many times it throws this error because schema is different than code generated column list.
    • variables are used instead of column names
    • if you create empty row in schema (with no column name)

if you follow these rules you will not this error again.

How to solve “GC overhead limit exceeded” error

As the names suggested Java try to remove unused object but fail, because it not able to handle so many object created by Talend code generator.  There’s simply too much objects being created too fast, and the standard Java GC mechanism (on 1.6 at least) is not able to handle it.

This error may occur during compiling job or at running job, so we have two way to fix.

For Run-Time solution is

  • Opne Run TAB
  • click on Advance Tab
  • dobule click on -Xmas and incrise the size upto GB eg. -Xmas2G or -Xmas-100M
  • double click on -Xmax and incrise the size upto GB eg. -Xmas4G or -Xmas-200M

For runtime Error Configuration of jobs JVM parameters is different from studio jvm startup parameters

In run-time case, you have to add/customize the JVM parameters to your binary.ini file in <TIS Install> directory if you are using The .ini files affect the studio (including compilation of jobs) but not the running of jobs.
For the studio memory, if you run TOS_DI-win-x86_64.exe then you need to modify TOS_DI-win-x86_64.ini.

These two save your life from “GC overhead limit exceeded” error