Monthly Archives: March 2015
This post we will generate sample data for further utilisation. In ETL world, if you want to test some components you must need data, but getting a desire sample data is very difficult.
To generate sample data we will use tRowGenerator component which has built in editor where you can select functions or write your won expressions to get expected sample data.
Spte 1: Start writing on Talend designer canvas “trow…” it will show you list of component, from the list select tRowGenerator component.
Note: This is new feature from Talend wherein you don`t need to search component in pallet and then drag and drop.
See in picture.
Step 2: Double click on component and do the setting in tRowGenerator using it`s editor.
- Click on [+] sign to add new column, add new column with name =”name”,
- Select function form “Function” tab on same columns grid.
- Select “TalendDataGenerator.getFirstName” function from function list.
- Add following columns and select the relavant function as we did previously.
- City = TalendDataGenerator.getUsCity
- Now we have 4 columns but we need another one column for Identity number, so add column “ID” with “integer” data type.
- In function Tab select “…” (three dots). You will see function parameters window with single row, down to the columns grid.
- There are three tabs first one “Parameter” is fixed, with no edit option, second on for “value” and last one for “comment”
- Click on value tab it will show you “…” dots then click on it, it opens expression builder for edit. you can add your custom logic here.
- Select “Numeric” routine then select “sequence” keep the default values.
- In “Number of Rows for RowGenerator” text box write value=10. ( we required only ten rows to be generated)
- Click on preview button on below window it will show you generated sample data as a result, it will look like below Image.
For demonstration we have generated only ten rows, but you are free to generate as many as rows you require.
Step 3: Add tLogRow, and connect with tRowGenerator using main flow.
Step 4: Run the job it will show you below result.
If you want to insert this data to the file or database then use the specific compoent e.g. tFileOutputDelimited to store in delimited file.
This post I will describe, how to get most resent file from directory based on display date or a date from file name.
We have below sample files in our directory and every file has date in the name of file, based on that we will decide which file is most resent rather than file created date/ modified date.
As you can see, we have three list of files.
- sales element 11 2014.xls has been modified at 28-01-2015
- sales element 02 2015.xls has been modified at 28-01-2015
- sales element 12 2014.xls has been modified at 03-03-2015
If we use file created or modified date to get most resent file then we will get ” sales element 12 2014.xls” which is a wrong file.
To get a latest file from directory we will use below steps.
Step 1: Add tFileList component and configure it get all .xls files form directory. see the image for details.
Step 2: Add tFileProperties component and connect with tFileList using Iterator link, then provide file path and name from global variable. which looks like this ((String)globalMap.get(“tFileList_1_CURRENT_FILEPATH”)).
Step 3: Add tMap after tFileProperties and connect with main link and do the fowling setting in it.
- Create output name as “FileList”.
- Add all the source columns to this output.
- Add new variable in tMap using variable creation, write this code in it.
- Create new column in output with the name “DisplayDate” and datatype is Date.
- Add below code in it.
TalendDate.parseDate(“MM yyyy”, Var.var1)
- See the image for more details.
Step 4: Add tHashOutput component after tMap and connect with main link.
Step 5: Add tHashInput Component below tFileList and link using “OnSubJobOk” trigger.
Step 6: Copy Schema from tHashOutput to tHashInput.
Step 7: Add tAggregateRow component and connect with tHashInput using main flow link. Do the basic setting like below.
Step 8: Add tLogRow to check the result. you will see the output as below.
Step 9: Your job design should be look like in below Image.
Note: You can avoid using tHash***** components just use tAggregateRow after tMap and do the setting as is, it will work.
I have listed down some of the common differences between tJava, tJavaRow and tJavaFlex component.
|Use component to integrate your custom Java code||Yes||Yes||Yes|
|It will be executed first but only once in the subjob.||Yes||No||–|
|It requires Input Flow||No||Yes||No|
|It Requires Output Flow||No||If output Schema Defined||If output Schema Defined|
|It can be used as Start of the Job||Yes||No||Yes|
|It can be used as a separate subjob||Yes||No||Yes|
|It accepts Main Flow or Iterator Flow||Both||Only Main||Both|
|It has three Java code parts (start, main, end)||No||No||Yes|
|It will Auto propagate Data||No||No||Yes|
tJava advantage: this component can be use as trigger component, At start of the job at end of the job.
tJavaRow : this component required main flow so it can be used at end of the Sub Job but not at the start of subjob.
tJavaFlex: this component holds capabilities of tJava & tJavaRows you can use this component for row generation, or at start of job, or at then end of sub job. or individual subjob. It gives you ability to auto propagate data.
You came across here that means it is worth of writing this post. 🙂
Whenever I go for the interview there will be some new questions, so I thought why not to draft all these questions at single place?
It is just attempt to remember all Talend Interview question nothing else.
- Difference between tMap and tJoin component in Talend .
- Difference between tAggregaterow and tAggregatesortedrow.
- Difference between tJava,tJavarow,tJavaflex.
- How to improve the performance of Talend job having complex design?
- Difference between built in schema and Repository.
- What is the declaration of method which we define in system routine?
- What is XMS and XMX parameter in Talend?
- How to resolve heap space issue in Talend ?
- How to do the exception handling in Talend?
- What is Default join for tMap.
- What are the different lookup patterns available with Talend?
- What is the basic requirement while updating the perticular table?
- How to generate surrogate key by using Talend?
- What is the use of Expression editor in Talend?
- How to debug a particular Talend job.
- What is context variable and context group?
- How to pass the variables from parent job to child job and from child job to Parent?
- How to forcefully exit the job.
- Explain the use of tContextload.
- How to execute multiple queries by using Talend?
- How to do the multithreading while executing the Job?
- What is hashmap in Talend and how to use it?
- How to do the full join in Talend.Explain the steps.
- How to do the right outer join in Talend? Explain the steps?
- How the ELT database components are differ from ETL database Components.
- How to use the external libraries in Talend?
- How to pass data from parent job to child jobs through tRunjob component ?
- How to load context variables dynamically?
- How to Share DB Connection in Talend?
- Skip Header rows and footer rows before load.
- What is Incremental Load? Describe using Talend.
- How can you pass a value form parent job to child job in Talend.
- How to call stored procedure and function in Talend Job
If you have any question which is not listed here please send me i will add it to the list. with your reference.