Blog Archives

Generate sample data in Talend

This post we will generate sample data for further utilisation.  In ETL world, if you want to test some components you must need data, but getting a desire sample data is very difficult.

To generate sample data we will use tRowGenerator component which has built in editor where you can select functions or write your won expressions to get expected sample data.

Spte 1: Start writing on Talend designer canvas “trow…” it will show you list of component, from the list select tRowGenerator component.

Note: This is new feature from Talend wherein you don`t need to search component in pallet and then drag and drop.

See in picture.

Add Component writing  on Talend Designer pan

Add Component writing on Talend Designer pan

Step 2: Double click on component and do the setting in tRowGenerator using it`s editor.

  • Click on [+] sign to add new column, add new column with name =”name”,
  • Select function form “Function” tab on same columns grid.
  • Select “TalendDataGenerator.getFirstName” function from function list.
  • Add following columns and select the relavant function as we did previously.
    • City = TalendDataGenerator.getUsCity
    • State=TalendDataGenerator.getUsState
    • Street=TalendDataGenerator.getUsStreet
  • Now we have 4 columns but we need another one column for Identity number, so add column “ID” with “integer” data type.
  • In function Tab select “” (three dots). You will see function parameters window with single row, down to the columns grid.
  • There are three tabs first one “Parameter” is fixed, with no edit option, second on for “value” and last one for “comment”
  • Click on value tab it will show you “…” dots then click on it, it opens expression builder for edit. you can add your custom logic here.
  • Select “Numeric” routine then select “sequence” keep the default values.
  • In “Number of Rows for RowGenerator” text box write value=10. ( we required only ten rows to be generated)
  • Click on preview button on below window it will show you generated sample data as a result, it will look like below Image.
tRowGenerator setting

tRowGenerator setting

For demonstration we have generated only ten rows, but you are free to generate as many as rows you require.

Step 3: Add tLogRow, and connect with tRowGenerator using main flow.

Step 4: Run the job it will show you below result.

tRowGenerator output sample data

tRowGenerator output sample data

If you want to insert this data to the file or database then use the specific compoent e.g. tFileOutputDelimited to store in delimited file.

Create File Name with Date and TimeStamp

This post I will describe you how to name a file with Timestamps in Talend. File name format depends upon your business requirement, for example your business requirement is to name file with time stamp like “Order_yyyyMMdd_hhmmss.dat” so it will have time stamp up to seconds hence you want to get the same file for reading or any other purpose you will not find out easily therefore you can maintain file names in variable to access it later.

In our scenario we will create a file with above name format and then same file will be used to read and display result after reading will use same file copy to the same day folder name.

This is our final job design.

 

File Name With Time Stamp

File Name With Time Stamp

  • Create context variable context.FileName as string to hold the file name.
  • Add tJava and write below code in it.

context.FileName=TalendDate.getDate(“yyyyMMdd_hhmmss”);

  • Now add tRowGenerator to generate dummy data ( you can use your source e.g. database, file or anything) and link with tJava using “OnSubJobOk”.
  • Add tFileOutputDeimited component and connect with tRowGenerator using main flow then configure as shown in below image.
Add File Name for Time Stamp

Add File Name for Time Stamp

As you can see we have assign file name from context variable, in the same way you can add dynamic file name like “D:/Orders”+TalendDate.getDate(“yyyyMMdd_hhmmss”)+”.csv” this.

Our source file created with expected file name that is “Orders_20150129_134310.csv”, now we want to read same file so follow below steps.

  • Add tFileInputDelimited and connect with tRowGenerator using “OnSubJobOk” link.
  • Configure tFileInputDelimited component as shown in below image.
tFileInputDelimited

tFileInputDelimited configuration & setting

You can observe that we are using file name in same way we did for file creation, because we don`t know how long file creation will take if it exceed in more than one second then you will miss the file name previously created to avoid that we are storing file name in context.FileName variable.

Now you have read the file and want to copy it to some other location, but it should be stored in folder with today’s date when it was created to do so,

  • Add tFileCopy component and link with tFileInputDelimited using “OnSubJobOk” link.
  • Configure tFileCopy component as shown in image.
tFileCopy Configuration

tFileCopy Configuration

You can notice that we are using same file name as we are using for rest of the component above. And the only change here is we are creating directory with Dynamic name by using “D:/”+TalendDate.getDate(“yyyyMMdd”) this code. This component allows us to create directory if it is not exist with provided name.

We have moved entire file from source location to dynamically created folder. Same you can use dynamic file or directory name as per business need.