Monthly Archives: March 2013

Pulling Twitter Updates Using Talend Open Studio -Part II

In previous post you have seen how to get a user ID`s from twitter API. In this post we will see how can we get the Users details.

Requirement for Demo. 
Talend Open Studio. 
Twitter API Access. 
JDK installed. 

Access to below API.

followers/ids
users/show
statuses/friends
statuses/followers

To get user`s detail we will have to use users/show API so make sure you are able to access this API. Once you  click on mention API you will gate a XML file which has many details, so making it simple and understandable we will take below details from XML, if you want then you can take all the details. 

Create a Job in Talend Open Studio and follow the step for creating mapping/Schema of users/show XML.


Click on Metadata Node, and right Click on File XML node, then click on Create File XML option from Pop up.

Provide mapping name and users/show XML File. 

Once done go to the next Tab and configure all the properties like below screen. and select below listed fields for display.



Our Xpath Loop expression is: /user/status
select below list of field using Ctrl+click and drag and drop to “Fileds to extract” and click on “Refresh Preview” button to make sure you have parsed XML properly. 

created_at
description
es_count
favourites_count
followers_count
following
friends_count
id
location
name
screen_name 
time_zone
usl
verified

Now we have ready our sample file with twitter user details. we have to store this information into CSV file, hence you need to drag and drop tFileOutPutDelimited, Drag and Drop schema mapping we created recently for XML on designer and select tFileInputXML. Connect tFileInputXML to tFileOutPutDelimited using Main connector , and synch source schema to tFileOutPutDelimited component, 

Give the output file path and name other configuration. once done execute the job to sure every thing is working fine. your final job look like below with output. 




Output



Here we have completed two part of Twitter API, one to get the user ID`s and other one is to get user details. 

In next part of this post we will integrate both jobs in single one to retrieve each user id and their details in CSV file. 




Advertisements

Pulling Twitter Updates Using Talend Open Studio -Part I

Twitter is most popular Micro Blogging site, and people like to get the details of users, events, elements, and followers. We will see how Talend Open Studio help us to automate Twitter user detail scrapping. It 

I splitted this post into 4 parts, so readers can go to the specific topic. 




Requirement for Demo. 
Talend Open Studio. 
Twitter API Access. 
JDK installed. 

Access to below API.

followers/ids
users/show
statuses/friends
statuses/followers

Above 4 API will used to get followers, Friends, and user details, our sample Twitter user name is “pubscode”. 

first we will call API to get all the followers Id`s then because we don`t have any other information associated with each Id so we have used these Id`s to get the detail information about each Id,  and each detail information will be stored in .CSV file.


Id`s API will return all the followers Id`s in below XML format so before move ahead we will create Mapping/Schema. 

It`s a simple XML file so i am skipping part creating XML schema using Talend, and directly jumping on how we can call the API through Talend and stored details in XML file. 

There are various ways to get API call done in Talend so I am explaining you a simplest way which i use.


Create a job with name “Twitter_API” and drop tFileFetch. Select tFileFetch and click on “Component Tab” you will see all the properties of tFileFetch so below screen will help you to configure properties. 

To make sure we have configured all the things properly just run the job and check whether you got the file “pubscode.xml” in your “Destination directory” text box you specified. 

Now we have our sample file ready to process using tFileInputXML 
Drop tFileInputXML component from Palette and  click on “Component Tab” to read XML file. select tFileInputXML component to configure all the required properties as shown in below screen. 


once you configured tFileInputXML Component connect tFileFetch to tFileInputXML using “OnComponentOK” trigger. final job will look like below screen. 




We have downloaded users ID`s in XML format, Next Blog will see, how can we get details of each user and stored their details to .CSV file. 

Parse XML from Google Drive Using Talend Open Studio

On this part we will see how we can load XML from Google Drive. I have below XML stored in my won Google Drive account, which is available for Download and view. So first of all please check you are able to download it. 

Required Things For Demo.
Sample XML (Download)
Talend Open Studio Installed

Here is our Sample XMl, you can Download it from above link. 


    
   <?xml version=”1.0″ encoding=”UTF-8″?>
<Itmes>
<item id=”111″ clientName=”SB”>
<details>
<detail child_id=”1″>
<name>Pen Drive</name>
<amount>2</amount>
</detail>
<detail child_id=”2″>
<name>Flash Drive</name>
<amount>20</amount>
</detail>
</details>
<tags>
<tag tag_id=”1″>
<name>CD</name>
</tag>
</tags>
</item>
<item id=”112″ clientName=”GJ”>
<details>
<detail child_id=”1″>
<name>Flopy</name>
<amount>1</amount>
</detail>
</details>
<tags>
<tag tag_id=”1″>
<name>USB Drive</name>
</tag>
<tag tag_id=”2″>
<name>USB 2.0</name>
</tag>
</tags>
</item>
</Itmes>

       
   

  • Create a new job named as “XML_From_GDrive_2_CSV“.
  • Create Metadata for above sample XML file using Metadata repository wizard see below screen.
XML Metdata
  • Once you Created Schema/ Mapping then Drop the tFileInputXML from metadata you created in the Job and make it as Built in.
  • Drop tFileFetch, tFileOutPutDelimited.
  • Configure tFileFetch property as below screen.
  • Set above XML download path in URI property.
  •  Now configure tFileInputXML with our Download Directory path with file name as below.
  • Drop tFileOutPutDelimited and configure as mention in below picture.
  • Your final job will be look like this run the application and check for result it will be shown you result we can see here.
  • Output