TABLUM.IO
.:.

Turning RSS Feeds into a SQL Database Without Coding

This article provides an alternative way to ingest XML and JSON feeds into a SQL database and conduct ad-hoc reporting with minimal effort. As an example, we’ll examine the ranking and rating of App Store applications from the iTunes RSS feed, which will be turned into a relational database and queried using SQL.

The App Store provides an RSS feed with a list of Apps, Tracks, Books, and other entities in XML and JSON formats. However, this data is unstructured and can’t be easily analyzed. To make it more suitable for ad-hoc reporting, it needs to be ingested into a SQL database and queried with a SQL console.

Data analysts have to do a lot of tedious work to get external feed data into a SQL database. This includes using an HTTP client to fetch the feed content, parsing it into a structured form, cleaning and transforming the nested structure into a 2-D array, setting up a database, and creating a table with a schema that matches the feed content or specific fields from it. All of this is necessary to make sure that the data is ready for analysis and requires either a combination of tools and services or scripting in Python.

However, there is a simpler way that can save time and make the process more enjoyable. In this post, I’ll show you how to quickly get external feed data into a SQL database without writing a single line of code. To demonstrate, I will be using iTunes feed to compare the top apps in the “meditation” category by their rankings and ratings in two countries of my interest — Spain and Portugal.

Data Source

I found a publicly available iTunes feed on the internet that returns a JSON output containing a list of up to 200 apps for the search term, e.g. “meditation”. You’re able to fetch the list for specific country by adding the “&country=” parameter. This way, you can get up to 200 results for each country the app is available in.

For TOP-200 apps in Spain: https://itunes.apple.com/search?term=meditation&entity=software&limit=200&country=ES

Getting Data Ready for Analysis

I use TABLUM.IO to quickly load data from CSV, XML, and JSON files, and get the data instantly ready for ad-hoc reporting.

How it works

Spain (View #1)

I’ll use the following URL and parameters to fetch data:
It will return dozens of fields for each app. So I’m going to provide extra parameters to the “URL Downloader” to specify the columns I need to keep in the resulting output:
cols="trackId,trackName,averageUserRating,userRatingCount,averageUserRatingForCurrentVersion,userRatingCountForCurrentVersion"

I will end up with:

  • trackId — App ID (can be used in a request for getting a list of comments),
  • trackName — App’s name,
  • averageUserRating — the average user rating (by country),
  • userRatingCount — the number of user ratings an app has received,
  • averageUserRatingforCurrentVersion — average user rating for the current app version,
  • userRatingCountForCurrentVersion — the total number of ratings submitted by users for the current version of the app.

TABLUM.IO returns a SQL table with the top 200 most popular meditation apps from the Spanish App Store:

At the bottom of the page, you can now see a blue rectangle. This is the View #1 (think of it as a “tab” in GoogleSheets) for the top-200 meditation apps in Spain:


Portugal (View #2)

Now we are going to create the second view with Portiguese top 200 apps same way we did for Spanish feed. Click the “+” button to create a new View:

Insert the URL, set the filter parameters for columns to return, and click the “Run Query” button:

https://itunes.apple.com/search?term=meditation&entity=software&limit=200&country=PT
cols="trackId,trackName,averageUserRating,userRatingCount,averageUserRatingForCurrentVersion,userRatingCountForCurrentVersion"

Here we’ve got the TOP-200 apps from the “meditation” category in Portugal:


Spain+Portugal, “Joined” data (View #3)

Now we’re going to create the third view with an “Ingested Data” type so we could qeury results from already ingested data via SQL commands.
Click the “+” button to create a new View, as we did before. Then, from the drop-down menu select the “Ingested Data” item to be able to query joined data from both View 1 and View 2:

I’ll put the following query into the SQL Console to do this:
-- Spanish Application TOP 
WITH raw_es as (
  SELECT rowNumberInAllBlocks() + 1 as es_rank, * -- all columns from V_1001 + rank
  FROM DS_.V_1001
),

-- Portugese Application TOP 
raw_pt as (
  SELECT rowNumberInAllBlocks() + 1 as pt_rank, *  -- all columns from V_1001 + rank 
  FROM DS_.V_1002
)

-- Mix them together by their app_id and calculate the diff in rate
SELECT trackId as app_id, 
       trackName as appName,
       es_rank,
       pt_rank,
       es_rank - pt_rank as rank_diff,
       averageUserRating as es_rating,
       raw_pt.averageUserRating as pt_rating,
       es_rating - pt_rating as rating_diff
FROM raw_es
INNER JOIN raw_pt
ON raw_es.trackId = raw_pt.trackId

As a side note: when the data gets imported by TABLUM.IO, it becomes a ClickHouse SQL table. So any further queries to the ingested data is done using ClickHouse SQL syntax.

The output results will contain the following set of columns:

  • app_id
  • appName — App’s name,
  • es_rank—App’s ranking in the Spanish App Store,
  • pt_rank — App’s ranking in the Portuguese App Store,
  • rank_diff — the difference between app’s ranking in Spain and Portugal,
  • es_ratnig — App’s rating in the Spanish App Store,
  • pt_ratnig — App’s rating in the Portuguese App Store,
  • rating_diff — the difference between app’s rating in Spain and Portugal.

This query produces a final SQL table that shows the difference in ranking and rating for the same apps between Spanish and Portuguese. It is based on data from both View 1 and View 2, which are enriched with rankings and joined by application ID from the App Store:


Result

I ingested the iTunes feed to create a SQL table that is ready for ad-hoc reporting and further manipulation. By comparing the top apps in the “meditation” category, I was able to identify the differences in rankings and ratings between Spanish and Portuguese App Stores. The third and sixth columns of the table illustrate this difference.
Product Use Cases
Made on
Tilda