The following is a simple program for building data sets using the United Nation’s COMTRADE API.
Note: this code does not apply to the “Bulk download” API, which allows users to download entire chunks of annual or monthly data. Instead, this program is for making specific, repeated queries for import (export) data between certain countries.
1. First Things First: Some Prepation.
My script loops over lists of reporting countries and their counterparts. I have these lists saved as CSV files.
The UN Comtrade system uses its own coding scheme. Thus, we do not make queries based on country names, but using their identifiers found here coding scheme.
I have list of countries and their corresponding codes saved as .csv files. I have a trade partner list (uncomtrade_partners.csv) and list of reporting countries (uncomtrade_reporting.csv), both of which have the following layout:
The countries I wish to query have a column (data.frame$V3) with a 1 indicating the country I wish to include in my query.
The following code sets takes these .csv files, and creates two lists of UN COMTRADE codes that correspond to the reporting countries I want and the partner countries I want.
2. The Meat: Querying and Saving Data Chunks
Next I define two programs.
The first program is a file for retreiving the UN COMTRADE data in (.csv) form from the UN COMTRADE’s bulk download API.
The following code is based loosely on COMTRADE’s own R code
One important distiction between this code and mine is that I found that on non-Windows computers, the UN’s code didn’t handle HTTPS very well. In fact, using their code I tended to get connection errors, in particular,
Yup, an HTTPS related error.
Thus, the line
set_config( config( ssl_verifypeer = 0L ) ) deals with R’s HTTPS issues (As reported here).
read.csv( text = content( GET(string), as="text" , type = "text/csv" , header = TRUE )) properly retrieves a concatenated HTTPS URL stored in the
string, and parsing the retrieved text content into a coherent .CSV table. (The function deals with .JSON-based queries in similar ways, but I just prefer .CSV data.)
The second program simply saves each date query.
Specifically, within the
get.Comtrade() function above, we dumped our queried .CSV tables into a local directory. (As opposed to holding them in memory, given R’s funky memory issues). This was done using our function,
save.Comtrade(), which takes an individual data.frame and saves it using
3. Making Multiple Queries Using mapply()
mapply() function runs the
get.Comtrade() UN COMTRADE retrieval program, doing so for each combination of countries from the reporting_list and the partners_list generated in Part 1 – holding a list of parameters constant with
mapply() is like a loop, applying the API querying function to each reporting+partner country combination. However, we cannot directly use the reporting_list and the partners_list as the two argument lists in
mapply() (doing so won’t generate a complete list of combinations). Instead, we use the
expand.grid() function to generate all reporting+partner country combinations; and this table gives us two columns,
argument_array$partner_arguments, which are the actual country argument lists used by
mapply()’s main value added is cycling over many argument lists, we want to hold some
get.Comtrade() arguments constant. To do so, we supply a list of arguments to
MoreArgs. The parameters we hold costant are in the list,
list( ps = "all" , fmt = "csv" , rg = "1" , px = "h0" , cc = "all").
4. Stacking Saved Data Chunks into a Dataset
get.Comtrade() repeatedly queried and saved each chunk of queried data to a directory . However, 100s of data chunks are hardly useful as a dataset. We was to assemble the pieces into a single coherent dataset.
Thus, the following program assembles our saved queries into a coherent dataset.
The above code makes a list of all the files in /your/file/path that match our naming scheme,
_trade_h0_all_[REPORTING COUNTRY]_[PARTNER COUNTRY]_raw.csv_.
Next, we efficiently open all these files and stack them into a coherent dataset. We use
lapply() to loop over the list of .csv files and open each using nimble
fread() function (from the wonderful __data.table package). The list of opened datasets are then “stacked” into a single dataset using
Finally, we we save the assembled dataset to our local direction, /your/file/path. You can use your preferred method of course, especially if you run into memory problems writing large .CSV files with R (I prefer using the experimental
fwrite() function in the data.table package, which I discuss here.).