The insee package gathers tools to easily download data and metadata from insee BDM database.
It uses SDMX queries under the hood. Have a look at the detailed SDMX webservice page on insee.fr.
The first version of the package was published on CRAN 2020-07-29.
In order for someone working behind a proxy server to be able to use insee, it is necessary to modify system variables as follow.
Sys.setenv(http_proxy = "my_proxy_server")
Sys.setenv(https_proxy = "my_proxy_server")
In order for someone working behind a proxy server to be able to use insee, it is necessary to modify system variables as follow.
Sys.setenv(INSEE_download_option_method = "mymethod")
Sys.setenv(INSEE_download_option_port = "1234")
Sys.setenv(INSEE_download_option_extra = "-U : --proxy-myprotocol --proxy myproxy:1234")
Sys.setenv(INSEE_download_option_proxy = "myproxy")
Sys.setenv(INSEE_download_option_auth = "myprotocol")
This section will give you an overview of what you can do with insee.
Series have two identifiers the SDMX identifier and the so called idbank. Both can be used to download data.
INSEE BDM database offers more than 200 Datasets. The
get_dataset_list()
function returns the datasets catalogue
:
insee_dataset = get_dataset_list()
INSEE BDM database currently offers more than 150 000 series. The
get_idbank_list
function returns the series catalogue from
a dataset name.
idbank_list = get_idbank_list('BALANCE-PAIEMENTS')
The best way to download data is to find the right series key
(idbank), but how ? Indeed, in some cases it is not easy to understand
what are the differences among series, especially for non-French
speakers. To make the search easier, the best way is to use the
get_idbank_list
function with a dataset name, then it can
be helpful to filter with the columns FREQ, NATURE, UNIT etc. Moreover,
the insee package
provides the function add_insee_title
to get titles from
idbanks, either in English or in French. It is not advised to use the
function on the whole idbank dataset, as each SDMX query has 400-idbank
limit. Then, add_insee_title
function splits the list into
several lists of 400 idbanks each. Thus, the user should filter the
idbank dataset before using the function to avoid as much as possible
this bottleneck as the following example shows. After the data
retrieval, it is really nice to use the split_title
function on the dataframe to get more readable titles easy to use in
plots and add_insee_metadata
to get the metadata with the
data.
idbank_list_selected =
get_idbank_list("IPI-2015") %>% #industrial production index dataset
filter(FREQ == "M") %>% #monthly
filter(NATURE == "INDICE") %>% #index
filter(CORRECTION == "CVS-CJO") %>% #Working day and seasonally adjusted SA-WDA
#automotive industry and overall industrial production
filter(str_detect(NAF2,"^29$|A10-BE")) %>%
add_insee_title()
Another way to find a series key is to perform a keyword-based search
with the function search_insee
. Beware that this function
uses package internal data which might not be the most up-to-date. See
the following examples :
# search multiple patterns
dataset_survey_gdp = search_insee("Survey|gdp")
# data about paris
data_paris = search_insee('paris')
# all data
data_all = search_insee()
The get_insee_idbank
function should handle up to 1200
idbanks. It is then advised to narrow down the idbanks list used as
argument of the function. Otherwise, put the limit argument to FALSE to
ignore the function’s idbank limit.
library(insee)
# the user can make a manual list of idbanks to get the data
# example 1
data =
get_insee_idbank("001558315", "010540726") %>%
add_insee_metadata()
# using a list of idbanks extracted from the insee idbank dataset
# example 2 : household's confidence survey
df_idbank =
get_idbank_list("ENQ-CONJ-MENAGES") %>% #monthly households' confidence survey
add_insee_title() %>%
filter(CORRECTION == "CVS") #seasonally adjusted
list_idbank = df_idbank %>% pull(idbank)
data =
get_insee_idbank(list_idbank) %>%
split_title() %>%
add_insee_metadata()
# example 3 : get more than 1200 idbanks
idbank_dataset = get_idbank_list()
df_idbank =
idbank_dataset %>%
slice(1:1201)
list_idbank = df_idbank %>% pull(idbank)
data = get_insee_idbank(list_idbank, firstNObservations = 1, limit = FALSE)
For some datasets as IPC-2015 (inflation), the filter is necessary.
insee_dataset = get_dataset_list()
# example 1 : full dataset
data = get_insee_dataset("CLIMAT-AFFAIRES")
# example 2 : filtered dataset
# the user can filter the data
data = get_insee_dataset("IPC-2015", filter = "M+A.........CVS.", startPeriod = "2015-03")
# in the filter, the + is used to select several values in one dimension, like an "and" statement
# the void means "all" values available
# example 3 : only one series
# by filtering with the full SDMX series key, the user will get only one series
data =
get_insee_dataset("CNA-2014-CPEB",
filter = "A.CNA_CPEB.A38-CB.VAL.D39.VALEUR_ABSOLUE.FE.EUROS_COURANTS.BRUT",
lastNObservations = 10)
Feel free to open an issue with any question about this package using https://github.com/pyr-opendatafr/R-Insee-Data Github repository