--- title: "Patterns of convergence and divergence within the *convergEU* package" author: "Nedka D. Nikiforova, Federico M. Stefanini, Chiara Litardi, Eleonora Peruffo and Massimiliano Mascherini" date: "`r Sys.Date()`

Index:" output: rmarkdown::html_vignette: fig_caption: yes toc: true number_sections: true vignette: > %\VignetteIndexEntry{Patterns of convergence and divergence within the *convergEU* package} %\VignetteEncoding{UTF-8} \usepackage[utf8]{inputenc} % \VignetteDepends{ggplot2,dplyr,tidyverse,eurostat,purrr,tibble,tidyr,formattable,kableExtra,caTools,gridExtra,knitr,magrittr,readr,readxl,utf8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r setup,include = FALSE} library(ggplot2) library(dplyr) library(tidyverse) library(eurostat) library(purrr) library(tibble) library(tidyr) library(ggplot2) library(formattable) library(kableExtra) library(caTools) library(readxl) library(convergEU) knitr::opts_chunk$set( collapse = TRUE, warning = FALSE, message = FALSE, comment = "#>", fig.width = 4, fig.height = 5, dpi=200 ) ```

# Finding patterns of convergence and divergence within the convergEU package The *convergEU* package allows to obtain patterns of change along time for indicators in the European Union (**EU**) by invoking the *ms_pattern_ori* function: ```{r, eval=FALSE} help(ms_pattern_ori) ``` The *ms_pattern_ori* function allows for obtaining patterns for both *lowBest* and *highBest* types of indicators. More specifically, in this function the following patterns are defined through numerical labels and corresponding string labels: * 1: Catching up * 2: Flattening * 3: Inversion * 4: Outperforming * 5: Slower pace * 6: Diving * 7: Defending better * 8: Escaping * 9: Falling away * 10: Underperforming * 11: Recovering * 12: Reacting better * 13: Paralleling better over * 14: Paralleling equal over * 15: Paralleling worse over * 16: Paralleling worse under * 17: Paralleling equal under * 18: Paralleling better under * 19: Crossing * 20: Crossing reversed * 21: Other (Inspection) It is important to note that for finding patterns for indicators of type "low is better", we assume that higher the indicator value, worse the considered socio/economic feature in a given member state (**MS**). Instead of creating new labels to tag patterns of this class of indicators, we transform the original indicator after noting that the absolute positioning of values is not relevant while judging for the presence of a given pattern. Thus, the indicators of type "low is better" are transformed, and the distance from the maximum value for each original observation is calculated. If the original index decreases then the transformed value increases, and the pattern recognition scheme applies in the same way as for indicators of type "high is better". The graphical plots for the defined patterns depending on the type of indicators (*lowBest* or *highBest*) are available by invoking the *patt_legend* function: ```{r, eval=FALSE} help(patt_legend) ``` When considering indicators of type *highBest*, the graphical representation of the patterns is as follows: ```{r, fig.width = 6, fig.height = 5,out.width="100%"} highind<-patt_legend(indiType="highBest") highind ``` while for the *lowBest* type of indicators the plot of the patterns is: ```{r, fig.width = 6, fig.height = 5,out.width="100%"} lowhind<-patt_legend(indiType="lowBest") lowhind ``` For further details on the defined patterns we refer to the Eurofound report "Monitoring convergence in the European Union Upward convergence in the EU: Concepts, measurements and indicators" (2018, p. 25-26). For illustrating practically the points discussed above, let's consider a first example related to the *emp_20_64_MS* dataset for which the indicator is of type *highBest*. Thus, for obtaining the patterns for this type of indicator,we invoke the *ms_pattern_ori* function as follows: ```{r} myemp <-ms_pattern_ori(emp_20_64_MS, "time",type="highBest") ``` The output of the *ms_pattern_ori* function consists of the usual three list components: "$res" that contains the results, "\$msg" that possibly carries messages for the user and "\$err" which is a string containing an error message, if an error occurs: ```{r} names(myemp) ``` The "\$res" component of the output contains the numerical labels for the patterns as well as their string labels: ```{r} mypattemp<-myemp$res$mat_label_tags mypattempn<-myemp$res$mat_without_summaries mypattempn mypattemp ``` Let's illustrate more in detail one of the obtained patterns; to this end, we consider the time period 2006-2007 and the country France ("FR") for which the obtained pattern is "Slower pace": ```{r} mypattemp[["2006/2007"]][12] ``` with the following graphical plot of the calculated pattern where the dashed blue line refers to the France and the black solid line refers to the EU: ```{r,echo=FALSE,out.width="70%",fig.height=6,fig.width=7} matRaw1 <- dplyr::select(emp_20_64_MS, -time) matRawT <- dplyr::select(emp_20_64_MS, time) EUavemp <- dplyr::bind_cols(matRawT ,EUavempp=apply(matRaw1,1,mean)) EUavemph <- EUavemp[5:6,] avehu <- emp_20_64_MS[5:6,"FR"] gFR <- ggplot() + geom_point(aes(x=EUavemph$time,y=EUavemph$EUavempp),color='black') + geom_point(aes(x=EUavemph$time,y=avehu$FR),color='blue') + geom_line(aes(x=EUavemph$time,y=EUavemph$EUavempp),color='black') + geom_line(aes(x=EUavemph$time,y=avehu$FR),color='blue',linetype = 2) + ggtitle("Employment rate indicator: 2006-2007")+ labs(y="France", x="Time")+ theme(axis.text.x=element_blank()) gFR ``` The interpretation of the "Slower pace" pattern is straightforward as illustrated in the Eurofound report "Monitoring convergence in the European Union Upward convergence in the EU: Concepts, measurements and indicators" (2018, p. 25). A second example relates to an indicator of type "low is better". To this end, let's consider the indicator Unemployment rate by sex, age and educational attainment - annual averages for which the data are stored in the *une_educ_a.xls* file (Subsection 4.2, Tutorial for analyzing convergence with the *convergEU* package). First, we import the data from the *xls* file as explained in details in the Tutorial (Subsection 4.2): ```{r,echo=F} ``` ```{r, echo=TRUE} # library(readxl) file_name <- system.file("vign/une_educ_a.xls", package = "convergEU") myxls2<-read_excel(file_name, sheet="Data",range = "A12:AP22", na=":") myxls2 <- dplyr::mutate(myxls2, `TIME/GEO` = as.numeric(`TIME/GEO`)) ``` where "une_educ_a.xls" specifies the path (eventually including disk unit or folders) in which the *xls* file is stored. Then, the cluster *EU27_2020* of **MS** is chosen, the data are checked for unsuited features, and missing values imputation is performed as follows: ```{r, echo=TRUE} EU27estr<-convergEU_glb()$EU27_2020$memberStates$codeMS myxls<- dplyr::select(myxls2,`TIME/GEO`,all_of(EU27estr)) check_data(myxls) myxls3<- dplyr::rename(myxls,time=`TIME/GEO`) myxlsf <- impute_dataset(myxls3, timeName ="time", countries=convergEU_glb()$EU27_2020$memberStates$codeMS, headMiss = c("cut", "constant")[2], tailMiss = c("cut", "constant")[2])$res check_data(myxlsf) ``` in order to obtain the final dataset *myxlsf* for calculating patterns. The indicator *une_educ_a* is of type "low is better"; thus, the syntax to find the patterns is as follows: ```{r} myres <- ms_pattern_ori(myxlsf, "time",type="lowBest") ``` where the "\$res" component of the output contains the numerical labels for the patterns as well as their string labels: ```{r } mypattl<-myres$res$mat_label_tags mypattn<-myres$res$mat_without_summaries mypattn mypattl ``` For this indicator, let's take the time period 2015-2016 and the **MS** Finland ("FI") for which the obtained pattern is again "Slower pace": ```{r} mypattl$`2015/2016`[14] ``` In this case, given that the indicator is of type "low is better", the plot for the "Slower pace" pattern is: ```{r,echo=FALSE,out.width="70%",fig.height=5,fig.width=5} matRaw <- select(myxlsf,-time) EUave <- cbind( select(myxlsf,time) ,EUave=apply(matRaw,1,mean)) EUave1<-EUave[7:8,] avef<-matRaw[7:8, "FI"] gfr<-ggplot() + geom_point(aes(x=EUave1$time,y=EUave1$EUave),color='black') + geom_point(aes(x=EUave1$time,y=avef$FI),color='blue') + geom_line(aes(x=EUave1$time,y=EUave1$EUave),color='black') + geom_line(aes(x=EUave1$time,y=avef$FI),color='blue',linetype = 2) + ggtitle("Unemployment rate indicator: 2015-2016")+ labs(y="Finland", x="Time")+ theme(axis.text.x=element_blank()) gfr ``` where the dashed blue line refers to Finland and the black solid line refers to the EU. Recall that differently from the previous indicator of type "high is better", for this type of indicator the results for the "Slower pace" pattern should be interpreted according to the assumption that "higher the indicator value, worse the considered socio/economic feature in a member country". To further illustrate other possible patterns, consider again the first example related to the *emp_20_64_MS* dataset (indicators of type "high is better"). For example, let's take the time period 2011-2012 and the **MS** Portugal ("PT") for which the pattern is "Crossing": ```{r} mypattemp$`2011/2012`[23] ``` where the corresponding plot for this pattern is: ```{r,echo=FALSE,out.width="70%",fig.height=5,fig.width=5} EUavemph1<-EUavemp[10:11,] avept<-emp_20_64_MS[10:11,c("PT")] gpt<-ggplot() + geom_point(aes(x=EUavemph1$time,y=EUavemph1$EUavempp),color='black') + geom_point(aes(x=EUavemph1$time,y=avept$PT),color='blue') + geom_line(aes(x=EUavemph1$time,y=EUavemph1$EUavempp),color='black') + geom_line(aes(x=EUavemph1$time,y=avept$PT),color='blue',linetype = 2) + ggtitle("Employment rate indicator: 2011-2012")+ labs(y="Portugal", x="Time")+ theme(axis.text.x=element_blank()) gpt ``` Similarly, for the member country Lithuania ("LT") in the same time period, the obtained pattern is now "Crossing reversed" and the plot for this pattern is: ```{r,echo=FALSE,out.width="70%",fig.height=5,fig.width=5} avelt<-emp_20_64_MS[10:11,c("LT")] glt<-ggplot() + geom_point(aes(x=EUavemph1$time,y=EUavemph1$EUavempp),color='black') + geom_point(aes(x=EUavemph1$time,y=avelt$LT),color='blue') + geom_line(aes(x=EUavemph1$time,y=EUavemph1$EUavempp),color='black') + geom_line(aes(x=EUavemph1$time,y=avelt$LT),color='blue',linetype = 2) + ggtitle("Employment rate indicator: 2011-2012")+ labs(y="Lithuania", x="Time")+ theme(axis.text.x=element_blank()) glt ``` # Types of convergence/ divergence within the convergEU package Convergence and divergence may be strict or weak, upward or downward. In the *convergEU* package, the function *upDo_CoDi* is specifically implemented for assessing the type of convergence/ divergence occurring for a given indicator, a collection of member states and a period of time: ```{r, eval=FALSE} help(upDo_CoDi) ``` The interpretation depends on the type of indicator, that is "highBest" or "lowBest". Let's consider a first example for the *emp_20_64_MS* dataset in which the indicator "Employment rate" is of type *highBest*. Suppose that we wish to determine the type of convergence/ divergence by considering as reference time the year 2008 (*time_0*), as target time the year 2010 (*time_t*), and the variance for summarizing dispersion (argument *heter_fun*): ```{r} Empconv<-upDo_CoDi(emp_20_64_MS, timeName = "time", indiType = "highBest", time_0 = 2008, time_t = 2010, heter_fun = "var") ``` The output of the *upDo_CoDi* function consists of the usual three list components: "$res" that contains the results, "\$msg" that possibly carries messages for the user and "\$err" which is a string containing an error message, if an error occurs: ```{r} names(Empconv) Empconv$msg Empconv$err ``` By considering more in details the component "\$res", it contains for example: * a statement if a convergence or a divergence is occurred: ```{r} Empconv$res$declaration_type ``` * a statement of the type of convergence/divergence, i.e. strict or weak, downward or upward: ```{r} Empconv$res$declaration_strict Empconv$res$declaration_weak ``` * a list of the member states labels for which the differences for a given indicator between the target time and the reference time are greater than zero : ```{r} Empconv$res$declaration_split$names_incre ``` * a list of the member states labels for which the differences for a given indicator between the target time and the reference time are smaller than zero : ```{r} Empconv$res$declaration_split$names_decre ``` * a list of the differences for a given idicator between the target time and the reference time for each member state: ```{r} Empconv$res$diffe_MS ``` * the average of such differences: ```{r} Empconv$res$diffe_averages ``` * the dispersion for the reference time and the target time respectively, and computed on the basis of the type of dispersion specified in the argument *heter_fun* (e.g. the variance in this example): ```{r} Empconv$res$dispersions ``` Note that if the argument *heter_fun* is set to *var* (as in this example) or *sd* (i.e. the standard deviation), then calculations for those statistics are performed using as a denominator $n-1$, i.e. the number of observations decreased by 1. Thus, if the users prefer to adopt n as a denominator, then the function *pop_var* may be used as follows: ```{r} Empconvpop<-upDo_CoDi(emp_20_64_MS, timeName = "time", indiType = "highBest", time_0 = 2008, time_t = 2010, heter_fun = "pop_var") ``` User-developed function are also allowed in the argument *heter_fun*, as illustrated in the following example related to an indicator of type *lowBest*. To this end, we consider the dataset *myxlsf* illustrated in the previous Section and related to the indicator Unemployment rate by sex, age and educational attainment. We choose as a reference time the year 2009 and as a target time the year 2011. Moreover, in this case we consider the following user-developed function for summarizing dispersion: ```{r} diffQQmu <- function(vettore){ (quantile(vettore,0.75)-quantile(vettore,0.25))/mean(vettore) } ``` This user-developed function *diffQQmu* is specified in the argument *heter_fun* of the function *upDo_CoDi*: ```{r} unempconvvar<-upDo_CoDi(myxlsf, timeName = "time", indiType = "lowBest", time_0 = 2009, time_t = 2011, heter_fun = "diffQQmu") unempconvvar ``` According to the obtained results, for this type of indicator there is an evidence of divergence of type "weak upward" in the period 2009-2011.

**References**
The following reference may be consulted to find further details on convergence: * Eurofound (2018), Upward convergence in the EU: Concepts, measurements and indicators, Publications Office of the European Union, Luxembourg; by: Massimiliano Mascherini, Martina Bisello, Hans Dubois and Franz Eiffe. * Nedka D. Nikiforova, Federico M. Stefanini, Chiara Litardi, Eleonora Peruffo and Massimiliano Mascherini (2020) Tutorial: analysis of convergence with the convergEU package. Package vignette URL https://local.disia.unifi.it/stefanini/RESEARCH/coneu/tutorial-conv.html