Latent Class Analysis (LCA) is a statistical technique used to identify unobserved or latent subgroups within a population based on observed categorical variables. It is a type of finite mixture modelling that assumes the population consists of several distinct groups, each characterized by a unique pattern of responses or probabilities on the observed variables. It provides a valuable tool for understanding heterogeneity within populations and can help researchers gain insights into the characteristics and behaviour of different subgroups.
The goal of LCA is to assign individuals to the most appropriate latent class based on their patterns of responses to a set of categorical variables, which in this setting is employment barriers. It allows researchers to understand the underlying structure or typology of a population by identifying groups of individuals who share similar response patterns. In other words, it allows us to identify groups of individuals who share similar employment barriers.
LCA assumes that the observed categorical variables (i.e., the employment barriers) are indicators of the latent classes and that the relationship between the latent classes and the observed variables can be captured by probabilities.
The process of conducting LCA involves several steps. First, the number of latent classes needs to be specified based on theoretical considerations and/or model fit criteria. This process is described further below. Then, the model estimates the latent class probabilities and item-response probabilities using maximum likelihood estimation. Once the model is estimated, individuals can be assigned to the most likely latent class based on their response patterns.
As mentioned above the number of latent classes needs to be specified based on theoretical considerations and/or model fit criteria. We use both strategies to determine the optimal number of groups. The point of departure is a baseline model with the ten identified barriers as the only inputs. We then estimate 20 model with number of groups varying from 1 to 20 groups. Similar to Fernandez et al. (2016), we calculate the Akaike Information Criterion (AIC), the Bayesian Information Criterion (BIC) and the classification error for each model.