Working with Model
Monodeep Mukherjee
Follow
--
Listen
Share
Author : Luca Scrucca, Mohammed Saqr, Sonsoles López-Pernas, Keefe Murphy
Abstract : Heterogeneity has been a hot topic in recent educational literature. Several calls have been voiced to adopt methods that capture different patterns or subgroups within students behavior or functioning. Assuming that there is an average pattern that represents the entirety of student populations requires the measured construct to have the same causal mechanism, same development pattern, and affect students in exactly the same way. Using a person-centered method (Finite Gaussian mixture model or latent profile analysis), the present tutorial shows how to uncover the heterogeneity within engagement data by identifying three latent or unobserved clusters. This chapter offers an introduction to the model-based clustering that includes the principles of the methods, a guide to choice of number of clusters, evaluation of clustering results and a detailed guide with code and a real-life dataset. The discussion elaborates on the interpretation of the results, the advantages of model-based clustering as well as how it compares with other methods.
2. A review on Bayesian model-based clustering(arXiv)
Author : Clara Grazian
Abstract : Clustering is an important task in many areas of knowledge: medicine and epidemiology, genomics, environmental science, economics, visual sciences, among others. Methodologies to perform inference on the number of clusters have often been proved to be inconsistent, and introducing a dependence structure among the clusters implies additional difficulties in the estimation process. In a Bayesian setting, clustering is performed by considering the unknown partition as a random object and define a prior distribution on it. This prior distribution may be induced by models on the observations, or directly defined for the partition. Several recent results, however, have shown the difficulties in consistently estimating the number of clusters, and, therefore, the partition. The problem itself of summarising the posterior distribution on the partition remains open, given the large dimension of the partition space. This work aims at reviewing the Bayesian approaches available in the literature to perform clustering, presenting advantages and disadvantages of each of them in order to suggest future lines of research