Skip to main content

Pierre Michel

Faculty Aix-Marseille UniversitéFaculté d'économie et de gestion (FEG)

Econometrics, Finance and mathematical methods
Michel
Status
Assistant professor
Research domain(s)
Econometrics
Thesis
2016, Aix-Marseille Université
Download
CV
Address

AMU - AMSE
5-9 Boulevard Maurice Bourdet, CS 50498
​13205 Marseille Cedex 1

Abstract Background The γ-metric value is generally used as the importance score of a feature (or a set of features) in a clas- sification context. This study aimed to go further by creating a new methodology for multivariate feature selection for classification, whereby the γ-metric is associated with a specific search direction (and therefore a specific stopping criterion). As three search directions are used, we effectively created three distinct methods. MethodsWe assessed the performance of our new methodology through a simulation study, comparing them against more conventional methods. Classification performance indicators, number of selected features, stability and execution time were used to evaluate the performance of the methods. We also evaluated how well the proposed methodology selected relevant features for the detection of atrial fibrillation, which is a cardiac arrhythmia. ResultsWe found that in the simulation study as well as the detection of AF task, our methods were able to select informative features and maintain a good level of predictive performance; however in a case of strong correlation and large datasets, the γ-metric based methods were less efficient to exclude non-informative features. Conclusions Results highlighted a good combination of both the forward search direction and the γ-metric as an evaluation function. However, using the backward search direction, the feature selection algorithm could fall into a local optima and can be improved.
Keywords Atrial fibrillation, Classification, Feature selection, Γ-metric
Abstract We study the impact of socioeconomic factors on two key parameters of epidemic dynamics. Specifically, we investigate a parameter capturing the rate of deceleration at the very start of an epidemic, and a parameter that reflects the pre-peak and post-peak dynamics at the turning point of an epidemic like coronavirus disease 2019 (COVID-19). We find two important results. The policies to fight COVID-19 (such as social distancing and containment) have been effective in reducing the overall number of new infections, because they influence not only the epidemic peaks, but also the speed of spread of the disease in its early stages. The second important result of our research concerns the role of healthcare infrastructure. They are just as effective as anti-COVID policies, not only in preventing an epidemic from spreading too quickly at the outset, but also in creating the desired dynamic around peaks: slow spreading, then rapid disappearance.
Abstract Two main nonpharmaceutical policy strategies have been used in Europe in response to the COVID-19 epidemic: one aimed at natural herd immunity and the other at avoiding saturation of hospital capacity by crushing the curve. The two strategies lead to different results in terms of the number of lives saved on the one hand and production loss on the other hand. Using a susceptible–infected–recovered–dead model, we investigate and compare these two strategies. As the results are sensitive to the initial reproduction number, we estimate the latter for 10 European countries for each wave from January 2020 till March 2021 using a double sigmoid statistical model and the Oxford COVID-19 Government Response Tracker data set. Our results show that Denmark, which opted for crushing the curve, managed to minimize both economic and human losses. Natural herd immunity, sought by Sweden and the Netherlands does not appear to have been a particularly effective strategy, especially for Sweden, both in economic terms and in terms of lives saved. The results are more mixed for other countries, but with no evident trade-off between deaths and production losses.
Abstract Most patient-reported experience measures (PREMs) are paper-based, leading to a high burden for patients and care providers. The aim of this study was to (1) calibrate an item bank to measure patients’ experience of respect and dignity for adult patients with serious mental illnesses and (2) develop computerized adaptive testing (CAT) to improve the use of this PREM in routine practice. Patients with schizophrenia, bipolar disorder, and major depressive disorder were enrolled in this multicenter and cross-sectional study. Psychometric analyses were based on classical test and item response theories and included evaluations of unidimensionality, local independence, and monotonicity; calibration and evaluation of model fit; analyses of differential item functioning (DIF); testing of external validity; and finally, CAT development. A total of 458 patients participated in the study. Of the 24 items, 2 highly inter-correlated items were deleted. Factor analysis showed that the remaining items met the unidimensional assumption (RMSEA = 0.054, CFI = 0.988, TLI = 0.986). DIF analyses revealed no biases by sex, age, care setting, or diagnosis. External validity testing has generally supported our assumptions. CAT showed satisfactory accuracy and precision. This work provides a more accurate and flexible measure of patients’ experience of respect and dignity than that obtained from standard questionnaires.
Keywords Psychiatry, Mental health, Schizophrenia, Depressive disorders, Bipolar disorders, Patient-reported experience measures, Health services research
Abstract Background: In high-dimensional data analysis, the complexity of predictive models can be reduced by selecting the most relevant features, which is crucial to reduce data noise and increase model accuracy and interpretability. Thus, in the field of clinical decision making, only the most relevant features from a set of medical descriptors should be considered when determining whether a patient is healthy or not. This statistical approach known as feature selection can be performed through regression or classification, in a supervised or unsupervised manner. Several feature selection approaches using different mathematical concepts have been described in the literature. In the field of classification, a new approach has recently been proposed that uses the γ-metric, an index measuring separability between different classes in heart rhythm characterization. The present study proposes a filter approach for feature selection in classification using this γ-metric, and evaluates its application to automatic atrial fibrillation detection. Methods: The stability and prediction performance of the γ-metric feature selection approach was evaluated using the support vector machine model on two heart rhythm datasets, one extracted from the PhysioNet database and the other from the database of Marseille University Hospital Center, France (Timone Hospital). Both datasets contained electrocardiogram recordings grouped into two classes: normal sinus rhythm and atrial fibrillation. The performance of this feature selection approach was compared to that of three other approaches, with the first two based on the Random Forest technique and the other on receiver operating characteristic curve analysis. Results: The γ-metric approach showed satisfactory results, especially for models with a smaller number of features. For the training dataset, all prediction indicators were higher for our approach (accuracy greater than 99% for models with 5 to 17 features), as was stability (greater than 0.925 regardless of the number of features included in the model). For the validation dataset, the features selected with the y-metric approach differed from those selected with the other approaches; sensitivity was higher for our approach, but other indicators were similar. Conclusion: This filter approach for feature selection in classification opens up new methodological avenues for atrial fibrillation detection using short electrocardiogram recordings.
Keywords Y-metric, Atrial fibrillation detection, Classification, Clinical decision making, Feature selection, Machine learning, Γ-metric, Machine learning, Feature selection, Classification, Clinical decision making, Atrial fibrillation detection
Abstract We introduce an approach based on functional data analysis to identify patterns of malaria incidence to guide effective targeting of malaria control in a seasonal transmission area. Using functional data method, a smooth function (functional data or curve) was fitted from the time series of observed malaria incidence for each of 575 villages in west-central Senegal from 2008 to 2012. These 575 smooth functions were classified using hierarchical clustering (Ward’s method), and several different dissimilarity measures. Validity indices were used to determine the number of distinct temporal patterns of malaria incidence. Epidemiological indicators characterizing the resulting malaria incidence patterns were determined from the velocity and acceleration of their incidences over time. We identified three distinct patterns of malaria incidence: high-, intermediate-, and low-incidence patterns in respectively 2% (12/575), 17% (97/575), and 81% (466/575) of villages. Epidemiological indicators characterizing the fluctuations in malaria incidence showed that seasonal outbreaks started later, and ended earlier, in the low-incidence pattern. Functional data analysis can be used to identify patterns of malaria incidence, by considering their temporal dynamics. Epidemiological indicators derived from their velocities and accelerations, may guide to target control measures according to patterns.
Keywords Malaria dynamic, Malaria patterns, Time series clustering, Functional data analysis
Abstract Introduction Les technologies de l’information et de la communication ont permis la naissance du web 2.0, caractérisé par la mise en place et l’utilisation de nouveaux outils collaboratifs de communication tels que les blogs, les wikis, les fils RSS et les réseaux sociaux. En s’appropriant ces outils, une médecine participative basée sur le partage d’informations et d’expériences entre professionnels, patients et tout acteur de la santé s’est développée. Depuis juin 2012, une communauté médicale échange sur Twitter avec le hashtag #DocTocToc et contribue à la naissance de la e-santé sur ce réseau social. L’objectif de cette étude est d’analyser les principales thématiques des demandes effectuées via le hashtag #DocTocToc par les médecins généralistes entre juin 2012 et mars 2017. Méthodes Une collecte de données par une méthode de « web scraping » a permis de constituer un corpus de tweets dont les auteurs ont été identifiés manuellement afin de procéder à un échantillonnage, de façon à ne conserver que les tweets émis par les médecins généralistes. Une étape de prétraitement a permis de transformer les formes potentiellement non reconnues par les logiciels de traitement du langage naturel. Le corpus a été appréhendé à l’aide de deux approches : une approche lexicale via le logiciel Iramuteq® et une indexation terminologique par l’extracteur de concepts multi-terminologiques (ECMT) du Catalogue et index des sites médicaux francophones (CISMeF). Résultats Sur les 12 716 tweets recueillis, 7366 étaient rédigés par des médecins généralistes et ont été analysés. L’approche lexicale détermine deux grands mondes lexicaux représentés sous forme de dendrogramme, l’un en lien avec les demandes médico administratives relatives à la gestion du cabinet et à la prise en charge sociale du patient, l’autre en lien avec les demandes d’ordre purement médicales. La méthode d’indexation terminologique met en évidence les spécialités médicales pourvoyeuses de demandes de télé-expertise : gynécologie, neurologie, infectiologie, pédiatrie, cardiologie, dermatologie ; et permet de les croiser avec l’objectif de la demande : diagnostic, thérapeutique. Conclusion Sur Twitter®, le hashtag #DocTocToc est utilisé par les médecins généralistes comme un espace de partage informel d’informations en matière de santé mais aussi de gestion de problèmes administratifs et sociaux. Le DocsTocToc se présente comme un groupe d’échange de pratique à grande échelle ou le médecin compte sur l’avis de ses pairs.(Fig. 1)
Keywords Text mining, Twitter, E-santé, Communication, Big data
Abstract We consider different approaches for assessing variable importance in clustering. We focus on clustering using binary decision trees (CUBT), which is a non-parametric top-down hierarchical clustering method designed for both continuous and nominal data. We suggest a measure of variable importance for this method similar to the one used in Breiman’s classification and regression trees. This score is useful to rank the variables in a dataset, to determine which variables are the most important or to detect the irrelevant ones. We analyze both stability and efficiency of this score on different data simulation models in the presence of noise, and compare it to other classical variable importance measures. Our experiments show that variable importance based on CUBT is much more efficient than other approaches in a large variety of situations.
Keywords Variable importance, Deviance, CUBT, Unsupervised learning, Variables ranking
Abstract Background: Measuring the quality and performance of health care is a major challenge in improving the efficiency of a health system. Patient experience is one important measure of the quality of health care, and the use of patient-reported experience measures (PREMs) is recommended. The aims of this project are 1) to develop item banks of PREMs that assess the quality of health care for adult patients with psychiatric disorders (schizophrenia, bipolar disorder, and depression) and to validate computerized adaptive testing (CAT) to support the routine use of PREMs; and 2) to analyze the implementation and acceptability of the CAT among patients, professionals, and health authorities. Methods: This multicenter and cross-sectional study is based on a mixed method approach, integrating qualitative and quantitative methodologies in two main phases: 1) item bank and CAT development based on a standardized procedure, including conceptual work and definition of the domain mapping, item selection, calibration of the item bank and CAT simulations to elaborate the administration algorithm, and CAT validation; and 2) a qualitative study exploring the implementation and acceptability of the CAT among patients, professionals, and health authorities. Discussion: The development of a set of PREMs on quality of care in mental health that overcomes the limitations of previous works (ie, allowing national comparisons regardless of the characteristics of patients and care and based on modern testing using item banks and CAT) could help health care professionals and health system policymakers to identify strategies to improve the quality and efficiency of mental health care.
Keywords COM, PARIS team
Abstract During the Covid-19 pandemic, the Omicron wave was notable for its highly transmissible and contagious variant of concern, coinciding with the availability of a vaccine that has been rolled out well earlier. In this paper, we address two key questions. First, we seek todesign a simple epidemiological model that can best capture the dynamics of Omicron infections. We demonstrate that combining the SIRDand SISD models provides an adequate solution. The second question examines the benefits of vaccination, in terms of both economicactivity and lives saved, once the model is implemented. Our results show that without vaccination, the human cost would have been fivetimes higher, and production losses would have doubled, due to stricter con- finement measures and a higher death toll. We also quantify the cost of vaccine hesitancy at more than 8,000 extra deaths.
Keywords Compartment models, COVID-19, Omicron wave, Vaccination benefit, Vaccine hesitation
Abstract Anxiety and depression may have serious disabling consequences for health, social, and occupational outcomes for people who are unaware of their actual health status and/or whose mental health symptoms remain undiagnosed by physicians. This article provides a big picture of unrecognised anxiety and depressive troubles revealed by a low score on the Mental Health Inventory-5 (MHI-5) with the help of machine learning methods using the 2012 French National Representative Health and Social Protection Survey (Enquête Santé et Protection Sociale, ESPS) matched with yearly healthcare consumption data from the French Sickness Fund. Compared to people with no latent symptoms who did not declare any depression over the last 12 months, those with unrecognised anxiety or depression were found to be older, more deprived, more socially disengaged, at a higher probability of adverse working conditions, and with higher healthcare expenditures backed, to some extent, by chronic conditions other than anxiety or mood disorder.
Keywords Tree-based methods, SHAP values, Workplace outcomes, Healthcare consumption, Mental health inventory-5 MHI-5, Unrecognised mental disorders
Abstract Uprising in China, the global COVID-19 epidemic soon started to spread out in Europe. As no medical treatment was available, it became urgent to design optimal non-pharmaceutical policies. With the help of a SIR model, we contrast two policies, one based on herd immunity (adopted by Sweden and the Netherlands), the other based on ICU capacity shortage. Both policies led to the danger of a second wave. Policy efficiency corresponds to the absence or limitation of a second wave. The aim of the paper is to measure the efficiency of these policies using statistical models and data. As a measure of efficiency, we propose the ratio of the size of two observed waves using a double sigmoid model coming from the biological growth literature. The Oxford data set provides a policy severity index together with observed number of cases and deaths. This severity index is used to illustrate the key features of national policies for ten European countries and to help for statistical inference. We estimate basic reproduction numbers, identify key moments of the epidemic and provide an instrument for comparing the two reported waves between January and October 2020. We reached the following conclusions. With a soft but long lasting policy, Sweden managed to master the first wave for cases thanks to a low R 0 , but at the cost of a large number of deaths compared to other Nordic countries and Denmark is taken as an example. We predict the failure of herd immunity policy for the Netherlands. We could not identify a clear sanitary policy for large European countries. What we observed was a lack of control for observed cases, but not for deaths.
Keywords SIR models, Phenomenological models, Double sigmoid models, Sanitary policies, Herd immunity, ICU capacity constraint