Aller au contenu principal

Badih Ghattas

Chercheur Aix-Marseille UniversitéFaculté d'économie et de gestion (FEG)

Économétrie, finance et méthodes mathématiques
Ghattas
Statut
Professeur des universités
Thèse
2000, Université de la Méditerranée
Adresse

AMU - AMSE
5-9 Boulevard Maurice Bourdet, CS 50498
​13205 Marseille Cedex 1

Résumé Objectives 1. To develop a deep-learning segmentation model for automated measurement of maximal aortic diameter (D max ) and volumes of aortic dissection components: true-lumen (TL), circulating false-lumen (CFL), and thrombus (Th) on CT angiography (CTA). 2. To assess the predictive value of these measures for adverse aortic remodeling in residual aortic dissection (RAD).Materials and methods This retrospective study included 322 patients from two centers. The segmentation model was trained on 120 patients (Center 1) and tested on an internal dataset (30 patients, Center 1) and an external dataset (10 patients, Center 2) in terms of Dice Similarity Coefficient (DSC). The model extracted D max , global false-lumen volume (FL Glo = CFL + Th), and local false-lumen volume (FL Loc , measured 3 cm around the largest diameter). Clinical validation was performed on 83 patients from Center1 (internal validation, 2-year follow-up) and 79 patients from Center2 (external validation, 4.5-year follow-up). ResultsThe segmentation model achieved high accuracy (Center 1, DSC: 0.93 TL, 0.93 CFL, 0.87 Th; Center 2, DSC: 0.92 TL, 0.93 CFL, 0.84 Th) with strong agreement between automated and manual measurements. Aortic remodeling occurred in 39/83 patients (46.9%) from Center1 and 33/79 patients (41.7%) from Center2. Aortic remodeling occurred in 39/83 patients (47%) from Center1 and 33/80 (42%) from Center2. FL Loc outperformed D max and FLGlo (Center 1: AUC = 0.83, 0.73, and 0.76; Center 2: AUC = 0.77, 0.64, and 0.70). At optimal thresholds, FL Loc showed good predictive performance (Center 1: Sensitivity = 0.87, Specificity = 0.68). Conclusion Deep-learning segmentation provides accurate aortic measurements. Local false-lumen volumes predict adverse aortic remodeling in RAD better than diameter and global false-lumen volumes. Key PointsQuestion In residual aortic dissection (RAD) after type-A dissection, early identification of high-risk patients on initial CT angiography is crucial for endovascular treatment decisions. Findings False-lumen local volumes (3 cm around aortic dissection maximal diameters), obtained with an automatic deeplearning method, predict adverse remodeling better than diameter or global false-lumen volumes. Clinical relevance A deep-learning segmentation method of aortic dissection components on CTA, enabling automatic measurements of diameters and volumes is feasible. It provides local false-lumen volumes, a better predictive marker of adverse aortic remodeling than the currently used diameters and global volumes.
Mots clés Aortic dissection, Computed tomography angiography, Deep-learning, Prognosis, Computer-assisted image processing
Résumé Non-ischemic cardiomyopathies comprise a heterogeneous group of heart muscle disorders characterized by diverse etiologies, variable phenotypic expression, and distinct clinical courses, including dilated cardiomyopathy (DCM) and hypertrophic cardiomyopathy (HCM). Left ventricular trabeculations have emerged as a specific phenotype in NICMs. We performed a retrospective multicenter analysis including 1048 patients, with 568 cardiac MRI scans from 606 patients with DCM (n=161) or HCM (n=445) from the AP-HM Marseille cohort, and 442 scans from 442 patients with DCM (n=208) and HCM (n=261) from the HCL Lyon cohort. Trabeculated mass (TM), total myocardial mass (TMM), and the TM/TMM ratio, as a marker of trabecular burden, were automatically quantified from end-diastolic short-axis cine sequences using SmartHeart, a convolutional neural network–based tool. On that cohort clinical data, Genetic testing results and outcomes were available to explore genotype–phenotype associations. Descriptive results are available on the APHM Dataset and automated segmentation was successfully performed on all eligible scans, demonstrating high throughput. Mean TM/TMM ratios were 9.2% in DCM and 6.8% in HCM, with only 0.35% of patients exceeding the 20% threshold. Genetic analyses identified frequent variants in MYBPC3 (HCM) and TTN (DCM). The automated pipeline showed robust and reproducible performance, enabling large-scale cardiac phenotyping. This study demonstrates the feasibility and efficiency of deep learning for large-scale, automated quantification of LV trabeculations in a real-world clinical setting. The integration of this quantitative imaging biomarker with extensive genetic data sets the stage for future genotype–phenotype correlation analyses, potentially enhancing risk stratification and phenotyping in NICMs.
Mots clés Trabeculation Cardiac, Cardiac MRI, Hypertrophic cardiomyopathy, Dilated cardiomyopathy, Non-ischemic cardiomyopathies
Résumé L'apprentissage par transfert constitue une méthode clé en apprentissage automatique, permettant de tirer parti des connaissances acquises à partir de modèles préentraînés pour résoudre de nouvelles tâches ou améliorer les performances sur des tâches similaires. Ce travail s'intéresse spécifiquement à l'utilisation du transfert d'apprentissage dans le cadre des modèles linéaires généralisés (MLG), en mettant l'accent sur l'approche de réglage fin. L'objectif principal de cette étude est d'approfondir la compréhension des mécanismes sous-jacents au transfert d'apprentissage dans les MLG. Nous nous concentrons sur une mesure de gain permettant de quantifier le bénéfice (positif ou négatif) d'une approche par transfert. Pour ce faire, nous exploitons les riches liens existants entre les MLG, les distributions de la famille exponentielle et les divergences de Bregman.
Mots clés Mod`eles lin´eaires g´en´eralis´es, Apprentissage par transfert
Résumé La demande croissante de flexibilité énergétique a conduit au développement de programmes de réduction de la consommation en période de pointe. EDF R&D a lancé plusieurs initiatives utilisant les données des compteurs Linky. Cependant, évaluer leur impact reste difficile en raison des modèles de consommation individualisés, du biais de sélection dû à la participation volontaire et des limites des méthodes classiques d'inférence causale. Nous simulons un environnement contrôlé pour examiner ces défis et comparons les approches traditionnelles avec des méthodes d'apprentissage automatique comme SyncTwin, qui crée des jumeaux synthétiques pour gérer les variables cachées. Nos résultats montrent que les séries temporelles complexes défient les méthodes classiques, surtout avec des variables non observées influençant le traitement. Nous expliquons théoriquement pourquoi ces approches échouent, en raison de covariables inaccessibles influençant le score de propension. Nos travaux ouvrent la voie à de nouvelles techniques pour améliorer l'évaluation des programmes d'économie d'énergie.
Mots clés Intelligence Artificielle, Contrôle synthétique, Inférence causale
Résumé We propose a non parametric hypothesis test to compare two partitions of a same data set. The partitions may result from two different clustering approaches. The test may be done using any comparison index but we focus in particular on the Matching Error (ME) that is related to the misclassification error in supervised learning. Some properties of the ME and, especially, its distribution function for the case of two different partitions are analyzed. Extensive simulations and experiments show the efficiency of the test.
Mots clés Clustering, Comparing partitions, Hyposthesis test, Matching error
Résumé Clustering is widely used in unsupervised learning to fnd homogeneous groups of observations within a dataset. However, clustering mixed-type data remains a challenge, as few existing approaches are suited for this task. Tis study presents the state-of-the-art of these approaches and compares them using various simulation models. Te compared methods include the distance-based approaches k-prototypes, PDQ, and convex k-means, and the probabilistic methods KAy-means for MIxed LArge data (KAMILA), the mixture of Bayesian networks (MBNs), and latent class model (LCM). Te aim is to provide insights into the behavior of diferent methods across a wide range of scenarios by varying some experimental factors such as the number of clusters, cluster overlap, sample size, dimension, proportion of continuous variables in the dataset, and clusters' distribution. Te degree of cluster overlap and the proportion of continuous variables in the dataset and the sample size have a signifcant impact on the observed performances. When strong interactions exist between variables alongside an explicit dependence on cluster membership, none of the evaluated methods demonstrated satisfactory performance. In our experiments KAMILA, LCM, and k-prototypes exhibited the best performance, with respect to the adjusted rand index (ARI). All the methods are available in R.
Mots clés Bayesian networks, Clustering, KAMILA, LCM, Mixed-type data
Résumé Background: The aim of this study (EPIDIAB) was to assess the relationship between epicardial adipose tissue (EAT) and the micro and macrovascular complications (MVC) of type 2 diabetes (T2D). Methods: EPIDIAB is a post hoc analysis from the AngioSafe T2D study, which is a multicentric study aimed at determining the safety of antihyperglycemic drugs on retina and including patients with T2D screened for diabetic retinopathy (DR) (n = 7200) and deeply phenotyped for MVC. Patients included who had undergone cardiac CT for CAC (Coronary Artery Calcium) scoring after inclusion (n = 1253) were tested with a validated deep learning segmentation pipeline for EAT volume quantification. Results: Median age of the study population was 61 [54;67], with a majority of men (57%) a median duration of the disease 11 years [5;18] and a mean HbA1c of7.8 ± 1.4%. EAT was significantly associated with all traditional CV risk factors. EAT volume significantly increased with chronic kidney disease (CKD vs no CKD: 87.8 [63.5;118.6] vs 82.7 mL [58.8;110.8], p = 0.008), coronary artery disease (CAD vs no CAD: 112.2 [82.7;133.3] vs 83.8 mL [59.4;112.1], p = 0.0004, peripheral arterial disease (PAD vs no PAD: 107 [76.2;141] vs 84.6 mL[59.2; 114], p = 0.0005 and elevated CAC score (> 100 vs < 100 AU: 96.8 mL [69.1;130] vs 77.9 mL [53.8;107.7], p < 0.0001). By contrast, EAT volume was neither associated with DR, nor with peripheral neuropathy. We further evidenced a subgroup of patients with high EAT volume and a null CAC score. Interestingly, this group were more likely to be composed of young women with a high BMI, a lower duration of T2D, a lower prevalence of microvascular complications, and a higher inflammatory profile. Conclusions: Fully-automated EAT volume quantification could provide useful information about the risk of both renal and macrovascular complications in T2D patients.
Mots clés CAC score, Cardiac computed tomography, Deep learning, Epicardial adipose tissue, Type 2 diabetes
Résumé Abstract Some complex models are frequently employed to describe physical and mechanical phenomena. In this setting, we have an input in a general space, and an output where is a very complicated function, whose computational cost for every new input is very high, and may be also very expensive. We are given two sets of observations of , and of different sizes such that only is available. We tackle the problem of selecting a subset of smaller size on which to run the complex model , and such that the empirical distribution of is close to that of . We suggest three algorithms to solve this problem and show their efficiency using simulated datasets and the Airfoil self‐noise data set.
Résumé In the Design of Experiments , we seek to relate response variables to explanatory factors. Response Surface methodology (RSM) approximates the relation between output variables and a polynomial transform of the explanatory variables using a linear model. Some researchers have tried to adjust other types of models, mainly nonlinear and nonparametric. We present a large panel of Machine Learning approaches that may be good alternatives to the classical RSM approximation. The state of the art of such approaches is given, including classification and regression trees, ensemble methods, support vector machines, neural networks and also direct multi-output approaches. We survey the subject and illustrate the use of ten such approaches using simulations and a real use case. In our simulations, the underlying model is linear in the explanatory factors for one response and nonlinear for the others. We focus on the advantages and disadvantages of the different approaches and show how their hyperparameters may be tuned. Our simulations show that even when the underlying relation between the response and the explanatory variables is linear, the RSM approach is outperformed by the direct neural network multivariate model, for any sample size (
Mots clés Hyperparameter tuning, Multi-output regression, Design of Experiments
Résumé In pharmaceutical studies, the Quality by Design (QbD) approach is increasingly being implemented to improve product development. Product quality is tested at each step of the manufacturing process, allowing a better process understanding and a better risk management, thus avoiding manufacturing defects. A key element of QbD is the construction of a Design Space (DS), i.e., a region in which the specifications on the output parameters should be met. Among the various possible construction methods, Designs of Experiments (DoE), and more precisely Response Surface Methodology, represent a perfectly adapted tool. The DS obtained may have any geometrical shape; consequently, the acceptable variation range of an input may depend on the value of other inputs. However, the experimenters would like to directly know the variation range of each input so that their variation domains are independent. In this context, we developed a method to determine the “Proven Acceptable Independent Range” (PAIR). It consists of looking for all the hyper polyhedra included in the multidimensional DS and selecting a hyper polyhedron according to various strategies. We will illustrate the performance of our method on different DoE cases.
Mots clés Quality by Design QbD, Design of Experiments DoE, Response Surface Methodology RSM, Design Space DS, Proven Acceptable Independent Range PAIR
Résumé We consider different approaches for assessing variable importance in clustering. We focus on clustering using binary decision trees (CUBT), which is a non-parametric top-down hierarchical clustering method designed for both continuous and nominal data. We suggest a measure of variable importance for this method similar to the one used in Breiman’s classification and regression trees. This score is useful to rank the variables in a dataset, to determine which variables are the most important or to detect the irrelevant ones. We analyze both stability and efficiency of this score on different data simulation models in the presence of noise, and compare it to other classical variable importance measures. Our experiments show that variable importance based on CUBT is much more efficient than other approaches in a large variety of situations.
Mots clés Variable importance, Deviance, CUBT, Unsupervised learning, Variables ranking