We introduce a new clustering procedure specialized for Big Data. It is inspired by the work of [1], and applies a MapReduce procedure for any base clustering algorithm, split-ting the data set at hand, clustering subsamples, and combining intermediate results. We use thus a high level parallelization running a base clustering approach on small samples. We analyse in detail our approach exploring various alternatives and showing its efficiency by simulations.

Computerized adaptive testing with decision regression trees: an alternative to item response theory for quality of life measurement in multiple sclerosisJournal articlePierre Michel, Karine Baumstarck, Anderson Loundou, Badih Ghattas, Pascal Auquier and Laurent Boyer, Patient Preference and Adherence, Volume 12, pp. 1043-1053, 2018

The aim of this study was to propose an alternative approach to item response theory (IRT) in the development of computerized adaptive testing (CAT) in quality of life (QoL) for patients with multiple sclerosis (MS). This approach relied on decision regression trees (DRTs). A comparison with IRT was undertaken based on precision and validity properties.

Materials and methods:
DRT- and IRT-based CATs were applied on items from a unidi-mensional item bank measuring QoL related to mental health in MS. The DRT-based approach consisted of CAT simulations based on a minsplit parameter that defines the minimal size of nodes in a tree. The IRT-based approach consisted of CAT simulations based on a specified level of measurement precision. The best CAT simulation showed the lowest number of items and the best levels of precision. Validity of the CAT was examined using sociodemographic, clinical and QoL data.

CAT simulations were performed using the responses of 1,992 MS patients. The DRT-based CAT algorithm with minsplit = 10 was the most satisfactory model, superior to the best IRT-based CAT algorithm. This CAT administered an average of nine items and showed satisfactory precision indicators (R = 0.98, root mean square error [RMSE] = 0.18). The DRT-based CAT showed convergent validity as its score correlated significantly with other QoL scores and showed satisfactory discriminant validity.
Conclusion: We presented a new adaptive testing algorithm based on DRT, which has equivalent level of performance to IRT-based approach. The use of DRT is a natural and intuitive way to develop CAT, and this approach may be an alternative to IRT.

Clustering based on unsupervised binary trees to define subgroups of cancer patients according to symptom severity in cancerJournal articlePierre Michel, Zeinab Hamidou, Karine Baumstarck, Badih Ghattas, Noémie Resseguier, Olivier Chinot, Fabrice Barlesi, Sébastien Salas, Laurent Boyer and Pascal Auquier, Quality of Life Research: An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation, Volume 27, Issue 2, pp. 555-565, 2018

Studies have suggested that clinicians do not feel comfortable with the interpretation of symptom severity, functional status, and quality of life (QoL). Implementation strategies of these types of measurements in clinical practice imply that consensual norms and guidelines regarding data interpretation are available. The aim of this study was to define subgroups of patients according to the levels of symptom severity using a method of interpretable clustering that uses unsupervised binary trees.

The patients were classified using a top-down hierarchical method: Clustering using Unsupervised Binary Trees (CUBT). We considered a three-group structure: "high", "moderate", and "low" level of symptom severity. The clustering tree was based on three stages using the 9-symptom scale scores of the EORTC QLQ-C30: a maximal tree was first developed by applying a recursive partitioning algorithm; the tree was then pruned using a criterion of minimal dissimilarity; finally, the most similar clusters were joined together. Inter-cluster comparisons were performed to test the sample partition and QoL data.

Two hundred thirty-five patients with different types of cancer were included. The three-cluster structure classified 143 patients with "low", 46 with "moderate", and 46 with "high" levels of symptom severity. This partition was explained by cut-off values on Fatigue and Appetite Loss scores. The three clusters consistently differentiated patients based on the clinical characteristics and QoL outcomes.

Our study suggests that CUBT is relevant to define the levels of symptom severity in cancer. This finding may have important implications for helping clinicians to interpret symptom profiles in clinical practice, to identify individuals at risk for poorer outcomes and implement targeted interventions.

Modernizing quality of life assessment: development of a multidimensional computerized adaptive questionnaire for patients with schizophreniaJournal articlePierre Michel, Karine Baumstarck, Christophe Lançon, Badih Ghattas, Anderson Loundou, Pascal Auquier and Laurent Boyer, Quality of Life Research: An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation, Volume 27, Issue 4, pp. 1041-1054, 2018

OBJECTIVE: Quality of life (QoL) is still assessed using paper-based and fixed-length questionnaires, which is one reason why QoL measurements have not been routinely implemented in clinical practice. Providing new QoL measures that combine computer technology with modern measurement theory may enhance their clinical use. The aim of this study was to develop a QoL multidimensional computerized adaptive test (MCAT), the SQoL-MCAT, from the fixed-length SQoL questionnaire for patients with schizophrenia.
METHODS: In this multicentre cross-sectional study, we collected sociodemographic information, clinical characteristics (i.e., duration of illness, the PANSS, and the Calgary Depression Scale), and quality of life (i.e., SQoL). The development of the SQoL-CAT was divided into three stages: (1) multidimensional item response theory (MIRT) analysis, (2) multidimensional computerized adaptive test (MCAT) simulations with analyses of accuracy and precision, and (3) external validity.
RESULTS: Five hundred and seventeen patients participated in this study. The MIRT analysis found that all items displayed good fit with the multidimensional graded response model, with satisfactory reliability for each dimension. The SQoL-MCAT was 39% shorter than the fixed-length SQoL questionnaire and had satisfactory accuracy (levels of correlation >0.9) and precision (standard error of measurement <0.55 and root mean square error <0.3). External validity was confirmed via correlations between the SQoL-MCAT dimension scores and symptomatology scores.
CONCLUSION: The SQoL-MCAT is the first computerized adaptive QoL questionnaire for patients with schizophrenia. Tailored for patient characteristics and significantly shorter than the paper-based version, the SQoL-MCAT may improve the feasibility of assessing QoL in clinical practice.

Defining Quality of Life Levels to Enhance Clinical Interpretation in Multiple Sclerosis: Application of a Novel Clustering MethodJournal articlePierre Michel, Karine Baumstarck, Laurent Boyer, Oscar Fernandez, Peter Flachenecker, Jean Pelletier, Anderson Loundou, Badih Ghattas, Pascal Auquier and on behalf of Group, Medical Care, Volume 55, Issue 1, pp. e1, 2017

To enhance the use of quality of life (QoL) measures in clinical practice, it is pertinent to help clinicians interpret QoL scores.

The aim of this study was to define clusters of QoL levels from a specific questionnaire (MusiQoL) for multiple sclerosis (MS) patients using a new method of interpretable clustering based on unsupervised binary trees and to test the validity regarding clinical and functional outcomes.

In this international, multicenter, cross-sectional study, patients with MS were classified using a hierarchical top-down method of Clustering using Unsupervised Binary Trees. The clustering tree was built using the 9 dimension scores of the MusiQoL in 2 stages, growing and tree reduction (pruning and joining). A 3-group structure was considered, as follows: “high,” “moderate,” and “low” QoL levels. Clinical and QoL data were compared between the 3 clusters.

A total of 1361 patients were analyzed: 87 were classified with “low,” 1173 with “moderate,” and 101 with “high” QoL levels. The clustering showed satisfactory properties, including repeatability (using bootstrap) and discriminancy (using factor analysis). The 3 clusters consistently differentiated patients based on sociodemographic and clinical characteristics, and the QoL scores were assessed using a generic questionnaire, ensuring the clinical validity of the clustering.

The study suggests that Clustering using Unsupervised Binary Trees is an original, innovative, and relevant classification method to define clusters of QoL levels in MS patients.

Clustering nominal data using unsupervised binary decision trees: Comparisons with the state of the art methodsJournal articleBadih Ghattas, Pierre Michel and Laurent Boyer, Pattern Recognition, Volume 67, pp. 177-185, 2017

In this work, we propose an extension of CUBT (clustering using unsupervised binary trees) to nominal data. For this purpose, we primarily use heterogeneity criteria and dissimilarity measures based on mutual information, entropy and Hamming distance. We show that for this type of data, CUBT outperforms most of the existing methods. We also provide and justify some guidelines and heuristics to tune the parameters in CUBT. Extensive comparisons are done with other well known approaches using simulations, and two examples of real datasets applications are given.

A Multidimensional Computerized Adaptive Short-Form Quality of Life Questionnaire Developed and Validated for Multiple Sclerosis: The MusiQoL-MCATJournal articlePierre Michel, Karine Baumstarck, Badih Ghattas, Jean Pelletier, Anderson Loundou, Mohamed Boucekine, Pascal Auquier and Laurent Boyer, Medicine, Volume 95, Issue 14, pp. e3068, 2016

The aim was to develop a multidimensional computerized adaptive short-form questionnaire, the MusiQoL-MCAT, from a fixed-length QoL questionnaire for multiple sclerosis.A total of 1992 patients were enrolled in this international cross-sectional study. The development of the MusiQoL-MCAT was based on the assessment of between-items MIRT model fit followed by real-data simulations. The MCAT algorithm was based on Bayesian maximum a posteriori estimation of latent traits and Kullback-Leibler information item selection. We examined several simulations based on a fixed number of items. Accuracy was assessed using correlations (r) between initial IRT scores and MCAT scores. Precision was assessed using the standard error measurement (SEM) and the root mean square error (RMSE).The multidimensional graded response model was used to estimate item parameters and IRT scores. Among the MCAT simulations, the 16-item version of the MusiQoL-MCAT was selected because the accuracy and precision became stable with 16 items with satisfactory levels (r ≥ 0.9, SEM ≤ 0.55, and RMSE ≤ 0.3). External validity of the MusiQoL-MCAT was satisfactory.The MusiQoL-MCAT presents satisfactory properties and can individually tailor QoL assessment to each patient, making it less burdensome to patients and better adapted for use in clinical practice.

Development of a cross-cultural item bank for measuring quality of life related to mental health in multiple sclerosis patientsJournal articlePierre Michel, Pascal Auquier, Karine Baumstarck, Jean Pelletier, Anderson Loundou, Badih Ghattas and Laurent Boyer, Quality of Life Research: An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation, Volume 24, Issue 9, pp. 2261-2271, 2015

OBJECTIVE: Quality of life (QoL) measurements are considered important outcome measures both for research on multiple sclerosis (MS) and in clinical practice. Computerized adaptive testing (CAT) can improve the precision of measurements made using QoL instruments while reducing the burden of testing on patients. Moreover, a cross-cultural approach is also necessary to guarantee the wide applicability of CAT. The aim of this preliminary study was to develop a calibrated item bank that is available in multiple languages and measures QoL related to mental health by combining one generic (SF-36) and one disease-specific questionnaire (MusiQoL).
METHODS: Patients with MS were enrolled in this international, multicenter, cross-sectional study. The psychometric properties of the item bank were based on classical test and item response theories and approaches, including the evaluation of unidimensionality, item response theory model fitting, and analyses of differential item functioning (DIF). Convergent and discriminant validities of the item bank were examined according to socio-demographic, clinical, and QoL features.
RESULTS: A total of 1992 patients with MS and from 15 countries were enrolled in this study to calibrate the 22-item bank developed in this study. The strict monotonicity of the Cronbach's alpha curve, the high eigenvalue ratio estimator (5.50), and the adequate CFA model fit (RMSEA = 0.07 and CFI = 0.95) indicated that a strong assumption of unidimensionality was warranted. The infit mean square statistic ranged from 0.76 to 1.27, indicating a satisfactory item fit. DIF analyses revealed no item biases across geographical areas, confirming the cross-cultural equivalence of the item bank. External validity testing revealed that the item bank scores correlated significantly with QoL scores but also showed discriminant validity for socio-demographic and clinical characteristics.
CONCLUSION: This work demonstrated satisfactory psychometric characteristics for a QoL item bank for MS in multiple languages. This work may offer a common measure for the assessment of QoL in different cultural contexts and for international studies conducted on MS.

How to interpret multidimensional quality of life questionnaires for patients with schizophrenia?Journal articlePierre Michel, Pascal Auquier, Karine Baumstarck, Anderson Loundou, Badih Ghattas, Christophe Lançon and Laurent Boyer, Quality of Life Research: An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation, Volume 24, Issue 10, pp. 2483-2492, 2015

PurposeThe classification of patients into distinct categories of quality of life (QoL) levels may be useful for clinicians to interpret QoL scores from multidimensional questionnaires. The aim of this study had been to define clusters of QoL levels from a specific multidimensional questionnaire (SQoL18) for patients with schizophrenia by using a new method of interpretable clustering and to test its validity regarding socio-demographic, clinical, and QoL information.MethodsIn this multicentre cross-sectional study, patients with schizophrenia have been classified using a hierarchical top-down method called clustering using unsupervised binary trees (CUBT). A three-group structure has been employed to define QoL levels as “high”, “moderate”, or “low”. Socio-demographic, clinical, and QoL data have been compared between the three clusters to ensure their clinical relevance.ResultsA total of 514 patients have been analysed: 78 are classified as “low”, 265 as “moderate”, and 171 as “high”. The clustering shows satisfactory statistical properties, including reproducibility (using bootstrap analysis) and discriminancy (using factor analysis). The three clusters consistently differentiate patients. As expected, individuals in the “high” QoL level cluster report the lowest scores on the Positive and Negative Syndrome Scale (p = 0.01) and the Calgary Depression Scale (p < 0.01), and the highest scores on the Global Assessment of Functioning (p < 0.03), the SF36 (p < 0.01), the EuroQol (p < 0.01), and the Quality of Life Inventory (p < 0.01).ConclusionGiven the ease with which this method can be applied, classification using CUBT may be useful for facilitating the interpretation of QoL scores in clinical practice.

Statistical challenges of quality of life and cancer: new avenues for future researchJournal articleLaurent Boyer, Karine Baumstarck, Pierre Michel, Mohamed Boucekine, Amelie Anota, Franck Bonnetain, Joel Coste, Bruno Falissard, Alice Guilleux, Jean-Benoit Hardouin, et al., Expert Review of Pharmacoeconomics & Outcomes Research, Volume 14, Issue 1, pp. 19-22, 2014

Statistical modeling conference on the quality of life measurements of the French National Platform of Quality of Life and Cancer Faculty of Science in Luminy, Marseille, France, 12-13 September 2013 The French National Platform of Quality of Life and Cancer and the statistical team of the Mathematical Institute of Luminy undertook a successful first conference addressing the statistical challenges of measuring the quality of life in the field of oncology. More than 15 presentations were made over a 2-day period by the Faculty of Sciences in Luminy. The conference managed to assemble participants from different disciplines, such as mathematics and statistics, public health, epidemiology and psychology, to debate the key statistical and methodological issues of quality of life measurement and analysis. Three main topics were covered in this conference: the treatment of missing data, the development of item banking and computerised adaptive testing and the detection/understanding of response shift.