Evaluating and Ranking GenAI Chatbots under Uncertainty: A Type-2 Neutrosophic RANCOM–MARCOS MCDM Framework

Hend Ahmed, Faculty of Computers and Informatics, Zagazig University, Zagazig 44519, Sharqiyah, EgyptFollow
Abduallah Gamal, Faculty of Computers and Informatics, Zagazig University, Zagazig 44519, Sharqiyah, EgyptFollow

Authors' ORCIDs

Hend Ahmed: https://orcid.org/0009-0007-9457-5727

Abduallah Gamal: https://orcid.org/0000-0002-3819-0714

Article Type

Research Article

Abstract

Owing to integrate GenAI chatbots to enhance productivity across various tasks, this research presents T2NN-RANCOM-MARCOS multi-attribute decision-making model, which employs Type-2 Neutrosophic Number (T2NN) to handle uncertain data, the RANCOM method, distinguished by its easy, highly repeatable, less time consuming, more appropriate to deal with problems exceeds 5 criteria with expert errors to assign subjective weights to criteria and MARCOS method to evaluate and rank eight GenAI chatbots against six main criteria are included 23 sub-criteria: Quality of Information, Understanding and Reasoning, Expression Style and Persona, Safety and Harm, Trust and Confidence and Economic are primarily derived from QUEST evaluation framework, while only a limited drawn from AICSQ, AIEPSAM, ISO/IES 2595 and AIMS evaluation frameworks. The results of ranking indicate that the Claude(O1) GenAI chatbot is the most suitable GenAI chatbot for marketing companies or departments. Sensitivity analysis was performed by changing weights of all criteria. A comparative analysis with 12 MCDM methods was conducted to demonstrate the extent of reliability and robust of proposed model. The findings show that high stability percentage 78.26% and the most sensitivity criteria are Accuracy (CT₁), Relevance (CT₂), Consistency (CT₄), Reasoning (CT₁₃), Bais (CT₁₇), Harm (CT₁₈), Data privacy (CT₂₀) and Data Security (CT₂₁). Correlated percentage exceed 90% with 12 MCDM methods and it has statistical significance. The proposed T2NN-RANCOM-MARCOS model is more robust and reliability, used to evaluate eight GenAI chatbots based on 23 criteria.

Keywords

GenAI chatbots, Neutrosophic sets, Type 2 neutrosophic, T2NN, Multi-criteria decision making, MCDM, RANCOM, MARCOS, Sensitivity analysis, Comparison analysis, Quest evaluation framework, AICSQ, AIEPSAM, ISO/IES 2595 and AIMS evaluation frameworks

How to Cite

Ahmed, Hend and Gamal, Abduallah (2025) "Evaluating and Ranking GenAI Chatbots under Uncertainty: A Type-2 Neutrosophic RANCOM–MARCOS MCDM Framework," Neutrosophic Systems with Applications: Vol. 25: Iss. 9, Article 5. DOI: https://doi.org/10.63689/2993-7159.1298