Dépôt Institutionnel de l'Université Ferhat ABBAS - Sétif 1 >
Faculté des Sciences >
Département d'Informatique >
Thèses de doctorat >
Veuillez utiliser cette adresse pour citer ce document :
http://dspace.univ-setif.dz:8888/jspui/handle/123456789/5322
|
Titre: | Dimensionality reduction in machine learning for arabic text classification |
Auteur(s): | Louail, Maroua |
Mots-clés: | Arabic text classification Natural language processing Dimensionality reduction |
Date de publication: | 1-jui-2025 |
Résumé: | Text classification is the automated process of assigning predefined labels or categories to text based on its content. This process helps organize vast amounts of textual data, simplifies management, enables efficient searches, and extracts valuable knowledge. The computational analysis of the Arabic language plays a crucial role in addressing its growing global significance. As the fourth most widely used language online, Arabic has driven the emergence of Arabic Text Classification (ATC) as a key research area. However, the field of ATC faces considerable challenges, primarily due to the linguistic complexity of the language and the high computational demands of its processing, which can impact the performance of realtime systems. This dissertation aims to bridge the gap between effectiveness and efficiency in ATC, particularly in resource-constrained environments.
The first objective of this research is to review existing ATC techniques, including preprocessing methods, vectorization strategies, dimensionality reduction techniques, and both classical machine learning and deep learning models, in order to provide a comprehensive understanding of current approaches. The second objective is to propose three innovative methods to enhance computational efficiency through dimensionality reduction while improving or at least maintaining high classification effectiveness. These methods are specifically designed for Modern Standard Arabic (MSA) text classification and are evaluated against state-of-the-art methods.
The dissertation presents the use of Principal Component Analysis (PCA), DistanceBased Meta-Features (DBMFs) for feature extraction, and the development of a new hybrid approach called "Tasneef ", which addresses computational challenges in Arabic text processing and outperforms state-of-the-art deep learning models and dimensionality reduction techniques. Through these contributions, this dissertation advances the state of the art in ATC by focusing on dimensionality reduction, which improves classification accuracy and reduces memory usage and runtime. |
URI/URL: | http://dspace.univ-setif.dz:8888/jspui/handle/123456789/5322 |
Collection(s) : | Thèses de doctorat
|
Fichier(s) constituant ce document :
|
Tous les documents dans DSpace sont protégés par copyright, avec tous droits réservés.
|