Dépôt Institutionnel de l'Université Ferhat ABBAS - Sétif 1 >
Faculté des Sciences >
Département d'Informatique >
Mémoires de master >
Veuillez utiliser cette adresse pour citer ce document :
http://dspace.univ-setif.dz:8888/jspui/handle/123456789/5623
|
Titre: | Governing AI Responsibly: Designing Guardrails for Large Language Models to Ensure Safety and Accuracy |
Auteur(s): | Mecheddal, Hemassa Sahli, Nourelhouda |
Mots-clés: | Designing Guardrails Large Language Models Accuracy |
Date de publication: | 2025 |
Résumé: | The rapid deployment of Large Language Models (LLMs) such as GPT-4 and PaLM across
critical domains raises urgent concerns about safety, bias, and misinformation. While their
generative capabilities are transformative, LLMs can also produce harmful, misleading, or inappropriate
content, often with high fluency and confidence. This thesis proposes a modular,
multi-signal guardrail architecture that governs LLM outputs to ensure ethical and safe deployment.
The system integrates three core signals: toxicity (via weighted Perspective API
attributes), semantic similarity to unsafe content (via embedding-based analysis), and topical
sensitivity (via regex-based). These signals are unified through a decision engine employing
threshold optimization tailored to topic risk levels. The architecture is evaluated using both
qualitative and quantitative metrics, demonstrating its ability to provide transparent, interpretable,
and adaptive moderation. Through Ablation Comparison of Guardrail Variants ,
the study highlights the trade-offs between rule-based rigidity and neural flexibility. Ultimately,
this research contributes a scalable, hybrid framework for responsible AI governance,
combining symbolic reasoning(Threshold Engine and the guardrail decision) and neural inference(
toxicity,safety, and topic sensitivity) to dynamically mitigate risks in real-time LLM
deployments. |
URI/URL: | http://dspace.univ-setif.dz:8888/jspui/handle/123456789/5623 |
Collection(s) : | Mémoires de master
|
Fichier(s) constituant ce document :
Il n'y a pas de fichiers associés à ce document.
|
Tous les documents dans DSpace sont protégés par copyright, avec tous droits réservés.
|