DSpace
 

Dépôt Institutionnel de l'Université Ferhat ABBAS - Sétif 1 >
Faculté des Sciences >
Département d'Informatique >
Mémoires de master >

Veuillez utiliser cette adresse pour citer ce document : http://dspace.univ-setif.dz:8888/jspui/handle/123456789/5623

Titre: Governing AI Responsibly: Designing Guardrails for Large Language Models to Ensure Safety and Accuracy
Auteur(s): Mecheddal, Hemassa
Sahli, Nourelhouda
Mots-clés: Designing Guardrails
Large Language Models
Accuracy
Date de publication: 2025
Résumé: The rapid deployment of Large Language Models (LLMs) such as GPT-4 and PaLM across critical domains raises urgent concerns about safety, bias, and misinformation. While their generative capabilities are transformative, LLMs can also produce harmful, misleading, or inappropriate content, often with high fluency and confidence. This thesis proposes a modular, multi-signal guardrail architecture that governs LLM outputs to ensure ethical and safe deployment. The system integrates three core signals: toxicity (via weighted Perspective API attributes), semantic similarity to unsafe content (via embedding-based analysis), and topical sensitivity (via regex-based). These signals are unified through a decision engine employing threshold optimization tailored to topic risk levels. The architecture is evaluated using both qualitative and quantitative metrics, demonstrating its ability to provide transparent, interpretable, and adaptive moderation. Through Ablation Comparison of Guardrail Variants , the study highlights the trade-offs between rule-based rigidity and neural flexibility. Ultimately, this research contributes a scalable, hybrid framework for responsible AI governance, combining symbolic reasoning(Threshold Engine and the guardrail decision) and neural inference( toxicity,safety, and topic sensitivity) to dynamically mitigate risks in real-time LLM deployments.
URI/URL: http://dspace.univ-setif.dz:8888/jspui/handle/123456789/5623
Collection(s) :Mémoires de master

Fichier(s) constituant ce document :

Il n'y a pas de fichiers associés à ce document.

View Statistics

Tous les documents dans DSpace sont protégés par copyright, avec tous droits réservés.

 

Valid XHTML 1.0! Ce site utilise l'application DSpace, Version 1.4.1 - Commentaires