ICLR 2026

SHIELD: Suppressing Hallucinations In LVLM Encoders via Bias and Vulnerability Defense

Yiyang Huang, Liang Shi, Yitian Zhang, Yi Xu, Yun Fu

Northeastern University

Abstract

Large Vision-Language Models (LVLMs) excel in diverse cross-modal tasks. However, object hallucination — where models produce plausible but inaccurate object descriptions — remains a significant challenge. In contrast to previous work focusing on LLM components, this paper is the first to trace LVLM hallucinations to visual encoders and identifies three key issues: statistical bias, inherent bias, and vulnerability. To address these challenges, we propose SHIELD, a training-free framework that mitigates hallucinations through three strategies: re-weighting visual tokens to reduce statistical bias, introducing noise-derived tokens to counter inherent bias, and applying adversarial attacks with contrastive decoding to address vulnerability. Experiments demonstrate that SHIELD effectively mitigates object hallucinations across diverse benchmarks and LVLM families. Moreover, SHIELD achieves strong performance on the general LVLM benchmark, highlighting its broad applicability.

Motivation: Hallucinations Stem from Visual Encoders

Accurate visual feature extraction is crucial for LVLMs to generate reliable outputs. However, we find that bias and vulnerability in visual encoders distort features and intensify object hallucinations. Most LVLMs adopt visual encoders derived from pretrained CLIP models, which are influenced by the imbalanced distribution of visual concepts in pretraining data. This gives rise to three key issues:

Three key issues in LVLM visual encoders

We identify three key issues in LVLM visual encoders that cause hallucinations: statistical bias, inherent bias, and vulnerability.

Statistical Bias

The visual encoder over-relies on frequent visual patterns in pretraining data, causing overemphasis on the corresponding tokens with disproportionately high activation values. This distorts fine-grained perception by directing attention to overweighted tokens, often resulting in hallucinations.

Statistical bias analysis

Inherent Bias

The visual encoder is overdependent on dominant objects in pretraining data, leading it to generate erroneous representations regardless of input — even when the input is meaningless random noise. Dominant objects such as cars, chairs, and tables are frequently hallucinated.

Inherent bias analysis

Vulnerability

Visual encoders have limited robustness to noise and subtle perturbations, making them susceptible to constructing inaccurate visual representations. Even a few attack steps cause sharp performance drops, demonstrating that minor perturbations can exploit this weakness.

Vulnerability analysis

Method

Building on these observations, we propose SHIELD, a training-free method to mitigate object hallucinations by addressing statistical bias, inherent bias, and vulnerability in visual encoders. SHIELD integrates three strategies that operate directly on visual tokens and decoding logits — no model retraining required.

SHIELD Framework Pipeline

Overview of the SHIELD framework. Our training-free approach addresses all three encoder issues via token re-weighting, token subtraction, and contrastive decoding.

Token Re-weighting module

Token Re-weighting

Generates a naive caption, encodes it with the CLIP text encoder, and computes a similarity matrix with visual tokens. The resulting weights redistribute attention across tokens relevant to ground-truth objects, alleviating statistical bias from overemphasized tokens.

Token Subtraction module

Token Subtraction

Passes K random noise inputs through the visual encoder and averages the resulting tokens to estimate erroneous representations of dominant objects. These estimates are subtracted from the visual tokens, removing inherent bias at the feature level.

Contrastive Decoding module

Contrastive Decoding

Constructs an adversarial attack tensor by minimizing cosine similarity between perturbed image features and caption features. At each decoding step, contrasts bias-reduced logits with adversarial logits to suppress hallucination-prone outputs.

Results

CHAIR (LLaVA-1.5) ↓ lower is better

MethodCSCI
Vanilla48.814.2
VCD46.813.2
OPERA44.612.8
SHIELD36.610.3

POPE Avg (LLaVA-1.5) ↑ higher is better

MethodAccF1
Vanilla81.379.6
VCD84.684.4
OPERA84.785.4
SHIELD87.087.4

MME Hallucination ↑ higher is better

MethodLLaVA-1.5InstructBLIPQwen-VL
Vanilla565.3380.3587.3
VCD604.6447.6596.6
OPERA592.3384.3623.3
SHIELD668.3461.6668.3

AMBER Score (LLaVA-1.5) ↑ higher is better

MethodScoreCHAIR↓Hal.↓
Vanilla82.09.229.2
VCD82.98.128.6
OPERA86.58.331.2
SHIELD88.06.425.1

POPE COCO — All Splits, All LVLMs

LVLMMethodRandomPopularAdversarial
AccF1AccF1AccF1
LLaVA-1.5Vanilla83.281.381.880.078.977.5
VCD87.787.185.385.080.881.3
OPERA89.189.086.086.379.180.9
SHIELD91.391.187.487.682.583.6
InstructBLIPVanilla80.780.478.278.375.876.5
VCD84.583.681.481.079.579.5
OPERA89.889.683.484.080.781.8
SHIELD88.287.684.684.382.282.4
Qwen-VLVanilla84.782.684.182.082.280.3
VCD88.687.887.186.484.283.9
OPERA86.184.285.783.883.982.1
SHIELD89.288.687.687.184.384.2

MME Full (LLaVA-1.5 7B) ↑

MethodPerceptionCognitionTotal
Vanilla1279.2352.91632.1
VCD1363.9353.21717.1
OPERA1413.0304.21717.2
SHIELD1473.0337.81810.8

Efficiency (LLaVA-1.5 7B, CHAIR)

MethodCSTime (s)Memory
Vanilla48.82.5915.69 GB
VCD46.84.8916.52 GB
OPERA44.624.0134.88 GB
SHIELD36.67.3418.17 GB
MME Full radar chart

MME Full benchmark radar chart. SHIELD improves perception across all sub-categories while maintaining cognition performance.

Key takeaway: SHIELD achieves the best hallucination suppression while maintaining strong general-purpose performance (MME Full: 1810.8). It is also 3.3× faster than OPERA with less than half the memory overhead.

Qualitative Examples

Ablation Visualization

Effect of each SHIELD module on attention distribution and generated captions. Each module progressively corrects a different source of hallucination.

Ablation: statistical bias

Mitigating statistical bias

Ablation: inherent bias

Reducing inherent bias

Ablation: vulnerability

Addressing vulnerability

Case Studies

Detailed comparisons between Vanilla LLaVA-1.5 and SHIELD. Hallucinated objects are highlighted in red, correct descriptions in green.

Case study 1

Case study 1

Case study 2

Case study 2

Code

SHIELD works as a non-invasive wrapper. Just three lines to integrate:

import shield
from llava.model.builder import load_pretrained_model

# Load model as usual
tokenizer, model, image_processor, _ = load_pretrained_model(
    "liuhaotian/llava-v1.5-7b", None, "llava-v1.5-7b"
)

# One-line SHIELD setup
shield.wrap(model, tokenizer,
    caption_file="experiments/first_cap/llava15_coco_pope_first_caption.jsonl",
    cd_alpha=2.0, cd_beta=0.35,
)

# Generate with SHIELD
shield_kw = model.shield_prepare(image, image_tensor, "image.jpg", use_cd=True)
output_ids = model.generate(input_ids, **shield_kw, do_sample=True, max_new_tokens=1024)

See the GitHub repository for full installation instructions, data preparation, and evaluation scripts.

BibTeX

If you find this work useful, please cite our paper:

ICLR version:

@inproceedings{
huang2026shield,
title={{SHIELD}: Suppressing Hallucinations In {LVLM} Encoders via Bias and Vulnerability Defense},
author={Yiyang Huang and Liang Shi and Yitian Zhang and Yi Xu and Yun Fu},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=yk7FFLoNcP}
}

arXiv version:

@article{huang2025shield,
  title={SHIELD: Suppressing Hallucinations In LVLM Encoders via Bias and Vulnerability Defense},
  author={Huang, Yiyang and Shi, Liang and Zhang, Yitian and Xu, Yi and Fu, Yun},
  journal={arXiv preprint arXiv:2510.16596},
  year={2025}
}