Case Study: Entropy Microdemo

Game Dev Corner

Home

Case Study: Entropy Microdemo — Hallucinations as Entropy Starvation

Hypothesis: Hallucinations might be entropy starvation in disguise. Injecting high‑quality entropy changes exploration paths and reduces false pattern reinforcement.

Problem: As models get more accurate, they can hallucinate more.
Hypothesis: Low‑entropy PRNG seeds reinforce stale attractors; true entropy (ERIS) enables healthier exploration.
Essentially: Uncertainty isn’t the threat — the threshold is.

Demonstration

A/B testing across identical prompts and params; only the seed source changes.

Models: GPT‑2, EleutherAI/pythia‑410M, TinyLlama‑1.1B‑Chat
Control: Fixed PRNG seed (42)
Variable: ERIS true entropy seed
Params: temperature 0.7, top‑k 50, do_sample true, identical max tokens

Methodological Note

NOTE: I totally forgot to save the inputs that I used the analysis for – so this is a few hours later with the same configs. Outputs shown reflect reruns under identical parameters, isolating the entropy source (PRNG vs ERIS) as the sole variable.

Observations by Model

GPT‑2

ERIS \\ more erratic, exploratory, meta‑aware; avoids some PRNG echo patterns
PRNG \\ safer, echo‑y, sometimes dull; sticks to reinforced grooves
Takeaway \\ sensitive to novel entropy but architecture limits gains

EleutherAI/pythia‑410M

PRNG \\ catastrophic loops or off‑topic derailments on some prompts
ERIS \\ avoids certain failure states; yields semantically adjacent paths
Takeaway \\ ERIS acts as a pattern interrupt to attractor loops

TinyLlama‑1.1B‑Chat

ERIS \\ more structured, responsive, pedagogical outputs; better task decomposition
PRNG \\ simpler, less interactive responses
Takeaway \\ with capable bases, high‑quality entropy unlocks better behaviors

Pipeline Angle: Rejection Sampling

Filtering pipelines (e.g., Llama 3.1’s Rejection Sampling) rely on candidate generations produced under a randomness source. If the source is PRNG, hidden patterns can shrink candidate diversity. ERIS raises candidate quality before filtering, giving the reward model better raw material and improving downstream SFT/DPO. Reference: Meta Llama 3.1 Post‑Training Pipeline

Resources

Back to Game Dev Corner