🖼️ Image Captioning

Upload a photo and generate a natural-language description using one of three trained models — or compare all at once.

Upload Image

Model

BLIP + LoRA produces the best captions

Model details
• Custom 5k / 100k — EfficientNet-V2-S + Transformer, trained from scratch on COCO subsets
• BLIP + LoRA — Salesforce BLIP base fine-tuned with LoRA adapters on COCO 2014

Runtime device: cpu · First inference per model is slower (lazy loading)