AI Image Captioner

Inventory Manager Demo GitHub

AI Image Captioner is exactly what it sounds like. This model takes in image input and outputs a caption for the image.

AI NLP Vision Transformers Seq2Seq model

How does it work?

Trained on the Flickr8k dataset, the model is a seq2seq model built on ViT and GPT2. The ViT processes images into a sequence of features, which is then mapped as an input into GPT2. GPT2 is then delegated with taking the visual understanding from ViT and generating natural language that describes the image.