IBM MAX Image Caption Library Examples

Riley James
3 min readMay 23, 2020

I was looking for an image captioning solution for a few images. IBM’s MAX image caption library was the first one I found, results, not great. Pretty terrible on these kinds of images really.

IBM’s Max Image Caption library: https://github.com/IBM/MAX-Image-Caption-Generator

How to use:

$ docker run -it -p 5000:5000 codait/max-image-caption-generator

Downloads the image and runs it.

Then you have a the api exposed locally: http://0.0.0.0:5000

You can use it via curl, like so:

curl -F "image=@samples/surfing.jpg" -X POST http://localhost:5000/model/predict

Or, you can test it via the swagger documentation:

Open you browser to: http://0.0.0.0:5000

Some Image & Response Examples:

{
“status”: “ok”,
“predictions”: [
{
“index”: “0”,
“caption”: “a bunch of green bananas hanging from a tree .”,
“probability”: 0.004498371145962152
},
{
“index”: “1”,
“caption”: “a bunch of bananas hanging from a tree .”,
“probability”: 0.0026828066468930407
},
{
“index”: “2”,
“caption”: “a bunch of bananas growing on a tree .”,
“probability”: 0.0008265803491343649
}
]
}
{
“status”: “ok”,
“predictions”: [
{
“index”: “0”,
“caption”: “a group of birds standing on top of a beach .”,
“probability”: 0.0009019170389011701
},
{
“index”: “1”,
“caption”: “a group of birds standing on top of a sandy beach .”,
“probability”: 0.0008502319491804747
},
{
“index”: “2”,
“caption”: “a group of birds sitting on top of a beach .”,
“probability”: 0.0006056828923520915
}
]
}
{
“status”: “ok”,
“predictions”: [
{
“index”: “0”,
“caption”: “a close up of a pair of scissors”,
“probability”: 0.00263403752634752
},
{
“index”: “1”,
“caption”: “a close up of a pair of scissors on a table”,
“probability”: 0.0004045068084737849
},
{
“index”: “2”,
“caption”: “a close up of a stuffed animal on a table”,
“probability”: 0.00013753994480485387
}
]
}
A funny looking pair of scissors
{
“status”: “ok”,
“predictions”: [
{
“index”: “0”,
“caption”: “a close up of a pair of scissors”,
“probability”: 0.00032490566885802724
},
{
“index”: “1”,
“caption”: “a close up of a person holding a kite”,
“probability”: 0.00017438483190880002
},
{
“index”: “2”,
“caption”: “a close up of a pair of scissors on a table”,
“probability”: 0.00006401656661012868
}
]
}

Conclusion:

Useless. Cool. Back to the drawing board. Obviously the dataset used to train the model is not suitable for the dataset I’m trying to caption.

--

--