IBM MAX Image Caption Library Examples

3 min readMay 23, 2020

I was looking for an image captioning solution for a few images. IBM’s MAX image caption library was the first one I found, results, not great. Pretty terrible on these kinds of images really.

IBM’s Max Image Caption library: https://github.com/IBM/MAX-Image-Caption-Generator

How to use:

$ docker run -it -p 5000:5000 codait/max-image-caption-generator

Downloads the image and runs it.

Then you have a the api exposed locally: http://0.0.0.0:5000

You can use it via curl, like so:

curl -F "image=@samples/surfing.jpg" -X POST http://localhost:5000/model/predict

Or, you can test it via the swagger documentation:

Open you browser to: http://0.0.0.0:5000

Some Image & Response Examples:

{
 “status”: “ok”,
 “predictions”: [
 {
 “index”: “0”,
 “caption”: “a bunch of green bananas hanging from a tree .”,
 “probability”: 0.004498371145962152
 },
 {
 “index”: “1”,
 “caption”: “a bunch of bananas hanging from a tree .”,
 “probability”: 0.0026828066468930407
 },
 {
 “index”: “2”,
 “caption”: “a bunch of bananas growing on a tree .”,
 “probability”: 0.0008265803491343649
 }
 ]
}

{
 “status”: “ok”,
 “predictions”: [
 {
 “index”: “0”,
 “caption”: “a group of birds standing on top of a beach .”,
 “probability”: 0.0009019170389011701
 },
 {
 “index”: “1”,
 “caption”: “a group of birds standing on top of a sandy beach .”,
 “probability”: 0.0008502319491804747
 },
 {
 “index”: “2”,
 “caption”: “a group of birds sitting on top of a beach .”,
 “probability”: 0.0006056828923520915
 }
 ]
}

{
 “status”: “ok”,
 “predictions”: [
 {
 “index”: “0”,
 “caption”: “a close up of a pair of scissors”,
 “probability”: 0.00263403752634752
 },
 {
 “index”: “1”,
 “caption”: “a close up of a pair of scissors on a table”,
 “probability”: 0.0004045068084737849
 },
 {
 “index”: “2”,
 “caption”: “a close up of a stuffed animal on a table”,
 “probability”: 0.00013753994480485387
 }
 ]
}

{
 “status”: “ok”,
 “predictions”: [
 {
 “index”: “0”,
 “caption”: “a close up of a pair of scissors”,
 “probability”: 0.00032490566885802724
 },
 {
 “index”: “1”,
 “caption”: “a close up of a person holding a kite”,
 “probability”: 0.00017438483190880002
 },
 {
 “index”: “2”,
 “caption”: “a close up of a pair of scissors on a table”,
 “probability”: 0.00006401656661012868
 }
 ]
}

Conclusion:

Useless. Cool. Back to the drawing board. Obviously the dataset used to train the model is not suitable for the dataset I’m trying to caption.

IBM MAX Image Caption Library Examples

How to use:

Some Image & Response Examples:

Conclusion:

Written by Riley James