Google has recently announced its new service called “Google Cloud Video Intelligence”. The purpose of this service is to offer tagging and annotations of digital videos.

I will try this service on a trailer of a French parody. This movie is made of several scenes taken from erotic movies of the seventies.

Why this parody?

  • because it is fun
  • because it is composed of a lot of different scenes
  • because it is short (so it won’t cost me a lot)
  • because, as it is related to erotic of the seventies, I am curious about the result!

Caution: this video is not a porn video, but is indeed not safe for work (#nsfw)

What information can the service find?

Shot change detection

This feature will detect the different scenes and display their time range. There is no further analysis of the scene. That is to say that it won’t tell, by now, that the first scene is about a sports competition. But indeed it will describe that the first scene occurs from the first microsecond until the xxx microsecond and so on.

Label detection

The more interesting feature is the label detection.

With this operation, the service will display tags of any element found in the video, as well as the time range of the video where they can be seen.

For example, it may tell you that there is a dog in the video between x and y micro-seconds as well as between w and z micro-seconds.

Preparing the video

I have downloaded the video, thanks to youtube-dl and I have uploaded it to Google Cloud Storage as the API expects the video to be here. There may be a way to post the video encoded in base64 directly, but that would have been less convenient for my tests.

screnshot

Querying Google Cloud Video Intelligence

This test is made with the simple REST API with curl.

Preparing the request

To actually use the API, we need to perform a POST request. The payload is a simple JSON file where we specify:

  • the URI of the video file to process
  • an array of features to use among: Shot change detection and/or label detection

Here is my payload. I want both features for my test:

1
2
3
4
{
    "inputUri": "gs://video-test-blog/trailer.mp4",
    "features": ["SHOT_CHANGE_DETECTION","LABEL_DETECTION"]
}

Launching the request

Authorization

To actually use the service, I need an authorization token. This token is linked to a service account. Then with the token, we can trigger the analysis by using this curl command:

1
2
3
4
curl -s -k -H 'Content-Type: application/json' \
      -H 'Authorization: Bearer MYTOKEN' \
      'https://videointelligence.googleapis.com/v1beta1/videos:annotate' \
      -d @demo.json

The action replies with a JSON containing an operation name. Actually, the operation is long and asynchronous. This operation name can be used to get the processing status.

1
2
3
{
   "name": "us-east1.16784866925473582660"
}

Getting the status

To request the status, we need to query the service to get the status of the operation:

1
2
3
curl -s -k -H 'Content-Type: application/json' \
      -H 'Authorization: Bearer MYTOKEN' \
      'https://videointelligence.googleapis.com/v1/operations/us-east1.16784866925473582660'

It returns a result in json that in which we can find three important fields:

  • done: a boolean that tells whether the processing of the video is complete or not
  • shotAnnotations: an array of the shot annotations as described earlier
  • labelAnnotations: an array of label annotations

Here is a sample output: (the full result is here)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
{
  "response": {
    "annotationResults": [
      {
        "shotAnnotations": [
          // ...
          {
            "endTimeOffset": "109479985",
            "startTimeOffset": "106479974"
          }
        ],
        "labelAnnotations": [
          // ... 
          {
            "locations": [
              {
                "level": "SHOT_LEVEL",
                "confidence": 0.8738658,
                "segment": {
                  "endTimeOffset": "85080015",
                  "startTimeOffset": "83840048"
                }
              }
            ],
            "languageCode": "en-us",
            "description": "Acrobatics"
          },
        ],
        "inputUri": "/video-test-blog/trailer.mp4"
      }
    ],
    //...
  },
  "done": true,
  //...
}

Interpreting the results

Tag cloud

I will only look at the label annotations. The API has found a lot of label described under the description fields and 1 to N location where such a description is found.

What I can do is to manipulate the data to list all the label with their frequency.

You can find here a little go code that will display labels as many times as they occur.

For example:

Abdomen Abdomen Abdomen Acrobatics Action figure Advertising Advertising ...

This allows me to generate a tag cloud with the help of this website:

So here is the visual result of what the service has found in the video:

tag cloud

Annotated video

To find out where the labels are, I made a little javascript that display the elements alongside of the youtube video. Just click on the video and the tags will be displayed below.

    Conclusion

    There is a lot more to do than simply displaying the tags. For example, We could locate an interesting tag, take a snapshot of the video, and use the photo API to find websites related to this part of the video.

    For example, in this video, it can be possible to find the original movies were people are dancing for example.

    I will postpone this for another geek-time.

    P.S. The javascript has been made with gopherjs. It is not optimize at all (I should avoid the encoding/json package for example). If you are curious about the implementation, the code is here, here and here.