- JSON representation
- VideoAnnotationResults
- LabelAnnotation
- Entity
- LabelSegment
- VideoSegment
- LabelFrame
- ExplicitContentAnnotation
- ExplicitContentFrame
- SpeechTranscription
- SpeechRecognitionAlternative
- WordInfo
Video annotation response. Included in the response
field of the Operation
returned by the operations.get
call of the google::longrunning::Operations
service.
JSON representation | |
---|---|
{
"annotationResults" :
[
{
object(
|
Fields | |
---|---|
annotationResults[]
|
Annotation results for all videos specified in |
VideoAnnotationResults
Annotation results for a single video.
JSON representation | |
---|---|
{ "inputUri" : string , "segmentLabelAnnotations" : [ { object( |
Fields | |
---|---|
inputUri
|
Video file location in Google Cloud Storage . |
segmentLabelAnnotations[]
|
Label annotations on video level or user specified segment level. There is exactly one element for each unique label. |
shotLabelAnnotations[]
|
Label annotations on shot level. There is exactly one element for each unique label. |
frameLabelAnnotations[]
|
Label annotations on frame level. There is exactly one element for each unique label. |
shotAnnotations[]
|
Shot annotations. Each shot is represented as a video segment. |
explicitAnnotation
|
Explicit content annotation. |
speechTranscriptions[]
|
Speech transcription. |
error
|
If set, indicates an error. Note that for a single |
LabelAnnotation
Label annotation.
JSON representation | |
---|---|
{ "entity" : { object( |
Fields | |
---|---|
entity
|
Detected entity. |
categoryEntities[]
|
Common categories for the detected entity. E.g. when the label is |
segments[]
|
All video segments where a label was detected. |
frames[]
|
All video frames where a label was detected. |
Entity
Detected entity from video analysis.
JSON representation | |
---|---|
{ "entityId" : string , "description" : string , "languageCode" : string } |
Fields | |
---|---|
entityId
|
Opaque entity ID. Some IDs may be available in Google Knowledge Graph Search API . |
description
|
Textual description, e.g. |
languageCode
|
Language code for |
LabelSegment
Video segment level annotation results for label detection.
JSON representation | |
---|---|
{
"segment" :
{
object(
|
Fields | |
---|---|
segment
|
Video segment where a label was detected. |
confidence
|
Confidence that the label is accurate. Range: [0, 1]. |
VideoSegment
Video segment.
JSON representation | |
---|---|
{ "startTimeOffset" : string , "endTimeOffset" : string } |
Fields | |
---|---|
startTimeOffset
|
Time-offset, relative to the beginning of the video, corresponding to the start of the segment (inclusive). A duration in seconds with up to nine fractional digits, terminated by ' |
endTimeOffset
|
Time-offset, relative to the beginning of the video, corresponding to the end of the segment (inclusive). A duration in seconds with up to nine fractional digits, terminated by ' |
LabelFrame
Video frame level annotation results for label detection.
JSON representation | |
---|---|
{ "timeOffset" : string , "confidence" : number } |
Fields | |
---|---|
timeOffset
|
Time-offset, relative to the beginning of the video, corresponding to the video frame for this location. A duration in seconds with up to nine fractional digits, terminated by ' |
confidence
|
Confidence that the label is accurate. Range: [0, 1]. |
ExplicitContentAnnotation
Explicit content annotation (based on per-frame visual signals only). If no explicit content has been detected in a frame, no annotations are present for that frame.
JSON representation | |
---|---|
{
"frames" :
[
{
object(
|
Fields | |
---|---|
frames[]
|
All video frames where explicit content was detected. |
ExplicitContentFrame
Video frame level annotation results for explicit content.
JSON representation | |
---|---|
{
"timeOffset" :
string
,
"pornographyLikelihood" :
enum(
|
Fields | |
---|---|
timeOffset
|
Time-offset, relative to the beginning of the video, corresponding to the video frame for this location. A duration in seconds with up to nine fractional digits, terminated by ' |
pornographyLikelihood
|
Likelihood of the pornography content.. |
SpeechTranscription
A speech recognition result corresponding to a portion of the audio.
JSON representation | |
---|---|
{
"alternatives" :
[
{
object(
|
Fields | |
---|---|
alternatives[]
|
Output only. May contain one or more recognition hypotheses (up to the maximum specified in |
SpeechRecognitionAlternative
Alternative hypotheses (a.k.a. n-best list).
JSON representation | |
---|---|
{
"transcript" :
string
,
"confidence" :
number
,
"words" :
[
{
object(
|
Fields | |
---|---|
transcript
|
Output only. Transcript text representing the words that the user spoke. |
confidence
|
Output only. The confidence estimate between 0.0 and 1.0. A higher number indicates an estimated greater likelihood that the recognized words are correct. This field is typically provided only for the top hypothesis, and only for |
words[]
|
Output only. A list of word-specific information for each recognized word. |
WordInfo
Word-specific information for recognized words. Word information is only included in the response when certain request parameters are set, such as enable_word_time_offsets
.
JSON representation | |
---|---|
{ "startTime" : string , "endTime" : string , "word" : string } |
Fields | |
---|---|
startTime
|
Output only. Time offset relative to the beginning of the audio, and corresponding to the start of the spoken word. This field is only set if A duration in seconds with up to nine fractional digits, terminated by ' |
endTime
|
Output only. Time offset relative to the beginning of the audio, and corresponding to the end of the spoken word. This field is only set if A duration in seconds with up to nine fractional digits, terminated by ' |
word
|
Output only. The word corresponding to this set of information. |