Iso/iec jtc 1/sc 29 N


ReImgProxy (Remote Image Recognition Registration Proxy)



Yüklə 1,86 Mb.
səhifə13/19
tarix19.10.2018
ölçüsü1,86 Mb.
#74906
növüApplication form
1   ...   9   10   11   12   13   14   15   16   ...   19

ReImgProxy (Remote Image Recognition Registration Proxy)


The MAREC provides a set of target resources, a video source URL and one or multiple ARAF compliant processing server URLs where recognition (and tracking) libraries are available. The ARAF Browser communicates with any of the processing servers provided by the MAREC, sends the target resources and the video frames and receives the recognition (and tracking) result. The ARAF Browser composes the server result in the expected format before sending it to the ARAF scene, as described in the Functionality and Semantics below.

Even though in the presented schema (below) the augmentation media is presented as input along other data provided by the MAREC, the augmentation media is not used by the PROTO implementation, meaning that it is not part of any of the PROTO fields. The augmentation resources are of course required for the scene augmentation but they do not influence the recognition process.




BIFS Textual Description

EXTERNPROTO ReImgProxy [

exposedField SFString videoSource ""

exposedField MFString processingServerURL []

exposedField SFInt32 frameEncodingType 0

exposedField SFInt32 processingType 0

exposedField MFString targetResources []

exposedField MFString targetResourcesTypes []

exposedField SFBool enabled FALSE

exposedField SFInt32 maximumDelay 200 #milliseconds

exposedField SFInt32 optimalDelay 50 #milliseconds

exposedField MFVec2f recognitionRegion []

eventOut MFInt32 onRecognition []

eventOut MFVec3f onTranslation []

eventOut MFRotation onRotation []

eventOut SFInt32 onError

]”org:mpeg:remote_image_recognition_registration”

Functionality and semantics

The MAREC provides one or multiple processing server URLs where recognition (and tracking) libraries are available, along with the target resources to be recognized and the video source where the recognition process shall be performed on. The ARAF Browser uses the provided processing servers as an external resource that is able to perform the recognition (and tracking) of the target resources. The video frames are sent to the processing server in a format that is specified by the MAREC, otherwise the best suitable format is chosen by the ARAF Browser. The recognition (and tracking) result received by the ARAF scene is an array of integers representing the indexes of the recognized target resources and optionally their pose matrixes as described in the description of the fields below.

A ARAF compliant processing server shall understand the HTTP requests presented in the following table:



ARAF Browser request

Request Type

Description

Processing Server response

Description

pServer/alive

GET

Get the unique key, the server parameters and capabilities

unique key 64-bit , target resource code,

video frame code, server capability codes



The unique key shall be used to indentify future requests from the client. The target resource code specifies the formats of the target resources that are supported by the server. The video frame code specifies the formats of the video frame that are supported by the server. The ARAF Browser decides how the video data will be encoded before is sent to the server by considering this response and the MAREC preferences (see frameEncodingType). The server capability code informs the ARAF Browser about the type of the processing that the server is able to perform.

pServer

key&frame_format

POST

Inform the processing server about the chosen video frame format

True/False

The server reponse is True if the data is correctly received and False otherwise

pServer

key&target&type&id

POST

Send each target resource along type code and the unique target resource id

True/False

The server reponse is True if the data is correctly received and False otherwise

pServer

key&frame

POST

Send a new video frame to the server

identified, rotation, translation

The server reponse is a list containing recognized target resources ids and optionally their pose matrixes (rotation and translation).
















ARAF Browser – Processing Server Communication Workflow

Communication Workflow:

  1. The ARAF Browser interrogates the processing server (GET /alive) in order to detect its status and to receive the server parameters and capabilities. The server returns:

    1. a unique key that shall be transmitted by ARAF Browser in future requests,

    2. the list of codes describing the supported target resources formats (file types or image descriptors). See table Target Image Formats and table Target Image Descriptors Formats for the supported codes and their meaning.

    3. the list of codes describing the supported video frame formats (file types or image descriptors). See table Target Image Formats and table Target Image Descriptors Formats for the supported codes and their meaning.

    4. the list of codes describing the sever capabilities. See table Server Capabilities for the supported codes and their meaning.

The server response must be in the following format:

key=unique key 64-bit

&resource_code=[target resource format codes];

&frame_code=[video frame format codes];

&server_capability_code=[server capability codes]
E.g of a possible server response:

key=2e45325f4f&resource_code=0,1&frame_code=0&server_capability_code=0,5


  1. Once the key has been received, the ARAF Browser knows that the processing server is ready to perform the recognition (and tracking) process. The ARAF Browser decides on one video frame encoding type, based on the user preference (if specified) and on the server's response then it informs the processing server about the chosen format. The video frames of this session are sent only in the specified encoding type. In addition to the video frame format the ARAF Browser chooses one of the available capabilities to be performed by the processing server. Therefore, a capability keyword (see table Server Capabilities) has to be transmitted to processing server along with the unique key and the chosen video frame format.

The second type of request contains the target resources provided by the MAREC (one request per each target resource). Each resource must have associated a unique ID, the one to be returned by the processing server when it is recognized, and the file format of the resource. The POST requests are as follows:

    1. Send the chosen video frame format:

key=unique_key

&frame_code=the chosen encoding format code

&server_capability_code=one of the available server capabilities code

    1. Every target resource is sent separately using a POST request:

key=unique_key

&target_resource=the target resource data
&type_code=the target resource encoding type


&id=the id of the target resource uniquely identifying the target resource in the MAR scene

The processing server returns TRUE if the data is correctly received or FALSE otherwise.



  1. The ARAF Browser sends a video frame to the processing server. The video frame has to be sent in the exact format that the processing server has been previously informed.

key=unique_key

&frame_data=the camera frame

The processing server's response is a list of IDs of the recognized target resources and optionally their corresponding pose matrixes or FALSE if no target resource is recognized.



identified=comma separated ids of the recognized target resources
&translation=[x,y,z]; //optional


&rotation=[x,y,z,q]; //optional

Where


  • identified is a list of integer values separated by commas

  • translation contains groups of 3 float values separated by semicolon. Each value is separated by comma.

  • rotation contains groups of 4 float values separated by semicolon. Each value is separated by comma.

If the processing server does not have tracking capabilities then the response will be composed only from the list of the recognized target resources (identified).

  1. The loop starts over from point 3 whenever the ARAF Browser has to send a new video frame data to the processing server.

videoSource is a SFString specifying the URI/URL of the video where the recognition (and tracking) process shall be performed on. The videoSource can be one of the following:

  1. Live 2D video camera feed

    1. a URI to one of the cameras available on the end user’s device. The possible values are specified in table Camera URIs.

    2. a URL to an external camera providing live camera feed.

  2. A URL to a prerecorded video file stored

  • locally on the end user’s device.

  • remotely on an external repository in the Web.

Based on the MAREC preferences, the video frames are sent to the recognition (and tracking) library every X milliseconds (the ARAF Browser is in charge of computing the frequency), as long as the recognition (and tracking) process is enabled. The video frames are sent as compressed images or raw data (see table Target Image Formats for the supported image file formats), or as descriptor files (see table Target Image Descriptors Formats) depending on the processing server capabilities. The ARAF Browser is in charge of deciding the encoding type of the video frames that shall be sent to the processing server, considering the MAREC preferences and the capabilities of the server.
The accepted video formats are specified in table Video formats.

The accepted communication protocols are specified in table Communication Protocols.





Camera URI

Description

worldFacingCamera

Refers to the primary camera, usually located at the back of the device (back camera)

userFacingCamera

Refers to the secondary camera, usually located at the front of the device (front camera)

Camera URIs

Video file format

Reference

Raw video data

ISO/IEC14496-1:2010/Amd2:2014 Support for raw audio-visual data

MPEG4 Visual

ISO/IEC14496-2:2004 Visual

MPEG4 AVC

ISO/IEC14496-10:2012 Advanced Video Coding

Proprietary

See Annex B (ARAF support for proprietary formats)

Video formats

Communication protocol name

Reference

RTP

RFC3550-2003 Real Time Transport Protocol

RTSP

RFC2326-2013 version 2.0

Real Time Streaming Protocol



HTTP

RFC2616-1999 Hypertext Transfer Protocol

DASH

ISO/IEC23009-1:2012

Dynamic Adaptive Streaming over HTTP



Communication Protocols

One or multiple codes presented in this table might be sent by the processing server to the ARAF Browser (see the first step of the Communication Protocol). The purpose is to inform the ARAF Browser about the processing capabilities of the server. On the other hand, the MAREC can use the dedicated prototype field (processingType) to express his preferences related to the processing type that he expects from the server by specifying one of the codes defined in this table. The ARAF Browser finally decides what type of processing the server should perform, based on the server capability and the MAREC preferences. The server returns different responses based on the chosed processing type (see Communication workflow and processingType field description).

CODE

Description

0

Recognition only.

The processing server is capable of performing image recognition only. This means that the server response contains information about the indexes of the recognized target resources, as defined in the description of the fields.



1

Recognition and tracking.

The processing server is capable of performing image recognition and tracking. This means that the server response contains information about the indexes of the recognized target resources along with their pose matrixes (the computed ones), as defined in the description of the fields.



Server Capabilities

processingServerURL is a MFString used by the MAREC to specify one or multiple web addresses where ARAF compliant processing servers are available. A valid URL is one that points to a processing server that handles at least one target resource type and is able to understand ARAF Browser requests, as defined in table ARAF Browser – Processing Server Communication Workflow. Because a processing server can handle requests from multiple clients in the same time, a unique key is generated by the server and transmitted to the ARAF Browser. The ARAF Browser sends the unique generated key in each request to the processing server. This way the processing server knows the source of the request and therefore it can perform the recognition (and tracking) process on the correct set of the target resources.

frameEncodingType field is an MFInt32 containing an array of video frame type codes. The MAREC has the possibility to specify the desired encoding type of the video frames that are sent to the processing server. If multiple keywords are specified by the MAREC, the ARAF Browser chooses the first encoding type that matches the server capabilities. The possible pre-defined codes of the frame encoding types and their meaning are listed in tables Target Image Formats and Target Image Descriptors Formats. If the MAREC does not specify any encoding type code the ARAF Browser uses a default one. The MAREC should not specify an encoding type unless he knows that the processing server gives better results with one or another. The ARAF Browser in any case interrogates the server (see the first step of the Communication Protocol) to retrieve the supported encoding type codes and then it decides on one encoding type considering the MAREC preferences. The field is optional.

The processingType field is a SFInt32 value where the MAREC can specify his preferences related to the tracking capabilities of the server as follows: if the field is 1 the ARAF Browser will request the processing server to provide the pose matrix of the recognized target resource. If the processing server is not capable of computing the pose matrix then the related fields (onTranslation and onRotation) will be empty, disregarding the MAREC request. By setting a 0 value on the tracking field the MAREC suggests that the processing should not perform image tracking at all, because the result it's not going to be used in the MAR Scene. The ARAF Browser informs the processing server about the MAREC preferences using a POST request as described in the Communication workflow above.



targetResources is an MFString where the target resources to be recognized (and tracked) within the MAR experience are specified. A URI can point to a local or remote resource file. The accepted communication protocols for the remote resources are the ones specified in table Communication Protocols. Any of the below combinations describes a valid targetResources assignment:

  • URIs pointing to target images. The file formats specified in table Target Image Formats are accepted.

  • URIs pointing to files where target image descriptors are found. A file contains descriptors of one single target image. The file formats specified in table Target Image Descriptor Formats are accepted.

  • URLs pointing to files where multiple target image descriptors are found. One file contains descriptors of multiple target images. The file formats specified in table Target Image Descriptor Formats are the supported ones.

  • any combination of the cases described above can coexist in the same targetResources field.

Target Image File Formats

Reference

targetResourceType keyword

targetResourceType code

JPEG

ISO/IEC 10918

JPEG

0

JPEG 2000

ISO/IEC 15444

J2K

1

PNG

ISO/IEC 15948

PNG

2

RAW

ISO 12234-2

RAW

3

Target Image Formats

Target Image Descriptor File Formats

Description

targetResourceType keyword

targetResourceType code

standard

CDVA

CDVA

5

proprietary

proprietary

See Annex B

99

Target Image Descriptors Formats

The targetResourcesTypes field is an MFString containing an array which specifies the type of each target resource defined in the targetResources field. Each target resource must have associated a type in order for the ARAF Browser to know how to interpret the data. The possible pre-defined keywords of the targetResourcesTypes and their meaning are listed in tables Target Image Formats and Target Image Descriptors Formats. If the target resource is a proprietary descriptor file and in addition to the actual image descriptors data the proprietary recognition (and tracking) library needs some other data (for e.g. an XML file), the related content should be stored in a directory that has the same name as the resource type (for e.g. target_resource_type_keywod/) in order for the ARAF Browser to know where the required files can be found. The names of the files within the directory have to be the same name as the corresponding descriptor file name.



enabled is a SFInt32 value indicating if the recognition (and tracking) process is enabled (running). MAREC can control the status of the recognition (and tracking) process or he can let the ARAF Browser to decide whether the recognition (tracking) process should be running or not. The following table specifies the supported integer values of the enabled field.


Code

Description

-1

ARAF Browser decides when the recognition (and tracking) process is enabled/disabled. If not supported, the recognition process is always disabled unless a value of 0 or 1 is set by the MAREC.

0 (default)

The recognition (and tracking) process is disabled

1

The recognition (and tracking) process is enabled
A value of -1 specifies that the ARAF Browser decides the status of the recognition (and tracking) process.

The recognition (and tracking) process is inactive while enabled is 0

While enabled is 1, we differentiate the following cases, based on the video source:


  • local live video camera feed: the frames coming from the local live video camera feed are considered by the ARAF Browser in the recognition (and tracking) process.

  • remote live video camera feed: the frames coming from the remote live video camera stream are considered by the ARAF Browser in the recognition (and tracking) process. Technically the only difference between the first case and the second one is the source of the video frames. In this case, a streaming protocol should be used to fetch the remote video camera stream.

  • local prerecorded video file: as long as enabled is 1, the ARAF Browser plays the video file and the corresponding video frames are used in the recognition (and tracking) process. Whenever enabled is 0 the video play back is paused. On 1, the video starts playing from the point where it was last paused. The video play back starts from the beginning when the end of the video stream is reached and enabled is 1.

  • remote prerecorded video file: idem as in the previous case except that the remote file has to be downloaded first. If a streaming protocol is being used, the ARAF Browser may request (if possible) video frames whenever enabled is 1, as it would play back the video remotely.

MAREC must have the possibility to choose the quality of his MAR experience and in the same time, indirectly, the processing power consumed by the recognition (and tracking) process. MAREC can control this by setting a maximum acceptable delay. As described in the below graph a response time higher than the maximum delay indicates an unacceptable quality of the MAR experience therefore an ARAF Browser must not present it. Any response time with a delay lower than the specified maximum delay produces a MAR experience that is at least acceptable from the point of view of the MAREC, therefore an ARAF Browser should present the MAR experience. The MAREC can also specify an optimal delay constraint informing an ARAF Browser that there is no need in trying to provide recognition (and tracking) response with a higher frequency (lower delay) becuase the MAR experience has already reached the targeted quality.

Further, the two fields implementing this functionality are presented.



maximumDelay is a SFInt32 value measured in milliseconds specifying which is the maximum acceptable delay of the recognition (and tracking) process in order for the MAR experience to be presented by an ARAF Browser. The MAREC expects an answer from the recognition (and tracking) process every, at most, maximumDelay milliseconds.

optimalDelay is a SFInt32 value measured in milliseconds specifying which is the optimal delay of the recognition (and tracking) process. By setting this field, the MAREC suggests that there is no need in trying to provide a recognition (and tracking) response with a higher frequency (lower delay) because the MAR experience quality is the desired one.

recognitionRegion a MFVec2f field specifying two 2D points that are relative to the center of the video frame on which the recognition (and tracking) algorithm is performed. The first point indicates the lower left coordinate and the second one the upper right coordinate of a rectangle. By using this field, the MAREC suggests that only the inside area given by the rectangle has to be used in the recognition (and tracking) process, not the entire video frame. The recognition (and tracking) process can be improved by using a video frame region rather than the whole video frame but on the other hand the way how the original video frame is pre-preprocessed (e.g. cropped) may introduce delays. The ARAF Browser cannot ensure that by using a recognition region the overall processing speed is improved.

onRecognition is an output event of type MFInt32 specifying the indexes of the target resources that have been recognized. An index is an integer value representing the position of a target resource in the targetResources array (0 indexed). The index of the first target resource is 0 and it is incremented by one for each next target resource. If a target resource is a file containing descriptors for several images, each target image descriptor within the descriptor file is assigned the next index as they were separately specified.

The following two fields are used to describe the pose matrix of a recognized resource. A pose matrix describes the relative position to the camera viewpoint of a recognized target resource. The pose matrix is described by one rotation and one translation vector, fields which are described below. These two fields are optional, meaning that the functionality of the prototype can be limited to only recognizing target resources, without computing their associated pose matrixes. If onTranslation is not used then implicitly onRotation is not used and vice-versa. If a translation is computed then implicitly the corresponding rotation should be computed by the processing server. In other words, if the processing server is capable of computing the pose matrix, the MAREC expects that both of the fields (onTranslation and onRotation) are set.



onTranslation is an exposed MFVec3f field where the translations of the recognized target resources are stored. A SFVec3f vector specifies where the corresponding recognized target resource is relative to the camera position within the video frame where the recognition process has been performed on. The default value of a translation vector is <0,0,0>. MAREC expects a SFVec3f translation for each recognized target resource or a default value if the translation could not be computed. The MAREC considers that the nth value of onTranslation corresponds to the target resource given by the value found on the nth index of onRecognition. The field is optional.

onRotation is a exposed MFRotation field where the rotations of the recognized target resources are stored. A SFRotation vector specifies how the corresponding recognized target resource is rotated with respect to the camera plane within the video frame. The default value of a rotation vector is <0,0,0,0>. The MAREC expects a SFRotation vector for each recognized target resource or a default value if the rotation could not be computed. The nth value of onRotation corresponds to the target resource given by the value found on the nth index of onRecognition. The field is optional.

onRotation, onTranslation and onRecognition must have the same lengths.

onError is an output event of type SFInt32.

Table Error codes presented below specifies onError possible values and their meaning.



Error code

Meaning

-1

The video source URL is invalid or not supported.

-2

At least one target resource is invalid or not supported. This error can be triggered in the cases when the ARAF Browser is not able to read/access any of the target resources or the processing server does not support the format of at least one target resource.

-3

None of the available video frame formats are supported by the processing server. In other words the ARAF Browser is not capable of sending the video frame to the processing server in one of the expected formats.

-4

Unavailable recognition (and tracking) library for at least one target resource that has been specified by the MAREC.

-5

Unknown error

Error codes

Yüklə 1,86 Mb.

Dostları ilə paylaş:
1   ...   9   10   11   12   13   14   15   16   ...   19




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə