Iso/iec jtc 1/sc 29 N


ReImgServer (Remote Image Recognition Registration Server)



Yüklə 1,86 Mb.
səhifə14/19
tarix19.10.2018
ölçüsü1,86 Mb.
#74906
növüApplication form
1   ...   11   12   13   14   15   16   17   18   19

ReImgServer (Remote Image Recognition Registration Server)


MAREC provides a video source URL and one or multiple processing server URLs where image recognition (and tracking) libraries are available. A set of target resources (images or descriptors) and the associated augmentation resources are stored in remote databases, on the processing server or anywhere else on the Web. The ARAF Browser sends video frames or video frame descriptors to the processing server, considering the MAREC preferences (if applicable) and the processing server capabilities. The server performs the recognition (and tracking) process on the received video data, searching in the set of the target resources that are stored in the database(s). Depending on the format of the video frame, the server may need to extract the descriptors before performing the recognition (and tracking) process. One of the server’s responses is composed by one or multiple augmentation resource URLs that are associated with the recognized target resource and (optionally) the pose matrix of the recognized object within the video frame. The augmentation resource can be an URL to a media file (see table Augmentation media types) or a proprietary string. The proprietary string is not interpreted by the default prototype implementation therefore the MAREC has to update the default prototype implementation. The MAREC has the option to let the ARAF Browser do the tracking of the recognized target resource locally. In this case the processing server must be able to send the recognized target resource to the ARAF Browser which in turn must be able to perform the tracking. Detailes are presented below in the Functionality and Semantics.


BIFS Textual Description

EXTERNPROTO ReImgServer [

exposedField SFString videoSource ""

exposedField SFInt32 frameEncodingType 0

exposedField SFInt32 processingType 0

exposedField MFString processingServerURL []

exposedField SFBool enabled FALSE

exposedField SFInt32 maximumDelay 200 #milliseconds

exposedField SFInt32 optimalDelay 50 #milliseconds

exposedField MFVec2f recognitionRegion []

eventOut MFString augmentationMediaURL

eventOut MFInt32 augmentationMediaType

eventOut MFString augmentationString

eventOut MFVec3f onTranslation

eventOut MFRotation onRotation

eventOut SFInt32 onError

]”org:mpeg:remote_image_recognition_registration_server”

Functionality and semantics

MAREC provides one or multiple processing server URLs where recognition (and tracking) libraries are available, along with a video source where the recognition process shall be performed and optionally the encoding code of the video frames that are sent to processing server. The ARAF Browser must use the provided processing server URLs as an external resource that is able to perform the recognition (and tracking) of the target resources that are stored in the external repositories. The recognition (and tracking) result of the processing server is an URL to an augmentation resource and the type of the media file that can be found at the given URL and optionally the pose matrix of the recognized object in the video frame. Another valid server's response is a proprietary string that is associated to the recognized object in the video frame. Considering that the processing server knows which was the last recognized target resource for a given client, a tracking algorithm can be performed remotely. In the case that the processing server is able to only perform image recognition, the ARAF Browser can choose to receive the target image descriptors from the processing server in order to perform the tracking locally. This case would be considered if only the MAREC sets the correct code in the processingType field (see Server Capabilities) and the ARAF Browser is capable of performing tracking of a target resource. The possible scenarious are described throughout the fields' descriptions below.

A ARAF compliant processing server shall understand the HTTP requests presented in the following table:



ARAF Browser request

Request Type

Description

Processing Server response

Description

pServer/alive

GET

Get the unique key, the server parameters and capabilities

unique key 64-bit ,

video frame code, server capability code, image descriptor code



The unique key shall be used to indentify future requests from client. The video frame code specifies the formats of the video frame that are supported by the server. The ARAF Browser decides how the video data will be encoded before is sent to the server by considering this response and the MAREC preferences (see frameEncodingType). The server capability code informs the ARAF Browser about the type of the processing that the server is able to perform. The image descriptor code informs the ARAF Browser about the data format of the target image descriptor that might be sent to the ARAF Browser (for local tracking).

pServer

key&frame_format

POST

Inform the processing server about the chosen video frame format

True/False

The server reponse is True if the data is correctly received and False otherwise

pServer

key&frame

POST

Send a new video frame to the server

recognized, rotation, translation

The server reponse is a URL to an augmentation resource and its pose matrix (rotation and translation) if applicable.
















Communication Workflow:

  1. ARAF Browser interrogates the Processing Server (GET /alive) in order to detect its status and to receive the server parameters and capabilities. The server returns:

    1. a unique key that shall be transmitted by ARAF Browser in future requests,

    2. the list of codes describing the supported video frame formats (file types or image descriptors). See table Target Image Formats and table Target Image Descriptors Formats for the supported codes and their meaning.

    3. the list of codes describing the sever capabilities. See table Server Capabilities for the supported codes and their meaning.

    4. the code of the image descriptor format that might be sent to the ARAF Browser based on the MAREC preferences (see processingType field description). The ARAF Browser must know if its local tracking library is capable of interpreting the image descriptors data received from the processing. See table Target Image Descriptors Formats. If applicable, the processing server sends image descriptors to the ARAF Browser (for local tracking) in the specified format.

The server response must be in the following format:

key=unique key 64-bit

&frame_code=[video frame format codes];

&server_capability_code=[server capability codes]
&image_descriptor_code=[the codes specifying supported imaget descriptor formats]

E.g of a possible server response:

key=2e45325f4f&frame_code=0&server_capability_code=1&image_descriptor_code=5


  1. Once the key has been received, the ARAF Browser knows that the processing server is ready to perform the recognition (and tracking) process. The ARAF Browser decides on one video frame encoding type, based on the user preference (if specified) and on the server's response then it informs the processing server about the chosen format. The video frames of this session are sent only in the specified encoding type. In addition to the video frame format the ARAF Browser chooses one of the available capabilities to be performed by the processing server. Therefore, a capability keyword (see table Server Capabilities) has to be transmitted to processing server along with the unique key and the chosen video frame format. The POST request is in the following format:

key=unique_key

&frame_code=the chosen encoding format code
&server_capability_code=one of the available server capabilities code

The processing server returns TRUE if the data is correctly received or FALSE otherwise.




  1. The ARAF Browser sends a video frame to the processing server. The video frame is transmitted in the format that the processing server has been previously informed.

key=unique_key

&frame_data=the camera frame
The processing server's response can be:

  • a URL to an augmentation resource that corresponds to the recognized target image and its corresponding pose matrix (if applicable) or the recognized target resource descriptors (if applicable) or FALSE if no resource is recognized.

resource=url to an augmentation resource

&descriptor=image descriptor in the previously specified format //optional
&translation=x,y,z //optional


&rotation=x,y,z,q //optional

  • a proprietary string

If the processing server does not implement tracking capabilities then the response will be composed only by the augmentation resource URL, unless the MAREC specified that he prefers the ARAF Browser to perform the tracking locally (see the description of the processingType field).

The case where the processing server returns a proprietary string is not covered by the standard. The MAREC should know how the string has to be interpreted, because he is the one providing the server URL. In this case the server response is transmitted to the MAR Scene exactly in the form as it comes.



  1. The loop starts over from point 3 whenever the ARAF Browser has to send a new video frame data to the processing server.


videoSource is a SFString specifying the URI/URL of the video where the recognition (and tracking) process shall be performed on. The videoSource can be one of the following:

  1. Live 2D video camera feed

    1. a URI to one of the cameras available on the end user’s device. The possible values are specified in table Camera URIs.

    2. a URL to an external camera providing live camera feed.

  2. A URL to a prerecorded video file stored

  • locally on the end user’s device.

  • remotely on an external repository in the Web.

Based on the MAREC preferences, the video frames are sent to the processing server every X milliseconds (the ARAF Browser is in charge of computing the frequency), as long as the recognition (and tracking) process is enabled. The video frames are sent as compressed images or raw data (see table Target Image Formats for the supported image file formats), or as descriptor files (see table Target Image Descriptors Formats) depending on the processing server capabilities. The ARAF Browser is in charge of deciding the encoding type of the video frames that shall be sent to the processing server, considering the MAREC preferences and the capabilities of the server.
The accepted video formats are specified in table Video formats.

The accepted communication protocols are specified in table Communication Protocols.





Camera URI

Description

worldFacingCamera

Refers to the primary camera, usually located at the back of the device (back camera)

userFacingCamera

Refers to the secondary camera, usually located at the front of the device (front camera)

Camera URIs

Video file format

Reference

Raw video data

ISO/IEC14496-1:2010/Amd2:2014 Support for raw audio-visual data

MPEG4 Visual

ISO/IEC14496-2:2004 Visual

MPEG4 AVC

ISO/IEC14496-10:2012 Advanced Video Coding

Proprietary

See Annex B (ARAF support for proprietary formats)

Video formats

Communication protocol name

Reference

RTP

RFC3550-2003 Real Time Transport Protocol

RTSP

RFC2326-2013 version 2.0

Real Time Streaming Protocol



HTTP

RFC2616-1999 Hypertext Transfer Protocol

DASH

ISO/IEC23009-1:2012

Dynamic Adaptive Streaming over HTTP



Communication Protocols

One or multiple codes presented in this table might be sent by the processing server to the ARAF Browser (see the first step of the Communication Protocol between the ARAF Browser and the processing server). The purpose is to inform the ARAF Browser about the processing capabilities of the server. On the other hand, the MAREC can use the dedicated prototype field (processingType) to express his preferences related to the processing type that he expects from the server by specifying one of the codes defined in this table. The ARAF Browser finally decides what type of processing the server should perform, based on the server capability and the MAREC preferences. The server returns different responses based on the chosed processing type (see Communication workflow and processingType field description).

CODE

Description

0

Recognition only.

The processing server is capable of performing only image recognition. This means that the server response contains information about the indexes of the recognized target resources, as defined in the description of the fields.



1

Recognition and tracking.

The processing server is capable of performing image recognition and tracking. This means that the server response contains information about the indexes of the recognized target resources along with their pose matrixes (the computed ones), as defined in the description of the fields.



2

Recognition and image descriptors.

The processing server is capable of performing image recognition. The server response contains in addition to the recognition related information the descriptors of the recognized resource. This way the tracking can be performed by the ARAF Browser locally.



Server Capabilities

processingServerURL is a MFString used by the MAREC to specify one or multiple web addresses where ARAF compliant processing servers are available. A valid URL is one that points to a processing server that handles at least one target resource type and is able to understand ARAF Browser requests, as defined in table ARAF Browser – Processing Server Communication Workflow. Because a processing server can handle requests from multiple clients in the same time, a unique key is generated by the server and transmitted to the ARAF Browser. The ARAF Browser sends the unique generated key in each request to the processing server. This way the processing server knows the source of the request and therefore it can perform the recognition (and tracking) process on the correct set of the target resources.

Target Image File Formats

Reference

targetResourceType keyword

Code

JPEG

ISO/IEC 10918

JPEG

0

JPEG 2000

ISO/IEC 15444

J2K

1

PNG

ISO/IEC 15948

PNG

2

RAW

ISO 12234-2

RAW

3

Target Image Formats

Target Image Descriptor File Formats

Description

targetResourceType keyword

Code

standard

CDVA

CDVA

5

proprietary

proprietary

See Annex B

99

Target Image Descriptors Formats

frameEncodingType field is an MFInt32 containing an array of video frame type codes. The MAREC has the possibility to specify the desired encoding type of the video frames that are sent to the processing server. If multiple keywords are specified by the MAREC, the ARAF Browser chooses the first encoding type that matches the server capabilities. The possible pre-defined codes of the frame encoding types and their meaning are listed in tables Target Image Formats and Target Image Descriptors Formats. If the MAREC does not specify any encoding type code the ARAF Browser uses a default one. The MAREC should not specify an encoding type unless he knows that the processing server gives better results with one or another. The ARAF Browser in any case interrogates the server (see the first step of the Communication Protocol) to retrieve the supported encoding type codes and then it decides on one encoding type considering the MAREC preferences. The field is optional.

The processingType field is a SFInt32 value where the MAREC can specify his preferences related to the recognition and tracking processing performed on the server. The MAREC can choose one of the codes that are specified in the table Server capabilities. The description of the codes is found in the given table. The MAREC can only suggest what type of processing the server should perform implicitly affecting the server's response. The ARAF Browser is in charge of informing the processing server what kind of job it should perform, based on the MAREC preferences and the server capabilities (Communication workflow explains how).



  • 0: the MAREC suggests that he expects only the recognition related information from the processing server,

  • 1: the MAREC suggests that he expects the recognition and tracking related information from the processing server

  • 2: the MAREC suggests that the processing server should only perform the recognition process while the tracking should be performed by the ARAF Browser locally. In this case, the ARAF Browser must be provided with a tracking library that is able to understand the descriptor data received from the server. If the ARAF Browser is not capable of performing the tracking of the target resource, an error code should be triggered (see onError field description).

From the MAR scene point of view the second and the third cases are providing the same data with the difference that in one case the pose matrix is computed remotely while in the other case the pose matrix is computed locally by the ARAF Browser. The ARAF Browser informs the processing server about the MAREC preferences using a POST request as described in the Communication workflow presented above.

If the ARAF Browser cannot perform the image tracking for any reason, an error code is sent to the MAR Scene.



enabled is a SFInt32 value indicating if the recognition (and tracking) process is enabled (running). MAREC can control the status of the recognition (and tracking) process or he can let the ARAF Browser to decide whether the recognition (tracking) process should be running or not. The following table specifies the supported integer values of the enabled field.


Code

Description

-1

ARAF Browser decides when the recognition (and tracking) process is enabled/disabled. If not supported, the recognition process is always disabled unless a value of 0 or 1 is set by the MAREC.

0 (default)

The recognition (and tracking) process is disabled

1

The recognition (and tracking) process is enabled
A value of -1 specifies that the ARAF Browser decides the status of the recognition (and tracking) process.

The recognition (and tracking) process is inactive while enabled is 0

While enabled is 1, we differentiate the following cases, based on the video source:


  • local live video camera feed: the frames coming from the local live video camera feed are considered by the ARAF Browser in the recognition (and tracking) process.

  • remote live video camera feed: the frames coming from the remote live video camera stream are considered by the ARAF Browser in the recognition (and tracking) process. Technically the only difference between the first case and the second one is the source of the video frames. In this case, a streaming protocol should be used to fetch the remote video camera stream.

  • local prerecorded video file: as long as enabled is 1, the ARAF Browser plays the video file and the corresponding video frames are used in the recognition (and tracking) process. Whenever enabled is 0 the video play back is paused. On 1, the video starts playing from the point where it was last paused. The video play back starts from the beginning when the end of the video stream is reached and enabled is 1.

  • remote prerecorded video file: idem as in the previous case except that the remote file has to be downloaded first. If a streaming protocol is being used, the ARAF Browser may request (if possible) video frames whenever enabled is 1, as it would play back the video remotely.

MAREC must have the possibility to choose the quality of his MAR experience and in the same time, indirectly, the processing power consumed by the recognition (and tracking) process. MAREC can control this by setting a maximum acceptable delay. As described in the below graph a response time higher than the maximum delay indicates an unacceptable quality of the MAR experience therefore an ARAF Browser must not present it. Any response time with a delay lower than the specified maximum delay produces a MAR experience that is at least acceptable from the point of view of the MAREC, therefore an ARAF Browser should present the MAR experience. The MAREC can also specify an optimal delay constraint informing an ARAF Browser that there is no need in trying to provide recognition (and tracking) response with a higher frequency (lower delay) becuase the MAR experience has already reached the targeted quality.

Further, the two fields implementing this functionality are presented.



maximumDelay is a SFInt32 value measured in milliseconds specifying which is the maximum acceptable delay of the recognition (and tracking) process in order for the MAR experience to be presented by an ARAF Browser. The MAREC expects an answer from the recognition (and tracking) process every, at most, maximumDelay milliseconds.

optimalDelay is a SFInt32 value measured in milliseconds specifying which is the optimal delay of the recognition (and tracking) process. By setting this field, the MAREC suggests that there is no need in trying to provide a recognition (and tracking) response with a higher frequency (lower delay) because the MAR experience quality is the desired one.

recognitionRegion a MFVec2f field specifying two 2D points that are relative to the center of the video frame on which the recognition (and tracking) algorithm is performed. The first point indicates the lower left coordinate and the second one the upper right coordinate of a rectangle. By using this field, the MAREC suggests that only the inside area given by the rectangle has to be used in the recognition (and tracking) process, not the entire video frame. The recognition (and tracking) process can be improved by using a video frame region rather than the whole video frame but on the other hand the way how the original video frame is pre-preprocessed (e.g. cropped) may introduce delays. The ARAF Browser cannot ensure that by using a recognition region the overall processing speed is improved.

The following two fields are used to describe the pose matrix of a recognized resource. A pose matrix describes the relative position to the camera viewpoint of a recognized target resource. The pose matrix is described by one rotation and one translation vector, fields which are described below. These two fields are optional, meaning that the functionality of the prototype can be limited to only recognizing target resources, without computing their associated pose matrixes. If onTranslation is not used then implicitly onRotation is not used and vice-versa. If a translation is computed then implicitly the corresponding rotation should be computed by the processing server. In other words, if the processing server is capable of computing the pose matrix, the MAREC expects that both of the fields (onTranslation and onRotation) are set.



onTranslation is an exposed MFVec3f field where the translations of the recognized target resources are stored. A SFVec3f vector specifies where the corresponding recognized target resource is relative to the camera position within the video frame where the recognition process has been performed on. The default value of a translation vector is <0,0,0>. MAREC expects a SFVec3f translation for each recognized target resource or a default value if the translation could not be computed. The MAREC considers that the nth value of onTranslation corresponds to the target resource given by the value found on the nth index of onRecognition. The field is optional.

onRotation is a exposed MFRotation field where the rotations of the recognized target resources are stored. A SFRotation vector specifies how the corresponding recognized target resource is rotated with respect to the camera plane within the video frame. The default value of a rotation vector is <0,0,0,0>. The MAREC expects a SFRotation vector for each recognized target resource or a default value if the rotation could not be computed. The nth value of onRotation corresponds to the target resource given by the value found on the nth index of onRecognition. The field is optional

onRotation, onTranslation and onRecognition must have the same lengths.

augmentationMediaURL is an output event of type MFString where augmentation media URLs associated to the recognized target resource are referenced.

augmentationMediaTypes is an output event of type MFInt32 where the code of each augmentation media format referenced in the augmentationMediaURL field is specified. See the table Augmentation media types for the description of the supported types and their associated codes.

Augmentation resource type description

Augmentation resource keyword

Code

Image (see table Target Image Formats for supported formats)

image

0

Video (see table Video Formats for supported formats)

video

1

Audio (.wav, .mid, .mp3, .raw)

audio

2

BIFS scene (see ISO/IEC 14496-11)

2d/3d bifs scene (.mp4)

3

Augmentation media types

augmentationMediaString is a MFString field where the processing server can add textual information (e.g. labels, descriptions, etc) associated to the recognized target resources. Because arbitrary data can be transmitted using this field, the MAREC should know the string format and therefore how to interpret the data. The server response is transmitted to the field exactly in the form as it comes. The proprietary string is not interpreted by the default prototype implementation.

onError is an output event of type SFInt32.

Table Error codes presented below specifies onError possible values and their meaning.



Error code

Meaning

-1

The video source URL is invalid or not supported.

-3

None of the available video frame formats are supported by the processing server. In other words the ARAF Browser is not capable of sending the video frame to the processing server in one of the expected formats.

-5

Unknown error

-6

The ARAF Browser is not capable of performing the tracking of the target resource.

Error codes

Yüklə 1,86 Mb.

Dostları ilə paylaş:
1   ...   11   12   13   14   15   16   17   18   19




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə