Discovery of Complex Behaviors through Contact-Invariant Optimization

Yüklə 0,85 Mb.

Pdf görüntüsü

səhifə	4/6
tarix	17.11.2018
ölçüsü	0,85 Mb.
	#80046

1 2 3 4 5 6

4.2.2

Simpliﬁed physics model

While the CIO method as described above can be used with stan-

dard physics models as implemented in existing physics engines,

our implementation relies on a simpliﬁed model yielding a favor-

able trade-off between physical realism and optimization efﬁciency.

Instead of representing the pose q directly and then computing the

end-effector positions p

(q) using forward kinematics, we repre-

sent the end-effector positions as well as their orientations directly

(i.e. as functions of spline parameters contained in s) and then de-

ﬁne the pose q using inverse kinematics; see Appendix. All mass

is assumed to be concentrated at the root bodies: the torso of each

character, as well as any passive objects. Non-smooth movements

of the (now-massles) limbs are avoided by including an accelera-

tion cost described later. The inverse dynamics are still in the form

(4) and the quadratic program deﬁning the contact force and control

is still in the form (6), but all computations are now simpliﬁed.

Representing the pose in terms of end-effector positions and orien-

tations makes it difﬁcult to enforce kinematic constraints exactly.

However we turn this to our advantage, by introducing an addi-

tional continuation method that allows limbs to stretch and joint

limits to be violated early in optimization. This is done by adding

quadratic costs to L

Physics

, that penalize any deviations of the limb

lengths from their reference values as well as any joint limit vio-

lations. We also penalize penetration of the character’s body parts

(approximated for collision with capsules shown in ﬁgure 2) against

the environment, or other body parts.

4.3

High-level goals and task cost

The cost L

Task

(s) encodes the high-level goals of the movement.

It includes task-speciﬁc terms specifying the desired outcome, and

generic terms (integrated over time) specifying that the movement

should be energy-efﬁcient and smooth:

Task

(s) =

(s))+

(s)

+ u

(s)

+ ¨

(s)

(9)

Here

are task-speciﬁc terms which only depend on the ﬁnal pose

, and b is an index over different tasks. Several tasks can be com-

posed together, such as combining a standing task with the moving

to target task. We use the above general form of L

Task

for all tasks

except for kicking/punching. In that case we specify an

at reg-

ular intervals when each target should be hit, and also include de-

pendence on ˙

q because we want the targets to be hit with a certain

end-effector velocity.

The general procedure for constructing the task-speciﬁc costs

to identify a vector of positional (and optionally velocity) features

(q) that are key to task b, deﬁne the desired feature values h

∗

the end of the movement (or at other important points in time such

as target hits), and then construct

(s)) = h

(s)) − h

∗

(10)

In this way, a ﬁnal position task

pos

can be speciﬁed by using h

pos

that selects torso position, and setting h

∗

pos

to the desired position.

Final orientation task

dir

can be deﬁned similarly for torso facing

direction. Standing task

stand

can be expressed by using a com-

bination of h

stand

and h

∗

stand

which speciﬁes that the center of torso

should be between two feet, the feet be fully extended, and the torso

direction be aligned with the vertical direction vector.

The relative importance of the different features can be adjusted by

scaling the corresponding elements of h.

4.4

Heuristic sub-goals and hint cost

In the absence of good initialization – which in the present context

would correspond to motion capture data or other detailed user in-

puts we aim to avoid – numerical optimization can be sped up by

providing heuristic sub-goals early on, and then disabling them near

convergence. Such heuristics (also known as shaping) are not meant

to be part of the true cost, but rather guide the solution to a region

from where the true cost can be optimized efﬁciently. We found that

even though most of the behaviors we studied could be synthesized

without such heuristics, in some cases (particularly those involving

two characters) a certain type of heuristic helps. This heuristic is

based on the ZMP stability criterion used in locomotion, where the

objective is to keep the ”zero moment point” z (q, ¨

q) in the convex

hull of the support region [Vukobratovic and Borovac 2004]. Let

n (z) denote the nearest distance (in a soft-min sense) to z point in

the convex hull. We compute n by expressing it as a convex com-

bination of the end-effector positions: n =

λ

i

where λ

≥ 0

and

= 1, and solving for the coefﬁcients λ using quadratic

programming regularized by the same weights W as in (7). Then

the hint cost is

Hint

(s) =

max ( z

(s) − n (z

(s)) − , 0)

(11)

This is a half-quadratic starting

away from the convex hull. The

parameter

is used to adjust how strictly we want to enforce the

ZMP stability criterion.

4.5

Numerical optimization and continuation

We optimize the composite cost L (s) deﬁned in (1) using an off-

the-shelf implementation of the LBFGS algorithm. The dimension-

ality of the vector s is (12(N + 1) + N )K, where again N is the

number of end-effectors and K is the number of movement phases.

The speciﬁc representation s used here is deﬁned in (12) and (13) of

Appendix A. We use K between 10 and 20 depending on the com-

plexity of the task. Each phase lasts 0.5 sec. The inverse dynamics

and cost are evaluated at 0.1 sec intervals (note that the analytical

spline representation allows us to evaluate the dynamics and cost at

any point in time). The gradient

L (s) which is needed for nu-

merical optimization is approximated using ﬁnite differences (with

= 10

−3

). Our implementation of ﬁnite differences takes advan-

tage of the fact that many of the cost terms depend only on the pose

at a single point in time, and do not need to be recomputed when

the rest of the trajectory is perturbed.

Continuation is implemented by weighting the four terms in (1) dif-

ferently in different phases of the optimization process (not to be

confused with movement phases). The optimization process has

three phases as follows. In Phase 1 only L

Task

is enabled. This

causes the optimizer to rapidly discover a movement that achieves

the task goals without being physically realistic. In Phase 2 we en-

able all four terms, except L

Physics

is down-weighted by 0.1 so that

physical consistency is enforced gradually. In Phase 3 we fully en-

able all terms except for L

Hint

– which is no longer needed and is

undesirable at this point, because we do not want it to affect the ﬁ-

nal solution. Qualitatively, Phase 1 corresponds to rapid discovery

combined with wishful thinking; Phase 2 corresponds to cautious

enforcement of physical realism while being guided by optional

hints; Phase 3 corresponds to reﬁnement of the ﬁnal solution. The

solution obtained at the end of each phase is perturbed with small

zero-mean Gaussian noise (to break any symmetries) and used to

initialize the next phase. The initialization for Phase 1 is completely

uninformative – a static initial pose. We found that using such con-

tinuation is often important. Exactly the same continuation scheme

was successful in all of the diverse behaviors we studied, and so our

method does not need behavior-speciﬁc adjustments.

Yüklə 0,85 Mb.

Dostları ilə paylaş:

1 2 3 4 5 6