GAMES104-Gameplay

王希-GAMES104-现代游戏引擎:从入门到实践。

资源

课程

第十五节:游戏引擎的 Gameplay 玩法系统基础

Gameplay Complexity and Building Blocks

游戏复杂性和构建模块

Outline of Gameplay System

游戏系统概述

Gameplay complexity and Building Blocks

游戏复杂性和构建模块

  • Overview

    概述

  • Event Mechanism

    事件机制

  • Script System

    脚本系统

  • Visual Script

    可视化脚本

  • Character, Control and Camera

    角色、控制和相机

AI

Challenges in GamePlay (1/3)

游戏玩法的挑战

Cooperation among multiple systems

多系统协作

webp

Challenges in GamePlay (2/3)

Diversity of game play in the same game

单种游戏多种玩法

webp

Challenges in GamePlay (3/3)

Rapid iteration

快速迭代

webp
这款游戏在研发过程中了改变了其玩法

Epic acknowledged that within the Fortnite fundamentals, they could also do a battle royale mode, and rapidly developed their own version atop Fortnite in about two months.

Epic 承认,在 Fortnite 的基本原理内,他们也可以做一款大逃杀模式,并在大约两个月内迅速在 Fortnite 的基础上开发了自己的版本。

Event Mechanism

事件机制

Let Objects Talk

让不同对象之间联系

webp

Event/Message Mechanism

事件/消息机制

  • Abstract the world communication to messages

    将世界通信抽象为消息

  • Decoupling event sending and handling

    解耦事件发送和处理

webp

使用消息机制便于不同对象之间联系

Publish-subscribe Pattern

发布-订阅模式

  • Publisher categorizes published messages (events) into classes

    发布者将发布的消息(事件)分类为不同的类别

  • Subscriber receive messages (events) that are of interest without knowledge of which publishers

    订阅者接收感兴趣的消息(事件),但不知道是哪个发布者

webp

3 Key Components of Publish-subscribe Pattern

发布-订阅模式的 3 个关键组件

  • Event Definition

    事件定义

  • Callback Registration

    回调注册

  • Event Despatching

    事件分派

Event Definition

事件定义

webp

定义事件的类型(枚举)和变量

Type and Arguments

类型和参数

webp

Impossible for hardcode

硬编码不可能

  • Editable

    可编辑性

webp

Callback Registration

回调注册

Callback (function)

回调(函数)

  • Any reference to executable code that is passed as an argument to another piece of code

    对作为参数传递给另一段代码的可执行代码的任何引用

字面上的理解,回调函数就是一个参数,将这个函数作为参数传到另一个函数里面,当那个函数执行完之后,再执行传进去的这个函数。这个过程就叫做回调。

webp

Object Lifespan and Callback Safety

对象生命周期和回调安全

Time points of registration and execution differs

注册执行的时间点不同

webp webp

执行回调函数时,回调函数所在的对象已经被销毁了,报错。

Object Strong Reference

对象强引用

webp

Make sure to unregister callback function before delete objects, otherwise it will cause memory leak!

删除对象前请务必注销回调函数,否则会造成内存泄漏!

Prevent object from de-allocation as long as callback function still registered

只要回调函数仍然注册,就防止对象被取消分配

Object Weak Reference

对象弱引用

webp

Object could be de-allocated, and wilcheck callback function if valid

对象可以被取消分配,并且将检查回调函数是否有效

Event Dispatch

事件分派

  • Send event to appropriate destination

    将事件发送到适当的目的地

webp

Event Dispatch: Immediate

事件分派:立即

webp

parent function returns after callback function

回调函数之后父函数返回,这么做可能出现如下问题:

  • Deep well of callbacks

    回调的深井

webp

这么做可能导致 Callstack 过长,占用内存。

  • Blocked by function

    被函数阻止

期间某个函数耗时较长,导致帧率突然下降。

webp

The bleeding effect should be loaded but cost plenty of time in this function call

应该加载出血效果,但在此函数调用中会花费大量时间

  • Difficult for parallelization

难以并行化

webp

Event Queue

事件队列

Basic implementation

基本实现

  • Store events in queue for handling at an arbitrary future time

    将事件存储在队列中,以便在未来的任意时间进行处理

webp

Event Serializing and Deserializing

事件序列化和反序列化

  • To store various types of events

    存储各种类型的事件

webp

Event Quene

事件队列

Ring buffer

环形缓冲区

webp
Batching

批处理

webp

Problems of Event Queue (1/2)

事件队列的问题

  • Timeline not determined by publisher

    事件发布者尚未确定时间表

webp

Problems of Event Queue (2/2)

  • One-frame delays

    一帧延迟

webp

Game Logic

游戏逻辑

Early Stage Game Logic Programming

早期游戏逻辑编程

Compiled language (mostly C/C++)

编译语言(主要是 C/C++)

  • Compiled to machine code with high performance

    编译为高性能机器代码

  • More easier to use than assembly language

    比汇编语言更易于使用

webp

修改某个游戏逻辑需要重新编译整个游戏。

Problem of Compiled Languages

编译语言的问题

Game requirements get complex as hardware evolves

随着硬件的发展,游戏要求变得复杂

  • Need quick iterations of gameplay logic

    需要快速迭代游戏逻辑

Issues with compiled language

编译语言的问题

  • Need recompilation with even a little modification

    即使进行少量修改也需要重新编译

  • Program can easily get crashed with incorrect codes

    程序很容易因代码错误而崩溃

Glue Designers and Programmers

将设计师和程序员连接起来

  • Get rid of inefficient communication between designers and programmers

    摆脱设计师和程序员之间低效的沟通

  • Designers need direct control of gameplay logic

    设计师需要直接控制游戏逻辑

  • Artists need to quickly adjust assets at the runtime environment

    艺术家需要在运行时环境中快速调整资产

webp

Scripting Languages

脚本语言

  • Support for rapid iteration

    支持快速迭代

  • Easy to learn and write

    易学易写

  • Support for hot update

    支持热更新

  • Stable, less crash by running in a sandbox

    沙盒运行稳定,崩溃少

lua
function tick(delta)
	if input_system.isKeyDown(Keycode.W) then
        self:moveForward(delta)
    elseif input_system.isKeyDown(Keycode.S) then 
        self:moveBackward(delta)
    end
 
    if input_system.iskeyDown(Keycode.MouseLeft) then
        self:fire(delta)
	end
    ...
end
Lua Script Example

How Script Languages Work

脚本语言的工作原理

Script is converted to bytecode by a compiler first, then run on a virtual machine

脚本首先由编译器转换为字节码,然后在虚拟机上运行

webp

Object Management between Scripts and Engine (1/2)

脚本和引擎之间的对象管理 (1/2)

Object lifetime management in native engine code

原生引擎代码中的对象生命周期管理

  • Need to provide an object lifetime management mechanism

    需要提供对象生命周期管理机制

  • Not safe when script uses native objects (may have been destructed)

    当脚本使用原生对象时不安全(可能已被破坏)

webp

Object Management between Scripts and Engine (2/2)

脚本和引擎之间的对象管理 (2/2)

Object lifetime management in script

脚本中的对象生命周期管理

  • The lifetime of objects are auto managed by script GC

    对象的生命周期由脚本 GC 自动管理

  • The time when object is deallocated is uncontrolled (controlled by GC)

    对象被释放的时间不受控制(由 GC 控制)

  • Easy to get memory leak if reference relations get complex in script

    如果脚本中的引用关系变得复杂,则容易发生内存泄漏

webp

Architectures for Scripting System (1/2)

脚本系统架构 (1/2)

Native language dominants the game world

原生语言主导游戏世界

  • Most gameplay logic is in native code

    大多数游戏逻辑都采用原生代码

  • Script extends the functionality of native engine code

    脚本扩展了原生引擎代码的功能

  • High performance with compiled language

    编译语言带来高性能

webp

Architectures for Scripting System (2/2)

Script language dominants the game world

脚本语言主导游戏世界

  • Most gameplay logic is in script

    大多数游戏逻辑都在脚本中

  • Native engine code provides necessary functionality to script

    原生引擎代码为脚本提供必要的功能

  • Quick development iteration with script language

    使用脚本语言快速进行开发迭代

webp

Advanced Script Features - Hot Update

高级脚本功能 - 热更新

Allow modifications of script while game is running

允许在游戏运行时修改脚本

  • Quick iteration for some specific logic

    针对某些特定逻辑进行快速迭代

  • Enable to fix bugs in script while game is online

    允许在游戏在线时修复脚本中的错误

A troublesome problem with hot update

热更新的一个麻烦问题

  • All variables reference to old functions shouldbe updated too

    所有引用旧函数的变量也应更新

webp

Issues with Script Language

脚本语言的问题

The performance is usually lower than compiled language

性能通常低于编译型语言

  • Weakly typed language is usually harder to optimize when compile

    弱类型语言在编译时通常更难优化

  • Need a virtual machine to run the bytecode

    需要虚拟机来运行字节码

  • JIT is a solution for optimization

    JIT 是优化的解决方案

Weakly typed language is usually harder to refactor

弱类型语言通常更难重构

webp

Make a Right Choice of Scripting Language

正确选择脚本语言

Things need to be considered

需要考虑的事项

  • Language performance

    语言性能

  • Built-in features, e.g. object-oriented programming support

    内置功能,例如面向对象编程支持

Select the proper architecture of scripting

选择合适的脚本架构

  • Object lifetime management in native engine code or script

    本机引擎代码或脚本中的对象生命周期管理

  • Which one is dominant, native language or script

    本机语言或脚本哪个占主导地位

热门脚本语言 (1/2)

Lua (used in World of Warcraft, Civilization V)

Lua(用于《魔兽世界》、《文明 5》)

  • Robust and mature

    强大且成熟

  • Excellent runtime performance

    出色的运行时性能

  • Light-weighted and highly extensible

    轻量且高度可扩展

Python (used in The Sims 4, EVE Online)

Python(用于《模拟人生 4》、《星战前夜》)

  • Reflection support

    反射支持

  • Built-in object-oriented support

    内置面向对象支持

  • Extensive standard libraries and third-party modules

    广泛的标准库和第三方模块

C# (to bytecode offline, used in Unity)

C#(离线字节码,用于 Unity)

  • Low learning curve, easy to read and understand

    学习难度低,易于阅读和理解

  • Built-in object-oriented support

    内置面向对象支持

  • Great community with lots of active developers

    拥有大量活跃开发人员的优秀社区

webp

Visual Scripting

可视化脚本

Why We Need Visual Scripting

为什么我们需要可视化脚本

  • Friendly to non-programmers, especially designers and artists

    对非程序员,尤其是设计师和艺术家来说很友好

  • Less error-prone with drag-drop operations instead of code writing

    使用拖放操作代替代码编写,更不容易出错

webp

Visual Script is a Program Language

Visual Script 是一种程序语言

Visual script is also a programming language, which usually needs

Visual Script 也是一种编程语言,通常需要

  • Variable

    变量

  • Statement and Expression

    语句和表达式

  • Control Flow

    控制流

  • Function

    函数

  • Class (for object-oriented programming language)

    类(用于面向对象编程语言)

webp

Variable

变量

Preserve the data to be processed or output

保存要处理或输出的数据

  • Type

    类型

    • Basic type, e.g. integer, floating

      基本类型,例如整数、浮点数

    • Complex type, e.g.structure

      复杂类型,例如结构体

  • Scope

    作用域

    • Local variable

      局部变量

    • Member variable

      成员变量

    • ...

webp

Variable Visualization - Data Pin and Wire

变量可视化 - 数据引脚和数据线

Use data wires through data pins to pass variables (parameters)

通过数据引脚使用数据线传递变量(参数)

  • Each data type uses a unique pin color

    每种数据类型都使用独特的引脚颜色

webp

Statement and Expression

语句和表达式

Control how to process data

控制如何处理数据

  • Statement: expresses some action to be carried out

    语句:表达要执行的某些操作

    • Assignment Statement

      赋值语句

    • Function Statement

      函数语句

    • ...

  • Expression: to be evaluated to determine its value

    表达式:要进行求值以确定其值

    • Function Expression

      函数表达式

    • Math Expression

      数学表达式

webp

Statement and Expression Visualization - Node

语句和表达式可视化 - 节点

Use nodes to represent statements and expressions

使用节点表示语句表达式

  • Statement Node

    语句节点

  • Expression Node

    表达式节点

webp

Control Flow

控制流

Control the statement execution order

控制语句的执行顺序

  • Sequence

    顺序

    • By default statements are executed one by one

      默认情况下,语句会逐个执行

  • Conditional

    条件

    • Next statement is decided by a condition

      下一个语句由条件决定

  • Loop

    循环

    • Statements are executed iteratively until thecondition is not true

      语句会迭代执行,直到条件不成立

webp

Control Flow Visualization -Execution Pin and Wire

控制流可视化 - 执行引脚和连线

Use execution wires through execution pins to make statements sequence

使用执行连线通过执行引脚来制作语句序列

  • Use control statement nodes to make different control flow

    使用控制语句节点来制作不同的控制流

webp

Function

函数

A logic module which take in data, process it and return result(s)

接收数据、处理数据并返回结果的逻辑模块

  • Input Parameter

    输入参数

    • The data required input to be processed

      需要输入以进行处理的数据

  • Function Body

    函数主体

    • Control how to process data

      控制如何处理数据

  • Return value(s)

    返回值

    • The data to be returned

      要返回的数据

webp

Function Visualization -Function Graph

函数可视化 - 函数图

Use a graph with connected nodes to make a function

使用带有连接节点的来制作函数

webp

Class

A prototype for a kind of objects

一种对象的原型

  • Member Variable

    成员变量

    • The lifetime is managed by the object instance

      生命周期由对象实例管理

  • Member Function

    成员函数

    • Can access member variables directly

      可以直接访问成员变量

    • Maybe overrided by derived classes

      可能被派生类覆盖

webp

Class Visualization -Blueprint

类可视化 - 蓝图

Use blueprint to define a class that inherits from a native class

使用蓝图定义从本机类继承的类

  • Event Callback Functions

    事件回调函数

  • Member Functions

    成员函数

  • Member Variables

    成员变量

webp

Make Graph User Friendly

使图表更方便用户使用

  • Fuzzy finding

    模糊查找

  • Accurate suggestions by type

    按类型提供准确建议

Visual Script Debugger

可视化脚本调试器

Debug is an important step among development

调试是开发过程中的重要步骤

Provide user-friendly debug tools for visual scripting

为可视化脚本提供用户友好的调试工具

webp

Issues with Visual Scriping (1/2)

可视化脚本问题 (1/2)

Visual script is hard to merge for a team work

可视化脚本很难在团队合作中合并

  • Usually a visual script is stored as a binary file

    通常,可视化脚本以二进制文件形式存储

  • Manually reorder script graph is inefficient and error-prone even with a merge tool

    即使使用合并工具,手动重新排序脚本图也效率低下且容易出错

webp

Issues with Visual Scripting (2/2)

The graph can get pretty messy with complex logic

图表可能因逻辑复杂而变得相当混乱

  • Need uniform graph layout rules for a team work

    团队合作需要统一的图表布局规则

webp

Script and Graph are Twins

脚本和图是双胞胎

webp

“3C” in Game Play

What is 3C?

3C: Character, Control & Camera

3C:角色、控制和摄像头

3C is the primary element that determines the gameplay experience

3C 是决定游戏体验的主要元素

webp

Character

角色

In-game character, both player and npc.

游戏中的角色,包括玩家和 NPC。

Include character movement, combat, healthmana, what skills and talents they have, etc.

包括角色移动、战斗、生命值、他们拥有的技能和天赋等。

One most basic element of a character is movement.

角色最基本的元素之一是移动

webp

Character: Well-designed Movement

角色:精心设计的动作

Movement looks simple, but it's hard to do well.

动作看似简单,但做好却很难。

In AAA games, every basic state of action needs to bebroken down into detailed states.

在 AAA 游戏中,每个基本动作状态都需要分解为详细状态。

webp

Extended Character: More complex and varied states

扩展角色:更加复杂多样的状态

webp
  • Hanging

    悬挂

  • Skating

    滑冰

  • Diving

    跳水

Extended Character: Cooperate with other systems

扩展角色:与其他系统配合

Game effects, sound, environment interaction.

游戏特效、声音、环境互动。

Extended Character: More realistic motion with Physics

扩展角色:更逼真的物理运动

  • Airflow

    气流

  • Inertia tensor

    惯性张量

  • Torque

    扭矩

  • ...

webp

Movement State Machine

运动状态机

webp

Control

控制

Different input device

不同的输入设备

Different game play

不同的游戏玩法

webp

A Good Example of Control

From Input to Game Logic

webp

Control: Zoom in and out

控制:放大和缩小

Control: Aim Assist

控制:瞄准辅助

webp

让玩家体验更好。如果没有瞄准辅助,有可能因延迟(接受信号输入到逻辑中射弹这段时间)导致玩家无法瞄准。

Control: Feedback

控制:反馈

webp

一些地方让手柄震动。

Control: Context Awareness

控制:情境感知

Context-sensitive controls

情境敏感控制

  • The same input button produces different effects in different game scenarios

    同一输入按钮在不同的游戏场景中产生不同的效果

webp

Control: Chord & Key Sequences

控制:和弦和按键序列

webp
Chords

和弦

  • when pressed at the same time, produce a unique behavior in the game

    同时按下时,在游戏中产生独特的行为

Key Sequences

按键序列

  • Gesture detection is generally implemented by keeping a brief history of the HlD actions performed by the player

    手势检测通常通过保存玩家执行的 HlD 操作的简要历史记录来实现

Camera: Subjective Feelings

相机:主观感受

webp

Camera Basic: POV & FOV

摄像机基础:POV 和 FOV

POV (point of view)

POV(视点)

  • determines the position of the player to observe

    确定玩家观察的位置

FOV (field of view)

FOV(视野)

  • determines the size of the player's viewing Angle

    确定玩家视角的大小

webp

Camera Binding

摄像机绑定

Using POV and rotation to bind.

使用 POV 和旋转进行绑定。

webp

Camera Control

相机控制

webp

相机与角色的相对位置不应是完全固定。

Camera Track

相机轨迹

webp

Camera Effects

相机特效

Provide the camera with more post-visual effects, such as filters and shake.

为相机提供更多后期视觉效果,如滤镜和抖动。

webp

Many Cameras: Camera Manager

多个摄像机:相机管理

Camera: Subjective Feelings

相机:主观感受

Complex effects are often achieved by multiple base adjustments. To create a sense of speed as an example, we can do:

复杂的效果往往需要通过多次基础调整来实现。以营造速度感为例,我们可以这样做:

  • Add lines in the speed direction

    在速度方向上添加线条

  • The character falls backwards

    角色向后倒下

  • The dynamic fuzzy

    动态模糊

  • Zoom in FOV (to speed up changes in screen content)

    放大 FOV(以加快屏幕内容的变化)

loose feeling

放松的感觉

  • Relax camera movement

    放松镜头运动

webp
Cinematic

电影

  • filter, motion, sound, narrator, model, animation, camera movement, ...

    滤镜、动作、声音、旁白、模型、动画、镜头运动……

webp

Camera

相机

For artists and designers to optimize the effect:

供艺术家和设计师优化效果:

  • Inheritable classes

    可继承的类

  • Function that can be accessed by Blueprint

    蓝图可访问的函数

  • Adjustable parameters

    可调整的参数

webp

Everything is Gameplay.

References

Event Mechanism

Script

Visual Scripting

Gameplay and 3C

第十六节:游戏引擎 Gameplay 玩法系统:基础 AI

Basic Artificial Intelligence

基础人工智能

Outline of Artificial intelligence Systems

人工智能系统概述

Al Basic

人工智能基础

  • Navigation

    导航

  • Steering

    转向

  • Crowd Simulation

    人群模拟

  • Sensing

    感知

  • Classic Decision Making Algorithms

    经典决策算法

Advanced Al

高级人工智能

  • Planning and Goals

    规划和目标

  • Machine Learning

    机器学习

导航

游戏中的导航

Find paths from a location to another in an automatic manner

自动查找从一个位置到另一个位置的路径

webp

导航三步骤

webp
  • Map representation

    地图表示

  • Path finding

    路径查找

  • Path smoothing

    路径平滑

Map Representations - Walkable Area

地图表示 - 可行走区域

  • We need to tell Al agents where they can walk - Walkable area

    我们需要告诉人工智能代理他们可以走到哪里 - 可行走区域

  • Walkable area of players is determined by character motion capabilities

    玩家的可行走区域由角色运动能力决定

    • Physical Collision

      物理碰撞

    • Climbing slope/height

      爬坡/高度

    • Jumping distance

      跳跃距离

    • ...

  • Simulating movement of Al agents as players costs too much

    模拟人工智能代理作为玩家的移动成本太高

  • Al agents are still expected to have the same walkable area as players

    人工智能代理仍然需要与玩家拥有相同的可行走区域

Map Representations - Formats

地图表示 - 格式

  • Waypoint Network

    航点网络

  • Grid

    网格

  • Navigation Mesh

    导航网格

  • Sparse Voxel Octree

    稀疏体素八叉树

Waypoint Network

航点网络

  • Network connecting critical points (waypoints) from the map

    连接地图上关键点(航点)的网络

  • Waypoint sources:

    航点来源:

    • Designed important locations(下图中红点

      设计重要位置

    • Corner points to cover walkable area(下图中绿点

      角点覆盖可步行区域

    • Internal points to connect near-by waypointsadding flexibility to navigation(下图中蓝点

      内部点连接附近的航点,为导航增添灵活性

webp

魔兽争霸就使用了航点网络。

Usage of waypoint network is similar to subway system

航点网络的使用方式与地铁系统类似

  • Find the nearest points to get on and off the network

    查找最近的上下车点

  • Plan the path on the waypoint network

    规划航点网络上的路径

webp

Pros:

优点:

  • Easy to implement

    易于实施

  • Fast path finding, even for large maps

    路径查找速度快,即使对于大型地图也是如此

Cons:

缺点:

  • Limited flexibility: must go to the nearest point inthe network before navigation

    灵活性有限:导航前必须前往网络中的最近点

  • Waypoint selection requires manual intervention

    航点选择需要人工干预

webp

Grid

网格

  • Intuitive discretization of map

    直观的地图离散化

  • Uniform subdivision into small regular grid shapes

    均匀细分为小的规则网格形状

  • Common grid shapes

    常见的网格形状

    • Square

      正方形

    • Triangle

      三角形

    • Hexagon

      六边形

webp

像文明五这样的六边形网格不易在内存中分配存储空间。

Grid property could be modified in runtime to reflect dynamic environmental changes

可以在运行时修改网格属性以反映动态环境变化

webp

Pros:

优点:

  • Easy to implement

    易于实现

  • Uniform data structure

    统一的数据结构

  • Dynamic

    动态

Cons:

缺点:

  • Accuracy depends on grid resolution

    精度取决于网格分辨率

  • Dense grid lowers pathfinding performance

    密集的网格会降低寻路性能

  • High memory consumption

    内存消耗高

  • Hard to handle 3D map

    难以处理 3D 地图

webp

导航网络难以处理 3D 地图。

导航网格 (NavMesh)

  • Solves the problem of representing overlapped walkable areas

    解决表示重叠可行走区域的问题

  • Approximates the walkable area of character controller based on physical collision and motion capabilities

    根据物理碰撞和运动能力估计角色控制器的可行走区域

  • Lowers network density to boost pathfinding performance

    降低网络密度以提高寻路性能

webp

NavMesh 示例

Neighboring 3D convex polygons to represent walkable areas

相邻的 3D 凸多边形表示可行走区域

webp

Convex Polygon of NavMesh

NavMesh 的凸多边形

Why convex polygon?

为什么是凸多边形?

  • Pathfinding generates a series of polygon (Polygon Corridor) need to walk through

    寻路会生成一系列需要穿过的多边形(多边形走廊

  • Convexity guarantees the final path is limited inthe polygon and two adjacent polygons have onlyone common edge (Portal)

    凸性保证最终路径仅限于多边形内,并且两个相邻多边形只有一个共同边(传送门

webp

NavMesh 的优缺点

Pros:

优点:

  • Support 3D walkable surface

    支持 3D 可行走表面

  • Accurate

    准确

  • Fast in pathfinding

    寻路速度快

  • Flexible for selection of start/destination

    可灵活选择起点/终点

  • Dynamic

    动态

Cons:

缺点:

  • Complex generation algorithm

    生成算法复杂

  • Not support 3D space

    不支持 3D 空间

webp

Sparse Voxel Octree

稀疏体素八叉树

  • Represents "flyable" 3D space

    表示“可飞行”的 3D 空间

  • Similar to spatial partitioning

    类似于空间分区

  • Finest level voxels represents complicated boundary

    最精细级别的体素表示复杂边界

  • Coarser-level voxels represents uniform regions

    较粗糙级别的体素表示均匀区域

webp

Path Finding

路径查找

Distances in map representations can be abstracted as edge costs in graph

地图表示中的距离可以抽象为图中的边成本

webp webp

深度优先搜索

Expand most recently added

展开最近添加的

广度优先搜索

Expand least recently added

展开最近最少添加的

Dijkstra Algorithm

lua
for each vertex v:
	dist[v] =
	prev[v]= none
	dist[source]= 0
set all vertices to unexplored
while destination not explored:
	v = least-valued unexplored vertex
	set v to explored
    for each edge(v, w):
    	if dist[v] + len(v,w) < dist[w]:
    		dist[w]= dist[v]+ len(v, w)
    		prev[W]=V

总是能搜索到图中两点间最短距离。

A Star (A*)

  • Expand lowest cost in list

    扩展列表中成本最低的元素

  • Distance is known distance from source + heuristic

    距离是距源的已知距离 + 启发式

  • Greedy: stops when reaches the goal

    贪婪:达到目标时停止

A* - Cost calculation

Cost calculation: f(n)=g(n)+h(n)f(n) = g(n) + h(n)

成本计算:f(n)=g(n)+h(n)f(n) = g(n) + h(n)

  • g(n)g(n): the exact cost of the path from the start to node nn

    g(n)g(n):从起点到节点 nn 的路径的准确成本

  • h(n)h(n): the estimated cost from node nn to the goal

    g(n)g(n):从起点到节点 nn 的路径的准确成本

webp

A* - Heuristic On Grids

A* - 网格启发式算法

  • For 4 directions of movement, we can use Manhattan distance

    对于 4 个移动方向,我们可以使用曼哈顿距离(来计算 g(n)g(n)h(n)h(n)

  • D1D_1: cost for moving to the adjacent node

    D1D_1:移动到相邻节点的成本

  • h(n)=D1(dx+dy)h(n)=D_1\cdot(d_x+d_y)

    • dx=xnxgoal,dy=ynygoald_x=|x_n-x_{goal}|, d_y=|y_n-y_{goal}|
webp

A*- Heuristic On NavMesh

A*- NavMesh 上的启发式方法

Multiple choices when evaluating cost on NavMesh

评估 NavMesh 上的成本时有多种选择

  • Using polygon centers or vertices usually over-estimate the cost

    使用多边形中心或顶点通常会高估成本

  • Using hybrid method introduces too many points to check

    使用混合方法会引入太多要检查的点

  • Midpoints of edges - a good balance

(选用区域的)边缘的中点 - 良好的平衡

webp
  • On a navigation mesh that allows any angle ofmovement, use a straight line distance

    在允许任意角度移动的导航网格上,使用直线距离

  • Use midpoint of the edge entering the currentnode as node cost calculation point

    使用进入当前节点的边缘中点作为节点成本计算点

  • DD: the cost for moving unit distance in any direction

    DD:向任意方向移动单位距离的成本

    • h(n)=Ddxdx+dydyh(n)=D\cdot\sqrt{d_x\cdot d_x+d_y\cdot d_y}
    • dx=xnxgoal,dy=ynygoald_x=|x_n-x_{goal}|, d_y=|y_n-y_{goal}|
webp

A*-NavMesh Walkthrough

A*-NavMesh 演练

webp

A*- Heuristic

A*- 启发式

  • h(n)h(n) controls AA^*'s behavior.

    h(n)h(n) 控制 AA^* 的行为。

  • With 100% accurate estimates, get shortest paths quickly

    以 100% 准确的估计,快速获得最短路径

  • Too low, continue to get shortest paths, but slow down

    太低,继续获得最短路径,但速度会减慢

  • Too high, exit early without shortest pathBalance between pathfinding speed and accuracy

    太高,提前退出,没有最短路径在寻路速度和准确性之间取得平衡

Path Smoothing

路径平滑

  • Why we need path smoothing

    为什么我们需要路径平滑

    • Zigzag, many unnecessary turns

      之字形,许多不必要的转弯

  • “String Pulling”- Funnel Algorithm

    “拉线”- 漏斗算法

webp

Path Smoothing-Funnel Algorithm

路径平滑漏斗算法

  • The scope of the funnel is the possible scope of the path

    漏斗的范围是路径的可能范围

  • Narrow the funnel if necessary to fit the portal

    必要时缩小漏斗以适应门户

webp

Terminate when the goal is in the funnel

当目标处于漏斗中时终止

webp

NavMesh 生成-体素化

Sample collision scene by voxelization

通过体素化示例碰撞场景

webp

NavMesh 生成-区域分割

  • Calculate the distance of each voxel to border

    计算每个体素到边界的距离

  • Mark border voxels by AgentRadius to avoid clipping

    通过 AgentRadius 标记边界体素以避免剪切

webp

Watershed Algorithm

分水岭算法

  • Gradually “flood” the "terrain"

    逐渐“淹没”“地形”

  • Form "watershed" (dividing ridge) when "pools" meet

    当“水池”相遇时形成“分水岭”(分水岭)

webp

Segment the “neighboring" voxels into regions to provide a good basis for polygon mesh

将“相邻”体素分割成区域,为多边形网格提供良好的基础

webp

Regions don't have overlapping voxels in 2D

区域在 2D 中没有重叠体素

webp

NavMesh 生成-网格生成

Generate NavMesh from segmented regions

从分段区域生成 NavMesh

webp

现在有插件来实现这个。

NavMesh 高级功能-多边形标记

Useful for marking terrain types: plains, mountain, water, etc.

用于标记地形类型:平原、山脉、水域等。

  • “Paint colors" to add user-defined regions

    “绘制颜色”以添加用户定义的区域

  • Polygons generated from user-defined regions have special flag

    从用户定义的区域生成的多边形具有特殊标记

webp

NavMesh 高级功能-Tile

  • Fast for responding to dynamic objects

    快速响应动态对象

  • Avoid rebuilding the entire NavMesh

    避免重建整个 NavMesh

  • TileSize - trade-off between pathfinding and dynamic rebuilding performance

    TileSize- 寻路和动态重建性能之间的权衡

webp

在游戏中,导航网络可能会发生变化。

NavMesh 高级功能-网格外链接

Allow agents to jump or teleport

允许代理跳跃或传送

webp

Steering

转向

From Path to Motion

从路径到运动

  • Cars cannot follow planned path exactly

    汽车无法完全遵循计划的路径(车辆具有转向半径)

  • Motion of cars are limited by theirs motion abilities:

    汽车的运动受到其运动能力的限制:

    • Linear acceleration (throttle/brake)

      线性加速度(油门/刹车)

    • Angular acceleration (steering force)

      角加速度(转向力)

  • Motion needs to be adjusted according to the limits

    运动需要根据限制进行调整

webp

Steering Behaviors

转向行为

webp
  • Seek / Flee

    寻找 / 逃跑

  • Velocity Match

    速度匹配(从起点出发加速,减速到终点停止)

  • Align

    对齐(让车头朝着某个方向)

Seek/Flee

寻找/逃跑

Steer the agent towards / away from the target

引导代理朝向/远离目标

  • Position matching in the nature

    自然中的位置匹配

  • Accelerate with max acceleration towards / away from the target

    以最大加速度朝向/远离目标加速

  • Will oscillate around the target

    会围绕目标振荡

  • Input:

    输入:

    • Self position

      自身位置

    • Target position

      目标位置

  • Output:

    输出:

    • Acceleration

      加速度

webp

Seek / Flee Variations

寻找 / 逃离变体

Modifying the target in runtime can generate new steering behaviors

在运行时修改目标可以生成新的转向行为

webp
  • Pursue

    追寻

  • Path Following

    路径跟随

  • Wander

    漫游

  • Flow Field Following

    流场跟随

Velocity Match

速度匹配

Matches the target velocity

匹配目标速度

  • Calculate acceleration from matching time and velocity differences

    根据匹配时间和速度差异计算加速度

  • Clamp the acceleration by maximum acceleration of agents

    通过代理的最大加速度限制加速度

  • Input:

    输入:

    • Target velocity

      目标速度

    • Self velocity

      自身速度

    • Matching time

      匹配时间

  • Output:

    输出:

    • Acceleration

      加速度

Align

对齐

Matches target orientation

匹配目标方向

  • Input:

    输入:

    • Target orientation

      目标方向

    • Self orientation

      自我定位

  • Output:

    输出:

    • Angular acceleration

      角加速度

webp

Crowd Simulation

人群模拟

Crowd

人群

A large group of individuals share information in the same environment alone or in a group

一大群人单独或成群地在同一环境中分享信息

  • Collision avoidance

    避免碰撞

  • Swarming

    蜂拥

  • Motion in formation

    形成队列运动

  • ...

webp

Crowd Simulation Models

人群模拟模型

  • Started from "Boids" system of Reynolds

    从雷诺的 "Boids" 系统开始

  • Three families of models:

    三个模型系列:

    • Microscopic models

      微观模型

      • "Bottom-Up"

        “自下而上”

      • Focus on individuals

        关注个体

    • Macroscopic models

      宏观模型

      • Crowd as a unified and continuous entity

        人群作为一个统一且连续的实体

    • Mesoscopic models

      中观模型

      • Divide the crowd into groups

        将人群分成几组

Microscopic Models-Rule-based Models

微观模型-基于规则的模型

Flock dynamics of animal crowds as an emergent behavior by modeling motion of each individuals with simple predefined rules:

通过使用简单的预定义规则对每个个体的运动进行建模,将动物群体的群体动态视为一种突发行为:

  • Separation: to steer away from all of its neighbors

    分离:远离所有邻居

  • Cohesion: to steer towards the "center of mass"

    凝聚:转向“重心”

  • Alignment: to line up with agents close by

    对齐:与附近的代理对齐

webp
  • Separation

    分离

  • Cohesion

    凝聚

  • Alignment

    对齐

webp

Easy to implement, but not suitable to simulate complex behavior rules.

易于实现,但不适合模拟复杂的行为规则。

Macroscopic Models

宏观模型

Simulate crowd motion from a macro perspective

从宏观角度模拟人群运动

  • Treat the crowd as a unified and continuous entity

    将人群视为统一且连续的实体(避免逐个计算影响性能)

  • Control motions with potential field or fluid dynamics

    用势场或流体动力学控制运动

  • Does not consider interactions between individualsand the environment in individual level

    不考虑个体与环境在个体层面上的相互作用

webp

Mesoscopic Models

中观模型

Simulate crowd motion taking care of both details and the whole

模拟人群运动,兼顾细节和整体

  • Divide the crowd into groups

    将人群分成几组

  • Deals with interactions between groups and individuals in each group

    处理群体之间以及每个群体中个人之间的互动

  • combinations of microscopic models and formation rules or psychological models

    微观模型与形成规则或心理模型的组合

webp

Collision Avoidance-Force-based Models

基于防撞力的模型

  • A mixture of socio-psychological and physical forces influencing the behavior in a crowd

    影响人群行为的社会心理和物理力量的混合

  • The actual movement of an individual depends on the desired velocity and its interaction with the environment

    个人的实际运动取决于所需速度及其与环境的相互作用

  • Can simulate dynamical features of escape crowd panic

    可以模拟逃离人群恐慌的动态特征

webp

Pros:

优点:

  • can be extended to simulate more emergent behaviors of human crowds

    可以扩展以模拟更多人群突发行为

Cons:

缺点:

  • Similar to physics simulation, simulation step should be small enough

    与物理模拟类似,模拟步骤应该足够小

webp

Collision Avoidance-Velocity-based models

碰撞避免-基于速度的模型

Consider the neighbor information to make decisions in velocity space

考虑邻居信息以在速度空间中做出决策

  • able to simulate in local space

    能够在局部空间中进行模拟

  • applied to collision avoidance

    应用于碰撞避免

Reciprocal Velocity obstacle methods-Current standard collision avoidance algorithms

相互速度障碍方法-当前标准碰撞避免算法

  • Velocity Obstacle (VO)

    速度障碍 (VO)

  • Reciprocal Velocity Obstacle (RVO)

    相互速度障碍 (RVO)

  • Optimal Reciprocal Collision Avoidance (ORCA)

    最佳相互碰撞避免 (ORCA)

webp

Velocity obstacle (VO)

速度障碍 (VO)

  • Calculate its own dodge velocity, assuming other agent is unresponsive

    假设其他代理没有响应,计算自己的躲避速度

  • Appropriate for static and unresponsive obstacles

    适用于静态和无响应的障碍物

  • Overshoot

    超调

  • Causes oscillation between two agents attempting to avoid each other

    导致两个试图相互避开的代理之间发生振荡

webp

Reciprocal Velocity Obstacle (RVO)

相互速度障碍 (RVO)

  • Assuming the other agent is using the same decision process (mutually cooperating)

    假设其他代理使用相同的决策过程(相互合作)

  • Both sides move half way out of the way ofa collision

    双方都向半路移动以避免碰撞

  • Only guarantees no oscillation and avoidance for two agents

    仅保证两个代理不会发生振荡和避免碰撞

Optimal Reciprocal Collision Avoidance (ORCA)

最佳相互碰撞避免 (ORCA)

webp

Sensing

传感

Sensing or Perception

传感或感知

webp

Internal Information

内部信息

  • Information of the agent itself

    agent 本身的信息

    • Position

      位置

    • HP

      生命值

    • Armor status

      护甲状态

    • Buff status

      增益状态

    • ...

  • Can be accessed freely

    可自由访问

Static Spatial information

静态空间信息

webp
  • Navigation Data

    导航数据

  • Tactical Map

    战术地图

  • Smart Object

    智能对象

  • Cover Point

    掩护点

Dynamic Spatial information (1/2) - influence Map

动态空间信息(1/2)- 影响力地图

webp

Dynamic Spatial Information (2/2) - Game Objects

动态空间信息 (2/2) - 游戏对象

  • Information being sensed from a character

    从角色感知到的信息

  • Multiple character information can exist for a single character as it can be sensed by multiple agents

    单个角色可以存在多个角色信息,因为它可以被多个代理感知

  • Usually contains:

    通常包含:

    • Game Object lD

      游戏对象 ID

    • Visibility

      可见性

    • Last Sensed Method

      最后感知的方法

    • Last Sensed Position

      最后感知的位置

Sensing Simulation

传感模拟

  • Light, sound, and odor travels in space

    光、声音和气味在空间中传播

  • Have max traveling range

    具有最大传播范围

  • Attenuates in space and time with different patterns

    以不同的模式在空间和时间中衰减

    • Sight is blocked by obstacles

      视线被障碍物阻挡

    • Smelling ranges shrinks over time

      嗅觉范围随时间缩小

  • Radiating field can simulate sensing signals

    辐射场可以模拟传感信号

    • Can be simplified as Influence Map

      可以简化为影响图

    • Agents covered by the field can sense the information

      场覆盖的代理可以感知信息

webp

Classic Decision Making Algorithms

经典决策算法

Decision Making Algorithms

决策算法

  • Finite State Machine

    有限状态机

  • Behavior Tree

    行为树

  • Hierarchical Tasks Network

    分层任务网络

  • Goal Oriented Action Planning

    目标导向行动规划

  • Monte Carlo Tree Search

    蒙特卡洛树搜索

  • Deep Learning

    深度学习

Finite State Machine

有限状态机

  • Change from one State to another according to some Conditions

    根据某些条件从一个状态变为另一个状态

  • The change from one state to another is called a Transition

    从一个状态到另一个状态的改变称为转换

webp webp

游戏 AI 经典案例——吃豆人。

Finite State Machine - Pros & Cons

有限状态机 - 优点和缺点

Pros:

优点:

  • Easy to implement

    易于实现

  • Easy to understand

    易于理解

  • Very fast to deal with simple case

    处理简单情况非常快

Cons:

缺点:

  • Maintainability is bad, especially add or remove state

    可维护性差,尤其是添加或删除状态

  • Reusability is bad, can't used in other projects or characters

    可重用性差,不能用于其他项目或角色

  • Scalability is bad, hard to modify for complicated case

    可扩展性差,复杂情况下难以修改

webp

Hierarchical Finite State Machine (HFSM)

分层有限状态机 (HFSM)

Tradeoff between reactivity and modularity

反应性和模块化之间的权衡

  • Reactivity: the ability to quickly and efficiently react to changes

    反应性:快速高效地对变化做出反应的能力

  • Modularity: the degree to which a system's components may be separated into building blocks, and recombined

    模块化:系统组件可分离成构建块并重新组合的程度

webp

Behavior Tree (BT)

Behavior Tree

Focus on state abstraction and transition conditions

关注状态抽象和转换条件

webp
Similar to human thinking:

类似于人类的思维:

  • If ghost close, run away

    如果鬼靠近,就逃跑

  • But if I'm powerful, chase it

    但如果我很强大,就去追它

  • Otherwise, eating

    否则,就吃

webp

Behavior Tree - Execution Nodes

行为树 - 执行节点

Execution node (leaf node)

执行节点(叶节点)

  • Condition node

    条件节点

  • Action node

    动作节点

webp

Behavior Tree - Control Nodes

行为树-控制节点

Control flow node (internal node)

控制流节点(内部节点)

  • Control flow determined by the return value of child nodes

    控制流由子节点的返回值决定

  • Each node has a return value which is success failure or running

    每个节点都有一个返回值,即成功、失败或正在运行

webp

Control Node-Sequence (1/2)

控制节点顺序 (1/2)

  • Order

    顺序

    • Execute children from left to right

      从左到右执行子节点

  • Stop Condition and Return Value

    停止条件和返回值

    • until one child returns Failure or Running then return value accordingly

      直到一个子节点返回失败或正在运行,然后相应地返回值

    • or all children return Success, then return Success

      或所有子节点都返回成功,然后返回成功

  • lf Stop and Return Running

    如果停止并返回正在运行

    • the next execution will start from the running action

      下一次执行将从正在运行的操作开始

webp

Control Node-Sequence (2/2)

Sequence

序列

  • Allows designers to make a "plan"

    允许设计师制定“计划”

webp

Control Node-Selector (1/2)

控制节点选择器 (1/2)

  • Order

    顺序

    • Execute children from left to right

      从左到右执行子节点

  • Stop Condition and Return Value

    停止条件和返回值

    • until one child returns Success or Running, then return value accordingly

      直到一个子节点返回 Success 或 Running,然后相应地返回值

    • or all children return Failure, then return Failure

      或所有子节点都返回 Failure,然后返回 Failure

  • If Stop and Return Runningthe next execution will start from the runningaction

    如果 Stop 并 Return Running,则下一次执行将从 running 操作开始

webp

Control Node-Selector (2/2)

控制节点选择器 (2/2)

Selector

选择器

  • Could select one action to do response to different environment

    可以选择一个动作来响应不同的环境

  • Could do the right thing according to priority

    可以根据优先级做正确的事情

webp

Control Node- Parallel (1/2)

控制节点 - 并行 (1/2)

  • Order

    顺序

    • Logically execute all children simultaneously

      逻辑上同时执行所有子节点

  • Stop Condition and Return Value

    停止条件和返回值

    • Return Success when at least M childnodes (between 1 and N) have succeeded

      当至少 M 个子节点(介于 1 和 N 之间)成功时返回成功

    • Return Failure when at least N - M + 1 child nodes (between 1 and N) have failed

      当至少 N - M + 1 个子节点(介于 1 和 N 之间)失败时返回失败

    • Otherwise return Running

      否则返回正在运行

  • If Stop and Return Running

    如果停止并返回正在运行

  • the next execution will start from the running actions

    下一次执行将从正在运行的操作开始

webp

Control Node - Parallel (2/2)

控制节点 - 并行 (2/2)

Parallel

并行

  • Could do multiple things "at the same time"

    可以同时做多件事

webp

Behavior Tree

行为树

Execution nodes

执行节点

  • Action

    操作

  • Condition

    条件

Control flow nodes

控制流节点

  • Sequence

    序列

  • Selector

    选择器

  • Parallel

    并行

Node TypeSymbolSucceedsFailsRunning
SequencewebpIf all children succeedlf one child failsIf one child returns Running
SelectorwebpIf one child succeedsIf all children failIf one child returns Running
ParallelwebpIf ≥ M children succeedIf > N - M children failelse
ConditionwebpUpon completionIf impossible to completeDuring completion
Actionwebplf trueIf falseNever

Tick a Behavior Tree

勾选行为树

  • The tick of BT is like thinking

    BT 的勾选就像思考

  • Every tick start from root node

    每次勾选都从根节点开始

  • Go through different nodes from up to down, left to right

    从上到下、从左到右遍历不同的节点

  • Each node must return failure, success or running

    每个节点必须返回失败、成功或正在运行

webp

Behavior Tree-Decorator (1/2)

行为树装饰器 (1/2)

Decorator

装饰器

  • A special kind of control node with a single child node

    一种特殊的控制节点,只有一个子节点

  • Usually some behavior pattern which is commonly used

    通常是一些常用的行为模式

webp
  • For example, some common policies:
    • Loop execution
    • Execute once
    • Timer
    • Time Limiter
    • Value Modifier
    • Etc.

Behavior Tree - Decorator (2/2)

行为树 - 装饰器 (2/2)

Decorator

装饰器

  • Example: Use timer to implement "patrol"

    示例:使用计时器实现“巡逻”

webp

Behavior Tree-Precondition

行为树-前提条件

Simplify behavior tree structure with preconditions

使用前提条件简化行为树结构

webp

Behavior Tree-Blackboard

行为树-黑板

Blackboard: the memory of behavior tree

黑板:行为树的记忆

webp

Behavior Tree - Pros

行为树 - 优点

  • Modular, Hierarchical organization

    模块化、分层组织

    • each subtree of a BT can be seen as a module, with a standard interface given by the return statuses

      BT 的每个子树都可以看作一个模块,具有由返回状态给出的标准接口

  • Human readable

    人类可读

  • Easy to maintain

    易于维护

    • Modification only affect parts of tree

      修改仅影响树的部分

webp
  • Reactivity

    反应性

    • Think every tick to quickly change behavior according to environment

      思考每一次滴答,根据环境快速改变行为

  • Easy to Debug

    易于调试

    • Every tick is a whole decision making process, so that it is easy to debug

      每一次滴答都是一个完整的决策过程,因此易于调试

webp

Behavior Tree - Cons

行为树 - 缺点

Cons

缺点

  • Each tick starts from root node which costs much more

    每次更新都从根节点开始,成本更高

  • The more reactive, the more condition to be checked andthe more costs per tick

    反应性越高,需要检查的条件就越多,每次更新的成本就越高

Upcoming: Al Planning and Goals

即将推出:人工智能规划和目标

To make the Al more deliberative, game designers introduced the Al Planning technique to improve the planning ability of Al

为了让人工智能更具深思熟虑,游戏设计师引入了人工智能规划技术来提高人工智能的规划能力

Al Planning:

人工智能规划:

  • Manage a set of actions

    管理一组动作

  • A planner make a plan according tothe initial world state

    规划者根据初始世界状态制定计划

webp

Reference

Steering & Sensing

Crowd Simulation

Classical Decision Making Algorithms

第十七节:游戏引擎 Gameplay 玩法系统:高级 AI

Adavanced Artificial Intelligence

高级 AI

  • Hierarchical Tasks Network

    分层任务网络

  • Goal-Oriented Action Planning

    目标导向行动规划

  • Monte Carlo Tree Search

    蒙特卡洛树搜索

  • Machine Learning Basic

    机器学习基础

  • Build Advanced Game Al

    构建高级游戏人工智能

Hierarchical Tasks Network

层次任务网络

Overview

HTN assumes there are many Hierarchical tasks

HTN 假设存在许多分层任务

webp

Make a Plan like Human

像人类一样制定计划

Hierarchical:

分层:

  • people in real world usually make their plan hierarchically

    现实世界中的人们通常分层制定计划

webp

为了完成 Take a class,我可能需要做出的 Method。

HTN Framework (1/2)

HTN 框架

World state

世界状态

  • Contains a bunch of properties

    包含一系列属性

  • Input to planner, reflect the status of world

    输入到规划器,反映世界的状态

  • It's a Subject World View in Al Brain

    它是人工智能大脑中的主体世界观

Sensors

传感器

  • Perceive changes of environment and modify world state

    感知环境变化并修改世界状态

  • It's more like Perception

    它更像是感知

webp

HTN Framework (2/2)

HTN Domain

HTN 域

  • Load from asset

    从资产加载

  • Describe the relationship of hierarchical tasks

    描述分层任务的关系

Planner

规划器

  • Make a plan from World State and HTN Domain

    根据世界状态和 HTN 域制定计划

Plan Runner

计划运行器

  • Running the plan

    运行计划

  • Update the world state after the task

    任务完成后更新世界状态

webp

HTN Task Types

HTN 任务类型

Two types of Tasks

两种类型的任务

  • Primitive Task

    原始任务

  • Compound Task

    复合任务

webp

Primitive Task (1/2)

原始任务 (1/2)

  • Preconditions

    前提条件

    • Determine whether an action could be executed

      确定是否可以执行某个操作

    • Check whether properties of game world being satisfied

      检查游戏世界的属性是否得到满足

  • Action

    操作

    • Determine what action the primitive task executes

      确定原始任务执行什么操作

  • Effects

    效果

    • Describe how the primitive task modify the game world state properties

      描述原始任务如何修改游戏世界状态属性

webp webp
  • 前提条件:有解药
  • 操作:使用解药
  • 效果:解除负面状态,消耗解药

Compound Task (1/2)

复合任务 (1/2)

Compound Tasks

复合任务

  • Contain several methods

    包含多种方法

  • Methods have different priority

    方法具有不同的优先级

  • Each method has preconditions

    每种方法都有先决条件

Method

方法

  • contains a chain of sub-Tasks

    包含一系列子任务

  • Sub-task could be a primitive task or a compound task

    子任务可以是原始任务或复合任务

webp webp

解毒任务:

  • 有足够材料——制作解药
  • 有足够钱——买解药
  • 最后使用解药解毒

HTN Domain

webp webp

Planning

Step 1

  • Start from the root task

    从根任务开始

  • Choose the method satisfying the precondition in order

    按顺序选择满足前提条件的方法

webp

Step 2

  • Decompose the method to tasks

    将方法分解为任务

  • Check precondition in order

    按顺序检查前提条件

  • Decompose the task if it is a compound task

    如果任务是复合任务,则将其分解

webp

Step 2 (For primitive tasks)

第 2 步(针对原始任务)

  • Assume all action will be succeed, update “world state” in temporary memory

    假设所有操作都会成功,在临时内存中更新“世界状态”

  • World state has a duplicated copy in planning phase for scratch paper

    世界状态在规划阶段有一份草稿纸的副本

webp

Step 2 (For primitive tasks)

第 2 步(针对原始任务)

  • go back and select a new method if precondition is not satisfied

    如果先决条件不满足,则返回并选择新方法

webp

Step 2 (For compound task)

第 2 步(针对复合任务)

  • select the next method if precondition is not satisfied

    如果先决条件不满足,则选择下一个方法

webp

Step 3

  • Repeat step 2 until no more task needs to be done

    重复步骤 2,直到不再需要完成任务

  • The final plan contains only primitive tasks

    最终计划仅包含原始任务

webp webp

Run plan

运行计划

Run plan

运行计划

  • Execute tasks in order

    按顺序执行任务

  • Stop until all tasks succeed, or one task failed

    停止直到所有任务成功,或一个任务失败

Execute task

执行任务

  • Check precondition and return failure if not satisfied

    检查先决条件,如果不满足则返回失败

  • Execute action

    执行操作

    • if succeed -> update world state and return success

      如果成功 -> 更新世界状态并返回成功

    • if failed -> return failure

      如果失败 -> 返回失败

webp

Replan

重新规划

There are three situations that the agent could start plan

代理可以在三种情况下启动计划

  • Not have a plan

    没有计划

  • The current plan is finished or failed

    当前计划已完成或失败

  • The World State changes via its sensor

    世界状态通过其传感器发生变化

webp

Conclusion

结论

Pros:

优点:

  • HTN is similar with BT, and it is more high-level

    HTN 与 BT 类似,但更高级

  • It outputs a plan which has long-term effect

    它输出具有长期效果的计划

  • It would be faster compared to the BT in the same case

    在相同情况下,它比 BT 更快

Cons:

缺点:

  • Player's behavior is unpredictable, so the tasks may be easy to fail

    玩家行为不可预测,因此任务可能很容易失败

  • The World state and the effect of tasks are challenging for designers

    世界状态和任务效果对设计师来说具有挑战性

Goal-Oriented Action Planning

以目标为导向的行动计划

Goal-Oriented Action Planning (GOAP)

目标导向行动计划 (GOAP)

  • GOAP is more automated

    GOAP 更加自动化

  • It takes backward planning rather than forward

    它需要向后规划,而不是向前规划

webp

Structure

结构

Sensors and World State

传感器和世界状态

  • Similar to HTN

    类似于 HTN

Goal set

目标集

  • All available goals

    所有可用目标

Action set

行动集

  • All available actions

    所有可用行动

Planning

规划

  • Output sequence of actions

    输出行动序列

webp

Goal Set

目标集

  • Precondition decides which goal will be selected

    前提条件决定选择哪个目标

  • Priority decide witch goal should be selected among all the possible goals

    优先级决定在所有可能的目标中选择哪个目标

  • Each goal can be presented as a Collection of States

    每个目标都可以表示为状态集合

webp

Goal Selection

目标选择

webp

Action Set

动作集

Action in GOAP is with precondition, effect and cost

GOAP 中的动作具有前提条件、效果和成本

  • Precondition: in which state, character can do this action

    前提条件:角色在哪种状态下可以执行此动作

  • Effect: after the action is done, how does the world statechanges

    效果:执行动作后,世界状态如何变化

  • Cost: defined by developer, used as a weight to make theplan which has the lowest cost

    成本:由开发人员定义,用作制定成本最低的计划的权重

webp

Backward Planning Like a Human

像人类一样进行反向规划

  • When making a plan, start from goal state

    制定计划时,从目标状态开始

webp

Goal:解毒

反向规划:使用解药-花钱买解药-拜访商店并付款

Planning

规划

Step 1

  • Check goals according to priority

    根据优先级检查目标

  • Find the first goal of which precondition is satisfied

    找到第一个满足先决条件的目标

webp

Step 2

  • Compare the target state with world state to find unsatisfied goal

    将目标状态与世界状态进行比较,找出未满足的目标

  • Set all unsatisfied states of the goal into a stack

    将目标的所有未满足状态放入堆栈中

webp

Step 3

  • Check the top unsatisfied state from the stack

    从堆栈中检查顶部未满足的状态

  • Select an action from action set which could satisfy the chosen state

    从操作集中选择一个可以满足所选状态的操作

  • Pop the state if it is satisfied by the selected action

    如果所选操作满足该状态,则弹出该状态

webp

Step 4

  • Push action to plan stack

    将操作推送到计划堆栈

  • Check precondition of corresponded action

    检查相应操作的前提条件

  • If precondition is not satisfied, push state to stack of unsatisfied states

    如果前提条件不满足,则将状态推送到不满足状态堆栈

webp

Build States-Action-Cost Graph

构建状态-动作-成本图

Can be turned into a path planning problem

可以转化为路径规划问题(动态规划

  • Node: combination of states

    节点:状态组合

  • Edge: Action

    边:动作

  • Distance: Cost

    距离:成本

Search direction

搜索方向

  • Start node: states of the goal

    起始节点:目标状态

  • End node: current states

    结束节点:当前状态

webp

根据目标解毒,从当前状态找到成本最低的路线。

The Lowest Cost Path

最低成本路径

Can use A* or other shortest path algorithms

可以使用 A* 或其他最短路径算法

  • Heuristics can be represented with number of unsatisfied states

    启发式算法可以用不满足状态的数量来表示

webp

Conclusion

结论

Pros:

优点:

  • Compared with HTN, GOAP plans is more dynamic

    与 HTN 相比,GOAP 计划更具动态性

  • Decoupling goals and behaviors

    将目标和行为分离

  • HTN can easily make precondition/effect mismatching mistakes

    HTN 很容易犯前提条件 / 效果不匹配的错误

Cons:

缺点:

  • In a single Al system, the runtime planning would be slower than BT/FSM/HTN

    在单个 AI 系统中,运行时规划会比 BT/FSM/HTN 慢

  • Also needs a well-represented world state and action effect

    还需要一个具有良好表现的世界状态和行动效果

蒙特卡洛树搜索

Monte Carlo Tree Search

MCTS is another automated planning, and it behaves more diversely

MCTS 是另一种自动化规划,其行为更加多样化

webp

AlphaGo 就用到了蒙特卡洛树搜索。

webp

Like playing chess, simulate millions possible moves in mind and choose the“best” step

就像下棋一样,在脑海中模拟数百万种可能的走法,并选择“最佳”一步

Monte Carlo Method

蒙特卡洛方法

  • A broad class of computational algorithms that rely on repeated random sampling to obtain numerical results

    一大类依赖重复随机抽样来获得数值结果的计算算法

webp

Monte Carlo Tree Search

webp

对于当前棋局状态,给出可能的合理行为,求 best move。

States and Actions

状态和动作

State

状态

  • The state of game

    游戏状态

  • Represented by a node

    用节点表示

webp
Action

动作

  • One step operation of Al

    人工智能的一步操作

  • Represented by an edge

    用边表示

webp

States Transfer

Transfer state from A to B by action
webp

State Space

状态空间

A Tree Structured State space:

树结构状态空间:

The set of states that can be reached from the current state after a possible sequence of actions

从当前状态经过一系列可能的操作后可以到达的状态集

webp

NOTICE: Rebuild the State Space for Each Move

注意:每次移动都要重建状态空间

webp

Simulation: Playing a Game in Mind Quickly

模拟:快速在脑海中玩游戏

Simulation

模拟

  • Run from the state node according to the Default Policy to produce an outcome

    根据默认策略从状态节点运行以产生结果

In the case of Go

围棋的情况

  • Apply random moves from the state until the game is over

    从状态中应用随机动作直到游戏结束

  • Return 1 (win) or 0 (loss) depending on the result

    根据结果返回 1(赢)或 0(输)

Default Policy

默认策略

  • A meaningful but quick rule or neural network to playthe game

    一个有意义但快速的规则或神经网络来玩游戏

webp

How to evaluate the states?

如何评估状态?

Evaluation Factors

评估因素

  • Q: Accumulation of Simulation Results

    Q:模拟结果的累积

  • N: Number of simulations

    N:模拟次数

Simulation Results and Number ofsimulations Maybe not direct simulation but from child nodes

模拟结果和模拟次数可能不是直接模拟,而是来自子节点

webp

Backpropagate

反向传播

Propagate influence of child state back parent state

将子状态的影响传播回父状态

  • QFatherNode=QFatherNode+QBackchildNodeQ_{FatherNode}=Q_{FatherNode} +Q_{BackchildNode}

  • Nnode=Nnode+1N_{node} = N_{node} + 1

  • Repeat it until reaching the root

    重复此操作直至到达根节点

webp

lteration Steps

迭代步骤

  • Selection: select the most urgent “expandable” node

    选择:选择最紧急的“可扩展”节点

  • Expansion:expand the tree by selecting an action

    扩展:通过选择操作扩展树

  • Simulation: simulate from the new node and produce an outcome

    模拟:从新节点进行模拟并产生结果

  • Backpropagate: backpropagate the outcome of simulation from the new node

    反向传播:从新节点反向传播模拟结果

webp

Search in "Infinite" State Space

在“无限”状态空间中搜索

Generally impossible to traverse the state space

通常不可能遍历状态空间

  • We prioritize exploring the most promising regions in state space

    我们优先探索状态空间中最有希望的区域

  • Pre-set a computational budget and stop exploring the state space when the budget is reached

    预设计算预算,并在达到预算时停止探索状态空间

Selection-Expandable Node

选择可扩展节点

Select the most urgent "expandable" node

选择最紧急的“可扩展”节点

“expandable" node

“可扩展”节点

  • Nonterminal state and has unvisited children

    非终止状态且有未访问的子节点

  • Example:

    示例:

webp

Selection-Exploitation and Exploration

选择-开发和探索

Exploitation

开发

  • Look in areas which appear to be promising

    寻找看似有希望的领域

  • Select the child which has high Q/N value

    选择具有高 Q/N 值的子项

webp
Exploration

探索

  • Look in areas that have not been well sampled yet

    查看尚未充分采样的区域

  • Select the child which has low number of visits

    选择访问次数较少的子项

webp webp

UCB (Upper Confidence Bounds)

UCB(置信上限)

How to balance exploration and exploitation?

如何平衡探索和开发?

  • Use UCB (Upper Confidence Bounds) formula

    使用 UCB(置信上限)公式

  • UCBjUCB_j :the UCB value of the node jj

    UCBjUCB_j:节点 jj 的 UCB 值

  • QjQ_j: the total reward of all playouts that passed through node jj

    QjQ_j:经过节点 jj 的所有播放的总奖励

  • NjN_j : the number of times node jj has been visited

    NjN_j:节点 jj 被访问的次数

  • NN : the number of times the parent node of node jj has been visited

    NN:节点 jj 的父节点被访问的次数

  • CC: a constant, adiusted to lower or increase the amount of exploration performe

    CC:一个常数,调整以降低或增加探索执行量

webp

Selection

选择

How to select the most urgent expandable node

如何选择最紧急的可扩展节点

  • Always Search from the root node

    始终从根节点搜索

  • Find the highest UCB value child node (promising child) of current node

    查找当前节点的 UCB 值最高的子节点(有希望的子节点)

  • Set promising child as current node

    将有希望的子节点设置为当前节点

  • Iterate above steps until current node is expandable. Set current node as selected node

    迭代上述步骤,直到当前节点可扩展。将当前节点设置为选定节点

webp

Expansion

扩展

Expansion

扩展

  • One or more new child nodes are added to selected node, according to the available actions

    根据可用操作,向选定节点添加一个或多个新子节点

  • The value of child node is unknown

    子节点的值未知

webp

Simulation and Backpropagation

模拟和反向传播

webp

The End Condition

结束条件

Computational budget

计算预算

  • Memory size (the number of nodes)

    内存大小(节点数)

  • Computation time

    计算时间

webp

How to Choose the Best Move?

如何选择最佳移动?

The "best" child node of current state node

当前状态节点的“最佳”子节点

  • Max child: Select the root child with the highest Q-value

    最大子节点:选择具有最高 Q 值的根子节点

  • Robust child: Select the most visited root child

    稳健子节点:选择访问次数最多的根子节点

  • Max-Robust child: Select the root child with both the highest visit count and the highest reward. lf none exist, then continue searching until an acceptable visit count is achieved

    最大稳健子节点:选择访问次数和奖励都最高的根子节点。如果不存在,则继续搜索,直到达到可接受的访问次数。

  • Secure child: Select the child which maximises a lowerconfidence bound (LCB)

    安全子节点:选择最大化下置信区间 (LCB) 的子节点

LCBj=QjNjC2ln(N)NjLCB_{j}=\frac{Q_j}{N_j}-C\cdot\sqrt{\frac{2\ln(N)}{N_j}}

webp

Conclusion

结论

Pros:

优点:

  • MCTS agent behaves diverse

    MCTS 代理行为多样

  • Agent makes the decision totally by itself

    代理完全自行做出决策

  • Can solve the problem of large search space

    可以解决搜索空间大的问题

Cons:

缺点:

  • The action and state are hard to design for most real-time games

    对于大多数实时游戏来说,动作和状态很难设计

  • It is hard to model for most real-time games

    对于大多数实时游戏来说,很难建模

Machine Learning Basic

机器学习基础

Machine Learning

机器学习

Four Types of Machine Learning

机器学习的四种类型

  • Supervised learning

    监督学习

  • Unsupervised learning

    无监督学习

  • Semi-supervised learning

    半监督学习

  • Reinforcement learning

    强化学习

webp

ML Types: Supervised Learning

ML 类型:监督学习

  • Learn from labeled data

    训练时提供标记数据

webp

ML Types: Unsupervised Learning

ML 类型:无监督学习

  • Learn from unlabeled data

    从未标记的数据中学习

webp

无监督学习便于处理聚类问题。

ML Types: Semi-supervised Learning

ML 类型:半监督学习

  • Learn from a lot of unlabeled data and very scarce labeled data

    从大量未标记数据和非常稀少的标记数据中学习

webp

ML Types: Reinforcement learning

ML 类型:强化学习

  • Learn from an interaction process with environment

    从与环境的交互过程中学习

webp

Reinforcement Learning

强化学习

Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward.

强化学习 (RL) 是机器学习的一个领域,它关注智能代理如何在环境中采取行动,以最大化累积奖励的概念。

  • Trial-and-error search

    反复试验

  • The learner must discover which actions yield the most reward by trying them

    学习者必须通过尝试发现哪些行动能产生最大的奖励

  • Delayed reward

    延迟奖励

  • Actions may affect the immediate reward, thenext situation and all subsequent rewards

    行动可能会影响即时奖励、下一个情况和所有后续奖励

Markov Decision Process-Basic Elements (1/4)

马尔可夫决策过程-基本概念

  • Agent

    代理

    The learner and decision maker

    学习者和决策者

  • Environment

    环境

    The thing the agent interacts with, comprising everything outside the agent

    代理与之交互的事物,包括代理之外的一切

webp

Markov Decision Process-State (2/4)

马尔可夫决策过程 - 状态 (2/4)

State is the observation of the agent, and the data structure is designed by human

状态是代理的观察,数据结构由人设计

webp

Markov Decision Process-Action (3/4)

马尔可夫决策过程-行动 (3/4)

Action is the minimal element the agent could behave in the game It is also designed by human

行动是代理在游戏中可以表现的最小元素,它也是由人类设计的

webp

Markov Decision Process-Reward (4/4)

马尔可夫决策过程-奖励 (4/4)

A special signal the agent receives at each time step passing from environment to the agent

代理在从环境传递到代理的每个时间步骤中收到的特殊信号

webp

MDP Mathematical Mode

MDP 数学模式

  • Probability of transition

    转换概率

    The probability of transition from s to s' after taking action a

    采取行动 a 后从 s 转换到 s' 的概率

p(ss,a)=P(St=sSt1=s,At1=a)p(s'|s, a)=P(S_t=s'|S_{t-1}=s, A_{t-1}=a)

  • Policy

    策略

    A mapping from states to probabilities of selecting each possible action

    从状态到选择每个可能行动的概率的映射

π(as)=P(At=aSt=2)\pi(a|s)=P(A_t=a|S_t=2)

  • Total reward

    总奖励

    The cumulative reward it receives in the long run

    从长远来看,它获得的累积奖励

Gt=Rt+1+Rt+2+Rt+3+...+RTG_t=R_{t+1}+R_{t+2}+R_{t+3}+...+R_T

Gt=Rt+1+γRt+2+γ2Rt+3+...G_t=R_{t+1}+\gamma R_{t+2}+\gamma^2R_{t+3}+...

webp

Policy

策略

A mapping from states to probabilities of selecting each possible action

从状态到选择每个可能动作的概率的映射

π(as)=P(A=aS=s)\pi(a|s)=P(A=a|S=s)

webp

Build Advanced Game AI

构建高级游戏 AI

Why Game Al needs Machine Learning

为什么游戏 AI 需要机器学习

It is notable that all previous methods actuallyneed human knowledge to design (include the cost of GOAP)

值得注意的是,之前的方法实际上都需要人类知识来设计(包括 GOAP 的成本)

But players always expect Al to be able to bothdeal with complicated game world and behave naturally and diversely

但玩家总是希望 AI 能够既能应对复杂的游戏世界,又能表现得自然多样

  • Traditional methods is in limited space

    传统方法空间有限

  • Machine Learning create infinite possibilities

    机器学习创造无限可能

webp

Machine Learning Framework in Game

游戏中的机器学习框架

The framework of deploying a neural network to play an agent

部署神经网络扮演代理的框架

Observation:

观察:

  • The Game State the Al could observe

    人工智能可以观察到的游戏状态

    • Vector feature

      矢量特征

    • Unit information

      单位信息

    • Environment information

      环境信息

    • Etc.

  • lmage

    图像

  • ...

webp

DRL Example-Model the Game

DRL 示例-游戏建模

A DRL design process should contain:

DRL 设计流程应包含:

  • State

    状态

  • Action

    动作

  • Reward

    奖励

  • NN design

    神经网络设计

  • Training Strategy

    训练策略

webp

DRL Example-State

webp

如上图这个游戏,状态 = 小地图 + 游戏统计 + 单位 + 玩家数据

States (1/2)-Maps

状态-地图

Heights

高度

Visibility: fog of war

可见性:战争迷雾

Creep

爬行

Entity owners

实体所有者

Alerts

警报

Pathable

可行进

Buildable

可建造

webp

States(2/2)-Units Information

状态(2/2)-单位信息

For each unit in a frame

针对框架中的每个单位

Unit type 单位类型

OwnerStatus 所有者状态

Display type 显示类型

Position 位置

Number of workers 工人数量

Cool down 冷却

Attributes 属性

Unit attributes 单位属性

Cargo status 货物状态

Building status 建筑状态

Resource status 资源状态

Order status 订单状态

Buff status 增益状态

webp

Actions

动作

For a unit it should have actions like

对于一个单位来说,它应该有以下动作

  • What

    什么

    • move

      移动

    • attack

      攻击

    • build

      建造

  • Who

  • Where

    哪里

  • When next action

    何时进行下一步行动

webp

Rewards (1/2)

奖励(1/2)

Direct reward from game

游戏直接奖励

  • Win: +1

    赢:+1

  • Lose: -1

    输:-1

Pseudo-reward output along with critic network:

与评论网络一起输出的伪奖励:

  • the distance of agent's operation and human data statistic z

    代理操作与人类数据统计的距离 z

webp

Rewards (2/2)

奖励 (2/2)

Reward is much denser in OpenAl Five at Dota2

Dota2 中 OpenAl Five 的奖励更加密集

Different reward settings could help us to train different styles of agent

不同的奖励设置可以帮助我们训练不同风格的代理

  • Aggressive

    激进

  • Conservative

    保守

  • ...

webp

NN architectures

webp

OpenAI 提供的玩 Dota2 的神经网络架构。

DRL example-Multi-Layer Perceptron (MLP)

DRL 示例-多层感知器 (MLP)

Classical and easy to implement

经典且易于实现

Flexible definition of the dimensions of inputs and outputs

灵活定义输入和输出的维度

webp

Scalar feature example

标量特征示例

  • Race

    种族

  • Owned Resource

    拥有的资源

  • Upgrade

    升级

  • Etc.

DRL example-Convolutional Neural Network (CNN)

DRL 示例-卷积神经网络(CNN)

webp

还介绍了 ResNet。

DRL example-Transformer

DRL 示例-Transformer

  • Introduce attention mechanisms

    引入注意力机制

  • Uncertain length vector

    不确定长度向量

  • Well represent the complex feature like multi agents

    很好地表示像多代理这样的复杂特征

webp

DRL example-Long-Short Term Memory (LSTM)

DRL 示例 - 长短期记忆 (LSTM)

Enable Al to remember or forget earlier data

使 AI 能够记住或忘记早期数据

webp

DRL example-NN Architecture Selection

DRL 示例-NN 架构选择

NN Architecture selection for different type of feature

不同类型特征的 NN 架构选择

  • Fixed length vector feature

    固定长度向量特征

    • Multi-Layer Perception

      多层感知

  • Uncertain length vector feature

    不确定长度向量特征

    • Long-Short Term Memory

      长短期记忆

    • Transformer

  • Image feature

    图像特征

    • ResNet
  • Raycast

  • Mesh

webp

Training Strategy-Supervised learning

训练策略-监督学习

AlphaStar is trained via both supervised learning and reinforcement learning. lt firstly learned a policy by supervised learning from human expert data

AlphaStar 通过监督学习和强化学习进行训练。它首先通过监督学习从人类专家数据中学习策略

z is a statistic summary of a strategy sampled from human data (for example,a build order)

z 是从人类数据中采样的策略的统计摘要(例如,构建顺序)

Minimize the distance (KL divergence) of agent policy and human decision distribution sampled from z

最小化从 z 中采样的代理策略和人类决策分布的距离(KL 散度)

webp

Training Strategy-Reinforcement learning

训练策略-强化学习

Secondly, it took RL technique to improve the SL policy

其次,采用强化学习技术改进 SL 策略

TD(λ),V-trace, UPGO are specific Reinforcement learning methods to improve actor network and critic network.

TD(λ)、V-trace、UPGO 是改进参与者网络和评论家网络的具体强化学习方法。

The KL degree towards old SL policy would also be considered

还会考虑对旧 SL 策略的 KL 度

These tricks improved the policy and made it more human-like

这些技巧改进了策略,使其更像人类

webp

Train the Agent-Self Play & Adversarial

训练 Agent-自我游戏和对抗

In AlphaStar three pools of agents attend training initialized from SL policy

在 AlphaStar 中,三个 Agent 池参加从 SL 策略初始化的训练

  • Main agents [MA]

    主要 Agent [MA]

    • Goal: most robust and output

      目标:最稳健和输出

    • Self-play (35%)

      自我游戏 (35%)

    • Against past LE and ME agents (50%)

      对抗过去的 LE 和 ME Agent (50%)

    • Against past MA agents (15%)

      对抗过去的 MA Agent (15%)

  • League exploiters [LE]

    联盟利用者 [LE]

    • Goal: find weakness of past all agents (MA, LE, ME)

      目标:找到过去所有 Agent (MA、LE、ME) 的弱点

    • Against all past agents (MA, LE, ME)

      对抗所有过去的 Agent (MA、LE、ME)

  • Main exploiters [ME]

    主要利用者 [ME]

    • Goal: find weakness of current MA agent

      目标:找到当前 MA Agent 的弱点

    • Against current MA agent

      对抗当前的 MA Agent

webp

RL or SL?——SL analysis

RL 还是 SL?——SL 分析

Supervised Learning needs high quality data, and sometimes behaves well too

监督学习需要高质量的数据,有时表现也很好

  • It behaves like human

    它表现得像人类

  • But may not outperform human expert data

    但可能不会胜过人类专家数据

  • Human data is unbalanced

    人类数据不平衡

  • Sometimes there is not enough data

    有时数据不足

webp

RL or SL?-RL analysis

RL 还是 SL?-RL 分析

Reinforcement Learning is usually considered as the optimal solution, however

强化学习通常被认为是最佳解决方案,但是

  • Training a RL model is tough

    训练 RL 模型很困难

  • The model is hard to converge

    模型很难收敛

  • The game environment for training is also a huge development project

    训练的游戏环境也是一个巨大的开发项目

  • The data collection process could be slow

    数据收集过程可能很慢

  • And the behavior maybe unnatural

    行为可能不自然

webp

RL or SL?——Dense reward

RL 还是 SL?——密集奖励

What makes a good problem for RL

什么才是 RL 的好问题

webp

RL or SL?——Summary

Situation for SL
SL 的情况
Situation for RL
强化学习的情况
Easy to get data
轻松获取数据
Needs to outperform the master level
需要超越大师水平
Needs to perform like human
需要像人类一样表现
Enough budget
足够的预算
Data is unavailable
数据不可用
Dense reward
密集奖励

Hybrid

混合

Machine Learning is powerful.

机器学习很强大。

But it cost much too. For example, DeepMind spends 250million dollars to finish alpha star and a replication needs 13million dollars

但成本也很高。例如,DeepMind 花费 2.5 亿美元完成 alpha star,而复制需要 1300 万美元

We often need to make a tradeoff that place DNN on the human-like points (a part of thewhole combat).

我们经常需要做出权衡,将 DNN 放在类人点上(整个战斗的一部分)。

webp

References

HTN

GOAP

MCTS

Machine Learning

Machine Learning Game Applications