飞机游戏六空战强化学习环境6.1DBML 6.1.3 DBML+强化学习算法使用

飞机游戏六空战强化学习环境6.1DBML 6.1.3 DBML+强化学习算法使用

项目文件结构

image-20221211111606633

doc 是指导文档,tutorial.ipynb也是指导项目的配置与安装的文件

src目录下是主要的环境和模型所在位置,

其中environment是作者根据以jsbsim和dogfight2作为仿真软件,编写的符合gym框架下的环境,在使用的时候需要将环境复制注册到代码所运行环境的gym包中,如下图所示

QQ截图20221211112054

DogfightEnv

我们来看一下 是怎样将dogfight的仿真与gym框架结合起来的

整体的函数

QQ截图20221211154237

交互的实现

#调用dogfight中的通信端口,使用其中的函数进行获取数据与发送数据
from gym.envs.dogfightEnv.dogfight_sandbox_hg2.network_client_example import dogfight_client as df

QQ截图20221211155135

def get_missile_state(missile_id):
	socket_lib.send_message(str.encode(json.dumps({"command": "GET_MISSILE_STATE", "args": {"missile_id": missile_id}})))
	state = json.loads((socket_lib.get_answer()).decode())
	return state


def set_missile_target(missile_id, target_id):
	socket_lib.send_message(str.encode(json.dumps({"command": "SET_MISSILE_TARGET", "args": {"missile_id": missile_id, "target_id": target_id}})))

具体实现是使用socket进行通信,

首先,对要进行发送的数据使用json进行打包,然后通过socket端口进行发送,然后再对接受到的信息进行解析,返回给调用函数

dogfight2提供的API

dogfight-sandbox-hg2/documentation_network.md at main · harfang3d/dogfight-sandbox-hg2 · GitHub

发送到的命令进行处理的函数都在dogfight2的source文件夹下的network_server.py文件,可以找到发送命令后都调用了哪些函数

示例 :例如,

QQ截图20221211203017

首先在network_client_example文件夹下打开dogfight_client.py ctrl+F进行搜索

QQ截图20221211203522

然后再去source文件夹下的network_server.py文件夹下进行搜索,就得到了函数,在函数体内就有其具体内容

QQ截图20221211204237

可视化的实现

状态空间的定义与实现

observation_space = Box(
    low=np.array([  # simple normalized
        # Plane
        -300,  # x / 100
        -300,  # y / 100
        -1,    # z / 50
        0,     # heading
        -360,  # pitch_attitude * 4
        -360,  # roll_attitude * 4
        # Missile
        -300,  # x / 100
        -300,  # y / 100
        -1,    # z / 50
        -315,  # heading * 100
        -315,  # pitch_attitude * 100
        -315,  # roll_attitude * 100
    ]),
    high=np.array([
        300,
        300,
        200,
        360,
        360,
        360,
        300,
        300,
        200,
        315,
        315,
        315,
    ])
)

状态空间分为飞机和导弹

飞机

  • 位置 x :[-300,300] y:[-300,300] z[-1,200]
  • 姿态: 前进方向heading [0,360] , 俯仰角,滚转角 [-360,360] 这里的正负可能代表顺时针和逆时针

导弹:

  • 位置 x :[-300,300] y:[-300,300] z[-1,200]
  • 姿态:前进方向heading [-315,315] , 俯仰角,滚转角 [-315,315] 这里的正负可能代表顺时针和逆时针

第3个变量 Pitch_attitude的图示 也就是pitch的姿态的角度

Angle-of-Attack-Flight-Path-Angle-and-Pitch-attitude-Upset-Recovery-Industry-Team-2008

动作空间的定义与实现

action_space = Box(
    low=np.array([
        0,  # Flaps
        -1,  # Pitch
        -1,  # Roll
        -1,  # Yaw
    ]),
    high=np.array([
        1,
        1,
        1,
        1,
    ]),
)

动作空间有四个,分别是襟翼的状态,控制区间为[0,1] ; 俯仰舵,滚转,偏航,控制区间为[-1,1]

代码阅读

import os
import random
import re
import sys
import time
import warnings

import gym
import numpy as np
from gym import Env
from gym.spaces import Box, Discrete

#调用dogfight中的通信端口,使用其中的函数进行获取数据与发送数据
from gym.envs.dogfightEnv.dogfight_sandbox_hg2.network_client_example import dogfight_client as df

# sys.path.append('./src/')
# sys.path.append('./src/environments/dogfightEnv/')
# sys.path.append('./src/environments/dogfightEnv/dogfight_sandbox_hg2/network_client_example/')
sys.path.append('gym.envs.dogfightEnv.dogfight_sandbox_hg2.network_client_example/')


# try:
print("gym dogfightEnv")
print(os.getcwd())




print("DBRL")
time.sleep(1)

# except:
#     from dogfightEnv.dogfight_sandbox_hg2.network_client_example import  dogfight_client as df
#     print("DBRL")
#     time.sleep(1)


class DogfightEnv(Env):

    def __init__(
        self,
        host='10.168.62.25',
        port='50888',
        plane_slot=1,
        enemy_slot=1,
        missile_slot=0,
        rendering=True,
    ) -> None:
        """
        构造函数,进行初始化

        :param host: 通信地址
        :param port: 端口
        :param plane_slot: 我方飞机数量
        :param enemy_slot: 敌方飞机数量
        :param missile_slot: 导弹数量
        :param rendering: 是否进行渲染,即可视化
        """
        
        self.host = host
        self.port = port
        self.nof = 0
        self.rendering = rendering
        self.plane_slot = plane_slot
        self.enemy_slot = enemy_slot
        self.missile_slot = missile_slot

        #飞机的动作空间
        self.action_space = Box(
            low=np.array([
                0,  # Flaps 襟翼
                -1,  # Pitch 俯仰角
                -1,  # Roll 翻滚角
                -1,  # Yaw 偏航角
            ]),
            high=np.array([
                1,
                1,
                1,
                1,
            ]),
        )


        # 飞机的观测空间
        self.observation_space = Box(
            low=np.array([  # simple normalized
                -300,  # x / 100
                -300,  # y / 100
                -1,  # z / 50
                0,  # heading
                -360,  # pitch_attitude * 4
                -360,  # roll_attitude * 4
                -300,  # x / 100
                -300,  # y / 100
                -1,  # z / 50
                -315,  # heading * 100
                -315,  # pitch_attitude * 100
                -315,  # roll_attitude * 100
            ]),
            high=np.array([
                300,
                300,
                200,
                360,
                360,
                360,
                300,
                300,
                200,
                315,
                315,
                315,
            ]),
            dtype=np.float64
        )

        df.set_renderless_mode(True)  # 是否进行渲染,默认不进行渲染
        #可视化设置,是否进行渲染,当参数为True,即进行渲染,则进行渲染,否则默认不进行渲染
        if self.rendering:
            df.set_renderless_mode(False)

        #Set client update mode ON: the scene update must be done by client network, calling "update_scene()"
        df.set_client_update_mode(True)

#飞机进行初始化,按着敌方和我方飞机队列获取飞机列表
        print('环境初始化')
        try:
            df.get_planes_list()
        except:
            print('Run for the first time')
            df.connect(host, int(port))
            time.sleep(2)
        
        planes = df.get_planes_list()  #从仿真环境获取飞机列表
        print(f'初始化 飞机planes:{planes}')

        df.disable_log()
        
        self.planeID = planes[self.plane_slot]
        self.enemyID = planes[self.enemy_slot]

        print(f'初始化 飞机planeID:{self.planeID}')
        print(f'初始化 飞机enemyID:{self.enemyID}')

        #初始化飞机
        for i in planes:
            df.reset_machine(i)





# 完成起飞动作
        # 设置最大推力
        df.set_plane_thrust(self.enemyID, 1)
        df.set_plane_thrust(self.planeID, 1)


        #Wait until plane thrust = 1
        t = 0
        while t < 1:
            plane_state = df.get_plane_state(self.enemyID)
            df.update_scene()
            t = plane_state["thrust_level"]

        # 激活后续燃料
        df.activate_post_combustion(self.enemyID)
        df.activate_post_combustion(self.planeID)

        df.set_plane_pitch(self.enemyID, -0.5)
        df.set_plane_pitch(self.planeID, -0.5)

        p = 0
        while p < 15:
            plane_state = df.get_plane_state(self.enemyID)
            df.update_scene()
            p = plane_state["pitch_attitude"]
        # 飞起来之后的stabilize_plane()函数与就是三个舵偏都设为0
        #        if p: self.set_pitch_level(0)if y: self.set_yaw_level(0)if r: self.set_roll_level(0)
        df.stabilize_plane(self.enemyID)
        df.stabilize_plane(self.planeID)
        #收起起落架
        df.retract_gear(self.enemyID)
        df.retract_gear(self.planeID)

        s = 0
        while s < 1000:
            plane_state = df.get_plane_state(self.enemyID)
            df.update_scene()
            s = plane_state["altitude"]

        
        df.set_plane_yaw(self.planeID, 1)


#设置导弹
        #获取导弹列表
        missiles = df.get_machine_missiles_list(self.enemyID)
        self.missileID = missiles[self.missile_slot]

        df.fire_missile(self.enemyID, self.missile_slot)

        df.set_missile_target(self.missileID, self.planeID) #设置导弹目标, 第一个参数是导弹id,第二个参数是目标id ,这里仅仅是一个属性设置

        df.set_missile_life_delay(self.missileID, 30)   #设置导弹存活时长 ,当过去30s后,自动摧毁导弹




    def getProperty(
        self,
        prop
    ):
        if prop == 'position':
            return [
                df.get_plane_state(self.planeID)['position'][0],
                df.get_plane_state(self.planeID)['position'][2],
                df.get_plane_state(self.planeID)['position'][1],
            ]
        elif prop == 'positionEci':
            warnings.warn('Dogfight simulation environments have no global data!')
            return [
                df.get_plane_state(self.planeID)['position'][0],
                df.get_plane_state(self.planeID)['position'][2],
                df.get_plane_state(self.planeID)['position'][1],
            ]
        elif prop == 'positionEcef':
            warnings.warn('Dogfight simulation environments have no global data!')
            return [
                df.get_plane_state(self.planeID)['position'][0],
                df.get_plane_state(self.planeID)['position'][2],
                df.get_plane_state(self.planeID)['position'][1],
            ]
        elif prop == 'attitudeRad':
            return [
                df.get_plane_state(self.planeID)['heading'] / 180 * np.pi,
                df.get_plane_state(self.planeID)['pitch_attitude'] / 180 * np.pi,
                df.get_plane_state(self.planeID)['roll_attitude'] / 180 * np.pi,
            ]
        elif prop == 'attitudeDeg':
            return [
                df.get_plane_state(self.planeID)['heading'],
                df.get_plane_state(self.planeID)['pitch_attitude'],
                df.get_plane_state(self.planeID)['roll_attitude'],
            ]
        elif prop == 'pose':
            return [
                df.get_plane_state(self.planeID)['position'][0],
                df.get_plane_state(self.planeID)['position'][2],
                df.get_plane_state(self.planeID)['position'][1],
                df.get_plane_state(self.planeID)['heading'],
                df.get_plane_state(self.planeID)['pitch_attitude'],
                df.get_plane_state(self.planeID)['roll_attitude'],
            ]
        elif prop == 'velocity':
            warnings.warn('三个值为速度在欧拉角上的分量, 与JSBSim中的速度不同')
            return [
                df.get_plane_state(self.planeID)['horizontal_speed'],
                df.get_plane_state(self.planeID)['linear_speed'],
                -df.get_plane_state(self.planeID)['vertical_speed'],
            ]
        elif prop == 'poseMissile':
            return [
                df.get_missile_state(self.missileID)['position'][0],
                df.get_missile_state(self.missileID)['position'][1],
                df.get_missile_state(self.missileID)['position'][2],
                df.get_missile_state(self.missileID)['Euler_angles'][0],
                df.get_missile_state(self.missileID)['Euler_angles'][1],
                df.get_missile_state(self.missileID)['Euler_angles'][2],
            ]
        else:
            raise Exception("Property {} doesn't exist!".format(prop))

    def getDistance(self):
        return ((df.get_plane_state(self.planeID)['position'][0] - df.get_missile_state(self.missileID)['position'][0]) ** 2 +\
        (df.get_plane_state(self.planeID)['position'][1] - df.get_missile_state(self.missileID)['position'][1]) ** 2 +\
        (df.get_plane_state(self.planeID)['position'][2] - df.get_missile_state(self.missileID)['position'][2]) ** 2) ** .5

    def getHP(self):
        return df.get_health(self.planeID)['health_level']

    def terminate(self):
        if not df.get_missile_state(self.missileID)['active']:
            if self.getHP() >= .9:
                return 1
            else:
                return -1
        else:
            return 0

    def setProperty(
        self,
        prop,
        value,
    ):
        if prop == 'plane':
            self.plane_slot = value
        elif prop == 'enemy':
            self.enemy_slot = value
        elif prop == 'missile':
            self.missile_slot = value
        else:
            raise Exception("Property {} doesn't exist!".format(prop))

    def sendAction(
        self,
        action,
        actionType=None,
    ):  #设定动作
        if actionType == None:
            df.set_plane_flaps(self.planeID, float(action[0]))
            df.set_plane_pitch(self.planeID, float(action[1]))
            df.set_plane_roll(self.planeID, float(action[2]))
            df.set_plane_yaw(self.planeID, float(action[3]))
        elif actionType == 'Flaps' or actionType == 'flaps':
            df.set_plane_flaps(self.planeID, action)
        elif actionType == 'Pitch' or actionType == 'pitch':
            df.set_plane_pitch(self.planeID, action)
        elif actionType == 'Roll' or actionType == 'roll':
            df.set_plane_roll(self.planeID, action)
        elif actionType == 'Yaw' or actionType == 'yaw':
            df.set_plane_yaw(self.planeID, action)

    def step(self, action):

        self.sendAction(action) #发送动作
        
        df.update_scene()  #场景更新
        self.nof += 1

        terminate = self.terminate()

        if terminate == 1:   #获取奖赏
            reward = 50
        elif terminate == -1:
            reward = -50
        else:
            reward = .1
            if self.getHP() <= .1:
                reward = -1

        terminate = True if terminate else False #判断是否终止
        
        ob = np.array([  # normalized  对状态向量进行标准化
            df.get_plane_state(self.planeID)['position'][0] / 100,
            df.get_plane_state(self.planeID)['position'][2] / 100,
            df.get_plane_state(self.planeID)['position'][1] / 50,
            df.get_plane_state(self.planeID)['heading'],
            df.get_plane_state(self.planeID)['pitch_attitude'] * 4,
            df.get_plane_state(self.planeID)['roll_attitude'] * 4,

            df.get_missile_state(self.missileID)['position'][0] / 100,
            df.get_missile_state(self.missileID)['position'][2] / 100,
            df.get_missile_state(self.missileID)['position'][1] / 50,
            df.get_missile_state(self.missileID)['Euler_angles'][0] * 100,
            df.get_missile_state(self.missileID)['Euler_angles'][1] * 100,
            df.get_missile_state(self.missileID)['Euler_angles'][2] * 100,
        ])

        return ob, reward, terminate, {}

    def render(self, id=0):
        # 关闭无渲染模式,即进行渲染
        
        df.set_renderless_mode(False)

    # 环境重新进行初始化
    def reset(
        self,
    ):


        self.__init__(
            host=self.host,
            port=self.port,
            plane_slot=self.plane_slot,
            enemy_slot=self.enemy_slot,
            missile_slot=self.missile_slot,
            rendering=self.rendering,
        )


        
        ob = np.array([  # normalized
            df.get_plane_state(self.planeID)['position'][0] / 100,
            df.get_plane_state(self.planeID)['position'][2] / 100,
            df.get_plane_state(self.planeID)['position'][1] / 50,
            df.get_plane_state(self.planeID)['heading'],
            df.get_plane_state(self.planeID)['pitch_attitude'] * 4,
            df.get_plane_state(self.planeID)['roll_attitude'] * 4,

            df.get_missile_state(self.missileID)['position'][0] / 100,
            df.get_missile_state(self.missileID)['position'][2] / 100,
            df.get_missile_state(self.missileID)['position'][1] / 50,
            df.get_missile_state(self.missileID)['Euler_angles'][0] * 100,
            df.get_missile_state(self.missileID)['Euler_angles'][1] * 100,
            df.get_missile_state(self.missileID)['Euler_angles'][2] * 100,
        ])

        return ob


思考与实践

问题1 初始状态的设置 ?飞机起飞

能不能直接设置飞机的初始状态,也就是根据初始场景

实践发现没法直接设置飞机的位置状态,只能让它先进行起飞

起飞过程也很简单,就是把推力设置为最大,然后设置pitch角,它就飞起来了

# -*- coding: utf-8 -*-

"""
@author     : zuti
@software   : PyCharm
@file       : client_control_test.py
@time       : 2022/12/11 16:47
@desc       :

"""
import dogfight_client as df
import time

# Print fps function, to check the network client frequency.
t0 = 0
t1 = 0
def print_fps():
	global t, t0, t1
	t1 = time.time()
	dt = t1 - t0
	t0 = t1
	if dt > 0:
		print(str(1 / dt))

# Enter the IP and port displayed in top-left corner of DogFight screen
df.connect("10.168.62.25", 50888)

time.sleep(2)

# Get the whole planes list in DogFight scene
# returns a list that contains planes id
planes = df.get_planes_list()
print(str(planes))

df.disable_log()

# Get the id of the plane you want to control
plane_id = planes[0]
enemyID = planes[1]


# Reset the plane at its start state
df.reset_machine(plane_id)
df.reset_machine(enemyID)

# Set plane thrust level (0 to 1)
df.set_plane_thrust(plane_id, 1)
df.set_plane_thrust(enemyID, 1)

# Set client update mode ON: the scene update must be done by client network, calling "update_scene()"
df.set_client_update_mode(True)

# Wait until plane thrust = 1
t= 0
while t < 1:
	plane_state = df.get_plane_state(plane_id)
	# Display text & vector - !!! Must be called before update_scene() !!!
	# !!! Display text & vector only works in Client Update Mode !!!
	df.display_2DText([0.25, 0.75], "Plane speed: " + str(plane_state["linear_speed"]), 0.04, [1, 0.5, 0, 1])
	df.display_vector(plane_state["position"], plane_state["move_vector"], "Linear speed: " + str(plane_state["linear_speed"]), [0, 0.02], [0, 1, 0, 1], 0.02)
	# Update frame:
	df.update_scene()
	print_fps()
	t = plane_state["thrust_level"]

#到这是 设置最大推力,没有其他操作会平飞然后落入水中


# Activate the post-combustion (increases thrust power)
df.activate_post_combustion(plane_id)
df.activate_post_combustion(enemyID)
# Set broomstick pitch level (<0 : aircraft pitch up, >0 : aircraft pitch down)
df.set_plane_pitch(plane_id, -0.5)
df.set_plane_pitch(enemyID, -0.5)

# Wait until plane pitch attitude >= 15
p = 0
while p < 15:
	# Timer to 1/20 s.
	# As Client update mode is ON, the renderer runs to 20 FPS until pitch >= 15°
	time.sleep(1/20)
	plane_state = df.get_plane_state(plane_id)
	df.display_2DText([0.25, 0.75], "Plane speed: " + str(plane_state["linear_speed"]), 0.04, [1, 0.5, 0, 1])
	df.display_vector(plane_state["position"], plane_state["move_vector"], "Linear speed: " + str(plane_state["linear_speed"]), [0, 0.02], [0, 1, 0, 1], 0.02)
	df.update_scene()
	p = plane_state["pitch_attitude"]

#设置操纵量为0
df.stabilize_plane(plane_id)
df.stabilize_plane(enemyID)

#设置起落架
df.retract_gear(plane_id)
df.retract_gear(enemyID)

s = 0
while s < 500:   #让另一架飞机飞到 1000米空中
	plane_state = df.get_plane_state(enemyID)
	df.update_scene()
	s = plane_state["altitude"]

#执行后,飞机会锁定命令,一直执行下去,所以应该根据飞机状态进行调整,而不是设置一次就完事了
df.set_plane_yaw(plane_id,1)
df.set_plane_roll(enemyID,-1)

问题2 动作控制指令的效果

飞机的控制对飞机飞机的影响 ?

发现还可以设置飞机的推力,这其实也可以作为飞机的一个控制量

  • 如果只设置 df.set_plane_pitch(plane_id, -0.5)

    有一个问题,这是直接设置舵偏吗,他能直接改变飞机姿态不

    解答:这是个动作指令,即操纵指令,并不是直接设置状态量

经过探索发现 pitch roll yaw 三个控制变量直接调整飞机的三个角速度

问题3 飞机姿态的直观理解与含义

控制命令会导致飞机姿态的改变,但是这个环境比较简单,一般不会坠机

发现在network_client_example文件夹下给出了几个示例,设置飞机飞行,可以从这边来进行展示和观看

posted @ 2022-12-11 21:48  英飞  阅读(593)  评论(0编辑  收藏  举报