【4】Humanoid Gym初学之 ---- 关于仿真Issac-GYM出现Tensor出现Nan报错的解决方案及分析过程

报错问题：出现了一个Nan

num_envs很小的情况下没问题，一旦大于50就有nan然后被强行停止函数的运行

Traceback (most recent call last):
  File "train.py", line 43, in <module>
    train(args)
  File "train.py", line 39, in train
    ppo_runner.learn(num_learning_iterations=train_cfg.runner.max_iterations, init_at_random_ep_len=True)
  File "/home/yyds/桌面/Gym5_human/humanoid-gym-main/humanoid/algo/ppo/on_policy_runner.py", line 129, in learn
    actions = self.alg.act(obs, critic_obs)
  File "/home/yyds/桌面/Gym5_human/humanoid-gym-main/humanoid/algo/ppo/ppo.py", line 93, in act
    self.transition.actions = self.actor_critic.act(obs).detach()
  File "/home/yyds/桌面/Gym5_human/humanoid-gym-main/humanoid/algo/ppo/actor_critic.py", line 133, in act
    self.update_distribution(observations)
  File "/home/yyds/桌面/Gym5_human/humanoid-gym-main/humanoid/algo/ppo/actor_critic.py", line 114, in update_distribution
    self.distribution = Normal(mean, mean*0. + self.std)
  File "/home/yyds/.local/lib/python3.8/site-packages/torch/distributions/normal.py", line 50, in __init__
    super(Normal, self).__init__(batch_shape, validate_args=validate_args)
  File "/home/yyds/.local/lib/python3.8/site-packages/torch/distributions/distribution.py", line 55, in __init__
    raise ValueError(
ValueError: Expected parameter loc (Tensor of shape (128, 10)) of distribution Normal(loc: torch.Size([128, 10]), scale: torch.Size([128, 10])) to satisfy the constraint Real(), but found invalid values:
tensor([[-0.0894, -0.0036, -0.0296,  ..., -0.0120,  0.0645,  0.0829],
        [-0.0959,  0.0003, -0.0248,  ..., -0.0023,  0.0624,  0.0635],
        [-0.1045,  0.0304, -0.0236,  ..., -0.0096,  0.0812,  0.0747],
        ...,
        [-0.0886,  0.0052,  0.0767,  ...,  0.0252, -0.0141,  0.1679],
        [-0.1043,  0.0212, -0.0588,  ...,  0.0114,  0.0969,  0.0459],
        [-0.0986,  0.0268,  0.0214,  ..., -0.0173,  0.0592,  0.0900]],
       device='cuda:0')

分析1

发现出现了这样的一个初始化，会不会和初始化有了一定程度的关系？

初步排查是这里的值存在Nan

而这个函数又被act函数引用

其又被一个内部类的函数应用

传入的参数为：
obs和critic_obs
那么我么就寻找obs这个变量的来源：

找到了这个函数：
返回类型为tensor类型的：

由于其用装饰函数修饰，必须要在子类中实现，所以我们查看它的子类在哪里集成了这个类：
这里env继承了这个类，但下面就走不通了，我们得另找其他路径来解决问题。。

找到了这个函数的定义了，在base_task文件中，返回的是obs_buf这个变量

那么我们继续查找这个变量

发现和下面的这两个变量紧密相关：

self.num_envs  环境数目
self.num_obs   观测值数目

说明它有环境数目行，观测值那么多列

整个目录又来base_task下面

被继承

被我们所继承
但我们还是不清楚obs_buf为什么会有一行，也就是有一个观测体会没有观测值？(也就是前面图片的变量，忘了前面图片找去)

我们进行了如下的查找，找到了这个obs_buf内容都是什么
我们询问copilot相关内容都是什么：

以及特权观察信息

二者分别对应了这里：

 def compute_observations(self):

        phase = self._get_phase()
        self.compute_ref_state()

        sin_pos = torch.sin(2 * torch.pi * phase).unsqueeze(1)
        cos_pos = torch.cos(2 * torch.pi * phase).unsqueeze(1)

        stance_mask = self._get_gait_phase()
        contact_mask = self.contact_forces[:, self.feet_indices, 2] > 5.

        self.command_input = torch.cat(
            (sin_pos, cos_pos, self.commands[:, :3] * self.commands_scale), dim=1)
        
        q = (self.dof_pos - self.default_dof_pos) * self.obs_scales.dof_pos
        dq = self.dof_vel * self.obs_scales.dof_vel
        
        diff = self.dof_pos - self.ref_dof_pos

        self.privileged_obs_buf = torch.cat((
            self.command_input,  # 2 + 3
            (self.dof_pos - self.default_joint_pd_target) * \
            self.obs_scales.dof_pos,  # 12
            self.dof_vel * self.obs_scales.dof_vel,  # 12
            self.actions,  # 12
            diff,  # 12
            self.base_lin_vel * self.obs_scales.lin_vel,  # 3
            self.base_ang_vel * self.obs_scales.ang_vel,  # 3
            self.base_euler_xyz * self.obs_scales.quat,  # 3
            self.rand_push_force[:, :2],  # 3
            self.rand_push_torque,  # 3
            self.env_frictions,  # 1
            self.body_mass / 30.,  # 1
            stance_mask,  # 2
            contact_mask,  # 2
        ), dim=-1)

        obs_buf = torch.cat((
            self.command_input,  # 5 = 2D(sin cos) + 3D(vel_x, vel_y, aug_vel_yaw)
            q,    # 12D
            dq,  # 12D
            self.actions,   # 12D
            self.base_ang_vel * self.obs_scales.ang_vel,  # 3
            self.base_euler_xyz * self.obs_scales.quat,  # 3
        ), dim=-1)

        if self.cfg.terrain.measure_heights:
            heights = torch.clip(self.root_states[:, 2].unsqueeze(1) - 0.5 - self.measured_heights, -1, 1.) * self.obs_scales.height_measurements
            self.privileged_obs_buf = torch.cat((self.obs_buf, heights), dim=-1)
        
        if self.add_noise:  
            obs_now = obs_buf.clone() + torch.randn_like(obs_buf) * self.noise_scale_vec * self.cfg.noise.noise_level
        else:
            obs_now = obs_buf.clone()
        self.obs_history.append(obs_now)
        self.critic_history.append(self.privileged_obs_buf)


        obs_buf_all = torch.stack([self.obs_history[i]
                                   for i in range(self.obs_history.maxlen)], dim=1)  # N,T,K

        self.obs_buf = obs_buf_all.reshape(self.num_envs, -1)  # N, T*K
        self.privileged_obs_buf = torch.cat([self.critic_history[i] for i in range(self.cfg.env.c_frame_stack)], dim=1)

也就是常见的那些观测值，那这样的话就跟我的urdf文件就有很大的关系了。