论文阅读笔记A Latent Transformer for Disentangled Face Editing in Images and Videos

论文题目:应用于图像和视频解纠缠面部编辑的潜在转换器

一、introduction and related work(记了一些关键语句)

(1)研究表明,在生成模型的潜在空间中,沿特定方向移动潜在代码可以导致相应生成图像中视觉属性的不变性。 

(2)Firstly, successful manipulations can only be achieved in well disentangled and linearized latent spaces

(3)用线性变换对人脸属性进行操作是非常有局限性的。

(4)the state-of-the-art image generator to project real image to latent space:stylegan

(5)The transformation network generates disentangled,identity-preserving and controllable attribute editing resultson real images

(6)有关disentangled representations相关的工作

  • One  optimization-based  method,  Im-age2StyleGAN++ , carried out local editing along with global semantic edits on images by applying masked interpolation on the activation features of StyleGAN(?这是什么)
  • Collinsetal. performed a k-means clustering on the activations of StyleGAN and detected a disentanglement of semantic objects,  which enables further local semantic editing on the generated image
  • For high level semantic edits, Ganalyze[13] learned a manifold in the latent space of BigGAN [5] togenerate images of different memorability. 
  • InterFaceGAN[35] proposed to learn a hyper-plane for a binary classifi-cation in the latent space, which one can use to manipulatethe target facial attribute by simple interpolation.  Follow-ing their work,  StyleSpace [42] carried out a quantitativestudy on the latent spaces of StyleGAN [21] and realized ahighly localized and disentangled control of the visual attributes.
  •  StyleFlow [3] achieved conditional exploration ofthe latent space by training conditional normalizing flows.
  • 还有很多,具体看论文related work部分

 

二、contributions

We propose a latent transformation network for facial attribute editing, achieving disentangled and controllable manipulations on real images with good identity preservation. 

Our method can carry out efficient sequential attribute editing on real images. 

We introduce a pipeline to generalize the face editing to videos and generate realistic and stable manipulations on high resolution videos.

 

三、method

1、we propose a framework to edit faces inreal images and videos via the latent space of StyleGAN.

2、假设总共有n个属性a,对于每个不同的attributes训练不同的transformer

3、为了从latent code中predict attributes,用了一个latent classifier C,C是pre-trained

Latent Classifier:To predict attributes on the manipu-lated latent codes, we train an attribute classifierC on the“latent code - label” pairs. 

The classifier consists of three fully connected layers with ReLU activations in between.C is fixed during the training of the latent transformer.

面部属性分类器引用于:(Harness-ing synthesized abstraction images to improve facial attributerecognition)

4.Given a latent code w∈ W+,the latent transformer T generates the direction for a single attribute modification, where the amount of changes is controlled by a scaling factor α. The network is expressed with a single layer of linear transformation

 5.loss function

 

 

 

 

 

 

 

 

 四、evaluation metrics

1、quantitative 

We compare our method quantitatively with GANSpace and  InterFaceGAN  using  three  metrics: 

(1) target  attribute change rate

(2)attribute preservation rate

(3)identity preser-vation score

2、qualitative

 

posted @   Tomorrow1126  阅读(427)  评论(0编辑  收藏  举报
编辑推荐:
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
阅读排行:
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· SQL Server 2025 AI相关能力初探
· 单线程的Redis速度为什么快?
· AI编程工具终极对决:字节Trae VS Cursor,谁才是开发者新宠?
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
历史上的今天:
2020-11-09 java编译时多态、运行时多态
2020-11-09 java对象上转型总结
点击右上角即可分享
微信分享提示