[NN] Synthetic Training Data

主要是材料整理。

 

 

2018


The Ultimate Guide to Synthetic Data: Uses, Benefits & Tools

 

带来的好处

However, synthetic data has several benefits over real data:

  • Overcoming real data usage restrictions: Real data may have usage constraints due to privacy rules or other regulations. Synthetic data can replicate all important statistical properties of real data without exposing real data, thereby eliminating the issue.
  • Creating data to simulate not yet encountered conditions: Where real data does not exist, synthetic data is the only solution.
  • Immunity to some common statistical problems: These can include item nonresponse, skip patterns, and other logical constraints.
  • Focuses on relationships: Synthetic data aims to preserve the multivariate relationships between variables instead of specific statistics alone.

 

Synthetic data tools

The tools related to synthetic data are often developed to meet one of the following needs:

  • Test data for software development and similar purposes
  • Training data for machine learning models

We prepared a regularly updated, comprehensive sortable/filterable list of leading vendors in synthetic data generation software. Some common vendors that are working in this space include:

NAMEFOUNDEDSTATUSNUMBER OF EMPLOYEES评价
BizDataX 2005 Private 51-200  
CA Technologies Datamaker 1976 Public 10,001+  
CVEDIA 2016 Private 11-50  
Deep Vision Data by Kinetic Vision 1985 Private 51-200  
Delphix Test Data Management 2008 Private 501-1000  
Genrocket 2012 Private 11-50  
Hazy 2017 Private 11-50  
Informatica Test Data Management Tool 1993 Private 5,001-10,000  
Mostly AI 2017 Private 11-50  
Neuromation 2016 Private 11-50  
Solix EDMS 2002 Private 201-500  
Supervisely 2017 Private 2-10 仅快速标注
TwentyBN 2015 Private 11-50 3d模拟

 

These tools are just a small representation of a growing market of tools and platforms related to the creation and usage of synthetic data. For the full list, please refer to our comprehensive list.

Synthetic data is a way to enable the processing of sensitive data or to create data for machine learning projects. To learn more about related topics on data, be sure to see our research on data.

 

 

从整体市场来看,数据标注行业国内起步较晚,行业代表公司有市值超28亿美元的Appen、Amazon旗下的AMT、估值10亿美金的Scale AI、以及近期完成2500万美元B轮融资的Labelbox 等。

 

Ref: https://labelbox.com/blog/labelbox-ceo-discusses-breakthroughs-in-ai-training-data

 

 

 

  

2019


Ref: Synthetic Data for Deep Learning

结尾的引用列表,是个好东东!

 

Ref: https://openaccess.thecvf.com/content_ICCV_2017_workshops/papers/w9/Li_Photo-Realistic_Simulation_of_ICCV_2017_paper.pdf

类似,但不完善,未了解random bg的妙处。

 

End.

posted @   郝壹贰叁  阅读(38)  评论(0编辑  收藏  举报
编辑推荐:
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
阅读排行:
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 单元测试从入门到精通
· 上周热点回顾(3.3-3.9)
· winform 绘制太阳,地球,月球 运作规律
点击右上角即可分享
微信分享提示