[NN] Synthetic Training Data
主要是材料整理。
2018
The Ultimate Guide to Synthetic Data: Uses, Benefits & Tools
带来的好处
However, synthetic data has several benefits over real data:
- Overcoming real data usage restrictions: Real data may have usage constraints due to privacy rules or other regulations. Synthetic data can replicate all important statistical properties of real data without exposing real data, thereby eliminating the issue.
- Creating data to simulate not yet encountered conditions: Where real data does not exist, synthetic data is the only solution.
- Immunity to some common statistical problems: These can include item nonresponse, skip patterns, and other logical constraints.
- Focuses on relationships: Synthetic data aims to preserve the multivariate relationships between variables instead of specific statistics alone.
Synthetic data tools
The tools related to synthetic data are often developed to meet one of the following needs:
- Test data for software development and similar purposes
- Training data for machine learning models
We prepared a regularly updated, comprehensive sortable/filterable list of leading vendors in synthetic data generation software. Some common vendors that are working in this space include:
NAME | FOUNDED | STATUS | NUMBER OF EMPLOYEES | 评价 |
---|---|---|---|---|
BizDataX | 2005 | Private | 51-200 | |
CA Technologies Datamaker | 1976 | Public | 10,001+ | |
CVEDIA | 2016 | Private | 11-50 | |
Deep Vision Data by Kinetic Vision | 1985 | Private | 51-200 | |
Delphix Test Data Management | 2008 | Private | 501-1000 | |
Genrocket | 2012 | Private | 11-50 | |
Hazy | 2017 | Private | 11-50 | |
Informatica Test Data Management Tool | 1993 | Private | 5,001-10,000 | |
Mostly AI | 2017 | Private | 11-50 | |
Neuromation | 2016 | Private | 11-50 | |
Solix EDMS | 2002 | Private | 201-500 | |
Supervisely | 2017 | Private | 2-10 | 仅快速标注 |
TwentyBN | 2015 | Private | 11-50 | 3d模拟 |
These tools are just a small representation of a growing market of tools and platforms related to the creation and usage of synthetic data. For the full list, please refer to our comprehensive list.
Synthetic data is a way to enable the processing of sensitive data or to create data for machine learning projects. To learn more about related topics on data, be sure to see our research on data.
从整体市场来看,数据标注行业国内起步较晚,行业代表公司有市值超28亿美元的Appen、Amazon旗下的AMT、估值10亿美金的Scale AI、以及近期完成2500万美元B轮融资的Labelbox 等。
Ref: https://labelbox.com/blog/labelbox-ceo-discusses-breakthroughs-in-ai-training-data
2019
Ref: Synthetic Data for Deep Learning
结尾的引用列表,是个好东东!
类似,但不完善,未了解random bg的妙处。
End.
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 单元测试从入门到精通
· 上周热点回顾(3.3-3.9)
· winform 绘制太阳,地球,月球 运作规律