[NN] Synthetic Training Data

主要是材料整理。

2018

The Ultimate Guide to Synthetic Data: Uses, Benefits & Tools

带来的好处

However, synthetic data has several benefits over real data:

Overcoming real data usage restrictions: Real data may have usage constraints due to privacy rules or other regulations. Synthetic data can replicate all important statistical properties of real data without exposing real data, thereby eliminating the issue.
Creating data to simulate not yet encountered conditions: Where real data does not exist, synthetic data is the only solution.
Immunity to some common statistical problems: These can include item nonresponse, skip patterns, and other logical constraints.
Focuses on relationships: Synthetic data aims to preserve the multivariate relationships between variables instead of specific statistics alone.

Synthetic data tools

The tools related to synthetic data are often developed to meet one of the following needs:

Test data for software development and similar purposes
Training data for machine learning models

We prepared a regularly updated, comprehensive sortable/filterable list of leading vendors in synthetic data generation software. Some common vendors that are working in this space include:

NAME	FOUNDED	STATUS	NUMBER OF EMPLOYEES	评价
BizDataX	2005	Private	51-200
CA Technologies Datamaker	1976	Public	10,001+
CVEDIA	2016	Private	11-50
Deep Vision Data by Kinetic Vision	1985	Private	51-200
Delphix Test Data Management	2008	Private	501-1000
Genrocket	2012	Private	11-50
Hazy	2017	Private	11-50
Informatica Test Data Management Tool	1993	Private	5,001-10,000
Mostly AI	2017	Private	11-50
Neuromation	2016	Private	11-50
Solix EDMS	2002	Private	201-500
Supervisely	2017	Private	2-10	仅快速标注
TwentyBN	2015	Private	11-50	3d模拟

These tools are just a small representation of a growing market of tools and platforms related to the creation and usage of synthetic data. For the full list, please refer to our comprehensive list.

Synthetic data is a way to enable the processing of sensitive data or to create data for machine learning projects. To learn more about related topics on data, be sure to see our research on data.