Advex AI targets data shortage with generative technology

San Francisco-based Advex AI has launched publicly at TechCrunch Disrupt 2024, aiming to address data shortages for training AI systems using synthetic imagery. Co-founded by CEO Pedro Pachuca and CTO Qasim Wani, Advex has already secured funding totalling $3.6 million and boasts seven major enterprise clients. Advex’s synthetic data platform uses a proprietary diffusion model to generate thousands of ‘fake’ images from a small sample, helping clients train machine vision systems with limited original data.

Advex’s solution is particularly valuable in sectors like manufacturing, where recognising subtle defects can be crucial but challenging with limited real data. For example, a car manufacturer needing to train a system to detect seat material flaws could upload just a few images of tears, with Advex generating thousands of variations to expand training data. Such applications span industries, from automotive to oil and gas, reducing costs and time associated with real data collection.

While synthetic data isn’t a new concept, Advex distinguishes itself through its custom diffusion model, which Pachuca says is faster and more realistic than traditional simulation methods. Unlike game-engine techniques, Advex’s model can rapidly create images tailored to the data gaps in a client’s specific AI system, helping it operate more effectively in real-world scenarios.

Can synthetic data help AI?

Artificial intelligence (AI) projects are hitting the limits of available data. Thus, there is a push for synthetic data.

Synthetic data are generated by machines. They are cheaper and less prone to legal requirements, including privacy protection.

But synthetic data raises a new set of issues: how to make sure that they relate to ‘normality’; how to avoid biases that synthetic, like real data, generates.

Source: Spectrum