Originale
Abstract
Generation of synthetic batch data through artificial intelligence
In this paper, a flexible approach to generating synthetic data batches, comprised of multivariate time-series synthetic datasets, is presented. One of the most overlooked influential factors of modern Artificial Intelligence (AI) approaches is an ample and high-quality database. Quite often, ample, and high-quality data is only available to a limited extend. Synthetically generated data can close this gap. Unlike dummy data, in-silico created synthetic data gives unprecedented levels of realism. As per definition, dummy data is mock data generated at random as a substitute for real data in testing environments. In contrast to the simple generation of random substitute data, this effort presents the creation of synthetic data for in-silico generation of additional batches, considering the characteristics of the underlying process and measured real data. Both aspects for synthetic data generation, quality, and quantity of data, are lined out and verified. Inherent to the synthetic data is its ability to not only generate realistic synthetic batch data but also to generate batches with a controlled spread in data if required, broadening the field of potential use cases. As planned, the synthetically generated data curves differ from each other randomly within the space spanned by the real data set. Achieving high-quality synthetic datasets while providing limited real-world datasets is a strong door opener for AI-based algorithms. Synthetically generated data significantly contributes to rooting and accelerating the use of AI in pharma by working as a privacy-secure drop-in replacement for real data. Synthetic data is exempt from privacy regulations and overcomes data re-identification risks.
Korrespondenz:
DI David Hubmayr, Wankdorfstr. 10, 3014 Bern (Schweiz), E-Mail: david.hubmayr@cslbehring.com
Zusammenfassung
In diesem Beitrag wird ein flexibler Ansatz zur Erzeugung synthetischer Batch-Daten, die sich aus multivariaten Zeitreihen zusammensetzen, vorgestellt. Einer der am meisten übersehenen Einflussfaktoren in Bezug auf Künstliche Intelligenz (KI) ist eine umfassende und qualitativ hochwertige Datenbasis.