A pioneering collaboration has uncovered two massive scientific datasets that could help AI systems think across disciplines, from exploding stars to blood flow patterns. This represents an important step toward machines that can establish unexpected connections between seemingly unrelated fields.
Estimated reading time: 6 minutes
What if artificial intelligence could think like Renaissance scientists and derive insights across astronomy, biology, physics, and more? The Polymathic AI project supports interdisciplinary scientific understanding with AI systems We have made significant progress toward this goal by releasing 115 terabytes of diverse scientific data, more than twice the size of the training data behind GPT-3, that has been specially curated for the purpose.
“These groundbreaking datasets are the most diverse and large-scale collection of high-quality data for machine learning training ever collected for these fields,” says the study at the Flatiron Institute in New York City. Engineer Michael McCabe explains. “Curating these datasets is an important step in creating interdisciplinary AI models that will enable new discoveries about our universe.”
The initiative’s name derives from the concept of a polymath, a rare person with expertise across multiple fields. But rather than relying on a single brilliant mind, this project aims to encode interdisciplinary thinking into the AI system itself. Datasets include everything from portraits of galaxies from the James Webb Space Telescope to simulations of biological systems and fluid dynamics.
“Machine learning has been around in astrophysics for about a decade, but it’s still very difficult to use across instruments, between missions, and across scientific disciplines,” said Francois Lanus, Polymathic AI research scientist. says Mr. “Datasets like Multimodal Universe allow us to natively understand all this data and build models that can be used as the Swiss Army Knife of astrophysics.”
The data is divided into two main collections. Multimodal Universe provides 100 terabytes of astronomical observations and measurements. The Well collection offers 15 terabytes of numerical simulations that model complex processes such as supernova explosions and embryonic development through partial differential equations, a mathematical description that recurs across scientific disciplines.
“Freely available datasets are an unprecedented resource for developing advanced machine learning models that can tackle a wide range of scientific problems,” said Ruben Ohana, a research fellow at the Flatiron Institute’s Center for Computational Mathematics. states. “The machine learning community has always been open source, which is why it’s so fast-paced compared to other fields.”
Glossary
- Erudite AI
- Artificial intelligence systems designed to work across multiple scientific disciplines, similar to human polymaths with expertise in many fields
- machine learning
- A type of artificial intelligence that automatically improves through experience and data analysis
- partial differential equation
- A mathematical formula that describes many physical phenomena and appears repeatedly across various scientific fields
TEST YOUR KNOWLEDGE
How large is the new dataset compared to GPT-3’s training data?
The new dataset totals 115 terabytes, more than double the 45 terabytes of training data for GPT-3.
What are the two main collections of datasets released?
Multimodal Universe (100 TB of astronomical data) and wells (15 TB of numerical simulations).
How do partial differential equations connect seemingly disparate scientific phenomena?
These equations appear in a variety of processes, from quantum mechanics to embryonic development, and provide mathematical descriptions that bridge different scientific disciplines.
What fundamental change in AI development does this project represent compared to traditional scientific AI tools?
While traditional AI tools are purpose-built for specific applications, this project aims to develop a truly multidisciplinary model that can work across disciplines and discover unexpected connections between disciplines. Masu.
Enjoy this story? Subscribe to our newsletter scienceblog.substack.com.