Realistic and Informative

Simulations

with Machine Learning

Better Simulations for Better Science

Astrophysics heavily relies on simulations, particularly gravitational N-body simulations.

Three pain points in this context are:

  1. The need for realistic simulations and the associated challenge of judging a simulation's realism objectively, rather than relying on a researcher’s subjective judgment.

  2. The absence of a streamlined, computationally inexpensive method to generate realistic initial conditions for these simulations.

  3. The current state of the art in planning large runs comprising many simulations with varying parameters depends on essentially subjective criteria for exploring parameter space, rather than employing an automated method to select informative simulation setups.

These issues lead to suboptimal results in computational astrophysics research. Specifically, they result in simulations that are less realistic and informative than possible. Among other issues, this leads to wasted energy, as a greater number of simulations are required to achieve a certain scientific output. Additionally, simulations typically run on high-performance hardware that consumes vast amounts of electricity.

My work on the RISING project as an MSCA-IF fellow directly addresses these three challenges by applying innovative machine learning tools.

The first work package, RISING::Realism, aims to develop quantitative tools that will allow simulations and observations to be compared rigorously with the objective of assessing simulation realism. The second package, RISING::Genesis, is focused on generating improved initial conditions for gravitational N-body simulations of star clusters, without relying on the intricate details of hydrodynamical simulations of star formation. The final package, RISING::Active, employs active learning to optimize experimental design for numerical simulations, ensuring that the relevant parameter space is explored efficiently and effectively.

Publications

1.     Reconstructing Robust Background IFU spectra using Machine Learning, Rhea, …, Pasquato et al. 2023

Submitted to RASTI. We combine a shallow neural network with principal component analysis (for interpretability) to interpolate the background spectra in IFU observations.

2.     Parameter Estimation for Open Clusters using an Artificial Neural Network with a QuadTree-based Feature Extractor, Cavallo, …, Pasquato, et al. 2023

Accepted by AJ; we apply the featurization method we introduced in Schiappacasse, Pasquato et al. 2023 to a set of open star clusters, using it as a basis to predict star cluster properties through machine learning.

3.     Star formation on a rubber sheet, Pasquato et al. 2023

Submitted to A&A; we introduce topological data analysis features to quantify for the first time the hierarchical substructure of star clusters. This allows us among other things to measure the realism of initial conditions for star cluster simulations.

4.     Active learning meets fractal decision boundaries: a cautionary tale from the Sitnikov three body problem, Payot, Pasquato et al. 2023

Accepted by the 2023 NeurIPS workshop on machine learning for physical sciences; we apply active learning to predict the stability of the three-body problem on a large set of numerical simulations, showing that the fractality of the decision boundary may lead active learning to underperform random sampling. I supervised Nicolas Payot during his summer internship at UdeM in 2023.

5.     Causa prima: cosmology meets causal discovery for the first time

Accepted by the 2023 NeurIPS workshop on machine learning for physical sciences; capturing the causal relations between all the relevant physical ingredients is one way simulations can become more realistic. We apply causal discovery for the first time to a dataset of super-massive black holes and their host galaxies, obtaining a causal diagram that can be compared to the causal relations implied by galaxy formation simulations.

6.     The search for the lost attractor, Pasquato et al. 2023

Accepted by the 2023 NeurIPS workshop on machine learning for physical sciences; simplified models, such as coupled systems of few ordinary differential equations, are a cheap and effective way to simulate astrophysical systems. We gauge their ability to fit more realistic numerical simulations using a range of methods, including the recently introduced Tests of Accuracy with Random Points (TARP).

7.     Variable Stars in Koopman Space, Mekahël, Pasquato, et al. 2023

Submitted to ApJ. We use dynamical mode decomposition to extract interpretable, physically meaningful features corresponding to oscillation modes from the light curves of variable stars, with an application to the Blazhko effect.

8.     Quantitatively rate galaxy simulations against real observations with anomaly detection, Jin, …, Pasquato et al. 2023

Submitted to MNRAS; we use anomaly detection based on generative adversarial networks to measure the realism of galaxy evolution simulations directly from images of real SDSS galaxies and mock images derived from NIHAO simulations. Zehao Jin is a Ph.D. student from NYU I co-supervised.

9.     Interpretable machine learning for finding intermediate-mass black holes, Pasquato et al. (2023)

Accepted by ApJ: we show how to effectively tackle a scientific problem, namely the detection of intermediate-mass black holes in star clusters, using both inherently interpretable machine learning and opaque models plus post-hoc explanations. Learning on simulations to predict real observational data we measure simulation realism via the overlap of density distributions in feature space. The best extrapolation is no extrapolation, but interpretability may mitigate the risks related to covariate shift.

10. Constructing Impactful Machine Learning Research for Astronomy: Best Practices for Researchers and Reviewers, Huppenkothen, ... Pasquato et al. (2023)

Submitted to Bulletin of the American Astronomical Society; this paper is the outcome of a discussion started in 2019 at Ringberg’s castle at the ML@RINGBERG 2019 conference organized by the Max Planck Institute for Astronomy. We outline a set of best practices for applying machine learning techniques to Astronomy.

11. Quadtree features for machine learning on CMDs, Schiappacasse-Ulloa, Pasquato et al. (2023)

Accepted by the ICML 2023 workshop LatinX in AI. The first author, José Schiappacasse-Ulloa from Chile, was my Ph.D. student in Padua. In this contribution we introduce a new method to featurize color-magnitude diagrams of star clusters. This will form the basis for downstream comparisons between synthetic color-magnitude diagrams and real observations, and was applied by Cavallo, …, Pasquato et al. 2023.

12. Sparse Logistic Regression for RR Lyrae versus Binaries Classification, Trevisan, Pasquato et al. (2023)

Published in ApJ. Piero Trevisan is a Ph.D. student at Rome University La Sapienza I supervised. This paper represents a first application of sparsification with the goal to promote intepretability in the context of variable star classification. Variable star physics, together with star clusters and gravitational N-body simulations, is also a context where assessing the realism of simulations vis-à-vis observational data is crucial.

13. Dynamics of intermediate mass black holes in globular clusters. Wander radius and anisotropy profiles, Di Cintio, Pasquato et al. (2023)

Published by A&A. We test-drive a new simulation method for gravitational N-body simulations of star clusters, relying on a multi-particle collision (MPC) algorithm. Direct N-body methods cannot fully access the relevant parameter space due to computational constraints. Our new method can, and the results underscore the need for a quantitative assessment of simulation realism, suggesting that direct N-body is not automatically more realistic than approximated methods.

14. VizieR Online Data Catalog: Hierarchical clustering in Vela OB2 complex (Pang+, 2021), Pang, ... Pasquato et al. (2023)

Stellar catalog of the Vela OB2 complex region, from the companion paper. This dataset may be used to set up realistic initial conditions for gravitational N-body simulations.

15. Stellar Clusters in 4MOST, Lucatello, ... Pasquato et al. (2023)

Published by The Messenger (ESO’s journal for science and technology), describes upcoming observations of star clusters by the 4MOST Stellar Cluster Survey.

16. Dynamics of binary black holes in young star clusters: the impact of cluster mass and long-term evolution, Torniamenti, ..., Pasquato (2022)

Published by MNRAS. We use direct N-body simulations to study the formation of binary black holes in star clusters. Binary black holes are a source of gravitational radiation through mergers. These are the kind of simulations that may benefit from numerical experiment design via active learning.

17. The ULTraS project: Understanding the X-ray variable and transient sky, De Luca, ... Pasquato et al. (2022)

Published by MEMSAIT. Presents the ULTraS project, relying on machine learning to to exploit the scientific results and products from the EU/FP7-funded project EXTraS (a systematic characterisation of time variability in all XMM-Newton sources).

18. VizieR Online Data Catalog: Finding black holes with black boxes (Askar+, 2019), Askar, Pasquato et al. (2022)

Catalog of star clusters that may host intermediate-mass black holes as predicted by a variety of machine learning methods based on observable features. Associated to the companion paper, Askar et al. 2019.

19. Dynamical Origin for the Collinder 132-Gulliver 21 Stream: A Mixture of Three Comoving Populations with an Age Difference of 250 Myr, Pang, ... Pasquato et al. (2022)

Published by ApJ. We apply a Bayesian parallax inversion approach I devised to Gaia data to obtain the three-dimensional distribution of stars in the field of Collinder 132 – Gulliver 21.

20. NIHAO - XXVIII. Collateral effects of AGN on dark matter concentration and stellar kinematics, Waterval, ... Pasquato et al. (2022)

Published by MNRAS. Active galactic nucleus feedback is a crucial ingredient in galaxy formation simulations, without which they cannot be considered realistic. This work is related to Jin, …, Pasquato et al. 2023 on measuring simulation realism with generative adversarial networks, and is part of the series of papers on NIHAO simulations.

21. 3D Morphology of Open Clusters in the Solar Neighborhood with Gaia EDR 3. II. Hierarchical Star Formation Revealed by Spatial and Kinematic Substructures, Pang, ... Pasquato et al. (2022)

Published by ApJ. We apply a Bayesian parallax inversion approach I devised to Gaia data for a sample of star clusters in the Solar neighborhood, yielding a reconstruction of the three-dimensional positions of the constituent stars. This allows for an appreciation of the fractal substructure of young star clusters, which may have important dynamical consequences and should not be ignored when setting up initial conditions for gravitational N-body simulations.

22. Sparse Identification of Variable Star Dynamics, Pasquato et al. (2022)

Published by ApJ. We use Sparse Identification of Nonlinear Dynamics (SINDy) to learn a simple ordinary differential equation from variable star light curve data.

23. Exploring X-ray variability with unsupervised machine learning. I. Self-organizing maps applied to XMM-Newton data, Kovačević, Pasquato et al. (2022)

Published by A&A. Summarizing a dataset to facilitate understanding is one of the avenues to interpretability in machine learning. We applied self-organizing maps to simultaneously perform clustering and dimensionality reduction of a dataset comprising ~200k X-ray sources.

24. Introducing a new multi-particle collision method for the evolution of dense stellar systems. II. Core collapse, Di Cintio, Pasquato et al. (2022)

Published by A&A. We present a new simulation method for gravitational N-body simulations of star clusters, relying on a multi-particle collision (MPC) algorithm. Comparison with more established dircet N-body methods will represent a use case for quantitative assessment of simulation realism.

25. Hierarchical generative models for star clusters from hydrodynamical simulations, Torniamenti, Pasquato et al. (2022)

Published by MNRAS. We introduce a new method based on hierarchical clustering to generate new realizations of positions, velocities, and masses of stars from a given set of star cluster initial conditions deriving from hydrodynamical simulations.

 

 

Conferences, talks, and seminars

  1. NeurIPS workshop on machine learning for physical sciences 2023, New Orleans, Dec. 15 - three accepted abstracts, two as first author

    • Active learning meets fractal decision boundaries: a cautionary tale from the Sitnikov three body problem, Payot, Pasquato et al. 2023

    • Causa prima: cosmology meets causal discovery for the first time, Pasquato et al. 2023

    • The search for the lost attractor, Pasquato et al. 2023

  2. ICML workshop LatinX in AI 2023, Hawaii, Jul. 24 - one accepted abstract

  3. Cosmic Connections: A ML X Astrophysics Symposium at Simons Foundation

    • Poster presentation: Quantifying simulation realism: comparing NIHAO simulations to SDSS observations via GANomaly

  4. 90th ACFAS meeting 2023, Montreal, May 8-12

    • Oral contribution: Interprétabilité des modèles d'apprentissage automatique en astrophysique : les trous noirs dans les amas globulaires.

  5. Seminar at Brera Astronomical Observatory, Milan, Italy, Apr. 13, 2022

    • Realistic initial conditions for star clusters with generative models

  6. Seminar at Montreal University, Feb. 24, 2022

    • Sparse identification of variable star dynamics

Mentorship

During the period covered by my fellowship I co-supervised four Ph.D. students:

Work carried out with Zehao contributed substantially to the RISING::Realism work package, with one paper currently under review by ApJ focusing on the comparison of NIHAO galaxy simulations with SDSS images by means of a GANomaly deep learning model.

Stefano built on an approach I devised for obtaining new realizations of hierarchically substructured initial conditions of star clusters from hydrodynamical simulations, an important part of the work package RISING::Genesis.

I also co-supervised four Master’s students:

  • George Pantelimon Prodan (Padua University);

  • Claudia Bielecki (Montreal University);

  • Samuele Colombo (Milan University); and

  • Gaia Carenini (IUSS Pavia).

The master’s thesis of George focused on resampling initial conditions for gravitational N-body simulations of star clusters from a Gaussian process learned on the output of hydrodynamical simulations, related again to RISING::Genesis.

Finally, I supervised two summer interns at Montreal University, Nicolas Mekhaël in 2022, and Nicolas Payot in 2023. The work of Nicolas Payot on the application of active learning to the three-body problem was accepted by the NeurIPS workshop on machine learning for physical sciences. This is relevant to the RISING::Active work package, whose ultimate goal is to optmize sets of N-body simulations through active learning.

Outreach & Community

Both in 2022 and in 2023 I participated in organizing the undergraduate hackaton Astromatic at Montreal University’s CIELA institute. This is an intensive one-week event where teams of undergrads engage in friendly competition, solving actual astrophysics problems through machine learning. In the first edition of the even I was a judge for the hackaton’s projects. For the second edition I gave a lecture on graph neural networks.

As a byproduct of my work on using generative adversarial networks (GANs) to measure simulation realism from the comparison of real images and mock images, I launched a website featuring early generative AI art: 10nebulae.art. This is a reflection on the genre of “artist’s impression” in the age of deep learning.

As a novel way to engage with online outreach I joined the prediction market website Manifold, who hosts a rapidly growing community largely concerned with the impact of AI on science and society at large. My contributions reached about 500 users, who engaged with questions such as Will interpretability be commonplace in physics papers relying on machine learning by the end of 2025? or Conditional on a major breakthrough happening in physics thanks to AI, will it be due to deep learning?”.

I also became a gardener for the innovative online journal Seeds of Science. Seeds of Science (ISSN: 2768-1254) is an open access journal dedicated to nurturing promising ideas and helping them blossom into scientific innovation. Peer review is conducted through voting and commenting by a diverse community of reviewers, or “gardeners”.

Finally, I engaged in direct outreach to the public through my personal blog on Substack, where I cover topics ranging from astronomy, to the application of machine learning to science, to fairness, accountability and transparency in AI.

Responsible AI for science

Machine learning methods are undoubtedly powerful. With great power come great responsibilities. For what pertains the impacts of AI on society at large, concerns can be broadly divided into medium/long term and short term. The burgeoning literature on short term risks and drawbacks points to the possibility that machine learning models perpetuate or even exacerbate societal bias and discrimination, while the long term risks comprise looming technological unemployment and the potential catastrophic misalignment of autonomous agents. I engaged with these topics on my personal blog, e.g. by carrying out one of the first systematic assessments of large language model proficiency in African languages.

In addition to these concerns, I believe it is crucial to address the fact that widespread application of machine learning to science, spearheaded by the torrential influx of new data we are witnessing, may in the medium term transform the very way we do science, for better or for worse.

Will we transition away from our current practices towards an increased reliance on black box algorithms? Is science about just understanding or, specifically, about human understanding? To what degree is artificial intelligence compatible with the scientific method as we know it?

I maintain that an important part of the answer to these questions lies in interpretability. My recent paper, Interpretable Machine Learning for Finding Intermediate-Mass Black Holes, sets the example for what in my opinion is the correct approach to the application of machine learning to astronomy. This involves the use of inherently interpretable machine learning methods where possible, or otherwise at least the application of post-hoc explanations. We laid out guidelines to this effect in Constructing Impactful Machine Learning Research for Astronomy: Best Practices for Researchers and Reviewers.

Equity, diversity and inclusion

In my professional journey, I am deeply committed to fostering a research environment where collaboration and mentorship are underpinned by a profound respect for the individual. My approach is characterized by an unwavering dedication to empathy and respect—values I consider non-negotiable in both personal interactions and professional endeavors.

My experiences as a parent to a child with disabilities have endowed me with a profound appreciation for the richly varied experiences of neurodivergent individuals. This intimate perspective has not only attuned me to the challenges they may encounter but has also illuminated the remarkable triumphs that can be achieved. Through this personal lens, I approach my professional responsibilities with an enhanced commitment to inclusivity, adaptability, and the celebration of diverse talents and perspectives.

By integrating these principles into my daily practice, I strive to lead by example and inspire those around me to embrace a culture where every challenge is met with kindness and every achievement is celebrated as a testament to our shared humanity.