Hybrid Physical-Deep Learning Models

From Galaxy Morphology to Large Scale Structure

François Lanusse

University of California, Berkeley

the Large Synoptic Survey Telescope

1000 images each night, 15 TB/night for 10 years

18,000 square degrees, observed once every few days

Tens of billions of objects, each one observed $\sim1000$ times

Previous generation survey: SDSS

Image credit: Peter Melchior

Current generation survey: DES

Image credit: Peter Melchior

LSST precursor survey: HSC

Image credit: Peter Melchior

Generative Models for Galaxy Image Simulation

Work in collaboration with
Rachel Mandelbaum, Siamak Ravanbakhsh, Barnabas Poczos, Peter Freeman

Lanusse et al., in prep
Ravanbakhsh, Lanusse, et al. (2017)

The weak lensing shape measurement problem

Shape measurement biases

The measured ellipticity $e$ is typically a biased tracer of the underlying shear $\gamma$ $$ < e > = \ (1 + m) \ \gamma \ + \ c $$

Simulation and calibration strategy

The GREAT3 approach

Input galaxies from deep HST/ACS COSMOS images (25.2 imag)

Apply a range of PSFs and noise levels sampled from the survey

Measure response of shape measurement to a known shear and estimate $m$ and $c$

The Bayesian hierarchical modeling strategy

$\mathbf{x}$

$\mathbf{\gamma}$

The root of the problem is that the likelihood $p(x | \gamma)$ is a complicated beast.

Provides an explicit description of pixel level data, in terms of simple, tractable distributions

(Schneider et al. 2015)

Impact of galaxy morphology

In both cases, we are building a forward model of the data, how accurate does this model need to be?

Mandelbaum, et al. (2013), Mandelbaum, et al. (2014)

$\Longrightarrow$ We cannot measure shear without an accurate model of galaxy morphology

Can we learn a model for galaxy morphologies from the data itself?

The evolution of generative models

Deep Belief Network
(Hinton et al. 2006)

Variational AutoEncoder
(Kingma & Welling 2014)

Generative Adversarial Network
(Goodfellow et al. 2014)

Wasserstein GAN
(Arjovsky et al. 2017)

Complications specific to astronomical images: spot the differences!

CelebA

HSC PDR-2 wide

There is noise
We have a Point Spread Function

Combining deep generative and physical models

Probabilistic model

Dataset of $N$ i.i.d. samples $\{x_i \}$ generated from $$ x \sim p_{\theta}(x | z, \Sigma, \Pi ) \ p(z) $$

$z$ is a set of latent variables
$\Pi$ is the PSF, $\Sigma$ is the noise covariance
Under a Gaussian noise model:
$\qquad p_{\theta}(x | z, \Sigma, \Pi)=\mathcal{N}( \Pi \ast g_\theta(z), \Sigma)$

$\Longrightarrow$ Decouples the morphology model from the observing conditions.

How to train your dragon model

Training the generative amounts to finding $\theta_\star$ that maximizes the marginal likelihood of the model: $$p_\theta(x | \Sigma, \Pi) = \int \mathcal{N}( \Pi \ast g_\theta(z), \Sigma) \ p(z) \ dz$$
$\Longrightarrow$ This is generally intractable

Efficient training of parameter $\theta$ is made possible by Amortized Variational Inference.

Auto-Encoding Variational Bayes (Kingma & Welling, 2014)

We introduce a parametric distribution $q_\phi(z | x, \Pi, \Sigma)$ which aims to model the posterior $p_{\theta}(z | x, \Pi, \Sigma)$.

Working out the KL divergence between these two distributions leads to: $$\log p_\theta(x | \Sigma, \Pi) \quad \geq \quad - \mathbb{D}_{KL}\left( q_\phi(z | x, \Sigma, \Pi) \parallel p(z) \right) \quad + \quad \mathbb{E}_{z \sim q_{\phi}(. | x, \Sigma, \Pi)} \left[ \log p_\theta(x | z, \Sigma, \Pi) \right]$$ $\Longrightarrow$ This is the Evidence Lower-Bound, which is differentiable with respect to $\theta$ and $\phi$.

The famous Variational Auto-Encoder

$$\log p_\theta(x| \Sigma, \Pi ) \geq - \underbrace{\mathbb{D}_{KL}\left( q_\phi(z | x, \Sigma, \Pi) \parallel p(z) \right)}_{\mbox{code regularization}} + \underbrace{\mathbb{E}_{z \sim q_{\phi}(. | x, \Sigma, \Pi)} \left[ \log p_\theta(x | z, \Sigma, \Pi) \right]}_{\mbox{reconstruction error}} $$

Illustration on HST/ACS COSMOS images

Fitting observations with VAE and Bulge+Disk parametric model.

Training set: GalSim COSMOS HST/ACS postage stamps
- 80,000 deblended galaxies from I < 25.2 sample
- Drawn on 128x128 stamps at 0.03 arcsec resolution
- Each stamp comes with:
  - PSF
  - Noise power spectrum
  - Bulge+Disk parametric fit

Auto-Encoder model:
- Deep residual autoencoder:
  7 stages of 2 resnet blocs each
- Dense bottleneck of size 32.
- Outputs positive, noiseless, deconvolved, galaxy surface brightness.

Sampling from the model

Woups... what's going on?

Tradeoff between code regularization and image quality

Latent space modeling with Normalizing Flows

$\Longrightarrow$ All we need to do is sample from the aggregate posterior of the data instead of sampling from the prior.

Dinh et al. 2016

Normalizing Flows

Assumes a bijective mapping between data space $x$ and latent space $z$ with prior $p(z)$: $$ z = f_{\theta} ( x ) \qquad \mbox{and} \qquad x = f^{-1}_{\theta}(z)$$
Admits an explicit marginal likelihood: $$ \log p_\theta(x) = \log p(z) + \log \left| \frac{\partial f_\theta}{\partial x} \right|(x) $$

Conditional sampling in VAE latent space

We build a latent space model $p_\varphi(z)$ using a Masked Autoregressive Flow (MAF) (Papamakarios, et al. 2017)

While we are learning to sample from the latent space, we can also learn to sample conditionaly: $$ p_\varphi(z | y) $$

Here we learn to sample images conditioned on:
- Size: half-light radius $r$
- Brightness: I band magnitude $mag\_auto$
- Redshift: COSMOS photometric redshift $zphot$

Flow-VAE samples

Testing conditional sampling

$\Longrightarrow$ We can successfully condition galaxy generation.

Testing galaxy morphologies

Takeaway message

We have combined physical and deep learning components to model observed noisy and PSF-convoled galaxy images.
$\Longrightarrow$ This framework can handle multi-band, multi-resolution, multi-instrument data.

We are overcoming the limitations of standard VAEs with an additional latent space model.
$\Longrightarrow$ Can produce sharp and meaningful images.

We demonstrate conditional sampling of galaxy light profiles
$\Longrightarrow$ Image simulation can be combined with larger survey simulation efforts.

GalSim Hub

Framework for sampling from deep generative models directly from within GalSim.

Go check out the alpha version: https://github.com/McWilliamsCenter/galsim_hub

Differentiable models of the Large-Scale Structure

Work in collaboration with
Chirag Modi, Uroš Seljak

Modi, Lanusse, et al., in prep
Modi, et al. (2018)

traditional cosmological inference

HSC cosmic shear power spectrum

HSC Y1 constraints on $(S_8, \Omega_m)$

(Hikage,..., Lanusse, et al. 2018)

Measure the ellipticity $\epsilon = \epsilon_i + \gamma$ of all galaxies
$\Longrightarrow$ Noisy tracer of the weak lensing shear $\gamma$

Compute summary statistics based on 2pt functions,
e.g. the power spectrum

Run an MCMC to recover a posterior on model parameters, using an analytic likelihood $$ p(\theta | x ) \propto \underbrace{p(x | \theta)}_{\mathrm{likelihood}} \ \underbrace{p(\theta)}_{\mathrm{prior}}$$

Main limitation: the need for an explicit likelihood

We can only compute the likelihood for simple summary statistics and on large scales

$\Longrightarrow$ We are dismissing most of the information!

A different road: forward modeling

Instead of trying to analytically evaluate the likelihood, let us build a forward model of the observables.

Each component of the model is now tractable, but at the cost of a large number of latent variables.

$\Longrightarrow$ How to peform efficient inference in this large number of dimensions?

Hamiltonian Monte-Carlo
Variational Inference
MAP+Laplace
Gold Mining
Dimensionality reduction by Fisher-Information Maximization

What do they all have in common?
-> They require fast, accurate, differentiable forward simulations

(Schneider et al. 2015)

How do we simulate the Universe in a fast and differentiable way?

Forward Models in Cosmology

Linear Field

Final Dark Matter

$\longrightarrow$
N-body simulations

introducing FlowPM: Particle-Mesh Simulations in TensorFlow

https://github.com/modichirag/flowpm


                  import tensorflow as tf
                  import flowpm
                  # Defines integration steps
                  stages = np.linspace(0.1, 1.0, 10, endpoint=True)

                  initial_conds = flowpm.linear_field(32,        # size of the cube
                                                      100,       # Physical size
                                                      ipklin,    # Initial powerspectrum
                                                      batch_size=16)

                  # Sample particles and displace them by LPT
                  state = flowpm.lpt_init(initial_conds, a0=0.1)

                  # Evolve particles down to z=0
                  final_state = flowpm.nbody(state, stages, 32)

                  # Retrieve final density field
                  final_field = flowpm.cic_paint(tf.zeros_like(initial_conditions),
                                                 final_state[0])

                  with tf.Session() as sess:
                      sim = sess.run(final_field)

Seamless interfacing with deep learning components
Gradients readily available

Forward Models in Cosmology

Linear Field

Final Dark Matter

Dark Matter Halos

Galaxies

$\longrightarrow$
N-body simulations
FlowPM

$\longrightarrow$
Group Finding
algorithms

$\longrightarrow$
Semi-analytic &
distribution models

Example of Extending Dark Matter Simulations with Deep Learning

$\longrightarrow$

Modi et al. 2018

The practical challenge for inference at scale

Simulations of scientifically interesting sizes do not fit on a single GPU RAM
e.g. $128^3$ operational, need $1024^3$ for survey volumes
$\Longrightarrow$ We need a distributed Machine Learning Framework

Most common form of distribution is data-parallelism $\Longrightarrow$ Reached Exascale on scientific deep learning applications

What we need is model-parallelism on HPC environments

$\Longrightarrow$ We have started investigating Mesh TensorFlow at NERSC and Google TPUs.

Mesh TensorFlow in a few words

Redefines the TensorFlow API, in terms of abstract logical tensors with actual memory instantiation on multiple devices defined by:
- The specification of the mesh of computing devices
- The specification of rules for which dimensions can be splitted

(Gholami et al. 2018)

Proof of concept with Mesh FlowPM and why should you care :-)

Evolution from initial conditions to z=0 distributed on 2 Nodes 16 GPUs

Our assessment so far

Provides an easy framework to write down distributed differentiable simulations and large scale Machine Learning tasks
The Mesh TensorFlow project is still young and limited in scope:
$\Longrightarrow$ we need help from the Physics community to develop it for our needs!

Takeaway message

We are combining physical and deep learning components to model the Large-Scale Structure in a fast and differentiable way.
$\Longrightarrow$ This is a necessary backbone for large scale simulation-based inference.

We are demonstrating that large-scale simulations can be implemented in distributed autodiff frameworks.
$\Longrightarrow$ We hope that this will one day become the norm.

Our community has unique needs and limited resources, we will all gain by working collaboratively !

FlowPM

FastPM N-body simulation implemented in (Mesh) TensorFlow

Go check it out: https://github.com/modichirag/flowpm

Final words

Galaxy2Galaxy (github.com/ml4astro/galaxy2galaxy): Repository of models and datasets for accelerating research in ML for astro.

Galaxy Emulation Task Force (gitter.im/ml4astro/GalaxyEmulationTaskForce): Group of people working on applications of generative models to the study of galaxies, join the conversation :-) .

Differentiable Universe Initiative (gitter.im/DifferentiableUniverseInitiative/community): Group of people working on building a full end-to-end differentiable model of the Universe, join the conversation ;-) .

Thank you !

Hybrid Physical-Deep Learning Models

From Galaxy Morphology to Large Scale Structure

François Lanusse

the Large Synoptic Survey Telescope

Generative Models for Galaxy Image Simulation

Work in collaboration with Rachel Mandelbaum, Siamak Ravanbakhsh, Barnabas Poczos, Peter Freeman

The weak lensing shape measurement problem

Simulation and calibration strategy

The Bayesian hierarchical modeling strategy

Impact of galaxy morphology

Can we learn a model for galaxy morphologies from the data itself?

The evolution of generative models

Complications specific to astronomical images: spot the differences!

Combining deep generative and physical models

How to train your dragon model

The famous Variational Auto-Encoder

Illustration on HST/ACS COSMOS images

Sampling from the model

Tradeoff between code regularization and image quality

Latent space modeling with Normalizing Flows

Conditional sampling in VAE latent space

Flow-VAE samples

Testing conditional sampling

Testing galaxy morphologies

Takeaway message

Differentiable models of the Large-Scale Structure

Work in collaboration with Chirag Modi, Uroš Seljak

traditional cosmological inference

A different road: forward modeling

How do we simulate the Universe in a fast and differentiable way?

Forward Models in Cosmology

introducing FlowPM: Particle-Mesh Simulations in TensorFlow

Forward Models in Cosmology

Example of Extending Dark Matter Simulations with Deep Learning

The practical challenge for inference at scale

Mesh TensorFlow in a few words

Proof of concept with Mesh FlowPM and why should you care :-)

Takeaway message

Final words

Work in collaboration with
Rachel Mandelbaum, Siamak Ravanbakhsh, Barnabas Poczos, Peter Freeman

Work in collaboration with
Chirag Modi, Uroš Seljak