University of California, Berkeley

- 1000 images each night, 15 TB/night for 10 years
- 18,000 square degrees, observed once every few days
- Tens of billions of objects, each one observed $\sim1000$ times

Previous generation survey: SDSS

Image credit: Peter Melchior

Current generation survey: DES

Image credit: Peter Melchior

LSST precursor survey: HSC

Image credit: Peter Melchior

Rachel Mandelbaum, Siamak Ravanbakhsh, Barnabas Poczos, Peter Freeman

Ravanbakhsh,

Shape measurement biases

The **measured ellipticity $e$** is typically a biased tracer of the **underlying shear $\gamma$**
$$ < e > = \ (1 + m) \ \gamma \ + \ c $$

The GREAT3 approach

- Input galaxies from deep HST/ACS COSMOS images (25.2 imag)
- Apply a range of PSFs and noise levels sampled from the survey
- Measure response of shape measurement to a known shear and estimate $m$ and $c$

$\mathbf{x}$

$\mathbf{\gamma}$

- The root of the problem is that the likelihood $p(x | \gamma)$ is a complicated beast.
- Provides an explicit description of pixel level data, in terms of simple, tractable distributions

(Schneider et al. 2015)

In both cases, we are building a

Mandelbaum, et al. (2013), Mandelbaum, et al. (2014)

$\Longrightarrow$ We cannot measure shear without an **accurate model of galaxy morphology**

- Deep Belief Network

(Hinton et al. 2006) - Variational AutoEncoder

(Kingma & Welling 2014) - Generative Adversarial Network

(Goodfellow et al. 2014) - Wasserstein GAN

(Arjovsky et al. 2017)

CelebA

HSC PDR-2 wide

- There is
**noise** - We have a
**Point Spread Function**

Probabilistic model

Dataset of $N$ i.i.d. samples $\{x_i \}$ generated from
$$ x \sim p_{\theta}(x | z, \Sigma, \Pi ) \ p(z) $$

- $z$ is a set of latent variables
- $\Pi$ is the PSF, $\Sigma$ is the noise covariance
- Under a Gaussian noise model:

$\qquad p_{\theta}(x | z, \Sigma, \Pi)=\mathcal{N}( \Pi \ast g_\theta(z), \Sigma)$

$\Longrightarrow$ ** Decouples the morphology model from the observing conditions**.

- Training the generative amounts to finding $\theta_\star$ that
**maximizes the marginal likelihood**of the model: $$p_\theta(x | \Sigma, \Pi) = \int \mathcal{N}( \Pi \ast g_\theta(z), \Sigma) \ p(z) \ dz$$$\Longrightarrow$ This is**generally intractable** - Efficient training of parameter $\theta$ is made possible by
**Amortized Variational Inference**.

Auto-Encoding Variational Bayes (Kingma & Welling, 2014)

- We introduce a
**parametric distribution**$q_\phi(z | x, \Pi, \Sigma)$ which aims to model the posterior $p_{\theta}(z | x, \Pi, \Sigma)$. - Working out the KL divergence between these two distributions leads to:
$$\log p_\theta(x | \Sigma, \Pi) \quad \geq \quad - \mathbb{D}_{KL}\left( q_\phi(z | x, \Sigma, \Pi) \parallel p(z) \right) \quad + \quad \mathbb{E}_{z \sim q_{\phi}(. | x, \Sigma, \Pi)} \left[ \log p_\theta(x | z, \Sigma, \Pi) \right]$$
$\Longrightarrow$ This is the
**Evidence Lower-Bound**, which is differentiable with respect to $\theta$ and $\phi$.

$$\log p_\theta(x| \Sigma, \Pi ) \geq - \underbrace{\mathbb{D}_{KL}\left( q_\phi(z | x, \Sigma, \Pi) \parallel p(z) \right)}_{\mbox{code regularization}} + \underbrace{\mathbb{E}_{z \sim q_{\phi}(. | x, \Sigma, \Pi)} \left[ \log p_\theta(x | z, \Sigma, \Pi) \right]}_{\mbox{reconstruction error}} $$

- Training set:
**GalSim COSMOS HST/ACS postage stamps**

- 80,000 deblended galaxies from I < 25.2 sample
- Drawn on 128x128 stamps at 0.03 arcsec resolution
- Each stamp comes with:
- PSF
- Noise power spectrum
- Bulge+Disk parametric fit

- Auto-Encoder model:
- Deep residual autoencoder:

7 stages of 2 resnet blocs each - Dense bottleneck of size 32.
- Outputs positive, noiseless, deconvolved, galaxy surface brightness.

- Deep residual autoencoder:

Woups... what's going on?

$$\log p_\theta(x| \Sigma, \Pi ) \geq - \underbrace{\mathbb{D}_{KL}\left( q_\phi(z | x, \Sigma, \Pi) \parallel p(z) \right)}_{\mbox{code regularization}} + \underbrace{\mathbb{E}_{z \sim q_{\phi}(. | x, \Sigma, \Pi)} \left[ \log p_\theta(x | z, \Sigma, \Pi) \right]}_{\mbox{reconstruction error}} $$

$\Longrightarrow$ All we need to do is

Dinh et al. 2016

Normalizing Flows

- Assumes a
**bijective**mapping between data space $x$ and latent space $z$ with prior $p(z)$: $$ z = f_{\theta} ( x ) \qquad \mbox{and} \qquad x = f^{-1}_{\theta}(z)$$ - Admits an explicit marginal likelihood: $$ \log p_\theta(x) = \log p(z) + \log \left| \frac{\partial f_\theta}{\partial x} \right|(x) $$

- We build a latent space model $p_\varphi(z)$ using a Masked Autoregressive Flow (MAF) (Papamakarios, et al. 2017)
- While we are learning to sample from the latent space, we can also
**learn to sample conditionaly**: $$ p_\varphi(z | y) $$ - Here we learn to sample images conditioned on:
- Size: half-light radius $r$
- Brightness: I band magnitude $mag\_auto$
- Redshift: COSMOS photometric redshift $zphot$

$\Longrightarrow$ We can successfully condition galaxy generation.

- We have
**combined physical and deep learning components**to model observed noisy and PSF-convoled galaxy images.

$\Longrightarrow$ This framework can handle multi-band, multi-resolution, multi-instrument data. - We are overcoming the limitations of standard VAEs with an additional latent space model.

$\Longrightarrow$ Can produce sharp and meaningful images. - We demonstrate conditional sampling of galaxy light profiles

$\Longrightarrow$ Image simulation can be combined with larger survey simulation efforts.

- Framework for sampling from deep generative models directly from within GalSim.
- Go check out the alpha version: https://github.com/McWilliamsCenter/galsim_hub

Chirag Modi, Uroš Seljak

Modi, **Lanusse**, et al., in prep

Modi, et al. (2018)

Modi, et al. (2018)

HSC cosmic shear power spectrum

HSC Y1 constraints on $(S_8, \Omega_m)$

(Hikage,..., Lanusse, et al. 2018)

- Measure the ellipticity $\epsilon = \epsilon_i + \gamma$ of all galaxies

$\Longrightarrow$ Noisy tracer of the weak lensing shear $\gamma$ - Compute
**summary statistics**based on 2pt functions,

e.g. the**power spectrum** - Run an MCMC to recover a posterior on model parameters, using an
**analytic likelihood**$$ p(\theta | x ) \propto \underbrace{p(x | \theta)}_{\mathrm{likelihood}} \ \underbrace{p(\theta)}_{\mathrm{prior}}$$

Main limitation: the need for an explicit likelihood

We can only compute the likelihood for **simple summary statistics** and on **large scales**

$\Longrightarrow$ We are dismissing most of the information!

- Instead of trying to analytically evaluate the likelihood, let us build a forward model of the observables.
- Each component of the model is now tractable, but at the
cost of a
**large number of latent variables**.

$\Longrightarrow$ How to peform efficient inference in this large number of dimensions?

- A non-exhaustive list of methods:
- Hamiltonian Monte-Carlo
- Variational Inference
- MAP+Laplace
- Gold Mining
- Dimensionality reduction by Fisher-Information Maximization

What do they all have in common?

-> They require fast, accurate, differentiable forward simulations

-> They require fast, accurate, differentiable forward simulations

(Schneider et al. 2015)

$\longrightarrow$

N-body simulations

N-body simulations

```
import tensorflow as tf
import flowpm
# Defines integration steps
stages = np.linspace(0.1, 1.0, 10, endpoint=True)
initial_conds = flowpm.linear_field(32, # size of the cube
100, # Physical size
ipklin, # Initial powerspectrum
batch_size=16)
# Sample particles and displace them by LPT
state = flowpm.lpt_init(initial_conds, a0=0.1)
# Evolve particles down to z=0
final_state = flowpm.nbody(state, stages, 32)
# Retrieve final density field
final_field = flowpm.cic_paint(tf.zeros_like(initial_conditions),
final_state[0])
with tf.Session() as sess:
sim = sess.run(final_field)
```

- Seamless interfacing with deep learning components
- Gradients readily available

$\longrightarrow$

N-body simulations

**FlowPM**

N-body simulations

$\longrightarrow$

Group Finding

algorithms

Group Finding

algorithms

$\longrightarrow$

Semi-analytic &

distribution models

Semi-analytic &

distribution models

$\longrightarrow$

Modi et al. 2018

- Simulations of scientifically interesting sizes
**do not fit on a single GPU RAM**

e.g. $128^3$ operational, need $1024^3$ for survey volumes

$\Longrightarrow$ We need a**distributed Machine Learning Framework**

- Most common form of distribution is
**data-parallelism**$\Longrightarrow$ Reached Exascale on scientific deep learning applications - What we need is
**model-parallelism**on HPC environments

$\Longrightarrow$ We have started investigating **Mesh TensorFlow at NERSC and Google TPUs**.

- Redefines the TensorFlow API, in terms of
**abstract logical tensors**with actual**memory instantiation on multiple devices**defined by:- The specification of the mesh of computing devices
- The specification of rules for which dimensions can be splitted

(Gholami et al. 2018)

Evolution from initial conditions to z=0 distributed on 2 Nodes 16 GPUs

Our assessment so far

- Provides an easy framework to write down distributed differentiable simulations and large scale Machine Learning tasks
- The Mesh TensorFlow project is still young and limited in scope:

**$\Longrightarrow$ we need help from the Physics community to develop it for our needs!**

- We are
**combining physical and deep learning components**to model the Large-Scale Structure in a**fast and differentiable way**.

$\Longrightarrow$ This is**a necessary backbone**for large scale simulation-based inference. - We are demonstrating that large-scale simulations can be
implemented in distributed autodiff frameworks.

$\Longrightarrow$ We hope that this will one day become the norm. - Our community has unique needs and limited resources, we will all gain by working collaboratively !

- FastPM N-body simulation implemented in (Mesh) TensorFlow
- Go check it out: https://github.com/modichirag/flowpm

- Galaxy2Galaxy (github.com/ml4astro/galaxy2galaxy): Repository of models and datasets for accelerating research in ML for astro.
- Galaxy Emulation Task Force (gitter.im/ml4astro/GalaxyEmulationTaskForce): Group of people working on applications of generative models to the study of galaxies, join the conversation :-) .
- Differentiable Universe Initiative (gitter.im/DifferentiableUniverseInitiative/community): Group of people working on building a full end-to-end differentiable model of the Universe, join the conversation ;-) .

Thank you !