# Proceedings

# Table of Contents

- Session 1: Overview
- Session 2: Simulations and devices for AO

- Session 3: FPGA-based RTC designs
- Session 4: CPU-based RTC designs
- Session 5: GPU-based RTC designs
- Session 6: Algorithms
- Session 7: Real World Systems

The complete archive with all the presentations is available here.

# Opening

The Director of Programs, dr Adrian Russell, welcomes the participants to the Real Time Control for Adaptive Optics Workshop and opens the workshop.

# Session 1: Overview

## Adaptive Optics Real-time Control in the ELT Era (N. Dipper)

The next generation of large telescopes will depend critically on Adaptive Optics. The instrumentation now proposed makes substantial demands on computing power for real-time control. These demands will be met by a combination of novel algorithms and the use of new developments in the world of high power computing. In this talk I will address the hardware and software systems that will be required to meet these challenges. In particular I will address the issue of interfacing to WFS cameras using and the choice of hardware (FPGA, GPU or CPU systems) and software development tools both for this aspect of an RTC and for pixel handling and wave-front reconstruction. I will also comment on the selection of standards in both hardware and software for the E-ELT. These standards will need to be applied both to the RTC data pipeline and to the data telemetry, calibration and control, which makes up a large fraction of the overall AO control system.

## Impact of latency and jitter on the performance of the RTC (E. Fedrigo)

Time delays introduced by the real time computer can generate a dramatic reduction in the performance of Adaptive Optics (AO) systems. This effect has been studied analytically and demonstrated experimentally in the past and is by now well understood within the AO community. However, the ambitious science objectives set for future ELTs by modern astronomy imply more demanding performance requirements for the AO system. In this scenario, a proper analysis of the performance of the AO loop might require taking into account the non-ideal behavior of the hardware components of the digital control system in a more accurate way. To this end a thorough analysis has been launched at ESO aiming at assessing the impact of deterministic and randomly-varying time delays (dubbed latency and jitter respectively) on the performance of the AO systems to be deployed at the E-ELT. The present talk outlines the methodology used for the analysis and presents a discussion of the major results obtained so far.

# Session 2: Simulations and devices for AO

## Computer simulation of adaptive optical systems (V. Lukin)

We have a long history in development of creation computer program for adaptive optics design modeling. From 1976 to present the area of our scientific interest deals with optics of random inhomogeneous media, remote sensing, investigation of foundations of atmospheric adaptive optics. In the past we have created of 4-dimensional (three spatial coordinates and time) computer system simulating the optical radiation propagation in the atmosphere under conditions of adaptive phase control of the optical radiation parameters. In particular, we created the numerical models of both the separate components and the whole channel of adaptive optical system. As a method for numerical solution of the parabolic wave equation describing the optical wave propagation in a random-inhomogeneous medium we used the splitting. We used the splitting method together with the Fourier method to solve the homogeneous problem. In our works we used a modification of this method developed by us, based on separation of variables in the homogeneous equation. The separation of variables reduces the two-dimensional homogeneous propagation problem to the two one-dimensional one that makes it possible to use profitably the computational process with one-dimensional algorithm of the Fast Fourier Transform.

Methods and features of parallel algorithms for numerical simulation of optical waves propagation are considered at the present. The scalar parabolic equation for a complex amplitude of monochromatic wave was solved numerically, using the Fourier transform method for homogeneous media and split-step Fourier method for inhomogeneous media. Two parallel algorithms, using OpenMP technology with MKL library for Intel multicore processors and CUDA technology for NVIDIA graphic accelerators have been created. The comparison of two approaches with each other and with a common sequential algorithm, using FFTW library, was performed by calculation of average number of test task solutions per second. It is shown that parallel algorithms have a considerable advantage in speed (by tens times) to the common sequential algorithm in accordance with the grid size in computational task

## Unifying simulation and real-time frameworks to reduce development costs (D. Gratadour)

The angular stone of the design studies of a complex system is the numerical simulation. It allows to validate the conceptual design and to test the behavior of the various system components under realistic conditions. Additionally, the foreseen E-ELT AO systems will require an unprecedented amount of computing power to be driven in real-time. Accelerator based architectures using either FPGAs or GPUs are very attractive solutions to provide the required power. Moreover, the emergence of GPGPU programming is, for the first time, proving a mean of addressing both simulations and real-time control developments on a unified architecture. Our team is developping a full scale end-to-end AO development platform, able to address the E-ELT scale and including a real-time core that can be directly integrated on a real system. The development of this platformc could provide a dramatic improvement in terms of robustness, development cost and upgradability of AO real-time computers. I'll present our approach and preliminary results for simulations and real-time control.

## CILAS Generic Deformable Mirror Drive Electronics (P. Morin)

We will describe the Generic Deformable Mirror Drive Electronics able to drive a deformable mirror with a large number of piezo actuators at high sampling frequency. This electronics converts the numerical order sent by a host computer to an analog high voltage applied to the actuators.

## Focusing light through complex media by wavefront shaping (D. Martina)

In the field of astronomy, atmospheric turbulences perturb light propagation and decrease the sensitivity and the resolution of telescopes. In this general case, the aberration is generally known to be well modeled by low spatial frequency variations of the phase, which can be well corrected by a deformable mirror.

Analyzing and correcting the light is the field of the adaptive optic.

A complex media is a highly inhomogeneous media that strongly scatters light. Sheet of paper, thin layer of paint or thick biological tissues are for examples such complex media. Light propagating through such samples experiments multiple scattering and the input waves are mixed in a seemingly random way and gives rise to highly perturbated wavefront: a speckle ( e.g. diffusion events in a complex media like a thin layer of paint can be of the order of 10^8 ). This is why such media are usually considered opaque and adaptive optics is no use.

Although the propagation through a multiple scattering medium is too complex to be described by classical means because of the number of parameters, recent works have shown that scattered light can nonetheless be harnessed, thanks to wavefront shaping, relying on devices like liquid crystal spatial light modulators [1]. By controlling the phase of the light going through the media and measuring the transmitted light in a well-controlled manner, one has the possibility to acquire a knowledge of the way the light propagates through the turbid medium by measuring the so-called transmission matrix, and focusing or reconstructing an image through the medium become possible [2]. However, this matrix is measured for one configuration of the complex media: if the media moves or if the media itself changes along the time, one would have to measure again the matrix. Living tissues are not stable in time: If one want to focus light through it by measuring the transmission matrix or by using optimization algorithms, this operation would have to be done faster than the decorrelation time of the medium, which can be of the order of a few millisecond or less.

Whatever the method adopted for focusing, the need for a fast closed-loop measurement is a real challenge and a prerequisite if we want to bring the technique from the proof-of-principle level to the biological imaging domain. We will show our latest results using a segmented kilo-DM from Boston Micromachine.

# Session 3: FPGA-based RTC designs

## AO RTC using ATCA comliant FPGA board (Ljusic)

In my talk I will explain:

- ATCA based AO RTC system architecture and interfaces

- FPGA board used in the system

- Implementation of several algorithms for use in TMT RTC

## Present Microgate RTCs and perspectives (Biasi)

Microgate is involved in the development of RTCs and other real-time systems for adaptive optics since several years. These systems are based custom electronics originally developed for the specific control problem of contactless large adaptive mirrors, then extended to slope computers and RTCs, and are deployed on several telescopes like MMT, LBT, Keck, Magellan. We intend to exploit this expertise for the development of a new system based on state of the art technology.

# Session 4: CPU-based RTC designs

## AO: Isn't there (almost) an app for that? (P. –H. Kamp)

A presentation about the ESO/ELT AO prototype cluster, built on COTS PCs, what we learned building that prototype and some perspectives on COTS computing in AO in the future.

Poul-Henning has been hacking UNIX for almost 30 years, and has 20 years experience using the FreeBSD kernel for all sorts of weird stuff, from accounting over air traffic control to realtime hardware control.

## VxWorks on Intel for real time high performance computing (H. Tischer)

Using a Real Time Operating System (RTOS) is essential to meeting tight timing requirements in a rich application environment without crippling OS and usage in an error prone way on each update.

We will see how the widespread RTOS VxWorks including sourcecode access and Tools has kept the pace in the last 25 years and while maintaining broad compatibility has reached support for Posix Processes, up to 32 Cores, 64 Bit, Vector engines on up to date high end hardware.

In SPARTA we run VxWorks on COTS Intel Multicore Servers and are porting existing Software to it. We plan to use the easy hardware access to integrate FPGAs, IEEE 1588 Precision Time Protocol and to have a look into available cluster technologies.

On x86-64, Vector engines are an inherent part of the mature ABI and compilers. By eliminated communication complexity and transport overhead, and Intel scaling up Vector performancy quickly, we will evaluate how price/effort/performance/portability will compete against GPUs.

## DDS on SPARTA: experience and remarks on a high performance data distribution mechanism (S. Zampieri)

The Data Distribution Service for Real-Time Systems (DDS) is an Object Management Group (OMG) Publish/Subscribe standard that aims to enable scalable, real-time, dependable, high performance and interoperable data exchanges between publishers and subscribers. Both commercial and open-source implementations of DDS are available.

In the context of SPARTA - the ESO Standard Platform for Adaptive optics Real Time Applications - DDS is used to distribute real-time data (pixels, slopes, mirror positions, etc) to various data tasks (Loop Monitors, Calibrators, Real Time Displays etc) running on the SPARTA cluster. DDS is also used to deliver database updates and log messages to the VLT environment (via dedicated gateways).

This talk presents the experience of using DDS in SPARTA and some perspectives for the future.

## Techniques for portable high performance (M. Frigo)

I will briefly survey three techniques that I have found useful for writing high-performance code without worrying too much about the precise details of computer architecture.

Cache-oblivious algorithms use the cache asymptotically optimally without knowing the size of the cache. Many common problems can be solved cache-obliviously: matrix multiplication, LU decomposition, stencils, FFT, sorting, etc.

The Cilk language and runtime system allows an easy expression of parallelism in a way that is independent of the number of cores. The Cilk scheduler is theoretically optimal and it works well in practice. The Cilk language is well suited to the expression of many cache-oblivious algorithms.

Finally, I will discuss how one can usefully employ automatic code generators and automatic tuning to improve the performance of programs. My own FFTW library is one example of this approach.

# Session 5: GPU-based RTC designs

## Studying GPU based RTC for TMT NFIRAOS (L. Wang)

We will show the benchmark results for GPU based RTC concept for TMT NFIRAOS using matrix vector multiply and Fourier domain preconditioned CG algorithm. We show that we can update the MVM control matrix in <10 seconds and apply it in real time in <1 ms. Hardware configuration is also discussed.

## Real-time control architecture for large scale AO systems - the EELT EPICS case (N. Doelman)

The EELT EPICS instrument puts a severe challenge on the real-time computing capabilities of the AO system, as it comprises 200x200 sub-apertures wavefront sensing, 40,000-actuation channels in the deformable mirror and an update frequency of 2 to 3 kHz. In a study of candidates for the EPICS real-time architecture, which included essential features such as processing power, memory bandwidth, clustering and software implementation time, a multi-GPU cluster has emerged as a very suitable real-time platform.

For the standard AO control approach in EELT-EPICS, consisting of a matrix-vector multiplication (MVM) wavefront reconstructor and an integrator controller, the internal memory bandwidth of the computation node turns out to be the performance determining factor rather than for instance the processing speed.

To validate the real-time performance of a multi-GPU cluster for large-scale AO, a scaled set-up has been realised, which consists of a CPU master node and 4 GeForce GTX570 slave nodes, interconnected by an Infiniband network and operating real-time Linux. The systems performs according to specification, in particular the achieved memory bandwidth is very close to the theoretical maximum value. Furthermore, the required time to implement the AO control code on a GPU cluster only requires marginally more effort than it would on a plane CPU node.

Based on the experimental results with the scaled multi-GPU cluster an estimate is given of the required multi-GPU cluster for the full EPICS case and other large-scale AO systems.

## How many GPUs do we really need ? (A. Sevin)

GPUs now provide teraflops of floating point performance at an affordable price. However, it is usually not trivial to fully exploit the theoretical performance peak especially when dealing with real-time applications involving data exchange in and out of the device. Optimizing jointly both computational throughput and transfer bandwidth is thus mandatory for this kind of applications. Efficient hiding of memory latency and maximized memory bandwidth are the two key aspects to harness for this work. We will present new strategies for driving complex AO system with a minimum number of GPUs. Achievable performance on optimally dimensioned systems will be demonstrated on realistic examples.

# Session 6: Algorithms

## Reducing the Fractal Iterative Method to half an iteration (C. Bechet)

The fractal iterative method (FRiM) has been introduced to solve the wavefront reconstruction at the dimensions of an ELT with a low-computational cost. Previous studies showed the requirement of only 3 iterations of the algorithm in order to provide the best AO performance. Applying such iterations is hard to combine with the low-latency requirement of the AO real-time computer.

We present here a new approach to avoid iterations in the computation of the commands with FRiM, thus allowing low-latency AO response even at the scale of the E-ELT. The method highlights the importance of "warm-start" strategy for AO. Thanks to simulations with FRiM on Octopus ESO simulator, we enhance the robustness of this new implementation with respect to increasing measurement noise, wind speed or even modeling errors.

## Fast new control algorithms for AO on ELT with pyramid WFS (I. Shatokhina)

For large AO systems equipped with a pyramid wavefront sensor (WFS) the existing control algorithms (MVM, FTR) are computationally very heavy to run in real time (at a frequency of 3.33 KHz required for the XAO system on the ELT). To overcome this problem, we developed new algorithms with highly reduced computational costs (compared to the standard methods like MVM). We present two algorithms, namely the CuReD with data preprocessing, and CLIF â€“ Convolution with Linearized Inverse Filter, for wavefront reconstruction from pyramid WFS measurements, which provide the same quality of the AO correction as the MVM and at the same time are much more efficient from the computational point of view. The CuReD with data preprocessing has a linear computational complexity O(n) (compared to O(n2) for MVM), and requires only 0.04% of the number of flops needed for the MVM (200x200 WFS). The other method (CLIF) has a complexity of O(n3/2) and requires 0.6% of the MVM flops. We show that both methods are parallelizable and pipelinable, and highlight some implementational aspects.

## D-SABRE: A distributed spline based wavefront reconstruction method for large scale wavefront reconstruction. (M. Verhaegen)

A new method for wavefront reconstruction is proposed with application to the extreme adaptive optics (XAO) systems for the next generation of ground based optical telescopes. The new method is based on bivariate simplex B-splines which are defined on triangular partitions. The new method is independent of sensor and actuator geometry, with the result that it can be used on imperfectly aligned sensor arrays, and on apertures with multiple holes and obstructions. Additionally, the method can be parallelized for application in large scale adaptive optics systems like the XAO on the E-ELT.

## CuReD: a fast wavefront reconstruction towards the real world (M. Rosensteiner)

The CuReD algorithm is a new, extremely fast algorithm, developed for the direct reconstruction of the wavefront from Shack-Hartmann sensor data, especially for extremely large telescopes. The quality of the algorithm is comparable to the standard methods, but with superior speed. The algorithm is fully developed and ready to be tested on real systems. We present the algorithm, and focus in particular on its properties with respect to parallelization. Additionally, we present strategies for the integration of the CuReD algorithm to a real system.

## Fast & Furious: a potential wavefront reconstructor for extreme adaptive optics at ELTs (V. Koriakoski)

The most demanding challenges for adaptive optics control at the ELTs are presented by exoplanet imagers requiring extremely high contrast. For instance, the XAO DM of EPICS would have ~10 000 actuators, and controlling it at 3 kHz using conventional techniques is impossible. We present an alternative, extremely fast way to reconstruct the wavefront. The algorithm, based on a small-phase approximation, focal plane images and phase-diversity, scales as N log N compared to the N^2 of the conventional matrix-vector-multiplication.

## The Kaczmarz algorithm for MCAO ( R. Ramlau)

The goal of MCAO is the correction of a larger field of view using several mirrors, conjugated to different heights. The problem to obtain the mirror shapes from wavefront sensor measurements consists of three subproblems: the reconstruction of the incoming wavefronts from the WFS data, the atmospheric tomography and the fitting of the mirrors. In the standard matrix vector multiplication algorithm, those steps are combined, but we present an algorithm, which solves the subproblems sequently. Therefore, we can use the specific properties of the subproblems and gain a higher flexibility. The wavefront reconstruction is performed by the CuReD algorithm, for the tomography and the fitting step we propose the iterative Kaczmarz method. We show that each of the subproblems can be solved fast and accurate enough.

## Fast wavefront reconstruction with wavelet regularization for MCAO (M. Yudytskiy)

Multi conjugate adaptive optics (MCAO) systems aim at widening the compensated field of view of a telescope by utilizing multiple deformable mirrors and sensing several guide stars from different directions. Determining the mirror shapes from the incoming sensor measurements, while staying within the time constraints, is a complex task for the telescopes of today. For the future telescopes, such as the E-ELT, the amount of data to be processed becomes too large and the implementation of the solution methods currently available becomes too costly. Instead of improving the computational capabilities and the architecture of the RTC system, we tackle the problem at it''s core: we attempt to significantly reduce the computational cost of the mathematical method behind the reconstruction.

In this talk we propose a novel iterative approach, which combines the conjugate gradient algorithm and the wavelet-based techniques. The method aims at replacing the commonly used MVM by keeping the quality, but reducing the computational complexity. In the E-ELT setting, the current implementation of the method is estimated to take only 3.5% of MVM''s cost (in terms of FLOPs). Ideas aimed at reducing this number even further will be presented. The method has a high potential for efficient parallelization, and is pipelinable to some degree. Additionally, the approach is very flexible--modifying the wavefront sensors or changing the asterism of the stars can be done without any precomputing.

# Session 7: Real Worls Systems

## CANARY and experience gained for real-time control (A. Basden)

We discuss the real-time system used by the tomographic open-loop MOAO CANARY on-sky demonstrator, the lessons learned, and algorithms used. Implementation details will be given, and we will discuss how the experience gained here can be applied to ELT scale real-time control systems.

## The SPARTA platform: status and plans (M. Suarez)

SPARTA, the ESO Standard Platform for Adaptive optics Real Time Applications, is the high-performance, real-time computing platform serving three major 2nd generation instruments at the VLT (SPHERE, GALACSI and GRAAL) and possibly a fourth one (ERIS).

SPARTA offers a very modular and fine-grained architecture, which is generic enough to serve a variety of AO systems. It includes the definitions of all the interfaces between those modules and provides libraries and tools for their implementation and testing, as well as a mapping to technologies capable of delivering the required performance. These comprise, amongst others, VXS communication, FPGA-aided wavefront processing, command time filtering and I/O, DSP-based wavefront reconstruction, DDS data distribution and multi-CPU number crunching, most of them innovative with respect to ESO standards in use. A scaled-down version of the platform, namely SPARTA-Light, employes a subset of the SPARTA technologies to implement the AO modules for the VLT auxiliary telescopes (NAOMI) and is the baseline for a new VLTI instrument (GRAVITY). For the above instrument portfolio, SPARTA provides also a complete implementation of the AO application, with features customised to each instrument's needs and specific algorithms.

The present talk focuses in the architecture of SPARTA, its major design decisions, technology choices and functional units. Special emphasis is made in the encapsulation and reusability of software, hardware and design concepts. The impact of the different technology choices in the RTC development process is analysed and potential roadmaps are explored to bring SPARTA into the next generation of AO instruments, which pose significant challenges in terms of problem size and data throughput. End-to-end as well as individual module performance data is provided for the XAO system delivered to SPHERE and initial performance results are presented for the GALACSI and GRAAL systems under development.

## A Modular Adaptive Vibrations Cancellation Scheme for SPARTA (L. Pettazzi)

An adaptive cancellation technique for the rejection of almost periodic disturbances to be implemented in the ESO’s Standard Platform for Adaptive optics Real Time Application (SPARTA) has been recently presented in literature. The proposed methodology aims at estimating phase and amplitude of the harmonic disturbance together with the response of the unknown plant at the frequency of vibration. On the basis of such estimates, the algorithm generates a control signal to cancel out the periodic perturbation.

The present talk focuses on the implementation- and operation-related features of the proposed algorithm rather than detailing performances and stability issues. In particular it will be shown how the modular approach chosen for the proposed vibration cancellation scheme leads to an easy and efficient implementation in the parallel SPARTA control infrastructure and leads to a very high operational flexibility. This latter feature represents a crucial advantage especially in scenarios characterized by a fairly large spectrum of operational conditions dictated by multiple science cases.

## KAOS - the Kiepenheuer-Institute Adaptive Optics System (T. Bekerfeld)

We introduce the Kiepenheuer-Institute Adaptive Optics System which is a versatile AO system that is presently used at the two German solar telescopes VTT (since 2003) and GREGOR (since 2011) on Tenerife. KAOS was also used on the balloon borne solar telescope SUNRISE in 2010.

The KAOS control software, which includes both single conjugate and multi-conjugate AO control, is based on CPUs and off-the-shelf computer hardware, including cameras, frame grabbers, and DM interfacing.

Linux is used as operating system and multiple CPU cores are dedicated to the control loop.

The real-time jitter of the sytem is about 2-3 µs RMS at loop frequencies of around 2 kHz.

Correlating Hartmann-Shack sensors for extended object are the main scope of KAOS, however, center of gravity tracking is also integrated.

## Tests of novel wavefront reconstructors on sky with CANARY (U. Bitenc)

The E-ELT will require novel optimal wavefront reconstruction algorithms that can handle the larger scale of ELT instrumentation. Two such algorithms, developed within the CANARY collaboration and elsewhere, have been successfully tested with CANARY both on the bench and on sky at the 4m WHT telescope during 2012. The results of these tests will be presented and the performance compared with more traditional reconstructors.

**Closure (G. Herriot).**

The workshop is concluded. See you at the next edition.