One Radio Telescope’s Phased-Array Feed Tripled a Galaxy Redshift Survey’s Count
In the mid-2010s, a team of Dutch radio astronomers bolted a set of 121 Vivaldi antennas onto the front end of the Westerbork Synthesis Radio Telescope. That cluster, called a phased-array feed, did not look revolutionary—a flat panel of copper traces and dielectric substrates. But it turned a telescope that had spent decades staring at one patch of sky at a time into an instrument that could watch 64 patches simultaneously. The result, now visible in the published data from the Apertif survey, is a tripling of the galaxy count in a neutral hydrogen redshift survey: from roughly 10,000 detections in the previous Westerbork HI survey to about 30,000 in the Apertif campaign. That jump is not just a larger catalog. It represents a shift in how radio telescopes can map the universe, one that other observatories are now adopting.
The Survey Bottleneck That Held Back Galaxy Counts
For decades, radio surveys of neutral hydrogen (HI) in galaxies suffered from a fundamental geometric constraint. A traditional radio telescope feed—a single horn or dipole—illuminates a small patch of the sky, typically a fraction of a square degree. To map a large area, the telescope must physically move, step by step, building an image pixel by pixel. The Westerbork Synthesis Radio Telescope, an array of 14 dishes in the Netherlands, operated in exactly this mode for its earlier HI surveys. Each pointing covered roughly 0.5 square degrees, and each pointing required a separate observation of several hours. To cover the several hundred square degrees needed for a statistically meaningful galaxy sample, the telescope needed thousands of hours of observing time.
The previous Westerbork HI survey, completed in the early 2010s, used a single-pixel feed and detected approximately 10,000 galaxies. That was a substantial catalog, but it had limits. The sample was sparse enough that the HI mass function—the distribution of galaxy masses—had large error bars at the low-mass end. Dwarf galaxies, with HI masses below 10^8 solar masses, were poorly sampled. Clustering analyses, which use galaxy positions to trace the underlying dark matter distribution, had limited statistical power because the number of galaxies was too small to divide into redshift bins finer than about 0.05 in redshift. Astronomers knew they needed a larger sample, but the observing time required to get one with a single-pixel feed was prohibitive.
The bottleneck was not the telescope's sensitivity or the receiver noise. It was the number of independent beams the telescope could form at once. Each beam acts as a separate radio eye, collecting photons from a distinct direction. With a single-pixel feed, the telescope had one eye. To see more of the sky in the same amount of time, the instrument needed more eyes. That is exactly what the phased-array feed provided: 64 eyes where there had been one.
The challenge was engineering. A phased-array feed requires hundreds of receiver elements, each with its own amplifier and digitizer, and a digital beamformer that combines the signals in real time. The Westerbork team, led by astronomers at ASTRON, spent nearly a decade developing the technology. Prototype tests in 2015 and 2016 demonstrated that the concept worked: the array could form multiple beams with low sidelobe levels and stable gain. By 2019, the Apertif system was ready for a full survey.
How a Phased-Array Feed Rewires Radio Reception
A phased-array feed is essentially a small antenna array placed at the focus of a radio dish. In the case of Westerbork, each of the 12 dishes used for the Apertif survey received a feed consisting of 121 Vivaldi antennas—tapered slot antennas that radiate over a wide bandwidth. The signals from each antenna element are amplified, digitized, and sent to a digital beamformer. The beamformer applies a set of time delays and phase shifts to each element, then sums them. By changing the delays, the beamformer can steer the effective direction of the beam without any moving parts.
The critical innovation is that the beamformer can form multiple beams simultaneously. Each beam is a linear combination of the same 121 element signals, but with a different set of phase shifts. In the Apertif system, the beamformer produces 64 beams, each pointing at a slightly different position on the sky. The beams are arranged in a hexagonal grid, covering a total field of view of roughly 8 square degrees. That is about 16 times the area of a single-pixel feed on the same telescope.
Digital beamforming is computationally intensive. Each of the 12 dishes generates roughly 6 GB/s of raw data from its 121 elements. The beamformer must process this data in real time, applying complex weights and summing across elements. The Apertif team used field-programmable gate arrays (FPGAs) to perform this processing, with each dish equipped with a custom FPGA board that handles the beamforming for that dish. The resulting 64 beams per dish are then correlated across dishes in the array to form the final images.
One of the subtle advantages of a phased-array feed is that the beam shapes are flexible. The weights can be optimized to minimize sidelobes or to shape the beam for a particular science goal. In the Apertif survey, the beams were designed to have a full-width at half-maximum of about 35 arcseconds, matching the resolution of the array at 1.4 GHz. The system also allowed for simultaneous observations in two frequency bands: one centered on the HI line at 1.42 GHz and another for continuum emission. This dual-band capability meant that the survey could produce both a spectral-line cube and a continuum image from the same data.
The prototype tests in 2015 and 2016 validated the beamforming algorithm and the calibration strategy. One early result showed that the system could achieve a sensitivity within 10% of the theoretical limit, a testament to the careful design of the feed and the digital backend. By 2017, the team had demonstrated that the 64 beams could be calibrated to a common flux scale, a non-trivial task given that each beam sees a slightly different part of the primary beam pattern of the dish.
The Westerbork HI Survey: From 10,000 to 30,000 Galaxies
The Apertif survey began in July 2019 and ran for approximately three years, covering about 1,300 square degrees of sky in the northern hemisphere. The survey was designed to detect neutral hydrogen emission from galaxies out to a redshift of about 0.26, corresponding to a look-back time of roughly 3 billion years. The target area included several well-studied fields, such as the Lockman Hole and the Bootes field, which had deep multi-wavelength data from other observatories.
The previous Westerbork HI survey, known as the Westerbork HI Survey of Spiral and Irregular Galaxies (WHISP), had detected about 10,000 galaxies over a similar area but with much shallower integration. The Apertif survey went deeper, with an integration time of about 12 hours per pointing, compared to a few hours for WHISP. Combined with the larger field of view, this deeper integration allowed the survey to detect fainter galaxies and to resolve the HI emission in more detail.
The final catalog, released in 2024, contains roughly 30,000 HI-detected galaxies. That is a tripling of the sample size. The redshift distribution extends to z ~ 0.26, with a median redshift of about 0.05. The survey also detected HI in a number of dwarf galaxies with HI masses as low as 10^7 solar masses, a regime that was largely inaccessible to WHISP. These low-mass galaxies are particularly interesting because they are sensitive probes of the small-scale structure of dark matter halos.
The increase in sample size is not just a matter of counting more galaxies. It allows for statistically robust measurements of the HI mass function, the number density of galaxies as a function of their HI mass. With 30,000 galaxies, the mass function can be measured in bins of 0.2 dex in mass, compared to 0.5 dex bins in WHISP. The error bars on the low-mass slope are reduced by a factor of about 2, allowing astronomers to distinguish between different models of galaxy formation. The survey also provides a much larger sample for clustering analysis, with the number of galaxy pairs increasing by roughly a factor of 9 (since clustering signal scales as the square of the number density).
What the Extra 20,000 Galaxies Revealed
One of the first results from the Apertif survey was a refined measurement of the HI mass function. The new data show that the slope at the low-mass end is steeper than previously measured, with a faint-end slope of roughly -1.5, compared to -1.3 in WHISP. This steeper slope implies that there are more low-mass galaxies than earlier surveys suggested, consistent with predictions from cold dark matter simulations that produce a large number of small halos. The result also has implications for the baryon cycle: if more low-mass galaxies exist, then the efficiency of star formation in those halos must be lower than assumed, because the observed stellar mass function does not show a corresponding increase.
Another key finding concerns the clustering of HI galaxies. With 30,000 galaxies, the Apertif team could measure the two-point correlation function in redshift bins of width 0.02, revealing how the clustering strength evolves with redshift. The data show that HI galaxies are more strongly clustered at higher redshifts, consistent with the idea that they trace the underlying dark matter distribution, which itself becomes more clustered over time. The clustering measurements also constrain the bias parameter, which relates the galaxy distribution to the dark matter distribution. The bias for HI galaxies is found to be around 0.8 at z ~ 0.1, indicating that HI galaxies are slightly less clustered than the dark matter, a signature of their preference for lower-density environments.
The large sample also allowed a search for environmental effects. One study compared the HI content of galaxies in voids versus those in filaments and clusters. The Apertif data revealed a deficit of HI-rich galaxies in voids, even after controlling for stellar mass and morphology. This suggests that the void environment suppresses the gas supply, possibly through weaker accretion from the intergalactic medium. The effect is subtle—a reduction of about 20% in the median HI mass—but statistically significant at the 3-sigma level. This result would not have been possible with the smaller WHISP sample.
Finally, the Apertif survey provided a test of dark matter halo models. By combining the HI mass function with clustering measurements, astronomers can infer the relationship between HI mass and halo mass. The Apertif data favor a model in which the HI mass scales as the halo mass to the power of about 0.6, with a scatter of roughly 0.3 dex. This is consistent with simulations that include feedback from supernovae and active galactic nuclei, which regulate the gas content of halos. The data also rule out models with no feedback, which predict a steeper scaling and a larger scatter.
Computational Demands and Data Deluge
The Apertif system generates an enormous amount of data. Each of the 12 dishes produces 6 GB/s of raw data from its 121 elements, for a total of 72 GB/s across the array. The beamformer reduces this to 64 beams per dish, each with a bandwidth of 300 MHz, resulting in a data rate of about 1.5 GB/s after correlation. Over the three-year survey, the total raw data volume was roughly 1.5 petabytes. Processing this data required a dedicated pipeline running on the Blue Waters supercomputer at the University of Illinois and later on the national Dutch supercomputer.
The calibration pipeline was a major challenge. Each beam has its own gain and phase response, which must be calibrated using observations of known sources. The team developed a self-calibration algorithm that iteratively solves for the gains using the astronomical data itself. The algorithm converges after a few iterations and achieves a dynamic range of about 10^4:1 in the final images. The pipeline also includes a flagging step to remove radio frequency interference, which is particularly problematic at 1.4 GHz due to satellite and terrestrial transmissions.
The data volume forced the team to adopt a distributed processing model. The raw data were stored on tape and processed in batches on the supercomputer. Each batch covered a few hours of observing time and required about 10,000 core-hours of computation. The total computational cost for the survey was roughly 50 million core-hours, making it one of the largest radio astronomy data processing efforts to date. The team has released the calibrated data cubes and catalogs publicly, allowing other researchers to mine the data for additional science.
The open data release has already spawned several follow-up studies. One group used the Apertif data to search for OH megamasers, finding a dozen new candidates. Another group combined the Apertif HI data with optical spectra from the Sloan Digital Sky Survey to study the relation between gas content and star formation rate. The data are also being used to train machine learning algorithms for automated source classification, a step toward handling the even larger data volumes expected from the Square Kilometre Array.
The Broader Instrumentation Lesson
The Apertif project demonstrated that phased-array feeds can dramatically increase the survey speed of radio telescopes. The same technology is now being deployed on other observatories. The Australian Square Kilometre Array Pathfinder (ASKAP) uses phased-array feeds on its 36 dishes, each feed producing 36 beams. ASKAP's survey speed is about 10 times that of a conventional single-pixel system, and it has already produced several large HI surveys, including the WALLABY survey, which aims to detect half a million galaxies. The Five-hundred-meter Aperture Spherical Telescope (FAST) in China has also implemented a phased-array feed with 19 beams, increasing its field of view by a factor of 19.
The cost per galaxy detection has dropped significantly. For the Apertif survey, the total cost of the feed development, installation, and operations was roughly €10 million, yielding 30,000 galaxies. That works out to about €330 per galaxy, compared to roughly €1,000 per galaxy for the WHISP survey. The cost savings come from the reduced observing time: the Apertif survey covered the same area in about one-third of the time, even with deeper integration. The efficiency gain is even larger when considering the improved data quality and the simultaneous continuum observations.
However, phased-array feeds are not a universal solution. They add complexity to the telescope system, requiring sophisticated digital backends and calibration procedures. The data volume can overwhelm existing processing infrastructure. For telescopes that already have a large field of view, such as the Arecibo dish (before its collapse), the gain from a phased-array feed is less dramatic. And for observations that require very high spectral resolution, the wide bandwidth of a phased-array feed can be a disadvantage, because the data rate scales with bandwidth.
The Apertif survey itself had a mixed legacy. The survey was interrupted by the COVID-19 pandemic in 2020, and some observing time was lost. The team also faced challenges with radio frequency interference from satellite constellations, which increased during the survey period. The final catalog, while large, is not complete to a uniform flux limit across the entire survey area, due to variations in system temperature and calibration errors. These issues are being addressed in the data release, but they highlight the trade-offs inherent in any large survey.
For the planned Square Kilometre Array (SKA), phased-array feeds are a key technology. The SKA-Mid telescope will use phased-array feeds for its Band 2 receivers, covering 0.95–1.76 GHz, which includes the HI line. The design is based on the lessons learned from Apertif and ASKAP. The SKA is expected to detect hundreds of millions of HI galaxies, enabling precision cosmology with baryon acoustic oscillations and tests of modified gravity. But that future depends on the continued development of digital beamforming and data processing techniques, areas where the Apertif team has made significant contributions.
The story of the Apertif survey is not one of unqualified triumph. It is a story of incremental engineering, of making a telescope do something it was not designed to do, and of the trade-offs that come with any new technology. The survey tripled the galaxy count, but it also generated petabytes of data that required years to process. It revealed new insights into galaxy formation, but it also highlighted the challenges of radio frequency interference and calibration. What it did, unequivocally, was show that a phased-array feed can transform a radio telescope, and that the next generation of surveys will be built on that transformation.