SSSC

Spigot2Sink

The spigot2sink (s2s) diagrams shown here are your indispensible guide to how the SSSC Open Challenges are put together, and to what we ask you to do with them.

First of all, do not be intimidated by the jargon, like "uv-data" or "Jones matrices". It is really quite simple, and people who are productive in writing software (and thus have organized brains) will quickly pick it up. In the meantime, you should be able to get along with the brief explainations at the bottom.

Page contents (click an entry to scroll down to that section):

1. Inspecting uv-data
2. Generating simulated uv-data
3. Solving for M.E. parameters
4. Subtracting the LSM sources from the uv-data
5. Correcting the uv-data
6. The full Monty
Appendix

1. Inspecting uv-data

Figure 1 (click to enlarge)

Fig 1 shows the simplest operation of all: inspecting the uv-data in a given Measurement Set (MS). The latter is the uv-data file of the CASA/AIPS++ package. Converters exist to and from the data-formats of the other major packages (AIPS, MIRIAD, NEWSTAR). In this section, we show the basics of MeqTree visualization. This is already impressive but, with your help, we aim to offer many more ways to look at uv-data and its associated meta-data. In the meantime, you should also look at what CASA/AIPS++ has to offer in MS visualization.

The calico-view-ms.py TDL script, bundled with the Calico Framework within the MeqTrees Cattery, is for examining uv-data. It might be useful to examine the nodes that this script generates in parallel with reading this text. Brief instructions for its use can be found in the PURR logs for open challenge 1.

Note that the schematic tree that is shown in fig 1 (and in all subsequent block diagrams) should be interpreted as N(N-1)/2 parallel subtrees, one for each interferometer(i,j) in an N-station array. Each interferometer has its own MeqSpigot node (which reads its uv-data, usually from the DATA column of he MS) and its own MeqSink node (which writes the processed uv-data back, usually into another column of the MS).

By default, MeqTrees steps through the uv-data in time order. A single "VisDataMux" node generates a succession of processing Requests, which are issued to all MeqSink nodes simultaneously. Each Request contains a "domain", i.e. a rectangle in freq-time space. Successive Requests usually have domains that span all the freq channels in a "spectral window", but only a limited number of "time-slots" (e.g. 10 minutes). As you probably have grasped by now, all the nodes in a tree have the same simple task: to return a Result that contains values for the cells of the requested domain.

If fig 1, the flow of Requests is indicated as dashed arrows, while the flow of Results is in solid arrows. In later block diagrams, the Request flows will be omitted to avoid clutter.

The uv-data may be visualized in two ways. First of all, EVERY node in a tree, however complex, has the capability to "publish" its Results, which are then picked up by the browser for display. Thus, you have access to every intermediate result during every stage of the processing. The PURR log in the first Open Challenge explains how to deal with this embarassment of riches.

Secondly, the Resuts of an arbitrary collection of nodes may be visualized together in the same plot. This is a feature of the MeqComposer node (Its main function, which will not be discussed here, is to put the Results of its child nodes into a single "tensor" Result). The Request for this "inspector" node is generated with the help of the MeqReqSeq (request sequencer) node shown in fig 1. Rather than issuing its Request to all its children simultaneously, it does so in a user-defined sequence (indicated with 0 and 1 here). For each child, it waits for it to return a Result before issuing its Request to the next child. When finished, it passes on the Result of a user-defined child (in this case the MeqSpigot).

2. Generating simulated uv-data

Figure 2 (click to enlarge)

Fig 2 shows how simulated uv-data are generated. The uv-data for the various Open Challenges have all been made in this way. For each challenge, the TDL script that generated the data is available for scrutiny. It is highly recommended to read the Python code, and to browse the resulting tree in the meqbrowser. It goes without saying that you should also read the PURR log, of course.

The generating TDL script may also be cannibalized by the more ambitious among you, e.g. to generate new challenges.

As before, the schematic tree in fig 2 represents N(N-1)/2 parallel subtrees, for all interferometers(i,j). The flow of Requests is indicated once more as dashed arrows. The spigots are indicated for cintinuity with the other figures, but they are not needed for simulation, of course.

As described in the PURR log, an empty MS should be generated by an AIPS++ utility called "makeMS". This is done for a particular telescope (e.g. WSRT, or VLA), with a given station configuration, and a given number of time-slots and freq channels. It also specifies a pointing centre in the sky.

A sequence of processing Requests is generated, for contiguous time domains (usually 20 minutes or so). Larger domains are more efficient, but the maximum size is limited by the available memory.

For each Request, the Local Source Model (LSM) subtree calculates values(f,t) for the four Stokes parameters (I,Q,U,V) of each source. For each interferometer(i,j), and assuming linear polarization, these are converted into a 2x2 cohaerency matrix. For a point source in the field centre, the 4 (complex) elements are: XX=I+Q, XY=U+iV, YX=U-iV, YY=I-Q. The nominal (uncorrupted) uv-data are the sum of the contributions from all the sources in the LSM.

For each Request, the Jones matrix subtrees calculate the instrumental errors(f,t) for the various stations. The elements of each 2x2 Jones matrix are 4 complex numbers. The simplest case is the GJones matrix, which is a diagonal matrix that represents complex "station gains". The simulated instrumental errors (e.g. station phases or gains) are simply generated by means of small subtrees that generate sinusoidal errors with different periods in time and frequency.

NB: The GJones with its "antenna gains" is the staple of the 2GC data reduction packages, which do not use an explicit Measurement Equation (M.E.). MeqTrees does, and can handle Jones matrices that are themselves the product of a sequence of Jones matrices for different instrumental effects. The TDL script that is used here illustrates this by offering a small selection of Jones matrices, which are used for the various Open Challenges. As always, you are invited to play with this script, or even cannibilize it) to generate your own Open Challenges.

The simulated uv-data are simply the nominal visibilities calculated by the LSM subtrees, corrupted (by matrix multiplication) by the station-based instrumental errors calculated by the Jones matrix subtrees. Optionally, gaussian noise mey be added in this stage. The MeqSink nodes then store them in the DATA column of the MS.

3. Solving for M.E. parameters (MeqParms)

Figure 3 (click to enlarge)

Fig 3 shows how one may solve for (arbitrary subsets of) the M.E. parameters. Note that the LSM and Jones subtrees now contain MeqParm nodes, which are connected to a .mep or .fmep file to store their values for the different domains. (NB: An .fmep value is a more efficient version of a .mep file. Both are supported for the moment).

A viewer is being developed to scrutinize the stored MeqParm values, and even to manipulate them (e.g. smoothing). Unfortunately, this very useful tool is not compatible with our current release, but it will be with the one we plan for September (i.e. before we go to Nancay).

In this case, there is only a single MeqSolver node for the entire tree, i.e. for all N(N-1)/2 interferometer subtrees, which share the same LSM subtree(s), and N different Jones matrix subtrees. The solver node gets its Requests via a ReqSeq node in the same way as the inspector node in fig 1 (see also fig 4 and 5).

The solver solves for a user-defined subset of the M.E. parameters. These are made "solvable" by giving the solver a list of any subset of MeqParm nodes that are available in the Jones and/or LSM subtrees (note that instrumental and source parameters are treated equally here!).

The children of the solver are N(N-1)/2 MeqCondeq nodes, which supply it with "condition equations". A MeqCondeq has two children, which supply it with "predicted" and "measured" values (from the MS). The solver tries to minimize the difference between these two, for all MeqCondeqs simultaneously, in a Least Squares sense. Therefore, it is very instructive to visualize the MeqCondeq Results, for instance with an "inspector" node. Independent of the problem, the final MeqCondeq results should be noise-like. Obviously, this will only be the case if the solution has converged, and if one is solving for the correct M.E. and the correct subset of M.E. parameters(!). Dealing with this question is the core of what we are trying to achieve with MeqTrees.

The condition equations are accumulated in a solution matrix, which is then inverted to produce incremental improvements of the values of the solvable MeqParms. These are passed back up the tree to the relevant MeqParm nodes. The updated (and hopefully better) MeqParm values are then used in the next iteration. Since the M.E. is usually non-linear, multiple iterations will usually be needed before the solution is converged. This is controlled by the solver, and may be visualized in its Result in the usual way. (In the future, with your help, we plan to offer different solvers, and much more solver visualization).

4. Subtracting the LSM sources from the uv-data

Figure 4 (click to enlarge)

Fig 4 shows how source contributions may be subtracted from the uv-data. Since, the data in the MS are corrupted with instrumental errors, we must corrupt the nominal source visibilities with the same errors before they are subtracted. In this case, we assume that the correct MeqParm values have been obtained somehow (e.g. by solving for them, or by transferring them from calibrator observations), and reside in the .mep/.fmep file.

Subtracting the contributions of known sources from the uv-data is one of the corner-stones of 3GC. The reason is that, in the presence of Direction Dependen Effects (DDE) like the ionosphere or individual station beamshapes, it is very difficult to subtract them from the image with sufficient accuracy. But it is always possible to subtract any number of sources from the uv-data, provided they are all corrupted with with the correct instrumental errors, i.e. errors in their particular direction.

You are invited to play with the kind of residual error patterns that are left in the image when subtracting sources with the wrong parameters. It will quickly become clear to you how important it is to get it right, and then you amy join in to the worldwide quest of achieving that very expensive ideal.

5. Correcting the uv-data

Figure 5 (click to enlarge)

Fig 5 shows how the uv-data may be corrected for instrumental errors. It is the inverse operation to the corruption of the nominal uv-data that we used before. The same Jones matrix subtrees are required, with the same MeqParm values in .mep or .fmep files. It is necessary to correct the uv-data from the MS to make a decent image.

However, it should be kept in mind that uv-data can only be corrected for a single point in the sky. Thus, in the presence of DDE's, it is imperative to subtract as many bright sources as possible from the uv-data, before transforming the residuals into an image. And even then, many separate "facet" images should be made, after correcting the uv-residials for the centre of the facet. The facet images should be small enough that the imaging errors to not increase "too much" towards the edge.

6. The Full Monty

Figure 6 (click to enlarge)

Fig 6 is a combination of Figs 3,4,5. This is the prototype of a wide range of practical data reduction trees. First one solves for some subset of M.E. parameters. Then one uses the newly estimated MeqParm values to subtract the contributions of one or more LSM sources from the uv-data. And finally, the uv-residuals are corrected with the same MeqParm values.

Variations of this tree will be discussed in Nancay and beyond, in the context of 3GC. For instance, consider a cascade of such "peeling units", each of which "peels" a bright source from the uv-data, in reverse order of brightness.

Appendix: Some background information

Visibilities, uv-data and cohaerencies

The terms "uv-data" or "visibilities" or "cohaerency data" are all different names for the same thing, i.e. the output of a radio interferometry correlator. They are samples of the Fourier Transform of the observed brightness distribution. Thus, an image of the sky is obtained by Fourier Transforming the uv-data.

A radio aperture synthesis array consists of "stations" (a.k.a. "antennas" or "telescopes"). Each combination of two stations is called an "interferometer". An array of N stations has N(N-1)/2 different 2-station combinations.

Each station has two output signals, one for each polarization (X and Y for linear, or R and L for circular). The 2 output signals of each station are correlated with those of each of the other stations. The resulting visibilities for each combination (i,j) consist of 4 complex numbers, which are the elements of a 2x2 cohaerency matrix (e.g. XX XY YX YY). These are the uv-data units that are processed by MeqTrees.

The relative position of the stations (i and j) of an interferometer(i,j) is important. The (projected) distance and orientation of this "baseline", as seen from the observing direction, defines a point (u,v) in the so-called "uv-plane" (hence the name uv-data). An N-station array yields N(N-1)/2 different uv-samples simultaneously. Moreover, because of the rotation of the Earth, a slightly different set of uv-samples is obtained for each successive time interval, or "time-slot". An image of the sky is produced by Fourier-transforming the visibility function.

The resulting image will be convolved with a Point Spread Function (PSF), which is the Fourier Transform of the "uv-coverage", i.e. the sampling function of the uv-plane. Obviously, the PSF will have smaller sidelobes if the sampling of the uv-plane is more complete.

Since the PSF of a bright source will extend over the entire image, it will obscure fainter sources (which are often more interesting). Therefore, it is imperative to remove the PSF around bright sources. One way to do this is by means of image devonvolution methods like "CLEAN". But by far the best results are achieved by subtracting the bright sources from the uv-data, before turning them into an image.

The CASA / AIPS++ Measurement Set

The Measurement Set (MS) is organized as a table. The uv-data are stored in rows, in time order. Each table row contains a 4xNchan array of visibility data for all freq channels of a "spectral window", for all 4 correlations, of one particular interferometer (a pair of stations), for one "time-slot" (e.g. 1 minute).

MeqTrees also steps through the uv-data in time order. This is done by generating a series of processing Requests, each for a specified time-interval, which will be a multiple of the time-interval of a single MS row. The Requests are issued in parallel to the MeqSink nodes of the N(N-1)/2 subtrees of the available interferometers. These pass on the Request to their children, which pass them up the subtrees, right up to the MeqSpigot nodes that read uv-data from the DATA column of the MS. The processing Results are passed from node to node in the opposite direction, until the end Results are stored back in another column of the MS by the MeqSpigots.

The radio-astronomical Measurement Equation (M.E.)

Each Jones matrix may the product of other 2x2 Jones matrices, each of which represents a separate instrumental effect (e.g. phase).

To be continued.