Collective Action for Nomadic Small-angle Scatterers
Last updated R.Ghosh, August 2001
The aims of the canSAS meetings are to promote and simplify sharing SAS data analysis methods. Non-specialists will benefit from easily applicable methods for exchanging and merging of data from different synchrotron, laboratory and neutron facilities. The consensus at this meeting was not to impose a single solution, but to offer guidelines and recommendations on several, each with specific advantages, and also planning development of utilities to perform some easy data interconversion. Thus future meetings will be able to concentrate more on the scientific aspects of data analysis.
The canSAS-3 meeting was composed of a number of presentations and posters, a mixture of scientific and technical topics, each leading to some discussion. The reports given below are sorted into the principal sections below, but the classification is somewhat arbitrary:
Summary of canSAS-3, Grenoble, 17-19th May
There were 30 participants at this workshop jointly organised by the ILL, EMBL, ESRF and the DUBBLE-CRG. It brought together practitioners at the leading edge of data analysis, beam-line responsibles, young SAS scientists eager to comment on inherited software, and, also, computer scientists working on advanced data storage techniques.
The scientific presentations and posters showed the computer experts the challenge to provide for studies ranging from biological macro-molecules in solution to nucleation and growth in aluminium alloys.
The meeting offered the opportunity to establish direct links between two data structures, NeXus and sasCIF, which are being adopted as standard means for data exchange. XML could be seen as serving both to describe these structured files and for aiding the exchange of simple text files in the near future. Demonstrations of applications using commercial packages for SAS data reduction lead to a debate on their various merits though no single package was perceived as a clear winner.
The meeting concluded with agreement to produce and publicise new format examples and improve dissemination of pointers to analysis programs ready for approbation by the Users at SAS-2002 in Venice.
Data formats for SAS
Summary: sasCIF definitions have progressed to being available for archiving one-dimensional SAS data as a practical method at several centres. XML was discussed as offering a wrapper to help share existing data with minimal changes. HDF-5 appeared attractive as a technology for storing complex 2D and multi-dimensional data, even with the current proviso that it is only slowly being integrated into major software packages, resolving the problem of platform dependences for this binary data format.
The disparity of data formats had triggered the first canSAS meeting in 1998. In opening canSAS-3, Wim Bras (DUBBLE-CRG) stressed the need for merging SAS data from different instruments and facilities. He considered the present meeting should lead to a convergence on writing data for simple measurements; he did accept that high intensity instruments allowed a significant increase in more complex measurements, with time-dependent 2D data sets involving data volumes exceeding several Gigabytes in a couple of days. This posed a major subject for debate with the experts present. John Barnes (NIST) reviewed progress since the meetings in Grenoble (1998) and Brookhaven (1999). He re-iterated the need to conserve a maximum number of parameters (meta-data) concerning the measurement with the data, and to include errors with the latter.
Marc Malfois (DUBBLE-CRG) reported on recent modifications to the sasCIF format, which he and Dmitri Svergun had developed following the first canSAS meeting. Collaboration with Steve King (ISIS) identified additional meta-data required for Time of Flight (TOF) neutron data. Marc's document matching sasCIF names and NeXus meta-data names was greatly welcomed at the meeting, and was reviewed by separate working groups.
Ron Ghosh (ILL) described how the practical problems of dealing with large volumes (greater than 50000 measurements/instrument/year) of similar data had been resolved at the ILL by using simple sequence and version numbers to identify raw and treated data. Maintaining a "last-step" history" allowed users to browse the meta-data from all merged components.
In looking to the future he described a recent exercise to introduce XML, a rapidly developing standard for data interchange, as a wrapper for the multiferous file formats in use at present. With the addition of a few simple additional tag field markers, the tabular I(Q) data could be easily located and shared. The exercise with Steve King (ISIS) had shown how easily this could be achieved. There was general recognition that XML might well offer procedures for progressively indroducing a WWW based definition of 1D data and meta-data. Luke Gilbertson (ILL) demonstrated how a short Python script offered easy access to XML files, driven from a simple GUI.
Elena Pourmel (NCSA) then presented the Hierachical Data File (HDF) developed at the NCSA. Version 4 had been adopted by the NeXus development group and was being used increasingly for storage of raw data. The new version HDF5 was much more efficient and also simpler to use, but it had been necessary to abandon direct compatibility with preceding versions, offering conversion utilities. The problems of text storage evident to the NeXus developers, and inherent in the earlier versions of HDF had been resolved in the change to HDF-5. The demonstrations of browsers with editing features, H5-View (Pourmel), and good graphics (Eric Boucher, APS) convinced those present that HDF-5 should be adopted for complex data storage. Uwe Filge (PSI) was working on a new version of NeXus based on HDF-5, again showing that there would be a significant pressure group to urge software developers to incorporate direct support for HDF-5.
Summary: Placed together here are the combination of SAXS and WAXS, and also the problems of presenting and reducing TOF-SANS data.
Peter Bosecke (ESRF) described the problems encountered in merging SAXS and WAXS data from the complementary detector systems, and of the distortions arising as the WAXS detector is rotated about the sample.
It is likely that new SANS instruments will be on pulsed sources, so the problems of dealing successfully with TOF-SANS is of major interest to the neutron community. Elena Litvinenko (JINR) demonstrated how PV-WAVE is used to display their TOF-SANS data, and to assess and select the regions of data for subsequent treatment, all controlled by a comprehensive GUI. This is essential for examining graphically various data quality indicators. Rex Hjelm (LANSCE) showed succinctly how the required data quality could be defined before performing final data reduction of TOF-SANS. He had chosen HDF/NeXus as a data format since it offered the best means for storing the complex raw data, typically a set of about 200 2D frames measured on a non-linear time-scale. Calibration included testing for self-consistency of data from the different incident wavelengths.
These demonstrations raised the more general problem of how to model resolution effects, and added questions to the value of attempting to include indicative Q-resolution information with the tabulated I(Q) data.
Summary: The packages which have been used for SAS analysis include PV-WAVE, IDL, MATLAB, IGOR, OCTAVE, as well as maintained programs such as FIT2D. In common the dependences on X-window (unix) or PC-Windows have been resolved, and the user sees the same interface on either type of platform. The primary attraction in using packages is that the aspect of computer platform independence (primarily graphics) is resolved by the package authors. However one has no control of the direction of future updates or long term compatibility. There remain the questions of sharing developments, and the cost of commercial packages. It is not uncommon to mix usage combining conventional programming with packages to achieve rapid calculations, in which case the platform dependence is re-introduced.
Alan Munter (NIST) described the system he had inherited which was based on IGOR. While he appreciated the powerful interactive debugging facilities, he felt reluctant to continue development, raising the inevitable problem of long-term maintenance of the local application software which uses a commercial package.
Charles Dewhurst (ILL) demonstrated GRASP, based on MATLAB. This includes many automatic features for matching data taken in different instrument configurations, as well as 2D fitting etc. The easy ability to compile and freely distribute MATLAB executables is an exception for commercial packages.
Els Homan (EMBL) demonstrated the integrated control and treatment software developed using IDL for the DUBBLE-CRG SAXS-WAXS beamline at the ESRF.
Andy Hammersley (ESRF) showed that his FIT2D went beyond simple data reduction by including detector distorsions and other calibration corrections needed for high precision work. Although based on a GUI, he now wondered whether such products could benefit from a macro language (perhaps Python based) to allow more easily for repetitive operations. Matt Rodman, (SRS, Daresbury) described a GraphApp a GUI C library which enables applications to be compiled either on PC-Windows, or X-window. This allows stand-alone executables to be created, avoiding problems of multi-component installations using Tcl/Tk and scripting in conjunction with conventional programs.
SAS analysis and Science
Summary:This section served to show a fraction of the range of SAS analysis activities which have been partly adapted by their authors for different facilities. The applications presented orally and in posters covered fields of fibre diffraction, biological molecules in solution, super-alloys, and rope, amongst others. They illustrated for instrument scientists and software designers the advantages of having proven tools from different sources available to analyse data objectively, but also allowing some independent cross-checks on results.
Trevor Forsyth (ILL) presented his studies of polymorphism in DNA. This was analysed using the CCP13 software, though the CCP14 macromolecule package had also been used for higher angle data on the crystalline components. At present CCP13 uses the BSL format, primarily for want of something better capable of dealing with data from different instruments.
Dmitri Svergun (EMBL) ranged through a wealth of measurements, analyses and simulations of bio-molecules in solutions, polydisperse polymers, and partially ordered systems. In endeavoring to be applicable to a wide range of tabular data, his programs could ignore meta-data and hunt for tables of intensity-Q values. His website provides an introduction to this valuable set of programs.
Steve King (ISIS) presented his exercise of being confronted with analysing his own data on rope, and his subsequent self-introduction to the tools in CCP13 for analysing fibre diffraction data. This raised a discussion on the general problem of training even accomplished scientists to use existing software packages.
Problems of measuring weak signals (which is now possible with high fluxes) were raised by Adrian Rennie (King's College, London). When backgrounds are high the propagation of errors is needed in assessing final results. The same requirement holds when one investigates small effects due to applied fields. With more complex samples, such as fibres or large structures, e.g. ordered by shear, the SAS diffraction geometry (the measurement cuts through the Ewald sphere obliquely) requires correction before interpretation of the 2D SANS diffraction patterns. In systems with restricted order a number of effects lead to similar consequences on data, e.g. resolution/polydispersity, concentration/contrast, interactions/screening which must be resolved in the design of the experiment.
He gave the audience some cause for satisfaction on describing conventional polymer/solution experiments performed carefully at different facilities which gave essentially identical results using their local reduction and analysis procedures. This now can lead to further experiments, since small deviations from the model system can be reliably quantified. His last examples showed where on-line analysis could be used to optimise measurement conditions. The growth of silica particles occurs in multi-scale regimes when examined over a large dynamic range; the density fluctuations in heavy water near the critical point have rapidly changing correlation distances as the temperature is slowly varied. The dynamic ranges of these measurements requires adjusting the instrument configurations during the sequence of measurements to ensure the resolution matches this distance scale.
The mixture of subjects discussed and personal interests of participants created openings for debate after each presentation. During working group sessions the sasCIF dictionary/NeXus dictionary was reviewed and new items proposed. Finding common descriptors for the complex TOF-SANS case appeared to require further consideration.
The activists in Grenoble, Didcot and Washington proposed continuing exchanging examples of XML data files; all agreed to look at the possibilities of using HDF-5 utilities to try and construct some simple example data files of complex measurements. These would serve as examples to display in Venice at SAS-2002.
The longer term activities of canSAS meetings should be to provide
- maintenance of (established) data standards
- presentations of new SAS analysis procedures
- instruction for introducing non-specialists to SAS
There was general agreement that canSAS was now ready to advance beyond the problem of formats and should serve as a forum for debating analysis techniques, assembling and disseminating information on programs, and especially in attempting to demystify arcane reduction procedures for the many occasional SAS users. One suggestion was to create a website dedicated to these aims. There remains the problem of the location of such a site, and the manpower required to setup and maintain it.