Tofwerk (fibTOF FIB-SIMS)#

Warning

This plugin and the file format documentation below were developed by reverse-engineering real acquisition files without access to official Tofwerk format specifications. Descriptions of HDF5 groups, attributes, and data layouts may be incomplete or incorrect. If you notice anything wrong, please open an issue on the RosettaSciIO issue tracker!

Reads HDF5 files (.h5) written by TofDAQ acquisition software for Tofwerk time-of-flight secondary ion mass spectrometry (ToF-SIMS) instruments, including the fibTOF FIB-SIMS system.

Acquired signals#

A fibTOF acquisition produces two raw signal types and one derived signal:

Raw signals

  • EventList — the primary raw detector output. For each pixel in the 3-D raster (depth, y, x), a variable-length array of Time-to-Digital Converter (TDC) timestamps corresponding to every individual ion detection event. Each write (depth slice) mills a thin layer from the sample surface and then rasters the FIB beam across the exposed face to collect SIMS spectra. The spatial axes are:

    • depth — one entry per FIB milling + SIMS acquisition cycle (NbrWrites).

    • y — rows of the 2-D SIMS raster scan (NbrSegments).

    • x — columns of the 2-D SIMS raster scan.

  • FIB SE images — secondary-electron images acquired simultaneously at the full FIB scan resolution (e.g. 256×256), one image per depth slice. The SE image resolution is typically higher than the SIMS raster (e.g. 128×128 for a “256×256 2×2” acquisition), so SE images and SIMS data have independent pixel sizes — both derived from FIBParams.ViewField (in mm) divided by the respective pixel count.

Derived signal

  • Peak data — the signal most users work with. The EventList is integrated over user-defined mass windows (PeakData/PeakTable) to produce a 4-D array (depth, y, x, m/z) of per-peak ion counts. In pre-processed files this integration has already been performed by the Tofwerk software and the result is stored as PeakData/PeakData. For raw files it can be reconstructed on load (see signal="peak_data" below).

File states#

Two file states are supported:

  • Pre-processed files (PeakData/PeakData present) – A file where the Tofwerk software has already integrated the raw events into per-peak counts.

  • Raw files (no PeakData/PeakData) – unprocessed TofDAQ output as originally saved by the acquisition software. The "peak_data" signal can be reconstructed from the EventList data using the integration windows in PeakData/PeakTable, but this can be computationally intensive (see below).

HDF5 file structure#

All TofDAQ .h5 files share the following top-level layout. Groups marked [raw only] are absent from pre-processed files; groups marked [pre-processed only] are absent from raw files (though the Tofwerk software may preserve EventList when pre-processing).

/                               root — acquisition-wide attributes and metadata
├── AcquisitionLog/
│   └── Log                     compound dataset; one row per log entry.
│                                 Log[0]['timestring'] is the authoritative
│                                 ISO-8601 acquisition timestamp. This node
│                                 also contains the acquisition "finish" time.
├── FIBImages/
│   ├── Image0000/
│   │   └── Data                float64 (H, W) — SE image at full FIB resolution
│   ├── Image0001/ ...
│   └── Image000N/              one subgroup per image; sorted lexicographically
├── FIBParams/                  group attributes — FIB column settings
├── FibParams/
│   └── FibPressure/
│       └── TwData              float64 (NbrWrites, 1) — chamber pressure in Pa
│                                 for every depth slice in the dataset
├── FullSpectra/
│   ├── MassAxis                float32 (NbrSamples,) — calibrated m/z in Da
│   ├── SumSpectrum             float64 (NbrSamples,) — cumulative ion counts
│   ├── SaturationWarning       uint8   (NbrWrites, NbrSegments)
│   │                              Per-buffer saturation flag: 0 = no saturation,
│   │                              1 = saturation detected. High values
│   │                              indicate that the detector was overloaded during that
│   │                              (write, segment) combination and the corresponding
│   │                              spectral data should be treated with caution.
│   └── EventList               vlen uint16 (NbrWrites, NbrSegments, NbrX)
│                                 each pixel contains one variable-length TDC
│                                 timestamp array
├── PeakData/
│   ├── PeakTable               compound (NbrPeaks,) — peak definitions:
│   │                             label, mass (Da), lower/upper integration limits.
│   │                             Will contain "nominal" peaks created by the TofDAQ
│   │                             software, together with any user-defined peak integration
│   │                             windows created before the file was saved
│   └── PeakData                float32 (NbrWrites, NbrSegments, NbrX, NbrPeaks)
│                                 [pre-processed only] — peak-integrated ion counts
├── TimingData/
│   ├── BufTimes                float64 (NbrWrites, NbrSegments) — wall-clock
│   │                             timestamps per buffer, in seconds
│   └── (group attributes)      TofPeriod — Analog-Digital Converter (ADC) samples per ToF pulse
└── TPS2/                       instrument telemetry — Sampled once per scan line.
    ├── TwData                  float64 (NbrWrites, NbrSegments, 75) — target
    │                             and monitored voltages/flags for 75 named
    │                             channels.  Mostly low-level instrument debug
    │                             data not relevant to signal processing.
    │                             Channels of potential diagnostic interest:
    │                             MCP bias monitor, filament emission monitor,
    │                             and ToF pulser voltages.
    └── TwInfo                  |S256 (75,) — channel names and units

Key root attributes:

Attribute

Type

Description

TofDAQ Version

float32

Primary format-detection marker (e.g. 1.99).

NbrWrites

int32

Number of depth slices (milling steps).

NbrSegments

int32

Number of Y scan lines per write.

NbrSamples

int32

Number of bins (samples) per spectrum (length of MassAxis).

NbrPeaks

int32

Number of peaks in the peak table.

NbrWaveforms

int32

Hardware waveform averaging count (normally 1).

IonMode

bytes

b"positive" or b"negative".

Key FIBParams group attributes:

Attribute

Type

Description

FibHardware

bytes

FIB platform identifier (e.g. b"Tescan").

Voltage

float64

Primary ion beam voltage in V (divide by 1000 for kV).

Current

float64

Ion beam current in A (zero if not measured).

ViewField

float64

Field of view in mm. Divided by the pixel count to obtain pixel size in µm — separately for the SIMS raster and the FIB SE images.

Inspecting available signals#

Use rsciio.tofwerk.available_signals() to check which signals a file contains before loading:

from rsciio.tofwerk import available_signals

available_signals("acquisition.h5")
# ['sum_spectrum', 'peak_data', 'fib_images']          # pre-processed file
# ['sum_spectrum', 'peak_data', 'event_list', 'fib_images']  # raw file

Selecting signals#

The signal parameter controls which signals are returned. The default is "sum_spectrum", which is always fast.

import hyperspy.api as hs

# Default: 1-D cumulative spectrum (fast, always available)
s = hs.load("acquisition.h5", file_format="Tofwerk")

# 4-D peak array (depth × y × x × m/z)
s = hs.load("acquisition.h5", file_format="Tofwerk", signal="peak_data")

# FIB SE image stack (depth × y × x), full FIB scan resolution
s = hs.load("acquisition.h5", file_format="Tofwerk", signal="fib_images")

# Raw TDC timestamps as a ragged signal
s = hs.load("acquisition.h5", file_format="Tofwerk", signal="event_list")

# Multiple signals at once
signals = hs.load(
    "acquisition.h5",
    file_format="Tofwerk",
    signal=["sum_spectrum", "peak_data"],
)

# All signals available for this file
signals = hs.load("acquisition.h5", file_format="Tofwerk", signal="all")

Valid values for signal:

"sum_spectrum" (default)

1-D cumulative spectrum from FullSpectra/SumSpectrum. Always available. Supports lazy loading.

"peak_data"

4-D array (depth, y, x, m/z).

  • Pre-processed files: reads PeakData/PeakData directly. Supports lazy loading (recommended for large files).

  • Raw files: reconstructs the 4-D array by walking the variable-length FullSpectra/EventList and integrating events within each peak window. Always eager. On large files this can take several minutes; install tqdm for a progress bar and numba for a ~19× speed-up.

"event_list"

Ragged object array (depth, y, x) of raw uint16 TDC timestamps, one variable-length array per pixel. Present in all raw files; also available in pre-processed files if the Tofwerk software did not remove it. Supports lazy loading. HyperSpy represents it as a ragged signal and does not support plotting it directly.

"fib_images"

3-D stack (depth, y, x) of secondary-electron images at full FIB scan resolution. Available in FIB-SIMS files that contain a FIBImages group. Supports lazy loading. If any images have a non-dominant shape (e.g. a truncated final frame), they are skipped with a warning.

"all"

All signals available for the file. Use available_signals() to see the list in advance.

Subsetting#

Use mz_range and depth_range to read only a portion of the data:

# Load only peaks between 20 and 100 Da
s = hs.load("acquisition.h5", file_format="Tofwerk",
            signal="peak_data", mz_range=(20.0, 100.0))

# Load only depth slices 10–19 (exclusive upper bound)
s = hs.load("acquisition.h5", file_format="Tofwerk",
            signal="peak_data", depth_range=(10, 20))

Both parameters work for pre-processed files by slicing the HDF5 dataset before reading. For raw files, mz_range also reduces reconstruction cost because only the selected peaks need to be counted. depth_range applies to "peak_data", "event_list", and "fib_images".

Use dtype to cast "peak_data" after loading, e.g. dtype=np.uint16 for low-count integer data or dtype=np.float16 to halve memory use at reduced precision.

Lazy loading#

Pass lazy=True to defer reading large arrays until .compute() is called:

s = hs.load(
    "large_acquisition.h5",
    file_format="Tofwerk",
    lazy=True,
    signal="peak_data",
    chunks="auto",   # or a tuple / dict for manual chunking
)
# s.data is a dask array; inspect size before computing:
print(s.data.nbytes / 1e9, "GB")
s.compute()

Lazy loading is supported for "sum_spectrum", "peak_data" (pre-processed files), "event_list", and "fib_images". Reconstructing "peak_data" from a raw file’s EventList is always eager.

Optional dependencies#

  • numba — JIT-accelerates EventList reconstruction (~19× faster than the NumPy fallback on large datasets).

Pass show_progressbar=False to suppress progress output entirely.

Performance tips#

  • EventList reconstruction (raw files, signal="peak_data") can be slow on large datasets. Install numba for a large speed-up, and use mz_range to limit reconstruction to the peaks of interest. Adjust peak_data_batch_size to process multiple depth slices per batch (trades memory for fewer Python iterations).

  • Pre-processed files with unsorted peaks: if PeakData/PeakTable peaks are not in ascending m/z order, the reader sorts them in place. peak_data_batch_size controls how many depth slices are sorted per iteration; smaller values keep working arrays in CPU cache.

Metadata#

FIB-SIMS specific fields are stored under Acquisition_instrument.FIB_SIMS:

meta = s.metadata.Acquisition_instrument.FIB_SIMS

meta.file_type              # "pre-processed" or "raw"
meta.FIB.hardware           # FIB hardware identifier (e.g. "Tescan")
meta.FIB.voltage_kV         # Ion beam accelerating voltage in kV
meta.FIB.current_A          # Ion beam current in A
meta.FIB.view_field_mm      # Field of view in mm
meta.FIB.pixel_size_um      # SIMS pixel size in µm
meta.ToF.ion_mode           # "positive" or "negative"
meta.DAQ.tofdaq_version     # TofDAQ software version string

Note

The .h5 extension is shared with other formats (EMD, Arina, etc.). The plugin identifies Tofwerk files by checking for TofDAQ-specific HDF5 groups (FullSpectra, TimingData, AcquisitionLog) and the TofDAQ Version root attribute. Specify file_format="Tofwerk" explicitly if automatic detection fails.

API functions#

rsciio.tofwerk.available_signals(filename: str | Path) list[str]#

Return the list of signal names available in a Tofwerk TofDAQ HDF5 file.

This is a fast, read-only inspection call — no large data arrays are loaded.

Parameters:
filenamestr or pathlib.Path

Path to the .h5 file.

Returns:
list of str

Subset of ["sum_spectrum", "nominal_peak_data", "additional_peak_data", "event_list", "fib_images"] depending on what is present in the file.

Raises:
IOError

If the file is not a Tofwerk TofDAQ HDF5 file.

Examples

>>> from rsciio.tofwerk import available_signals
>>> available_signals("my_acquisition.h5")
['sum_spectrum', 'nominal_peak_data', 'fib_images']
rsciio.tofwerk.compute_peak_data_from_eventlist(event_list: ~numpy.ndarray, mass_axis: ~numpy.ndarray, nbr_samples: int, clock_ratio: int, normalization: int, peak_table: list[dict[str, ~typing.Any]], show_progressbar: bool = True, dtype: ~numpy.dtype = <class 'numpy.uint16'>) ndarray#

Reconstruct the 4-D peak-integrated array from raw EventList data.

This replicates the processing performed by the Tofwerk proprietary software when opening a raw file:

  1. For each pixel’s variable-length EventList (raw TDC timestamps), convert timestamps to ADC sample indices by integer division with the TDC-to-ADC clock ratio (SampleInterval / ClockPeriod, typically 64).

  2. Look up the calibrated mass for each sample index via MassAxis.

  3. Count events that fall within each peak’s integration window.

  4. Divide by NbrWaveforms × NActiveChannels — the number of times each physical ion is recorded in the EventList (once per ToF cycle per active recording channel).

Note

If numba is installed, each depth slice is read in a single h5py call (instead of one call per row), then each row is packed into a flat int64 buffer and processed by a JIT-compiled kernel. Otherwise falls back to a vectorised NumPy path that allocates per-pixel arrays but is still significantly faster than a pure-Python triple loop.

Parameters:
event_listarray_like

Ragged object array of shape (nwrites, nsegs, nx) of uint16 TDC timestamps, one variable-length array per pixel. Accepts h5py variable-length datasets or numpy object arrays (as loaded by signal="event_list").

mass_axisnumpy.ndarray

1-D array of shape (nbr_samples,) with calibrated mass (Da) for each ADC sample index, from FullSpectra/MassAxis.

nbr_samplesint

Length of mass_axis; events outside [0, nbr_samples) are discarded.

clock_ratioint

TDC-to-ADC sample index divisor (SampleInterval / ClockPeriod). Pass 1 if ClockPeriod is not available.

normalizationint

NbrWaveforms × NActiveChannels — the divisor used to convert raw event counts to per-waveform ion counts.

peak_tablelist of dict

Integration windows. Each dict must have keys lower_integration_limit and upper_integration_limit (Da).

show_progressbarbool, default=True

Whether to show tqdm progress bars during reconstruction.

dtypenumpy.dtype, default=np.uint16

Output array dtype. Counts are accumulated as float32 internally (for the normalization division) and cast to this dtype on return.

Returns:
numpy.ndarray

Shape (nwrites, nsegs, nx, npeaks), cast to dtype. Identical in structure to PeakData/PeakData in a pre-processed file.

rsciio.tofwerk.count_active_channels(ini: str) int#

Return the number of active TDC recording channels from the INI config string.

TofDAQ records events from multiple ADC/TDC channels simultaneously (e.g. Ch1 and Ch3 in a typical fibTOF setup). Each active channel contributes an independent copy of every ion event to the EventList, so the raw event count per pixel is multiplied by this factor.

Parameters:
inistr

Decoded text from the HDF5 root attribute file.attrs["Configuration File Contents"]. This is the TofDAQ configuration stored as an INI-style string, for example containing entries such as Ch1Record=1 and Ch2Record=0.

Returns:
int

Number of active channels (at least 1).

Examples

Read the embedded TofDAQ configuration from a file and count the active recording channels:

>>> ini = _decode(file.attrs["Configuration File Contents"])
>>> count_active_channels(ini)
2

A minimal INI-style example:

>>> count_active_channels("[TOFParameter]\nCh1Record=1\nCh2Record=0\nCh3Record=1")
2
rsciio.tofwerk.file_reader(filename: str | Path, lazy: bool = False, signal: str | list[str] = 'sum_spectrum', chunks: tuple | int | dict | str = 'auto', mz_range: tuple[float, float] | None = None, depth_range: tuple[int, int] | None = None, dtype: dtype | None = None, show_progressbar: bool = True, peak_data_batch_size: int = 1) list[dict[str, Any]]#

Read a Tofwerk TofDAQ HDF5 file.

Parameters:
filenamestr, pathlib.Path

Filename of the file to read or corresponding pathlib.Path.

lazybool, default=False

Whether to open the file lazily or not. The file will stay open until closed in compute() or closed manually. get_file_handle() can be used to access the file handler and close it manually.

signalstr or list of str, optional

Which signal(s) to return. Valid values:

"sum_spectrum" (default)

1-D cumulative spectrum from FullSpectra/SumSpectrum. Always available.

"nominal_peak_data"

4-D array (depth, y, x, m/z) for Tofwerk-generated peaks (labels starting with "nominal"). For pre-processed files, reads the relevant columns of PeakData/PeakData directly. For raw files, reconstructs from FullSpectra/EventList.

"additional_peak_data"

4-D array (depth, y, x, m/z) for user-defined custom peaks (labels not starting with "nominal"). Only available when the PeakData/PeakTable contains at least one such entry. For pre-processed files reads from PeakData/PeakData directly; for raw files reconstructs from FullSpectra/EventList.

"event_list"

Ragged object array (depth, y, x) of raw uint16 TDC timestamps. Present in raw files; may also be present in pre-processed files.

"fib_images"

3-D array (depth, y, x) of secondary-electron images at full FIB scan resolution (e.g. 256×256). Available only in FIB-SIMS files that contain a FIBImages group.

"all"

All signals available for the file.

Pass a list to request multiple specific signals, e.g. signal=["sum_spectrum", "nominal_peak_data"]. The returned list has one entry per requested signal (or all available for "all").

chunkstuple, int, dict or str, default=”auto”

The chunks used when reading the data lazily. This argument is passed to the dask.array.core.normalize_chunks() function.

mz_rangetuple, optional

Restrict the m/z axis of "peak_data" to peaks whose nominal mass falls within [mz_range[0], mz_range[1]] (inclusive, Da). For pre-processed files only the selected columns are retained after reading; for raw files only the selected peaks are reconstructed from the EventList, so the reconstruction cost scales with the number of selected peaks. If None (default), all peaks are returned.

depth_rangetuple, optional

Restrict the depth axis to slices [depth_range[0], depth_range[1]) (0-indexed, exclusive upper bound). Applies to "peak_data", "event_list", and "fib_images". For pre-processed files the HDF5 dataset is sliced before loading, so only the requested depth slices are read from disk. If None (default), all depth slices are returned.

dtypenumpy.dtype, optional

Cast the "peak_data" array to this dtype after loading. Useful to reduce memory usage, e.g. dtype=np.float16 or dtype=np.uint16 for low-count data. If None (default), the on-disk dtype (float32) is preserved.

show_progressbarbool, default=True

Whether to show the progressbar or not.

peak_data_batch_sizeint, optional

Number of depth slices to read and permute per iteration when loading "peak_data" from a pre-processed file whose peaks are not already in ascending mass order. Smaller batches keep the sort permutation working on arrays that fit in CPU cache, which is faster for large datasets. Default is 1 (one slice at a time). Has no effect when peaks are already sorted or when lazy=True.

Returns:
list of dict

List of dictionaries containing the following fields:

  • ‘data’ – multidimensional numpy.ndarray or dask.array.Array

  • ‘axes’ – list of dictionaries describing the axes containing the fields ‘name’, ‘units’, ‘index_in_array’, and either ‘size’, ‘offset’, and ‘scale’ or a numpy array ‘axis’ containing the full axes vector

  • ‘metadata’ – dictionary containing the parsed metadata

  • ‘original_metadata’ – dictionary containing the full metadata tree from the input file

When the file contains several datasets, each dataset will be loaded as separate dictionary.

Raises:
IOError

If the file is not a Tofwerk TofDAQ HDF5 file.

ValueError

If signal contains an unrecognised value, or if mz_range or depth_range are out of bounds, or if an explicitly requested signal is not available in the file.

NotImplementedError

If lazy=True and signal="nominal_peak_data" or signal="additional_peak_data" on a raw file that requires EventList reconstruction.