Tofwerk (fibTOF FIB-SIMS)#
Warning
This plugin and the file format documentation below were developed by reverse-engineering real acquisition files without access to official Tofwerk format specifications. Descriptions of HDF5 groups, attributes, and data layouts may be incomplete or incorrect. If you notice anything wrong, please open an issue on the RosettaSciIO issue tracker!
Reads HDF5 files (.h5) written by TofDAQ acquisition software for Tofwerk time-of-flight secondary
ion mass spectrometry (ToF-SIMS) instruments, including the fibTOF FIB-SIMS system.
Acquired signals#
A fibTOF acquisition produces two raw signal types and one derived signal:
Raw signals
EventList — the primary raw detector output. For each pixel in the 3-D raster
(depth, y, x), a variable-length array of Time-to-Digital Converter (TDC) timestamps corresponding to every individual ion detection event. Each write (depth slice) mills a thin layer from the sample surface and then rasters the FIB beam across the exposed face to collect SIMS spectra. The spatial axes are:depth — one entry per FIB milling + SIMS acquisition cycle (
NbrWrites).y — rows of the 2-D SIMS raster scan (
NbrSegments).x — columns of the 2-D SIMS raster scan.
FIB SE images — secondary-electron images acquired simultaneously at the full FIB scan resolution (e.g. 256×256), one image per depth slice. The SE image resolution is typically higher than the SIMS raster (e.g. 128×128 for a “256×256 2×2” acquisition), so SE images and SIMS data have independent pixel sizes — both derived from
FIBParams.ViewField(in mm) divided by the respective pixel count.
Derived signal
Peak data — the signal most users work with. The EventList is integrated over user-defined mass windows (
PeakData/PeakTable) to produce a 4-D array(depth, y, x, m/z)of per-peak ion counts. In pre-processed files this integration has already been performed by the Tofwerk software and the result is stored asPeakData/PeakData. For raw files it can be reconstructed on load (seesignal="peak_data"below).
File states#
Two file states are supported:
Pre-processed files (
PeakData/PeakDatapresent) – A file where the Tofwerk software has already integrated the raw events into per-peak counts.Raw files (no
PeakData/PeakData) – unprocessed TofDAQ output as originally saved by the acquisition software. The"peak_data"signal can be reconstructed from theEventListdata using the integration windows inPeakData/PeakTable, but this can be computationally intensive (see below).
HDF5 file structure#
All TofDAQ .h5 files share the following top-level layout. Groups marked
[raw only] are absent from pre-processed files; groups marked
[pre-processed only] are absent from raw files (though the Tofwerk
software may preserve EventList when pre-processing).
/ root — acquisition-wide attributes and metadata
├── AcquisitionLog/
│ └── Log compound dataset; one row per log entry.
│ Log[0]['timestring'] is the authoritative
│ ISO-8601 acquisition timestamp. This node
│ also contains the acquisition "finish" time.
├── FIBImages/
│ ├── Image0000/
│ │ └── Data float64 (H, W) — SE image at full FIB resolution
│ ├── Image0001/ ...
│ └── Image000N/ one subgroup per image; sorted lexicographically
├── FIBParams/ group attributes — FIB column settings
├── FibParams/
│ └── FibPressure/
│ └── TwData float64 (NbrWrites, 1) — chamber pressure in Pa
│ for every depth slice in the dataset
├── FullSpectra/
│ ├── MassAxis float32 (NbrSamples,) — calibrated m/z in Da
│ ├── SumSpectrum float64 (NbrSamples,) — cumulative ion counts
│ ├── SaturationWarning uint8 (NbrWrites, NbrSegments)
│ │ Per-buffer saturation flag: 0 = no saturation,
│ │ 1 = saturation detected. High values
│ │ indicate that the detector was overloaded during that
│ │ (write, segment) combination and the corresponding
│ │ spectral data should be treated with caution.
│ └── EventList vlen uint16 (NbrWrites, NbrSegments, NbrX)
│ each pixel contains one variable-length TDC
│ timestamp array
├── PeakData/
│ ├── PeakTable compound (NbrPeaks,) — peak definitions:
│ │ label, mass (Da), lower/upper integration limits.
│ │ Will contain "nominal" peaks created by the TofDAQ
│ │ software, together with any user-defined peak integration
│ │ windows created before the file was saved
│ └── PeakData float32 (NbrWrites, NbrSegments, NbrX, NbrPeaks)
│ [pre-processed only] — peak-integrated ion counts
├── TimingData/
│ ├── BufTimes float64 (NbrWrites, NbrSegments) — wall-clock
│ │ timestamps per buffer, in seconds
│ └── (group attributes) TofPeriod — Analog-Digital Converter (ADC) samples per ToF pulse
└── TPS2/ instrument telemetry — Sampled once per scan line.
├── TwData float64 (NbrWrites, NbrSegments, 75) — target
│ and monitored voltages/flags for 75 named
│ channels. Mostly low-level instrument debug
│ data not relevant to signal processing.
│ Channels of potential diagnostic interest:
│ MCP bias monitor, filament emission monitor,
│ and ToF pulser voltages.
└── TwInfo |S256 (75,) — channel names and units
Key root attributes:
Attribute |
Type |
Description |
|---|---|---|
|
float32 |
Primary format-detection marker (e.g. |
|
int32 |
Number of depth slices (milling steps). |
|
int32 |
Number of Y scan lines per write. |
|
int32 |
Number of bins (samples) per spectrum (length of |
|
int32 |
Number of peaks in the peak table. |
|
int32 |
Hardware waveform averaging count (normally 1). |
|
bytes |
|
Key FIBParams group attributes:
Attribute |
Type |
Description |
|---|---|---|
|
bytes |
FIB platform identifier (e.g. |
|
float64 |
Primary ion beam voltage in V (divide by 1000 for kV). |
|
float64 |
Ion beam current in A (zero if not measured). |
|
float64 |
Field of view in mm. Divided by the pixel count to obtain pixel size in µm — separately for the SIMS raster and the FIB SE images. |
Inspecting available signals#
Use rsciio.tofwerk.available_signals() to check which signals a file
contains before loading:
from rsciio.tofwerk import available_signals
available_signals("acquisition.h5")
# ['sum_spectrum', 'peak_data', 'fib_images'] # pre-processed file
# ['sum_spectrum', 'peak_data', 'event_list', 'fib_images'] # raw file
Selecting signals#
The signal parameter controls which signals are returned. The default is
"sum_spectrum", which is always fast.
import hyperspy.api as hs
# Default: 1-D cumulative spectrum (fast, always available)
s = hs.load("acquisition.h5", file_format="Tofwerk")
# 4-D peak array (depth × y × x × m/z)
s = hs.load("acquisition.h5", file_format="Tofwerk", signal="peak_data")
# FIB SE image stack (depth × y × x), full FIB scan resolution
s = hs.load("acquisition.h5", file_format="Tofwerk", signal="fib_images")
# Raw TDC timestamps as a ragged signal
s = hs.load("acquisition.h5", file_format="Tofwerk", signal="event_list")
# Multiple signals at once
signals = hs.load(
"acquisition.h5",
file_format="Tofwerk",
signal=["sum_spectrum", "peak_data"],
)
# All signals available for this file
signals = hs.load("acquisition.h5", file_format="Tofwerk", signal="all")
Valid values for signal:
"sum_spectrum"(default)1-D cumulative spectrum from
FullSpectra/SumSpectrum. Always available. Supports lazy loading."peak_data"4-D array
(depth, y, x, m/z).Pre-processed files: reads
PeakData/PeakDatadirectly. Supports lazy loading (recommended for large files).Raw files: reconstructs the 4-D array by walking the variable-length
FullSpectra/EventListand integrating events within each peak window. Always eager. On large files this can take several minutes; installtqdmfor a progress bar andnumbafor a ~19× speed-up.
"event_list"Ragged object array
(depth, y, x)of raw uint16 TDC timestamps, one variable-length array per pixel. Present in all raw files; also available in pre-processed files if the Tofwerk software did not remove it. Supports lazy loading. HyperSpy represents it as a ragged signal and does not support plotting it directly."fib_images"3-D stack
(depth, y, x)of secondary-electron images at full FIB scan resolution. Available in FIB-SIMS files that contain aFIBImagesgroup. Supports lazy loading. If any images have a non-dominant shape (e.g. a truncated final frame), they are skipped with a warning."all"All signals available for the file. Use
available_signals()to see the list in advance.
Subsetting#
Use mz_range and depth_range to read only a portion of the data:
# Load only peaks between 20 and 100 Da
s = hs.load("acquisition.h5", file_format="Tofwerk",
signal="peak_data", mz_range=(20.0, 100.0))
# Load only depth slices 10–19 (exclusive upper bound)
s = hs.load("acquisition.h5", file_format="Tofwerk",
signal="peak_data", depth_range=(10, 20))
Both parameters work for pre-processed files by slicing the HDF5 dataset
before reading. For raw files, mz_range also reduces reconstruction cost
because only the selected peaks need to be counted. depth_range applies
to "peak_data", "event_list", and "fib_images".
Use dtype to cast "peak_data" after loading, e.g.
dtype=np.uint16 for low-count integer data or dtype=np.float16 to
halve memory use at reduced precision.
Lazy loading#
Pass lazy=True to defer reading large arrays until .compute() is called:
s = hs.load(
"large_acquisition.h5",
file_format="Tofwerk",
lazy=True,
signal="peak_data",
chunks="auto", # or a tuple / dict for manual chunking
)
# s.data is a dask array; inspect size before computing:
print(s.data.nbytes / 1e9, "GB")
s.compute()
Lazy loading is supported for "sum_spectrum", "peak_data"
(pre-processed files), "event_list", and "fib_images". Reconstructing
"peak_data" from a raw file’s EventList is always eager.
Optional dependencies#
numba— JIT-accelerates EventList reconstruction (~19× faster than the NumPy fallback on large datasets).
Pass show_progressbar=False to suppress progress output entirely.
Performance tips#
EventList reconstruction (raw files,
signal="peak_data") can be slow on large datasets. Installnumbafor a large speed-up, and usemz_rangeto limit reconstruction to the peaks of interest. Adjustpeak_data_batch_sizeto process multiple depth slices per batch (trades memory for fewer Python iterations).Pre-processed files with unsorted peaks: if
PeakData/PeakTablepeaks are not in ascending m/z order, the reader sorts them in place.peak_data_batch_sizecontrols how many depth slices are sorted per iteration; smaller values keep working arrays in CPU cache.
Metadata#
FIB-SIMS specific fields are stored under Acquisition_instrument.FIB_SIMS:
meta = s.metadata.Acquisition_instrument.FIB_SIMS
meta.file_type # "pre-processed" or "raw"
meta.FIB.hardware # FIB hardware identifier (e.g. "Tescan")
meta.FIB.voltage_kV # Ion beam accelerating voltage in kV
meta.FIB.current_A # Ion beam current in A
meta.FIB.view_field_mm # Field of view in mm
meta.FIB.pixel_size_um # SIMS pixel size in µm
meta.ToF.ion_mode # "positive" or "negative"
meta.DAQ.tofdaq_version # TofDAQ software version string
Note
The .h5 extension is shared with other formats (EMD, Arina, etc.).
The plugin identifies Tofwerk files by checking for TofDAQ-specific HDF5
groups (FullSpectra, TimingData, AcquisitionLog) and the
TofDAQ Version root attribute. Specify file_format="Tofwerk"
explicitly if automatic detection fails.
API functions#
- rsciio.tofwerk.available_signals(filename: str | Path) list[str]#
Return the list of signal names available in a Tofwerk TofDAQ HDF5 file.
This is a fast, read-only inspection call — no large data arrays are loaded.
- Parameters:
- filename
strorpathlib.Path Path to the
.h5file.
- filename
- Returns:
- Raises:
IOErrorIf the file is not a Tofwerk TofDAQ HDF5 file.
Examples
>>> from rsciio.tofwerk import available_signals >>> available_signals("my_acquisition.h5") ['sum_spectrum', 'nominal_peak_data', 'fib_images']
- rsciio.tofwerk.compute_peak_data_from_eventlist(event_list: ~numpy.ndarray, mass_axis: ~numpy.ndarray, nbr_samples: int, clock_ratio: int, normalization: int, peak_table: list[dict[str, ~typing.Any]], show_progressbar: bool = True, dtype: ~numpy.dtype = <class 'numpy.uint16'>) ndarray#
Reconstruct the 4-D peak-integrated array from raw EventList data.
This replicates the processing performed by the Tofwerk proprietary software when opening a raw file:
For each pixel’s variable-length EventList (raw TDC timestamps), convert timestamps to ADC sample indices by integer division with the TDC-to-ADC clock ratio (
SampleInterval / ClockPeriod, typically 64).Look up the calibrated mass for each sample index via
MassAxis.Count events that fall within each peak’s integration window.
Divide by
NbrWaveforms × NActiveChannels— the number of times each physical ion is recorded in the EventList (once per ToF cycle per active recording channel).
Note
If
numbais installed, each depth slice is read in a single h5py call (instead of one call per row), then each row is packed into a flat int64 buffer and processed by a JIT-compiled kernel. Otherwise falls back to a vectorised NumPy path that allocates per-pixel arrays but is still significantly faster than a pure-Python triple loop.- Parameters:
- event_listarray_like
Ragged object array of shape
(nwrites, nsegs, nx)of uint16 TDC timestamps, one variable-length array per pixel. Accepts h5py variable-length datasets or numpy object arrays (as loaded bysignal="event_list").- mass_axis
numpy.ndarray 1-D array of shape
(nbr_samples,)with calibrated mass (Da) for each ADC sample index, fromFullSpectra/MassAxis.- nbr_samples
int Length of
mass_axis; events outside[0, nbr_samples)are discarded.- clock_ratio
int TDC-to-ADC sample index divisor (
SampleInterval / ClockPeriod). Pass 1 ifClockPeriodis not available.- normalization
int NbrWaveforms × NActiveChannels— the divisor used to convert raw event counts to per-waveform ion counts.- peak_table
listofdict Integration windows. Each dict must have keys
lower_integration_limitandupper_integration_limit(Da).- show_progressbarbool, default=True
Whether to show tqdm progress bars during reconstruction.
- dtype
numpy.dtype, default=np.uint16 Output array dtype. Counts are accumulated as float32 internally (for the normalization division) and cast to this dtype on return.
- Returns:
numpy.ndarrayShape
(nwrites, nsegs, nx, npeaks), cast todtype. Identical in structure toPeakData/PeakDatain a pre-processed file.
- rsciio.tofwerk.count_active_channels(ini: str) int#
Return the number of active TDC recording channels from the INI config string.
TofDAQ records events from multiple ADC/TDC channels simultaneously (e.g. Ch1 and Ch3 in a typical fibTOF setup). Each active channel contributes an independent copy of every ion event to the EventList, so the raw event count per pixel is multiplied by this factor.
- Parameters:
- ini
str Decoded text from the HDF5 root attribute
file.attrs["Configuration File Contents"]. This is the TofDAQ configuration stored as an INI-style string, for example containing entries such asCh1Record=1andCh2Record=0.
- ini
- Returns:
intNumber of active channels (at least 1).
Examples
Read the embedded TofDAQ configuration from a file and count the active recording channels:
>>> ini = _decode(file.attrs["Configuration File Contents"]) >>> count_active_channels(ini) 2
A minimal INI-style example:
>>> count_active_channels("[TOFParameter]\nCh1Record=1\nCh2Record=0\nCh3Record=1") 2
- rsciio.tofwerk.file_reader(filename: str | Path, lazy: bool = False, signal: str | list[str] = 'sum_spectrum', chunks: tuple | int | dict | str = 'auto', mz_range: tuple[float, float] | None = None, depth_range: tuple[int, int] | None = None, dtype: dtype | None = None, show_progressbar: bool = True, peak_data_batch_size: int = 1) list[dict[str, Any]]#
Read a Tofwerk TofDAQ HDF5 file.
- Parameters:
- filename
str,pathlib.Path Filename of the file to read or corresponding pathlib.Path.
- lazybool, default=False
Whether to open the file lazily or not. The file will stay open until closed in
compute()or closed manually.get_file_handle()can be used to access the file handler and close it manually.- signal
strorlistofstr, optional Which signal(s) to return. Valid values:
"sum_spectrum"(default)1-D cumulative spectrum from
FullSpectra/SumSpectrum. Always available."nominal_peak_data"4-D array
(depth, y, x, m/z)for Tofwerk-generated peaks (labels starting with"nominal"). For pre-processed files, reads the relevant columns ofPeakData/PeakDatadirectly. For raw files, reconstructs fromFullSpectra/EventList."additional_peak_data"4-D array
(depth, y, x, m/z)for user-defined custom peaks (labels not starting with"nominal"). Only available when thePeakData/PeakTablecontains at least one such entry. For pre-processed files reads fromPeakData/PeakDatadirectly; for raw files reconstructs fromFullSpectra/EventList."event_list"Ragged object array
(depth, y, x)of raw uint16 TDC timestamps. Present in raw files; may also be present in pre-processed files."fib_images"3-D array
(depth, y, x)of secondary-electron images at full FIB scan resolution (e.g. 256×256). Available only in FIB-SIMS files that contain aFIBImagesgroup."all"All signals available for the file.
Pass a list to request multiple specific signals, e.g.
signal=["sum_spectrum", "nominal_peak_data"]. The returned list has one entry per requested signal (or all available for"all").- chunks
tuple,int,dictorstr, default=”auto” The chunks used when reading the data lazily. This argument is passed to the
dask.array.core.normalize_chunks()function.- mz_range
tuple, optional Restrict the m/z axis of
"peak_data"to peaks whose nominal mass falls within[mz_range[0], mz_range[1]](inclusive, Da). For pre-processed files only the selected columns are retained after reading; for raw files only the selected peaks are reconstructed from the EventList, so the reconstruction cost scales with the number of selected peaks. IfNone(default), all peaks are returned.- depth_range
tuple, optional Restrict the depth axis to slices
[depth_range[0], depth_range[1])(0-indexed, exclusive upper bound). Applies to"peak_data","event_list", and"fib_images". For pre-processed files the HDF5 dataset is sliced before loading, so only the requested depth slices are read from disk. IfNone(default), all depth slices are returned.- dtype
numpy.dtype, optional Cast the
"peak_data"array to this dtype after loading. Useful to reduce memory usage, e.g.dtype=np.float16ordtype=np.uint16for low-count data. IfNone(default), the on-disk dtype (float32) is preserved.- show_progressbarbool, default=True
Whether to show the progressbar or not.
- peak_data_batch_size
int, optional Number of depth slices to read and permute per iteration when loading
"peak_data"from a pre-processed file whose peaks are not already in ascending mass order. Smaller batches keep the sort permutation working on arrays that fit in CPU cache, which is faster for large datasets. Default is 1 (one slice at a time). Has no effect when peaks are already sorted or whenlazy=True.
- filename
- Returns:
listofdictList of dictionaries containing the following fields:
‘data’ – multidimensional
numpy.ndarrayordask.array.Array‘axes’ – list of dictionaries describing the axes containing the fields ‘name’, ‘units’, ‘index_in_array’, and either ‘size’, ‘offset’, and ‘scale’ or a numpy array ‘axis’ containing the full axes vector
‘metadata’ – dictionary containing the parsed metadata
‘original_metadata’ – dictionary containing the full metadata tree from the input file
When the file contains several datasets, each dataset will be loaded as separate dictionary.
- Raises:
IOErrorIf the file is not a Tofwerk TofDAQ HDF5 file.
ValueErrorIf
signalcontains an unrecognised value, or ifmz_rangeordepth_rangeare out of bounds, or if an explicitly requested signal is not available in the file.NotImplementedErrorIf
lazy=Trueandsignal="nominal_peak_data"orsignal="additional_peak_data"on a raw file that requires EventList reconstruction.