Xarray is an open-source library that brings the power of labeled, multi-dimensional arrays to Python. It’s built on top of NumPy and integrates seamlessly with pandas, providing an intuitive interface for working with complex scientific datasets.
Why xarray?
- Labeled Dimensions: Index data by dimension names instead of axis numbers
- Coordinate-Based Selection: Select data using meaningful coordinates (time, frequency, channel)
- NetCDF Integration: Native support for NetCDF and HDF5 formats
- Broadcasting: Automatic alignment of arrays based on dimension names
- Metadata: Keep track of units, descriptions, and other metadata
Key Concepts
DataArray
A single multi-dimensional array with labeled dimensions:
import xarray as xr
import numpy as np
# Create a DataArray with labeled dimensions
data = xr.DataArray(
np.random.randn(3, 4, 5),
dims=["time", "channel", "trial"],
coords={
"time": np.arange(3),
"channel": ["Ch1", "Ch2", "Ch3", "Ch4"],
"trial": np.arange(5)
}
)
Dataset
A collection of DataArrays with shared dimensions:
ds = xr.Dataset({
"lfp": (["time", "channel"], lfp_data),
"spikes": (["time", "unit"], spike_data),
})
Common Use Cases in Research
Electrophysiology Data
# Load multi-channel recording
lfp = xr.DataArray(
recording_data,
dims=["time", "channel", "trial"],
coords={
"time": times,
"channel": channel_names,
"trial": trial_ids
}
)
# Select specific channels and time window
baseline = lfp.sel(channel=["Ch1", "Ch2"], time=slice(0, 1000))
# Average across trials
mean_response = lfp.mean(dim="trial")
Time-Frequency Analysis
# Store spectrogram with time and frequency coordinates
spectrogram = xr.DataArray(
tfr_data,
dims=["time", "frequency", "channel"],
coords={
"time": time_bins,
"frequency": freq_bins,
"channel": channels
}
)
# Select theta band
theta = spectrogram.sel(frequency=slice(4, 8))
Integration with Other Tools
- Pandas: Convert between DataFrames and DataArrays
- NumPy: All NumPy operations work on xarray objects
- Matplotlib: Direct plotting with labeled axes
- Dask: Parallel computing with large arrays
- NetCDF4: Read/write NetCDF files natively
Installation
pixi add xarray
# or
conda install -c conda-forge xarray
# or
pip install xarray
Best Practices
- Use meaningful dimension and coordinate names
- Include units and descriptions in attributes
- Save to NetCDF format for efficient storage
- Use
.sel()for label-based indexing,.isel()for position-based - Leverage automatic broadcasting for operations across dimensions