NetCDF (Network Common Data Form) is a set of software libraries and self-describing, machine-independent data formats for array-oriented scientific data. Originally developed for climate science, it’s now widely used across scientific domains.
Key Features
- Self-describing: Variables include metadata describing dimensions, units, and conventions
- Portable: Platform-independent binary format
- Efficient access: Direct access to subsets of large arrays
- CF Conventions: Standardized metadata conventions for geophysical data
- Multiple backends: Can use HDF5 as storage layer (NetCDF-4)
Scientific Applications
NetCDF is commonly used for:
- Time-series data with spatial dimensions
- Multi-dimensional experimental recordings
- Model outputs and simulations
- Data with complex coordinate systems
Python Integration
import xarray as xr
import numpy as np
# Create dataset
ds = xr.Dataset(
{
'neural_activity': (['time', 'channel', 'trial'], activity_data),
'stimulus': (['time', 'trial'], stimulus_data),
},
coords={
'time': np.arange(0, 10, 0.001), # 10s at 1kHz
'channel': np.arange(64),
'trial': np.arange(20),
},
attrs={
'experiment': 'visual_response',
'subject_id': 'M01',
'recording_date': '2024-01-15',
}
)
# Add metadata
ds['neural_activity'].attrs['units'] = 'microvolts'
ds['neural_activity'].attrs['description'] = 'LFP recordings'
# Save to NetCDF
ds.to_netcdf('experiment.nc')
# Load and work with subsets
ds = xr.open_dataset('experiment.nc')
trial_5 = ds.sel(trial=5)
channels_10_20 = ds.sel(channel=slice(10, 20))
When to Use NetCDF
Best for:
- Multi-dimensional arrays with labeled dimensions
- Time-series data with spatial/channel structure
- Data sharing with standardized metadata
- Integration with xarray workflows
Consider alternatives for:
- Simple tabular data (use Parquet or CSV)
- Hierarchical/nested structures (use HDF5)
- Specialized neuroscience formats (use NWB)