Core Skills

Research Data Management with DataLad

In this online, hands-on workshop, introduces DataLad, a powerful tool designed for organizing, tracking, and sharing scientific data. Built on the robust foundation of git and git-annex, DataLad combines version control for code with efficient handling of large data files, giving you complete control over your research workflow. You'll gain practical experience using DataLad to structure your datasets, track changes over time, and maintain reproducibility throughout your research process. Finally, you'll learn how to use open science platforms like GIN and the Open Science Framework to publish your DataLad-managed datasets.

Online

English

Prerequisites:

Essential Computer Tools for Researchers

Offerings:

DataLad - December 2025

December 5, 2025

09:30 - 17:00

Registration Closed

Topics

Version Control: Tracking Data and Analysis Pipelines
Data management: Best Practices for Structuring Datasets
Collaboration: Sharing Data using Open Science Platforms

Intended Participants

Researchers and students from all universities are welcome.
Participants of all skill levels and backgrounds are welcome.

Certification Requirements

Students who attend at least 75% of the course will receive a participation certificate by email at the end of the course.

Software Requirements

All students must attend the course with a Windows, Mac, or Linux Computer they can use to do the course exercises.

Zoom is a video conferencing software that allows for virtual meetings and webinars. It is essential for attending our online workshop sessions and provides the interactive features needed for effective learning.

Why Zoom?

Breakout Rooms: Essential for our small-group exercises
Screen Sharing: Share your screen to get help or demonstrate solutions
Stable & Reliable: Handles large groups with consistent quality
Recording: Sessions can be recorded for later review (where permitted)

Installation

Download and install the Zoom Desktop Client from the official website. We require the desktop client rather than the web version for full feature support.

Before Your First Session

Test Your Setup: Join a test meeting to check audio/video
Update Zoom: Make sure you have the latest version
Check Your Internet: Ensure you have a stable connection
Find a Quiet Space: Minimize background noise during sessions

Workshop Etiquette

Keep your microphone muted when not speaking
Use video when possible to help build community
Use reactions (👍, ✋) to provide feedback
Ask questions in chat or unmute to speak
Be ready to join breakout rooms for exercises

Tips

Familiarize yourself with screen sharing features before the workshop
Keep your Zoom name consistent with your registration
Use virtual backgrounds if needed for privacy
Enable “dual monitor mode” if you have two screens

Learn more about Zoom

Visual Studio Code is a powerful, lightweight code editor used for developing software. It supports various programming languages through extensions and provides an excellent environment for Python development and data science work.

Why VS Code?

Free & Open Source: Completely free with active community development
Extensible: Thousands of extensions for any language or tool
Integrated Tools: Built-in terminal, debugger, and Git integration
Jupyter Support: Work with notebooks directly in the editor
Remote Development: Edit files on remote servers or in containers

Installation

Download and install Visual Studio Code from the official website. Choose the appropriate version for your operating system (Windows, macOS, or Linux).

Essential Extensions for Research

Python Development

Python - IntelliSense, debugging, code navigation
Jupyter - Run and edit Jupyter notebooks
Pylance - Fast, feature-rich Python language support

Collaboration & Version Control

GitLens - Supercharge Git integration
Live Share - Real-time collaborative editing

Data & Visualization

Data Wrangler - Explore and clean data visually
Rainbow CSV - Colorize CSV files for easier reading

Tips

Learn keyboard shortcuts to improve efficiency (Ctrl+Shift+P / Cmd+Shift+P for command palette)
Customize your theme and settings
Use the integrated terminal for running commands
Enable autosave to never lose work
Use Zen Mode (Ctrl+K Z) for distraction-free coding

Getting Started with Python

Install the Python extension
Select your Python interpreter (Ctrl+Shift+P → “Python: Select Interpreter”)
Open a .py file or create a new one
Run code using the play button or Ctrl+Alt+N

Learn more about Visual Studio Code

Pixi is a modern package manager that simplifies the installation of scientific software. It’s built on top of the conda-forge ecosystem but is significantly faster and provides better dependency resolution.

Why Pixi?

Fast: 5-10x faster than conda for most operations
Reproducible: Uses lock files to ensure exact environment reproduction
Task Runner: Built-in task management (like npm scripts)
Modern Design: Clean CLI with better error messages
Conda Compatible: Uses the conda-forge repository

Installation

Follow the installation instructions on the official Pixi website. The installer will set up Pixi and configure your PATH automatically.

Getting Started

Initialize a new project:

pixi init
pixi add python
pixi shell

Add packages:

pixi add numpy pandas matplotlib

Define and run tasks in pixi.toml:

[tasks]
dev = "python main.py"
test = "pytest tests/"

Run tasks:

pixi run dev
pixi run test

Advantages Over Conda

Much faster package resolution and installation
Lock files ensure reproducibility by default
Better support for managing multiple projects
Built-in task runner eliminates need for separate tools

Tips

Use pixi shell to activate the environment
Define common tasks in pixi.toml for easy project workflows
Lock files (pixi.lock) should be committed to version control
Use pixi global install for system-wide tools

Learn more about Pixi

Git is a version control system that tracks changes in source code. It allows multiple people to work on a project simultaneously and maintains a complete history of all changes.

Why Git?

Distributed: Every developer has a complete copy of the project history
Branching: Experiment with new features without affecting the main codebase
Collaboration: Work with others seamlessly through platforms like GitHub
Reproducibility: Track exactly which version of code produced which results

Installation

Download and install Git from the official website. Choose the appropriate installer for your operating system.

Windows

Use Git for Windows installer with recommended defaults.

macOS

Git comes pre-installed on most macOS systems. Update with Homebrew: brew install git

Linux

Install using your package manager: sudo apt-get install git (Ubuntu/Debian)

Configuration

After installation, configure your identity:

git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"

Essential Commands

git clone - Copy a repository to your local machine
git add - Stage changes for commit
git commit - Save changes with a message
git push - Upload changes to remote repository
git pull - Download changes from remote repository

Tips

Use meaningful commit messages that explain why you made changes
Commit frequently to create detailed checkpoints
Create branches for new features or experiments
Use .gitignore to exclude data files and generated content

Learn more about Git

GitHub is a web-based platform built around Git that provides hosting for software development and version control. It’s the world’s largest code hosting platform and essential for modern collaborative research.

Why GitHub?

Collaboration: Work with researchers worldwide on shared projects
Visibility: Make your research code discoverable and citable
Integration: Connect with CI/CD, documentation, and project management tools
Community: Access to millions of open-source projects and libraries
Free for Research: Unlimited public and private repositories

Getting Started

Create a free account at github.com
Set up Git on your local machine
Configure Git with your GitHub credentials
Create your first repository or clone an existing one

Essential Features

Repositories

Host your code with full version history
README files for documentation
Issues for tracking bugs and features
Pull requests for code review

Collaboration

Fork projects to contribute
Star repositories to bookmark them
Follow researchers working in your field
Use GitHub Pages for project websites

Tips for Researchers

Include a LICENSE file to clarify how others can use your code
Write a clear README explaining what your code does
Create a CITATION.cff file for proper attribution
Use releases to mark versions associated with publications
Add topics to make your repository discoverable

Best Practices

Commit often with meaningful messages
Use branches for new features
Write clear documentation
Add a DOI through Zenodo integration for permanent archiving

Learn more about GitHub

The Open Science Framework (OSF) is a free, open-source platform for managing research projects. It supports the entire research lifecycle from project planning through publication, emphasizing openness and reproducibility.

Why OSF?

Free & Open: Completely free for researchers worldwide
Integrated Workflow: Connects with GitHub, Dropbox, Google Drive, and more
Permanent Storage: Long-term preservation of research materials
Preregistration: Register study plans before data collection
DOIs: Create citable, permanent identifiers for your work

Key Features

Project Management

Organize research materials in hierarchical projects
Add collaborators with granular permissions
Track changes and maintain version history
Add wiki pages for documentation

Make projects public or keep them private
Generate DOIs for permanent citation
Set embargo periods for timed release
License your work appropriately

Integrations

Connect GitHub repositories
Link cloud storage (Google Drive, Dropbox, Sciebo)
Use add-ons for specialized tools
Export to data repositories

Getting Started

Create a free account at osf.io
Create a new project for your research
Add components for different parts (data, code, materials)
Connect external services (GitHub, etc.)
Share with collaborators or make public

Use Cases

Preregistration: Document your hypotheses and analysis plan before collecting data
Data Sharing: Make datasets available with permanent DOIs
Supplementary Materials: Host materials that don’t fit in paper supplements
Collaboration: Central hub for multi-institution projects

Tips

Use clear, descriptive names for projects and components
Add detailed README files to explain your materials
Use tags to make projects discoverable
Consider making projects public after publication
Link related projects together

Learn more about Open Science Framework (OSF)

GIN (G-Node Infrastructure) is a free data management system designed for comprehensive and reproducible management of scientific data. It’s optimized for neuroscience research but suitable for any field with large datasets.

Why GIN?

Version Control for Data: Git-like workflow for datasets
Large File Support: Efficiently handles files of any size
Free Storage: Generous storage quotas for researchers
Neuroscience Focus: Designed with neuroscience workflows in mind
DOI Integration: Publish datasets with permanent identifiers

Key Features

Web interface for browsing and managing data
Git integration for command-line workflows
Support for large files through git-annex
Issue tracking and wikis for collaboration
Public and private repositories

Getting Started

Create account at gin.g-node.org
Install the GIN client or use Git with git-annex
Create a repository for your dataset
Push data using Git commands
Share with collaborators or make public

GIN vs. GitHub

GIN: Optimized for large data files, neuroscience community
GitHub: Optimized for code, broader software community
Use Both: Store code on GitHub, data on GIN, link them together

Best Practices

Use clear naming conventions for data files
Document data structure in README files
Use Git tags to mark dataset versions
Archive final versions with DOIs before publication
Link GIN datasets to GitHub code repositories

Tips

Use the GIN client for easier large file management
Add metadata files to describe your datasets
Make repositories public after paper acceptance
Use organizations for lab or project group data

Learn more about GIN

Research Data Management with DataLad

Prerequisites:

Essential Computer Tools for Researchers

Offerings:

DataLad - December 2025

Topics

Intended Participants

Certification Requirements

Software Requirements

Zoom

Why Zoom?

Installation

Before Your First Session

Workshop Etiquette

Tips

Visual Studio Code

Why VS Code?

Installation

Essential Extensions for Research

Python Development

Collaboration & Version Control

Data & Visualization

Tips

Getting Started with Python

Pixi

Why Pixi?

Installation

Getting Started

Advantages Over Conda

Tips

Git

Why Git?

Installation

Windows

macOS

Linux

Configuration

Essential Commands

Tips

GitHub

Why GitHub?

Getting Started

Essential Features

Repositories

Collaboration

Tips for Researchers

Best Practices

Open Science Framework (OSF)

Why OSF?

Key Features

Project Management

Open Sharing

Integrations

Getting Started

Use Cases

Tips

GIN

Why GIN?

Key Features

Getting Started

GIN vs. GitHub

Best Practices

Tips