Intro to File- and Database-oriented Neuroscience Data Management With Python, SQL, and HDF5
Neuroscience is evolving rapidly, with experimental data becoming increasingly complex. How can you seamlessly integrate vast and diverse datasets for insightful analysis and easy sharing? And how would your process improve if, instead of having to write long scripts, you could analyze data with just a few lines of code? In this workshop, discover the power of database management systems, a game-changer in neuroscience research. We will dive into the world of SQL and learn about DuckDB SQL engine, which makes it easy to apply industry-standard data organization methods to research data as a relational database – no server management needed! You'll also gain hands-on experience with HDF5 and JSON for key-value data storage and learn how to combine various management techniques for optimal convenience and performance by building hybrid database systems. By the course's end, you'll be adept at writing Python scripts to create and extract data from databases, query large databases in SQL, store complex data in HDF5, manage your work with Git, and publish your projects on GitHub.
Prerequisites:
Offerings:
Topics
- Navigating Local and Remote Filesystems with Pathlib and FsSpec
- Parsing and Extracting Metadata from Filenames and JSON Files
- Querying JSON, CSV, and Parquet files with SQL in DuckDB
Intended Participants
- Researchers and students from all universities are welcome.
- Participants of all skill levels and backgrounds are welcome.
Certification Requirements
Students who attend at least 75% of the course will receive a participation certificate by email at the end of the course.
Software Requirements
All students must attend the course with a Windows, Mac, or Linux Computer they can use to do the course exercises.
Register: https://www.zoom.com/
Zoom is a video conferencing software that allows for virtual meetings and webinars. It is essential for attending our online workshop sessions and provides the interactive features needed for effective learning.
Why Zoom?
- Breakout Rooms: Essential for our small-group exercises
- Screen Sharing: Share your screen to get help or demonstrate solutions
- Stable & Reliable: Handles large groups with consistent quality
- Recording: Sessions can be recorded for later review (where permitted)
Installation
Download and install the Zoom Desktop Client from the official website. We require the desktop client rather than the web version for full feature support.
Before Your First Session
- Test Your Setup: Join a test meeting to check audio/video
- Update Zoom: Make sure you have the latest version
- Check Your Internet: Ensure you have a stable connection
- Find a Quiet Space: Minimize background noise during sessions
Workshop Etiquette
- Keep your microphone muted when not speaking
- Use video when possible to help build community
- Use reactions (👍, ✋) to provide feedback
- Ask questions in chat or unmute to speak
- Be ready to join breakout rooms for exercises
Tips
- Familiarize yourself with screen sharing features before the workshop
- Keep your Zoom name consistent with your registration
- Use virtual backgrounds if needed for privacy
- Enable “dual monitor mode” if you have two screens
Register: https://code.visualstudio.com/download
Visual Studio Code is a powerful, lightweight code editor used for developing software. It supports various programming languages through extensions and provides an excellent environment for Python development and data science work.
Why VS Code?
- Free & Open Source: Completely free with active community development
- Extensible: Thousands of extensions for any language or tool
- Integrated Tools: Built-in terminal, debugger, and Git integration
- Jupyter Support: Work with notebooks directly in the editor
- Remote Development: Edit files on remote servers or in containers
Installation
Download and install Visual Studio Code from the official website. Choose the appropriate version for your operating system (Windows, macOS, or Linux).
Essential Extensions for Research
Python Development
- Python - IntelliSense, debugging, code navigation
- Jupyter - Run and edit Jupyter notebooks
- Pylance - Fast, feature-rich Python language support
Collaboration & Version Control
- GitLens - Supercharge Git integration
- Live Share - Real-time collaborative editing
Data & Visualization
- Data Wrangler - Explore and clean data visually
- Rainbow CSV - Colorize CSV files for easier reading
Tips
- Learn keyboard shortcuts to improve efficiency (
Ctrl+Shift+P/Cmd+Shift+Pfor command palette) - Customize your theme and settings
- Use the integrated terminal for running commands
- Enable autosave to never lose work
- Use Zen Mode (
Ctrl+K Z) for distraction-free coding
Getting Started with Python
- Install the Python extension
- Select your Python interpreter (
Ctrl+Shift+P→ “Python: Select Interpreter”) - Open a
.pyfile or create a new one - Run code using the play button or
Ctrl+Alt+N
Register: https://conda-forge.org/download
Conda is a package manager that simplifies the installation of scientific software. It helps in creating isolated environments for different projects, ensuring reproducibility and preventing dependency conflicts.
Why Conda?
- Solves Dependencies: Automatically resolves and installs all package dependencies
- Environment Isolation: Keep different projects separate with their own package versions
- Cross-Platform: Works consistently across Windows, macOS, and Linux
- Scientific Focus: Optimized for data science and research computing packages
Installation
We recommend installing Miniforge, which includes conda and uses conda-forge as the default channel.
- Download Miniforge from the official website
- Run the installer for your operating system
- Follow the installation prompts
- Restart your terminal/command prompt
Getting Started
Create a new environment:
conda create -n myenv python=3.11
conda activate myenv
Install packages:
conda install numpy pandas matplotlib
Best Practices
- Use separate environments for different projects
- Keep your base environment minimal
- Export environment specifications for reproducibility:
conda env export > environment.yml - Use
conda-forgechannel for the latest packages
Tips
- List environments:
conda env list - Remove environment:
conda env remove -n myenv - Update packages:
conda update --all
Register: https://git-scm.com/downloads
Git is a version control system that tracks changes in source code. It allows multiple people to work on a project simultaneously and maintains a complete history of all changes.
Why Git?
- Distributed: Every developer has a complete copy of the project history
- Branching: Experiment with new features without affecting the main codebase
- Collaboration: Work with others seamlessly through platforms like GitHub
- Reproducibility: Track exactly which version of code produced which results
Installation
Download and install Git from the official website. Choose the appropriate installer for your operating system.
Windows
Use Git for Windows installer with recommended defaults.
macOS
Git comes pre-installed on most macOS systems. Update with Homebrew: brew install git
Linux
Install using your package manager: sudo apt-get install git (Ubuntu/Debian)
Configuration
After installation, configure your identity:
git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"
Essential Commands
git clone- Copy a repository to your local machinegit add- Stage changes for commitgit commit- Save changes with a messagegit push- Upload changes to remote repositorygit pull- Download changes from remote repository
Tips
- Use meaningful commit messages that explain why you made changes
- Commit frequently to create detailed checkpoints
- Create branches for new features or experiments
- Use
.gitignoreto exclude data files and generated content
Register: https://github.com/
GitHub is a web-based platform built around Git that provides hosting for software development and version control. It’s the world’s largest code hosting platform and essential for modern collaborative research.
Why GitHub?
- Collaboration: Work with researchers worldwide on shared projects
- Visibility: Make your research code discoverable and citable
- Integration: Connect with CI/CD, documentation, and project management tools
- Community: Access to millions of open-source projects and libraries
- Free for Research: Unlimited public and private repositories
Getting Started
- Create a free account at github.com
- Set up Git on your local machine
- Configure Git with your GitHub credentials
- Create your first repository or clone an existing one
Essential Features
Repositories
- Host your code with full version history
- README files for documentation
- Issues for tracking bugs and features
- Pull requests for code review
Collaboration
- Fork projects to contribute
- Star repositories to bookmark them
- Follow researchers working in your field
- Use GitHub Pages for project websites
Tips for Researchers
- Include a LICENSE file to clarify how others can use your code
- Write a clear README explaining what your code does
- Create a CITATION.cff file for proper attribution
- Use releases to mark versions associated with publications
- Add topics to make your repository discoverable
Best Practices
- Commit often with meaningful messages
- Use branches for new features
- Write clear documentation
- Add a DOI through Zenodo integration for permanent archiving
Register: https://www.sciebo.de/
Sciebo is a cloud storage service for universities in North Rhine-Westphalia, Germany. It provides secure, GDPR-compliant storage for research data with large storage quotas.
Why Sciebo?
- Secure: Hosted in Germany with GDPR compliance
- Generous Storage: Large quotas for academic users
- University Integration: Uses your university credentials
- Collaboration: Share files and folders with colleagues
- Sync Across Devices: Desktop and mobile apps available
Features
- File synchronization across devices
- Sharing via links with password protection
- Collaborative document editing
- Version history for files
- Integration with university authentication
Getting Started
- Access Sciebo through your university’s login
- Install the desktop sync client (optional)
- Create folders for organizing your research data
- Use sharing features to collaborate with colleagues
Best Practices
- Organize files in clear folder structures
- Use descriptive file names with dates
- Set appropriate sharing permissions (read vs. edit)
- Regularly backup important data to multiple locations
- Be mindful of data sensitivity and compliance requirements
Tips
- Use selective sync to save local disk space
- Share folders instead of individual files for projects
- Use public links for sharing with external collaborators
- Check your storage quota regularly