Loading...

Notebook-Driven Development for Research Data Analysis with Papermill and PyDoIt

Data science Notebooks are a powerful tool that allows you to combine code, figures, and written explanations all in one place! Join us in this three-day, hands-on workshop, where we will extract the best out of data science notebooks by learning how to embed explanations, create beautiful plots using hvPlot, generate presentations from our analyses, and even advanced techniques that leverage PyDoIt and Papermill to build multi-notebook pipelines, run the analyses across varying parameters, batch process across mutiple datasets, and even develop fully-documented libraries of reusable code! This course is open to all researchers, with or without experience in a programming language.

Online
English

Offerings:

Sangeetha Nandakumar
Notebook-Driven Development
October 23, 2024
09:30 - 17:00
Registration Closed

Topics

  • Exploratory data analysis with Pandas and hvPlot on Jupyter Lab.
  • Access contents of a notebook from another notebook and creating parameterized analysis notebooks with papermill.
  • We use PyDoIt to create and manage a workflow of fully-documented notebooks.

Intended Participants

  • Researchers and students from all universities are welcome.
  • Participants of all educational backgrounds are welcome.

Certification Requirements

Students who attend at least 75% of the course will receive a participation certificate by email at the end of the course.

Software Requirements

All students must attend the course with a Windows, Mac, or Linux Computer they can use to do the course exercises.

Register: https://www.zoom.com/

Zoom is a video conferencing software that allows for virtual meetings and webinars. It is essential for attending our online workshop sessions and provides the interactive features needed for effective learning.

Why Zoom?

  • Breakout Rooms: Essential for our small-group exercises
  • Screen Sharing: Share your screen to get help or demonstrate solutions
  • Stable & Reliable: Handles large groups with consistent quality
  • Recording: Sessions can be recorded for later review (where permitted)

Installation

Download and install the Zoom Desktop Client from the official website. We require the desktop client rather than the web version for full feature support.

Before Your First Session

  1. Test Your Setup: Join a test meeting to check audio/video
  2. Update Zoom: Make sure you have the latest version
  3. Check Your Internet: Ensure you have a stable connection
  4. Find a Quiet Space: Minimize background noise during sessions

Workshop Etiquette

  • Keep your microphone muted when not speaking
  • Use video when possible to help build community
  • Use reactions (👍, ✋) to provide feedback
  • Ask questions in chat or unmute to speak
  • Be ready to join breakout rooms for exercises

Tips

  • Familiarize yourself with screen sharing features before the workshop
  • Keep your Zoom name consistent with your registration
  • Use virtual backgrounds if needed for privacy
  • Enable “dual monitor mode” if you have two screens
Learn more about Zoom

Register: https://code.visualstudio.com/download

Visual Studio Code is a powerful, lightweight code editor used for developing software. It supports various programming languages through extensions and provides an excellent environment for Python development and data science work.

Why VS Code?

  • Free & Open Source: Completely free with active community development
  • Extensible: Thousands of extensions for any language or tool
  • Integrated Tools: Built-in terminal, debugger, and Git integration
  • Jupyter Support: Work with notebooks directly in the editor
  • Remote Development: Edit files on remote servers or in containers

Installation

Download and install Visual Studio Code from the official website. Choose the appropriate version for your operating system (Windows, macOS, or Linux).

Essential Extensions for Research

Python Development

  • Python - IntelliSense, debugging, code navigation
  • Jupyter - Run and edit Jupyter notebooks
  • Pylance - Fast, feature-rich Python language support

Collaboration & Version Control

  • GitLens - Supercharge Git integration
  • Live Share - Real-time collaborative editing

Data & Visualization

  • Data Wrangler - Explore and clean data visually
  • Rainbow CSV - Colorize CSV files for easier reading

Tips

  • Learn keyboard shortcuts to improve efficiency (Ctrl+Shift+P / Cmd+Shift+P for command palette)
  • Customize your theme and settings
  • Use the integrated terminal for running commands
  • Enable autosave to never lose work
  • Use Zen Mode (Ctrl+K Z) for distraction-free coding

Getting Started with Python

  1. Install the Python extension
  2. Select your Python interpreter (Ctrl+Shift+P → “Python: Select Interpreter”)
  3. Open a .py file or create a new one
  4. Run code using the play button or Ctrl+Alt+N
Learn more about Visual Studio Code

Register: https://conda-forge.org/download

Conda is a package manager that simplifies the installation of scientific software. It helps in creating isolated environments for different projects, ensuring reproducibility and preventing dependency conflicts.

Why Conda?

  • Solves Dependencies: Automatically resolves and installs all package dependencies
  • Environment Isolation: Keep different projects separate with their own package versions
  • Cross-Platform: Works consistently across Windows, macOS, and Linux
  • Scientific Focus: Optimized for data science and research computing packages

Installation

We recommend installing Miniforge, which includes conda and uses conda-forge as the default channel.

  1. Download Miniforge from the official website
  2. Run the installer for your operating system
  3. Follow the installation prompts
  4. Restart your terminal/command prompt

Getting Started

Create a new environment:

conda create -n myenv python=3.11
conda activate myenv

Install packages:

conda install numpy pandas matplotlib

Best Practices

  • Use separate environments for different projects
  • Keep your base environment minimal
  • Export environment specifications for reproducibility: conda env export > environment.yml
  • Use conda-forge channel for the latest packages

Tips

  • List environments: conda env list
  • Remove environment: conda env remove -n myenv
  • Update packages: conda update --all
Learn more about Conda / Miniforge

Course Materials

Top