DICOM De-identification Using
Python Coding
DICOM de-identification involves identifying and handling metadata or image elements that may contain patient information. This tutorial introduces common concepts, workflows, and Python-based approaches used in educational and research settings.
This tutorial is intended for educational and demonstration purposes only. Users are responsible for ensuring compliance with applicable institutional, legal, and privacy requirements when working with real health data.
This tutorial shows you which DICOM tags contain identifying information, how de-identification workflows are structured, and how Python functions can be used. After learning the framework and best practices, you can adapt the functions to your own secure, approved work environment.
- Understand what to look for
- Learn common de-identification approaches
- Use functions in your own workflows
Choose your pathway
Select the option that best fits your experience and setup. Both pathways use educational example data and are intended to demonstrate de-identification concepts and workflows.
RECOMMENDED FOR BEGINNERS
Beginner Pathway
Google Colab
- No installation required
- Runs in your web browser
- Beginner-friendly interface
- Most libraries pre-installed
- Great for learning & experimentation
ADVANCED USERS
Advanced Pathway
Local/MYST Environment
- Full control of your environment
- Better performance for large datasets
- Easier integration with local workflows
- Requires local Python setup
- Recommended for experienced users
What to expect
A high-level overview of the tutorial using sample data.
1. Launch the notebook.
Open the notebook and follow the guided walkthrough.
2. Explore example DICOM data.
Learn using demonstration DICOM files provided for educational purposes.
3. Run the workflow.
Execute tutorial cells step-by-step to explore de-identification workflows.
4. Understand key concepts.
Review which tags may contain identifiers and how they can be modified or removed.
5. Apply in your own secure environment.
Review which tags may contain identifiers and how they can be modified or removed.
This tutorial is intended for learning and demonstration only – implement in your own secure environment.
Before you begin
Click on any topic to expand.
What is Python?
Python is a programming language commonly used in healthcare research, medical imaging, artificial intelligence (AI), and data science. Python allows users to write code that performs tasks such as reading medical imaging files, modifying metadata, automating workflows, and analyzing data.
This tutorial uses Python to demonstrate educational examples of DICOM de-identification workflows.
What is DICOM?
DICOM (Digital Imaging and Communications in Medicine) is the standard format used to store and share medical imaging data.
A DICOM file typically contains:
- Imaging data (e.g., MRI, CT, ultrasound images)
- Metadata (information about the patient, study, scanner, or institution)
Some DICOM fields may contain identifying information and that require review before data sharing or secondary use.
What is a Python library?
A Python library is a collection of pre-written code designed to perform specific tasks.
Libraries help simplify coding by providing built-in functions and tools.
Examples used in this tutorial include:
- pydicom → reading and editing DICOM files
- numpy → working with image arrays and numerical data
- matplotlib → displaying images and visual outputs
What is De-identification?
De-identification refers to the process of reducing the likelihood that an individual can be identified from data.
For DICOM imaging data, this may involve:
- Removing or modifying identifying metadata
- Reviewing image pixels for burned-in identifiers
- Replacing direct identifiers with non-identifying values
- Applying institutional or project-specific privacy practices
The specific approach may vary depending on the intended use, governance requirements, and environment.
What is a Jupyter notebook?
A Jupyter notebook is an interactive document that combines:
- Code
- Text explanations
- Images
- Outputs and visualizations
Notebooks allow users to run code step-by-step while following educational explanations and examples.
Google Colab is a cloud-based platform that allows Jupyter notebooks to run directly in a web browser.
Using the notebook
Notebook tutorials are organized into cells.
Common actions include:
- Running a cell
- Editing code
- Viewing outputs
- Expanding text explanations
Helpful shortcuts:
- Shift + Enter → Run the current cell
- Runtime → Run all → Execute the full notebook sequentially
It is recommended to run notebook cells in order, as later sections may depend on earlier steps.
Need help? Troubleshooting common issues
I get an error when running a cell.
Errors can happen if a cell is run out of order, a required file is missing, or a previous setup step was skipped.
TRY THE FOLLOWING:
- Run the notebook from the beginning
- Check that all setup/installation cells were completed
- Confirm that the sample file or file path exists
- Read the last line of the error message first, as it usually gives the most useful clue
Required library not found.
This usually means that a Python library needed for the tutorial has not been installed or loaded.
TRY THE FOLLOWING:
- Run the installation/setup cell near the beginning of the notebook
- Re-run the import cell after installation
- Check that the library name is spelled correctly
- Restart the runtime/kernel if the installation completed but the error still appears
Notebook keeps disconnecting.
Google Colab sessions may disconnect after periods of inactivity or if the browser/computer goes to sleep.
To reduce disconnections::
- Keep the browser tab open while using the notebook
- Avoid leaving the notebook inactive for long periods
- Run the notebook in smaller sections
- Save or download outputs when needed