DICOM De-identification Using
Python Coding

DICOM de-identification involves identifying and handling metadata or image elements that may contain patient information. This tutorial introduces common concepts, workflows, and Python-based approaches used in educational and research settings.

Educational use only.
This tutorial is intended for educational and demonstration purposes only. Users are responsible for ensuring compliance with applicable institutional, legal, and privacy requirements when working with real health data.

Learn the concepts. Apply them in your own secure environment.
This tutorial shows you which DICOM tags contain identifying information, how de-identification workflows are structured, and how Python functions can be used. After learning the framework and best practices, you can adapt the functions to your own secure, approved work environment.

Choose your pathway

Select the option that best fits your experience and setup. Both pathways use educational example data and are intended to demonstrate de-identification concepts and workflows.

RECOMMENDED FOR BEGINNERS

Beginner Pathway

Google Colab

ADVANCED USERS

Advanced Pathway

Local/MYST Environment

What to expect

A high-level overview of the tutorial using sample data.

1. Launch the notebook.

Open the notebook and follow the guided walkthrough.

2. Explore example DICOM data.

Learn using demonstration DICOM files provided for educational purposes.

3. Run the workflow.

Execute tutorial cells step-by-step to explore de-identification workflows.

4. Understand key concepts.

Review which tags may contain identifiers and how they can be modified or removed.

5. Apply in your own secure environment.

Review which tags may contain identifiers and how they can be modified or removed.

Important: Do not upload or process identified patient data in Google Colab or this GitHub Repository.
This tutorial is intended for learning and demonstration only – implement in your own secure environment.

Before you begin

Click on any topic to expand.

What is Python?

Python is a programming language commonly used in healthcare research, medical imaging, artificial intelligence (AI), and data science. Python allows users to write code that performs tasks such as reading medical imaging files, modifying metadata, automating workflows, and analyzing data.

This tutorial uses Python to demonstrate educational examples of DICOM de-identification workflows.

What is DICOM?

DICOM (Digital Imaging and Communications in Medicine) is the standard format used to store and share medical imaging data.

A DICOM file typically contains:

Imaging data (e.g., MRI, CT, ultrasound images)

Metadata (information about the patient, study, scanner, or institution)

Some DICOM fields may contain identifying information and that require review before data sharing or secondary use.

What is a Python library?

A Python library is a collection of pre-written code designed to perform specific tasks.

Libraries help simplify coding by providing built-in functions and tools.

Examples used in this tutorial include:

pydicom → reading and editing DICOM files

numpy → working with image arrays and numerical data

matplotlib → displaying images and visual outputs

What is De-identification?

De-identification refers to the process of reducing the likelihood that an individual can be identified from data.

For DICOM imaging data, this may involve:

Removing or modifying identifying metadata

Reviewing image pixels for burned-in identifiers

Replacing direct identifiers with non-identifying values

Applying institutional or project-specific privacy practices

The specific approach may vary depending on the intended use, governance requirements, and environment.

What is a Jupyter notebook?

A Jupyter notebook is an interactive document that combines:

Code

Text explanations

Images

Outputs and visualizations

Notebooks allow users to run code step-by-step while following educational explanations and examples.

Google Colab is a cloud-based platform that allows Jupyter notebooks to run directly in a web browser.

Using the notebook

Notebook tutorials are organized into cells.

Common actions include:

Running a cell

Editing code

Viewing outputs

Expanding text explanations

Helpful shortcuts:

Shift + Enter → Run the current cell

Runtime → Run all → Execute the full notebook sequentially

It is recommended to run notebook cells in order, as later sections may depend on earlier steps.

Need help? Troubleshooting common issues

I get an error when running a cell.

Errors can happen if a cell is run out of order, a required file is missing, or a previous setup step was skipped.

TRY THE FOLLOWING:

Run the notebook from the beginning
Check that all setup/installation cells were completed
Confirm that the sample file or file path exists
Read the last line of the error message first, as it usually gives the most useful clue

If the error continues: Restart the runtime/kernel and run the notebook cells again from the top.

Required library not found.

This usually means that a Python library needed for the tutorial has not been installed or loaded.

TRY THE FOLLOWING:

Run the installation/setup cell near the beginning of the notebook
Re-run the import cell after installation
Check that the library name is spelled correctly
Restart the runtime/kernel if the installation completed but the error still appears

In Google Colab, some libraries are already available, while specialized libraries may need to be installed explicitly during the tutorial.

Notebook keeps disconnecting.

Google Colab sessions may disconnect after periods of inactivity or if the browser/computer goes to sleep.

To reduce disconnections::

Keep the browser tab open while using the notebook
Avoid leaving the notebook inactive for long periods
Run the notebook in smaller sections
Save or download outputs when needed

Session Recovery: If the session disconnects, simply reconnect and re-run the notebook cells from the beginning to restore variables and assets.

DICOM De-identification Using Python Coding

DICOM De-identification Using
Python Coding

Choose your pathway

RECOMMENDED FOR BEGINNERS

Beginner Pathway

Google Colab

ADVANCED USERS

Advanced Pathway

Local/MYST Environment

What to expect

1. Launch the notebook.

2. Explore example DICOM data.

3. Run the workflow.

4. Understand key concepts.

5. Apply in your own secure environment.

Before you begin

Need help? Troubleshooting common issues

See also:

Stay informed

Quick links

Contact

DICOM De-identification Using Python Coding

DICOM De-identification UsingPython Coding

Choose your pathway

RECOMMENDED FOR BEGINNERS

Beginner Pathway

Google Colab

ADVANCED USERS

Advanced Pathway

Local/MYST Environment

What to expect

1. Launch the notebook.

2. Explore example DICOM data.

3. Run the workflow.

4. Understand key concepts.

5. Apply in your own secure environment.

Before you begin

Need help? Troubleshooting common issues

See also:

DICOM De-identification Using
Python Coding