CT data processing (1) data format, extraction and visualization
It has been a few months since I finished my PhD study about bone CT processing via deep learning. During the journey, I have tried many methods, and it's better to write things down to help remember them rather than to just rely solely on memory. I will try to summarize some useful parts of CT data processing in this series and hope they will be helpful. So, let's start!
Part 1: data format
The first topic is about the data format and how to extract them. We both know that CT imaging is a powerful medical imaging technique that allows doctors to visualize the internal structure of the body in fine detail, and we need some specialized algorithms and softwares to extract/generate useful information from the raw CT data. However, the raw CT data usually can not be directly used by machine learning algorithms. In developing machine learning algorithms, the first step is to know the data format and how to extract them. Below is the commonly used data format.
Format |
---|
Dicom |
Tiff |
Nifti |
DICOM is the common data format in raw CT data in my study since it contains both clinical information, CT machine information and image data. The clinical information could include some patient information. The CT machine information could contain the pixel spacing
, intercept and slope
, and so on. The image data is the main course for the model development.
The Dicom format is very powerful but it's not easy to use in computer vision tasks directly. Often, I use python
in my experiments and use pydicom
library for Dicom processing. Here, the code will read the Dicom file as an object which contains both the Dicom header and the image data. It will be complicated to use the Dicom format directly for model development.
In nature image processing, we can use jpg
, and png
to store data. But usually, the medical image will be 16-bit data, which is more suitable for the tiff
format. If all images have the same pixel spacing, intercept, slope and other CT machine-related, it could be an option to use the tiff
format. You can easily quick overview them without specific software.
In the real world, we can't assume our data types are all the same, so it's better to have a format that can encode both the image data and the CT machine data and is not as complicated as the Dicom format. The nifti
data is the answer. It has both the CT header and the image data included and is very convenient to use.
There are some other raw data formats, such as isq
, which also contains both the CT machine information and image data like Dicom
.
Part 2: data extraction
After we know the usable data formats, the next step is how to extract or generate such data. For the Dicom
, we can use pydicom
to access the Dicom data. For tiff
, both cv2
and pil
could be used. I prefer cv2
; it will give the data in numpy
format. For nifti
, the nibabel
is very useful.
There is another question here, which is the data transformation between these three formats, usually will be from Dicom
to tiff
, from tiff
to nifti
, and from dicom
to nifti
. The bridge here is to use cv2
, numpy
and os
.
Part 3: data visulization
It is strongly advised to have a look at your data before entering new tasks. imageJ
is the basic tool to visualize the data. Simple but efficient. Some other tools, such as mitk
and 3D Slicer
are also very useful.
Part 4: summarization
Let's have a summarization of how to extract the CT raw data.
The first step is to get the data ready on your disk, then to use python code to extract the data from the dicom
format to tiff
or nibabel
format, then to use imageJ
or some other tools to go through your data. This will give you a direct sense of the CT data and your task.