CT data processing (1) data format, extraction and visualization

It has been a few months since I finished my PhD study about bone CT processing via deep learning. During the journey, I have tried many methods, and it's better to write things down to help remember them rather than to just rely solely on memory. I will try to summarize some useful parts of CT data processing in this series and hope they will be helpful. So, let's start!

Part 1: data format

The first topic is about the data format and how to extract them. We both know that CT imaging is a powerful medical imaging technique that allows doctors to visualize the internal structure of the body in fine detail, and we need some specialized algorithms and softwares to extract/generate useful information from the raw CT data. However, the raw CT data usually can not be directly used by machine learning algorithms. In developing machine learning algorithms, the first step is to know the data format and how to extract them. Below is the commonly used data format.

Format
Dicom
Tiff
Nifti

DICOM is the common data format in raw CT data in my study since it contains both clinical information, CT machine information and image data. The clinical information could include some patient information. The CT machine information could contain the pixel spacing, intercept and slope, and so on. The image data is the main course for the model development.

The Dicom format is very powerful but it's not easy to use in computer vision tasks directly. Often, I use python in my experiments and use pydicom library for Dicom processing. Here, the code will read the Dicom file as an object which contains both the Dicom header and the image data. It will be complicated to use the Dicom format directly for model development.

In nature image processing, we can use jpg, and png to store data. But usually, the medical image will be 16-bit data, which is more suitable for the tiff format. If all images have the same pixel spacing, intercept, slope and other CT machine-related, it could be an option to use the tiff format. You can easily quick overview them without specific software.

In the real world, we can't assume our data types are all the same, so it's better to have a format that can encode both the image data and the CT machine data and is not as complicated as the Dicom format. The nifti data is the answer. It has both the CT header and the image data included and is very convenient to use.

There are some other raw data formats, such as isq, which also contains both the CT machine information and image data like Dicom.

Part 2: data extraction

After we know the usable data formats, the next step is how to extract or generate such data. For the Dicom, we can use pydicom to access the Dicom data. For tiff, both cv2 and pil could be used. I prefer cv2; it will give the data in numpy format. For nifti, the nibabel is very useful.

There is another question here, which is the data transformation between these three formats, usually will be from Dicom to tiff, from tiff to nifti, and from dicom to nifti. The bridge here is to use cv2, numpy and os.

Part 3: data visulization

It is strongly advised to have a look at your data before entering new tasks. imageJ is the basic tool to visualize the data. Simple but efficient. Some other tools, such as mitk and 3D Slicer are also very useful.

Part 4: summarization

Let's have a summarization of how to extract the CT raw data.

The first step is to get the data ready on your disk, then to use python code to extract the data from the dicom format to tiff or nibabel format, then to use imageJ or some other tools to go through your data. This will give you a direct sense of the CT data and your task.

posted @ 2022-12-07 14:17  xiaoxuxli  阅读(85)  评论(0编辑  收藏  举报