This is an open access journal, and articles are distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License, which allows others to remix, tweak, and build upon the work non-commercially, as long as appropriate credit is given and the new creations are licensed under the identical terms.
This is an open access journal, and articles are distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License, which allows others to remix, tweak, and build upon the work non-commercially, as long as appropriate credit is given and the new creations are licensed under the identical terms.
Panoramic radiography is a standard diagnostic imaging method for dentists. However, it is challenging to detect mandibular trauma and fractures in panoramic radiographs due to the superimposed facial skeleton structures. The objective of this study was to develop a deep learning algorithm that is capable of detecting mandibular fractures and trauma automatically and compare its performance with general dentists.
This is a retrospective diagnostic test accuracy study. This study used a two-stage deep learning framework. To train the model, 190 panoramic images were collected from four different sources. The mandible was first segmented using a U-net model. Then, to detect fractures, a model named Faster region-based convolutional neural network was applied. In the end, a comparison was made between the accuracy, specificity, and sensitivity of artificial intelligence and general dentists in trauma diagnosis.
The mAP50 and mAP75 for object detection were 98.66% and 57.90%, respectively. The classification accuracy of the model was 91.67%. The sensitivity and specificity of the model were 100% and 83.33%, respectively. On the other hand, human-level diagnostic accuracy, sensitivity, and specificity were 87.22 ± 8.91, 82.22 ± 16.39, and 92.22 ± 6.33, respectively.
Our framework can provide a level of performance better than general dentists when it comes to diagnosing trauma or fractures.
The field of dentistry has undergone a significant transformation over the past few decades, and new technologies based on artificial intelligence (AI) have played an essential role in this transformation. The use of these intelligent technologies has been used as a powerful tool for the prediction and diagnosis of diseases as well as for the provision of appropriate treatment plans by dentists.
Dentists and maxillofacial surgeons use panoramic radiography as a standard diagnostic imaging method in their routine practices.
Panoramic radiography can be used to detect various conditions, including mandibular lesions and traumas. The interpretation of trauma and mandibular injuries can be challenging even for experienced professionals due to their complexity. This is primarily due to the panoramic radiography procedure in which the source-detector assembly rotates around the patient's head so that all bony structures of the facial skeleton are superimposed.
Hence, we decided to investigate the use of deep learning to create an image analysis algorithm for automatically detecting mandibular trauma and fractures on panoramic radiographs in this study. We also compared the performance of our model with the diagnostic performance of general dentists. It is possible to use this algorithm in clinical practice as an aid in clinical decision-making.
Study design
This is a retrospective diagnostic test accuracy study. A two-stage deep learning framework was used in this study. First, a U-net model was used to segment the region of interest, which was the mandible. Then, a model named Faster region-based convolutional neural network (Faster R-CNN) was applied to determine the presence and the position of fractures in the mandible through panoramic radiographs. The *Aja University of Medical Sciences' ethics committee approved the study (IR.AJAUMS.REC.1400.204). In accordance with the Checklist for Artificial Intelligence in Medical Imaging,
Data description
A total of 190 panoramic radiographs were collected from the patients. Due to limitations in acquiring relevant data, we gathered them from various resources, which were as follows.
Imam Hossein Hospital, Tehran, Iran A private maxillofacial radiology center, Isfahan, Iran Radiopaedia website (
https://radiopaedia.org/) Open-access biomedical image search engine provided by NIH (
https://openi.nlm.nih.gov/).
All the images were exported to JPEG. The inclusion criteria were the presence of any sign of at least one fracture on the hard tissue of the mandible. The exclusion criteria were as follows.
Low-quality or corrupted images (e.g., blurry or noisy images) Duplicate data Data that cannot be identified as ground truth for any reason.
The pretreatment images of a patient were chosen if both pretreatment and posttreatment images were available.
Diagnostic criteria and data labeling
For the first model, the aim was to segment the region of interest. For this purpose, a dentist annotated all 190 images by drawing polygons around the mandible using LabelMe software (the MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts, USA). Another dentist double-checked the annotated data and edited the polygons if there was any issue. To develop the trauma detection model, all radiographic images were annotated by two oral and maxillofacial radiologists through a consensus process. The location of the fracture was determined using bounding boxes with LabelMe software. In case of any disagreements, the final decision was made through consensus.
Data partitions and preprocessing
Finally, 190 images were divided randomly into the training (n = 154), validation (n = 18), and test sets (n = 18). The validation set was used for early stopping. Before feeding both models, all images were resized to 224 × 224. In addition, histogram equalization was used to adjust the contrast of an image based on its histogram. To enhance the object detection model, the number of samples was increased by five times before the model was used. Applied augmentation techniques were as follows.
Random crop Random color jitter (e.g., applying random changes in brightness, contrast, saturation, and hue) Random affine (e.g., applying random rotation, translating, and scaling) Adding random Gaussian noise Random horizontal flip.
Model architecture and training details
To develop our deep learning models, we have used the Python programming language and the PyTorch library to implement them. For the region of interest segmentation, we used a randomly initialized U-net model. The output of this model was used for training the object detector. We used the Faster R-CNN model based on ResNet101 pretrained on the COCO detection dataset for object detection.
To avoid overfitting, we decided to use the early stopping strategy. According to this strategy, the best weights of the model based on their performance on the validation set are stored and used in the next run of the model. Finally, to tune the hyperparameters, a randomized search strategy was used. A Tesla T4 Graphics Processor Unit was used to carry out the training procedure.
Comparing results to the human-level detection
In the final step, the test set of panoramic radiographs and another 18 random radiographs without any sign of fractures were given to five general dentists (H.M.R., F.S., T.S., Z.P., and A.O.). Then, we asked them to classify images if there were any fractures in the samples or not. Then, the diagnosis of the AI model and dentists were compared to each other.
Performance measurements and statistical analysis
For the segmentation model, our main performance measurements were intersection over union (IoU) and dice coefficient. For the object detection model, our main performance measurements were mean average precision calculated at the IoU threshold of 0.5 (mAP50) and 0.75 (mAP75). In addition, the accuracy, specificity, and sensitivity of AI and dentists were compared. If the AI model found any fracture in the image, we considered it a positive predicted sample. Otherwise, we considered it a negative predicted sample.
The amount of IoU and dice coefficient for the segmentation model were 94.53% and 91.77% for the test set images, respectively. Three samples of model outcomes are presented in
Samples of the segmentation model outputs. Samples of the final model outputs.
The accuracy of the classification model was determined to be 91.67%. Moreover, it was shown that the model had a sensitivity of 100 and a specificity of 83.33%. The confusion matrix of models' prediction and the ground truth is presented in
Confusion matrix of the model for the diagnosis of the trauma.
Misdiagnosis is one of the most common causes of malpractice in health care. Clinicians may misinterpret radiographic fractures for a variety of reasons, including fatigue, a lack of specialized expertise, and inconsistency in readings.
According to our results, we achieved a mAP50 of 98.66% and a mAP75 of 57.90% using our framework. It can be interpreted that the framework has desirable performance in detecting fractures. Nevertheless, there may be some improvements that need to be made to the bounding box area. An increase in the number of samples in the dataset may be able to address this drawback of the model with regard to detecting fracture extent. Moreover, the sensitivity of the model was 100%, which means the framework can detect any suspicious regions and hardly miss any fractured mandible.
Compared to general practitioners, the model was also outperformed in the case of sensitivity. In practice, most regions without access to oral and maxillofacial radiologists routinely rely on general practitioners to screen patients for mandibular fractures. Thus, general practitioners were included in the comparison of clinician performance with the model in this study. The outcome of the model suggests that it can be used as an assistant by practitioners for the purpose of screening patients who are potentially traumatized.
Similar to our work, Son et al.
To improve the performance of our model, we extracted our region of interest, which was the mandible hard tissue, using a segmentation algorithm. It was intended to assist the object detection model in focusing only on the relevant parts of the image. This region of interest extraction strategy has already been used in AI in medical imaging and dentistry papers. As an example, similar to our study, Yüksel et al.
Besides the performance of the model, one of the critical advantages of our study, as opposed to similar studies, was that we were able to obtain images from multiple sources of varying types of machines, radiation exposure conditions, sensors, and image quality. This is because using data from different sources may help the deep learning model generalize better to the data samples from the sources outside our training set.
A significant limitation of this study was the fact that we were unable to access the large volume of data that was required. As a first step in tackling this issue, we have collected data from public sources (e.g., PubMed) and pooled it with our data. This approach was already used in biomedical imaging to extend the size of the dataset.
As a practical and adaptable tool, our framework also has the potential to provide a level of accuracy that could compete with general dentists when it comes to trauma or fracture diagnosis. The main limitation of the study was the small dataset. It is suggested that future studies to use more extensive datasets. Prospective and clinical studies are also recommended to evaluate the framework outcome in real-world scenarios.
Financial support and sponsorship
Nil.
Conflicts of interest
The authors of this manuscript declare that they have no conflicts of interest, real or perceived, financial or nonfinancial in this article.