'人工智能與古生物學:使用深度學習發掘微化石'

深度學習 人工智能 物理 經濟 Y壹企問 2019-08-05
"

Artificial Intelligence & Paleontology: Use Deep Learning to search for Microfossils

In this posting we show a Deep Learning-based method for fully automated microfossil identification and extraction in bore core samples acquired via MicroCT. For the identification we developed a Deep Learning approach which resulted in a high rate of correct microfossil identification (98% IoU). To validate it we use ground truths generated by specialists in the micropaleontology field. We also present the first fully annotated MicroCT-acquired publicly available microfossils dataset.

在這篇文章中,我們展示了一種基於深度學習的微化石自動識別和提取方法,該方法可用於處理MicroCT掃描過的鑽孔岩心樣本。該方法實現了98%的微化石正確識別率。為了驗證該方法的準確性,我們使用了微體古生物學專家標註的正確樣本(進行評估)。此外,我們還提供了第一個完全標註的、公開可用的微化石數據集。

Artificial Intelligence & Paleontology

人工只能與古生物學

The applicability of the computational analysis of paleontological images ranges from the study of animals, plants and evolution of microorganisms to the simulation of the habitat of living beings of a given epoch.

古生物圖像的計算分析(即計算機視覺分析)可用於動植物研究、微生物進化和特定時期生物棲息環境的模擬。

But nowadays paleontology is not only a pure science anymore. It also can be applied to solve problems in areas of economical activity, such as oil exploration, where there are several factors to be analyzed in order to identify the potential of an exploration site and minimize the expenses related to the oil extraction process. One factor is the characterization of the environment to be explored. This analysis can occur in several ways: use of probes, extraction of samples for petrophysical components evaluation or the correlation with logs of other drilling wells.

如今的古生物學已經不再僅僅是一門純粹的科學了。它還可以用於解決經濟活動領域的問題,例如說石油勘探。為了確定勘探點的潛力,並最大限度地減少採油過程中的相關費用,石油勘探需要考慮很多因素。而分析勘探環境的特徵就是其中一個因素。這種分析可以通過(以下)幾種方式進行:使用鑽井、提取樣本進行岩石物理組分分析、與其他鑽井的勘探日誌進行關聯分析。

When we look at samples extraction, fossils found in sedimentary rock are central for the characterization of this rock. Here Computed Tomography (CT) is of importance because it preserves the sample and makes it available for several analyzes. Based on 3D images generated by CT, several analyzes and simulations can be performed and processes, currently performed manually and exhaustively, can be automated.

當我們觀察岩石樣本時,在沉積岩中發現的那些化石對於岩石的表徵至關重要。在這裡,計算機斷層掃描(Computed Tomography,CT)有著非常重要的作用。因為它可以保留樣本,並使其能用於多種分析。基於CT生成的3D圖像,(我們)可以進行若干的分析和模擬;而且,目前這種手工的、勞累的處理流程可以實現自動化。

Imagine the following scenario: A paleontologist receives a rock sample with microfossils for analysis. The time needed for the complete process of microfossils isolation, performed manually, is long and after this process the rock sample is destroyed. After this, the paleontologist will analyze the microfossils in physical isolation with a microscope and classify manually each isolated fossil.

想象以下場景:一位古生物學家收到了一塊含有微化石的岩石樣本,他要對其進行分析。通過手工作業對微化石進行隔離(處理)的完整過程非常耗時,而且經過此處理後岩石樣本將被完全破壞。完成這一步驟後,古生物學家將使用顯微鏡對物理隔絕的微化石進行分析和分類。

"

Artificial Intelligence & Paleontology: Use Deep Learning to search for Microfossils

In this posting we show a Deep Learning-based method for fully automated microfossil identification and extraction in bore core samples acquired via MicroCT. For the identification we developed a Deep Learning approach which resulted in a high rate of correct microfossil identification (98% IoU). To validate it we use ground truths generated by specialists in the micropaleontology field. We also present the first fully annotated MicroCT-acquired publicly available microfossils dataset.

在這篇文章中,我們展示了一種基於深度學習的微化石自動識別和提取方法,該方法可用於處理MicroCT掃描過的鑽孔岩心樣本。該方法實現了98%的微化石正確識別率。為了驗證該方法的準確性,我們使用了微體古生物學專家標註的正確樣本(進行評估)。此外,我們還提供了第一個完全標註的、公開可用的微化石數據集。

Artificial Intelligence & Paleontology

人工只能與古生物學

The applicability of the computational analysis of paleontological images ranges from the study of animals, plants and evolution of microorganisms to the simulation of the habitat of living beings of a given epoch.

古生物圖像的計算分析(即計算機視覺分析)可用於動植物研究、微生物進化和特定時期生物棲息環境的模擬。

But nowadays paleontology is not only a pure science anymore. It also can be applied to solve problems in areas of economical activity, such as oil exploration, where there are several factors to be analyzed in order to identify the potential of an exploration site and minimize the expenses related to the oil extraction process. One factor is the characterization of the environment to be explored. This analysis can occur in several ways: use of probes, extraction of samples for petrophysical components evaluation or the correlation with logs of other drilling wells.

如今的古生物學已經不再僅僅是一門純粹的科學了。它還可以用於解決經濟活動領域的問題,例如說石油勘探。為了確定勘探點的潛力,並最大限度地減少採油過程中的相關費用,石油勘探需要考慮很多因素。而分析勘探環境的特徵就是其中一個因素。這種分析可以通過(以下)幾種方式進行:使用鑽井、提取樣本進行岩石物理組分分析、與其他鑽井的勘探日誌進行關聯分析。

When we look at samples extraction, fossils found in sedimentary rock are central for the characterization of this rock. Here Computed Tomography (CT) is of importance because it preserves the sample and makes it available for several analyzes. Based on 3D images generated by CT, several analyzes and simulations can be performed and processes, currently performed manually and exhaustively, can be automated.

當我們觀察岩石樣本時,在沉積岩中發現的那些化石對於岩石的表徵至關重要。在這裡,計算機斷層掃描(Computed Tomography,CT)有著非常重要的作用。因為它可以保留樣本,並使其能用於多種分析。基於CT生成的3D圖像,(我們)可以進行若干的分析和模擬;而且,目前這種手工的、勞累的處理流程可以實現自動化。

Imagine the following scenario: A paleontologist receives a rock sample with microfossils for analysis. The time needed for the complete process of microfossils isolation, performed manually, is long and after this process the rock sample is destroyed. After this, the paleontologist will analyze the microfossils in physical isolation with a microscope and classify manually each isolated fossil.

想象以下場景:一位古生物學家收到了一塊含有微化石的岩石樣本,他要對其進行分析。通過手工作業對微化石進行隔離(處理)的完整過程非常耗時,而且經過此處理後岩石樣本將被完全破壞。完成這一步驟後,古生物學家將使用顯微鏡對物理隔絕的微化石進行分析和分類。

人工智能與古生物學:使用深度學習發掘微化石

A few microfossils manually extracted from our carbonatic rocks and photographed with a Z-stack microscope

從碳酸鹽岩石中手工提取的一些微化石(使用了Z堆疊顯微拍照)

Now let's hypothesize that the company where this paleontologist works acquired an AI-based tomographic image analysis software, specific for microfossil analysis. This software performs the microfossil identification and extraction from the rock sample automatically with minimal or no supervision at all. The paleontologist now can load his tomographic sample, select the specific pipeline and leave the procedure executing during the night, leaving to the paleontologist only the work of evaluating the obtained results and classify each extracted microfossil.

現在讓我們來假設一下,這個古生物學家工作的公司獲得了一款基於AI的、專門用於微化石的斷層圖像分析軟件。該軟件可以自動識別和提取岩石樣本中的微化石,整個過程只需要很少或者完全不需要人力干預。加載(要分析的)斷層掃描樣本,選擇合適的處理流程,讓程序在夜間運行。如此一來,古生物學家的工作就只剩下:對處理的結果進行評估,對提取得到的微化石進行分類。

"

Artificial Intelligence & Paleontology: Use Deep Learning to search for Microfossils

In this posting we show a Deep Learning-based method for fully automated microfossil identification and extraction in bore core samples acquired via MicroCT. For the identification we developed a Deep Learning approach which resulted in a high rate of correct microfossil identification (98% IoU). To validate it we use ground truths generated by specialists in the micropaleontology field. We also present the first fully annotated MicroCT-acquired publicly available microfossils dataset.

在這篇文章中,我們展示了一種基於深度學習的微化石自動識別和提取方法,該方法可用於處理MicroCT掃描過的鑽孔岩心樣本。該方法實現了98%的微化石正確識別率。為了驗證該方法的準確性,我們使用了微體古生物學專家標註的正確樣本(進行評估)。此外,我們還提供了第一個完全標註的、公開可用的微化石數據集。

Artificial Intelligence & Paleontology

人工只能與古生物學

The applicability of the computational analysis of paleontological images ranges from the study of animals, plants and evolution of microorganisms to the simulation of the habitat of living beings of a given epoch.

古生物圖像的計算分析(即計算機視覺分析)可用於動植物研究、微生物進化和特定時期生物棲息環境的模擬。

But nowadays paleontology is not only a pure science anymore. It also can be applied to solve problems in areas of economical activity, such as oil exploration, where there are several factors to be analyzed in order to identify the potential of an exploration site and minimize the expenses related to the oil extraction process. One factor is the characterization of the environment to be explored. This analysis can occur in several ways: use of probes, extraction of samples for petrophysical components evaluation or the correlation with logs of other drilling wells.

如今的古生物學已經不再僅僅是一門純粹的科學了。它還可以用於解決經濟活動領域的問題,例如說石油勘探。為了確定勘探點的潛力,並最大限度地減少採油過程中的相關費用,石油勘探需要考慮很多因素。而分析勘探環境的特徵就是其中一個因素。這種分析可以通過(以下)幾種方式進行:使用鑽井、提取樣本進行岩石物理組分分析、與其他鑽井的勘探日誌進行關聯分析。

When we look at samples extraction, fossils found in sedimentary rock are central for the characterization of this rock. Here Computed Tomography (CT) is of importance because it preserves the sample and makes it available for several analyzes. Based on 3D images generated by CT, several analyzes and simulations can be performed and processes, currently performed manually and exhaustively, can be automated.

當我們觀察岩石樣本時,在沉積岩中發現的那些化石對於岩石的表徵至關重要。在這裡,計算機斷層掃描(Computed Tomography,CT)有著非常重要的作用。因為它可以保留樣本,並使其能用於多種分析。基於CT生成的3D圖像,(我們)可以進行若干的分析和模擬;而且,目前這種手工的、勞累的處理流程可以實現自動化。

Imagine the following scenario: A paleontologist receives a rock sample with microfossils for analysis. The time needed for the complete process of microfossils isolation, performed manually, is long and after this process the rock sample is destroyed. After this, the paleontologist will analyze the microfossils in physical isolation with a microscope and classify manually each isolated fossil.

想象以下場景:一位古生物學家收到了一塊含有微化石的岩石樣本,他要對其進行分析。通過手工作業對微化石進行隔離(處理)的完整過程非常耗時,而且經過此處理後岩石樣本將被完全破壞。完成這一步驟後,古生物學家將使用顯微鏡對物理隔絕的微化石進行分析和分類。

人工智能與古生物學:使用深度學習發掘微化石

A few microfossils manually extracted from our carbonatic rocks and photographed with a Z-stack microscope

從碳酸鹽岩石中手工提取的一些微化石(使用了Z堆疊顯微拍照)

Now let's hypothesize that the company where this paleontologist works acquired an AI-based tomographic image analysis software, specific for microfossil analysis. This software performs the microfossil identification and extraction from the rock sample automatically with minimal or no supervision at all. The paleontologist now can load his tomographic sample, select the specific pipeline and leave the procedure executing during the night, leaving to the paleontologist only the work of evaluating the obtained results and classify each extracted microfossil.

現在讓我們來假設一下,這個古生物學家工作的公司獲得了一款基於AI的、專門用於微化石的斷層圖像分析軟件。該軟件可以自動識別和提取岩石樣本中的微化石,整個過程只需要很少或者完全不需要人力干預。加載(要分析的)斷層掃描樣本,選擇合適的處理流程,讓程序在夜間運行。如此一來,古生物學家的工作就只剩下:對處理的結果進行評估,對提取得到的微化石進行分類。

人工智能與古生物學:使用深度學習發掘微化石

The before & after of microfossil analysis…

微化石分析方式對照

Microfossil Identification and Classification with Deep Learning

基於深度學習的微化石識別和分類

You can perform this workflow above employing Semantic Segmentation with Deep Learning. Let's see how…

你可以使用深度學習中的語義分割模型完成上述工作。下面讓我們看一下大致步驟。

We employed a scanned carbonatic rock sample obtained from a drilling rig probe collected at the Sergipe Basin Quaternary sediments, off the northeast coast of Brazil. For the training of our Semantic Segmentation network, a team of micropalentologists generated a Ground Truth for this rock sample, segmenting and classifying manually the whole MicroCT volume. The scanner used to digitalize the sample was a Versa XRM-500 (ZEISS/XRadia), with a volume size of 956x1004x983 voxel.

用於CT掃描的碳酸鹽岩石樣本通過鑽井勘探獲得,其來源為位於巴西東北海岸的塞爾希培盆地第四紀沉積物。為了訓練我們的語義分割網絡,一個由古生物學家組成的團隊對這些岩石樣本的MicroCT影像進行了手工標記,標記的內容包括(微化石的)分割和分類。用於岩石樣品數字化的(CT)掃描儀為Versa XRM-500(ZEISS / XRadia),由此獲得的CT影像的體積為956x1004x983體素。

The full dataset, together with additional explanations and the specialist-annotated Ground Truth of manually segmented images is available here: http://www.lapix.ufsc.br/microfossil-segmentation .

包含了標記信息的完整數據集可在此處獲得:http://www.lapix.ufsc.br/microfossil-segmentation 。

"

Artificial Intelligence & Paleontology: Use Deep Learning to search for Microfossils

In this posting we show a Deep Learning-based method for fully automated microfossil identification and extraction in bore core samples acquired via MicroCT. For the identification we developed a Deep Learning approach which resulted in a high rate of correct microfossil identification (98% IoU). To validate it we use ground truths generated by specialists in the micropaleontology field. We also present the first fully annotated MicroCT-acquired publicly available microfossils dataset.

在這篇文章中,我們展示了一種基於深度學習的微化石自動識別和提取方法,該方法可用於處理MicroCT掃描過的鑽孔岩心樣本。該方法實現了98%的微化石正確識別率。為了驗證該方法的準確性,我們使用了微體古生物學專家標註的正確樣本(進行評估)。此外,我們還提供了第一個完全標註的、公開可用的微化石數據集。

Artificial Intelligence & Paleontology

人工只能與古生物學

The applicability of the computational analysis of paleontological images ranges from the study of animals, plants and evolution of microorganisms to the simulation of the habitat of living beings of a given epoch.

古生物圖像的計算分析(即計算機視覺分析)可用於動植物研究、微生物進化和特定時期生物棲息環境的模擬。

But nowadays paleontology is not only a pure science anymore. It also can be applied to solve problems in areas of economical activity, such as oil exploration, where there are several factors to be analyzed in order to identify the potential of an exploration site and minimize the expenses related to the oil extraction process. One factor is the characterization of the environment to be explored. This analysis can occur in several ways: use of probes, extraction of samples for petrophysical components evaluation or the correlation with logs of other drilling wells.

如今的古生物學已經不再僅僅是一門純粹的科學了。它還可以用於解決經濟活動領域的問題,例如說石油勘探。為了確定勘探點的潛力,並最大限度地減少採油過程中的相關費用,石油勘探需要考慮很多因素。而分析勘探環境的特徵就是其中一個因素。這種分析可以通過(以下)幾種方式進行:使用鑽井、提取樣本進行岩石物理組分分析、與其他鑽井的勘探日誌進行關聯分析。

When we look at samples extraction, fossils found in sedimentary rock are central for the characterization of this rock. Here Computed Tomography (CT) is of importance because it preserves the sample and makes it available for several analyzes. Based on 3D images generated by CT, several analyzes and simulations can be performed and processes, currently performed manually and exhaustively, can be automated.

當我們觀察岩石樣本時,在沉積岩中發現的那些化石對於岩石的表徵至關重要。在這裡,計算機斷層掃描(Computed Tomography,CT)有著非常重要的作用。因為它可以保留樣本,並使其能用於多種分析。基於CT生成的3D圖像,(我們)可以進行若干的分析和模擬;而且,目前這種手工的、勞累的處理流程可以實現自動化。

Imagine the following scenario: A paleontologist receives a rock sample with microfossils for analysis. The time needed for the complete process of microfossils isolation, performed manually, is long and after this process the rock sample is destroyed. After this, the paleontologist will analyze the microfossils in physical isolation with a microscope and classify manually each isolated fossil.

想象以下場景:一位古生物學家收到了一塊含有微化石的岩石樣本,他要對其進行分析。通過手工作業對微化石進行隔離(處理)的完整過程非常耗時,而且經過此處理後岩石樣本將被完全破壞。完成這一步驟後,古生物學家將使用顯微鏡對物理隔絕的微化石進行分析和分類。

人工智能與古生物學:使用深度學習發掘微化石

A few microfossils manually extracted from our carbonatic rocks and photographed with a Z-stack microscope

從碳酸鹽岩石中手工提取的一些微化石(使用了Z堆疊顯微拍照)

Now let's hypothesize that the company where this paleontologist works acquired an AI-based tomographic image analysis software, specific for microfossil analysis. This software performs the microfossil identification and extraction from the rock sample automatically with minimal or no supervision at all. The paleontologist now can load his tomographic sample, select the specific pipeline and leave the procedure executing during the night, leaving to the paleontologist only the work of evaluating the obtained results and classify each extracted microfossil.

現在讓我們來假設一下,這個古生物學家工作的公司獲得了一款基於AI的、專門用於微化石的斷層圖像分析軟件。該軟件可以自動識別和提取岩石樣本中的微化石,整個過程只需要很少或者完全不需要人力干預。加載(要分析的)斷層掃描樣本,選擇合適的處理流程,讓程序在夜間運行。如此一來,古生物學家的工作就只剩下:對處理的結果進行評估,對提取得到的微化石進行分類。

人工智能與古生物學:使用深度學習發掘微化石

The before & after of microfossil analysis…

微化石分析方式對照

Microfossil Identification and Classification with Deep Learning

基於深度學習的微化石識別和分類

You can perform this workflow above employing Semantic Segmentation with Deep Learning. Let's see how…

你可以使用深度學習中的語義分割模型完成上述工作。下面讓我們看一下大致步驟。

We employed a scanned carbonatic rock sample obtained from a drilling rig probe collected at the Sergipe Basin Quaternary sediments, off the northeast coast of Brazil. For the training of our Semantic Segmentation network, a team of micropalentologists generated a Ground Truth for this rock sample, segmenting and classifying manually the whole MicroCT volume. The scanner used to digitalize the sample was a Versa XRM-500 (ZEISS/XRadia), with a volume size of 956x1004x983 voxel.

用於CT掃描的碳酸鹽岩石樣本通過鑽井勘探獲得,其來源為位於巴西東北海岸的塞爾希培盆地第四紀沉積物。為了訓練我們的語義分割網絡,一個由古生物學家組成的團隊對這些岩石樣本的MicroCT影像進行了手工標記,標記的內容包括(微化石的)分割和分類。用於岩石樣品數字化的(CT)掃描儀為Versa XRM-500(ZEISS / XRadia),由此獲得的CT影像的體積為956x1004x983體素。

The full dataset, together with additional explanations and the specialist-annotated Ground Truth of manually segmented images is available here: http://www.lapix.ufsc.br/microfossil-segmentation .

包含了標記信息的完整數據集可在此處獲得:http://www.lapix.ufsc.br/microfossil-segmentation 。

人工智能與古生物學:使用深度學習發掘微化石

Digitalized rock sample

數字化的岩石樣本

What we employed was the fast.ai/PyTorch framework and with hyperparameter optimization (HYPO) we acquired a IoU of 0.98 with UNET + ResNet34 (or ResNet50). The different kinds of HYPO are important here and we will explain them below.

我們在這裡採用的框架是fast.ai/PyTorch框架,使用的模型則是UNET+ResNet34(或ResNet50)。經過超參調優之後,模型獲得了0.98的IoU。下面是超參優化的幾點解釋。

HYPO #1: Variable Resolution: The MicroCT data posed a challenge: the memory requirements imposed by both the dataset size we employed and the UNET architecture would strongly limit the batch size of our training set. In order to overcome this limitation and be able to initially work with larger batch sizes and train the network at a faster pace, we employed a step-wise progressive improving image resolution training strategy.

1.可變分辨率:MicroCT數據給我們的訓練帶來了挑戰。由於我們數據集很大,而且U-NET架構本身要求的內存也很大,所以我們在訓練過程中能使用的批次大小會受到極大的限制。為了突破這一限制,在訓練的最初階段能使用更大的批次,我們採用了逐步改進圖像分辨率的訓練策略。

HYPO #2: Differential Learning Rate: Another strategy for fine-tuning our model is the Differential Learning Rates (DLR) strategy, also informally presented by Jeremy Howard during a lecture of the same fast.ai course series.

2.差異學習率:我們進行模型微調時採用了另一個策略是差異學習率(Differential Learning Rates,DLR)策略。Jeremy Howard在fast.ai的系列課程講座中非正式地介紹了這一策略。

HYPO # 3: Fit1Cycle: We trained the network employing the fit1cycle method originally developed by Leslie N. Smith:

3.Fit1Cycle:我們使用了最初由Leslie N. Smith開發的fit1cycle訓練方法。

"

Artificial Intelligence & Paleontology: Use Deep Learning to search for Microfossils

In this posting we show a Deep Learning-based method for fully automated microfossil identification and extraction in bore core samples acquired via MicroCT. For the identification we developed a Deep Learning approach which resulted in a high rate of correct microfossil identification (98% IoU). To validate it we use ground truths generated by specialists in the micropaleontology field. We also present the first fully annotated MicroCT-acquired publicly available microfossils dataset.

在這篇文章中,我們展示了一種基於深度學習的微化石自動識別和提取方法,該方法可用於處理MicroCT掃描過的鑽孔岩心樣本。該方法實現了98%的微化石正確識別率。為了驗證該方法的準確性,我們使用了微體古生物學專家標註的正確樣本(進行評估)。此外,我們還提供了第一個完全標註的、公開可用的微化石數據集。

Artificial Intelligence & Paleontology

人工只能與古生物學

The applicability of the computational analysis of paleontological images ranges from the study of animals, plants and evolution of microorganisms to the simulation of the habitat of living beings of a given epoch.

古生物圖像的計算分析(即計算機視覺分析)可用於動植物研究、微生物進化和特定時期生物棲息環境的模擬。

But nowadays paleontology is not only a pure science anymore. It also can be applied to solve problems in areas of economical activity, such as oil exploration, where there are several factors to be analyzed in order to identify the potential of an exploration site and minimize the expenses related to the oil extraction process. One factor is the characterization of the environment to be explored. This analysis can occur in several ways: use of probes, extraction of samples for petrophysical components evaluation or the correlation with logs of other drilling wells.

如今的古生物學已經不再僅僅是一門純粹的科學了。它還可以用於解決經濟活動領域的問題,例如說石油勘探。為了確定勘探點的潛力,並最大限度地減少採油過程中的相關費用,石油勘探需要考慮很多因素。而分析勘探環境的特徵就是其中一個因素。這種分析可以通過(以下)幾種方式進行:使用鑽井、提取樣本進行岩石物理組分分析、與其他鑽井的勘探日誌進行關聯分析。

When we look at samples extraction, fossils found in sedimentary rock are central for the characterization of this rock. Here Computed Tomography (CT) is of importance because it preserves the sample and makes it available for several analyzes. Based on 3D images generated by CT, several analyzes and simulations can be performed and processes, currently performed manually and exhaustively, can be automated.

當我們觀察岩石樣本時,在沉積岩中發現的那些化石對於岩石的表徵至關重要。在這裡,計算機斷層掃描(Computed Tomography,CT)有著非常重要的作用。因為它可以保留樣本,並使其能用於多種分析。基於CT生成的3D圖像,(我們)可以進行若干的分析和模擬;而且,目前這種手工的、勞累的處理流程可以實現自動化。

Imagine the following scenario: A paleontologist receives a rock sample with microfossils for analysis. The time needed for the complete process of microfossils isolation, performed manually, is long and after this process the rock sample is destroyed. After this, the paleontologist will analyze the microfossils in physical isolation with a microscope and classify manually each isolated fossil.

想象以下場景:一位古生物學家收到了一塊含有微化石的岩石樣本,他要對其進行分析。通過手工作業對微化石進行隔離(處理)的完整過程非常耗時,而且經過此處理後岩石樣本將被完全破壞。完成這一步驟後,古生物學家將使用顯微鏡對物理隔絕的微化石進行分析和分類。

人工智能與古生物學:使用深度學習發掘微化石

A few microfossils manually extracted from our carbonatic rocks and photographed with a Z-stack microscope

從碳酸鹽岩石中手工提取的一些微化石(使用了Z堆疊顯微拍照)

Now let's hypothesize that the company where this paleontologist works acquired an AI-based tomographic image analysis software, specific for microfossil analysis. This software performs the microfossil identification and extraction from the rock sample automatically with minimal or no supervision at all. The paleontologist now can load his tomographic sample, select the specific pipeline and leave the procedure executing during the night, leaving to the paleontologist only the work of evaluating the obtained results and classify each extracted microfossil.

現在讓我們來假設一下,這個古生物學家工作的公司獲得了一款基於AI的、專門用於微化石的斷層圖像分析軟件。該軟件可以自動識別和提取岩石樣本中的微化石,整個過程只需要很少或者完全不需要人力干預。加載(要分析的)斷層掃描樣本,選擇合適的處理流程,讓程序在夜間運行。如此一來,古生物學家的工作就只剩下:對處理的結果進行評估,對提取得到的微化石進行分類。

人工智能與古生物學:使用深度學習發掘微化石

The before & after of microfossil analysis…

微化石分析方式對照

Microfossil Identification and Classification with Deep Learning

基於深度學習的微化石識別和分類

You can perform this workflow above employing Semantic Segmentation with Deep Learning. Let's see how…

你可以使用深度學習中的語義分割模型完成上述工作。下面讓我們看一下大致步驟。

We employed a scanned carbonatic rock sample obtained from a drilling rig probe collected at the Sergipe Basin Quaternary sediments, off the northeast coast of Brazil. For the training of our Semantic Segmentation network, a team of micropalentologists generated a Ground Truth for this rock sample, segmenting and classifying manually the whole MicroCT volume. The scanner used to digitalize the sample was a Versa XRM-500 (ZEISS/XRadia), with a volume size of 956x1004x983 voxel.

用於CT掃描的碳酸鹽岩石樣本通過鑽井勘探獲得,其來源為位於巴西東北海岸的塞爾希培盆地第四紀沉積物。為了訓練我們的語義分割網絡,一個由古生物學家組成的團隊對這些岩石樣本的MicroCT影像進行了手工標記,標記的內容包括(微化石的)分割和分類。用於岩石樣品數字化的(CT)掃描儀為Versa XRM-500(ZEISS / XRadia),由此獲得的CT影像的體積為956x1004x983體素。

The full dataset, together with additional explanations and the specialist-annotated Ground Truth of manually segmented images is available here: http://www.lapix.ufsc.br/microfossil-segmentation .

包含了標記信息的完整數據集可在此處獲得:http://www.lapix.ufsc.br/microfossil-segmentation 。

人工智能與古生物學:使用深度學習發掘微化石

Digitalized rock sample

數字化的岩石樣本

What we employed was the fast.ai/PyTorch framework and with hyperparameter optimization (HYPO) we acquired a IoU of 0.98 with UNET + ResNet34 (or ResNet50). The different kinds of HYPO are important here and we will explain them below.

我們在這裡採用的框架是fast.ai/PyTorch框架,使用的模型則是UNET+ResNet34(或ResNet50)。經過超參調優之後,模型獲得了0.98的IoU。下面是超參優化的幾點解釋。

HYPO #1: Variable Resolution: The MicroCT data posed a challenge: the memory requirements imposed by both the dataset size we employed and the UNET architecture would strongly limit the batch size of our training set. In order to overcome this limitation and be able to initially work with larger batch sizes and train the network at a faster pace, we employed a step-wise progressive improving image resolution training strategy.

1.可變分辨率:MicroCT數據給我們的訓練帶來了挑戰。由於我們數據集很大,而且U-NET架構本身要求的內存也很大,所以我們在訓練過程中能使用的批次大小會受到極大的限制。為了突破這一限制,在訓練的最初階段能使用更大的批次,我們採用了逐步改進圖像分辨率的訓練策略。

HYPO #2: Differential Learning Rate: Another strategy for fine-tuning our model is the Differential Learning Rates (DLR) strategy, also informally presented by Jeremy Howard during a lecture of the same fast.ai course series.

2.差異學習率:我們進行模型微調時採用了另一個策略是差異學習率(Differential Learning Rates,DLR)策略。Jeremy Howard在fast.ai的系列課程講座中非正式地介紹了這一策略。

HYPO # 3: Fit1Cycle: We trained the network employing the fit1cycle method originally developed by Leslie N. Smith:

3.Fit1Cycle:我們使用了最初由Leslie N. Smith開發的fit1cycle訓練方法。

人工智能與古生物學:使用深度學習發掘微化石

The step-wise progressive improving image resolution training strategy

逐步改進圖像分辨率的訓練策略。

Want to see some results?

來看看結果吧

(A) shows microfossils identified and segmented by the network in a MicroCT slice. (B) shows the fossil in detail and (C) shows a microtomographic image of the fossil after it was manually extracted. (D) shows a microscope photograph of the same sample.

A圖顯示了MicroCT切片圖像經過模型識別和分割後的微化石。B圖顯示了微化石的細節。C圖顯示了手工處理後的微化石顯微圖像。D圖顯示了同一樣本的顯微鏡照片。

"

Artificial Intelligence & Paleontology: Use Deep Learning to search for Microfossils

In this posting we show a Deep Learning-based method for fully automated microfossil identification and extraction in bore core samples acquired via MicroCT. For the identification we developed a Deep Learning approach which resulted in a high rate of correct microfossil identification (98% IoU). To validate it we use ground truths generated by specialists in the micropaleontology field. We also present the first fully annotated MicroCT-acquired publicly available microfossils dataset.

在這篇文章中,我們展示了一種基於深度學習的微化石自動識別和提取方法,該方法可用於處理MicroCT掃描過的鑽孔岩心樣本。該方法實現了98%的微化石正確識別率。為了驗證該方法的準確性,我們使用了微體古生物學專家標註的正確樣本(進行評估)。此外,我們還提供了第一個完全標註的、公開可用的微化石數據集。

Artificial Intelligence & Paleontology

人工只能與古生物學

The applicability of the computational analysis of paleontological images ranges from the study of animals, plants and evolution of microorganisms to the simulation of the habitat of living beings of a given epoch.

古生物圖像的計算分析(即計算機視覺分析)可用於動植物研究、微生物進化和特定時期生物棲息環境的模擬。

But nowadays paleontology is not only a pure science anymore. It also can be applied to solve problems in areas of economical activity, such as oil exploration, where there are several factors to be analyzed in order to identify the potential of an exploration site and minimize the expenses related to the oil extraction process. One factor is the characterization of the environment to be explored. This analysis can occur in several ways: use of probes, extraction of samples for petrophysical components evaluation or the correlation with logs of other drilling wells.

如今的古生物學已經不再僅僅是一門純粹的科學了。它還可以用於解決經濟活動領域的問題,例如說石油勘探。為了確定勘探點的潛力,並最大限度地減少採油過程中的相關費用,石油勘探需要考慮很多因素。而分析勘探環境的特徵就是其中一個因素。這種分析可以通過(以下)幾種方式進行:使用鑽井、提取樣本進行岩石物理組分分析、與其他鑽井的勘探日誌進行關聯分析。

When we look at samples extraction, fossils found in sedimentary rock are central for the characterization of this rock. Here Computed Tomography (CT) is of importance because it preserves the sample and makes it available for several analyzes. Based on 3D images generated by CT, several analyzes and simulations can be performed and processes, currently performed manually and exhaustively, can be automated.

當我們觀察岩石樣本時,在沉積岩中發現的那些化石對於岩石的表徵至關重要。在這裡,計算機斷層掃描(Computed Tomography,CT)有著非常重要的作用。因為它可以保留樣本,並使其能用於多種分析。基於CT生成的3D圖像,(我們)可以進行若干的分析和模擬;而且,目前這種手工的、勞累的處理流程可以實現自動化。

Imagine the following scenario: A paleontologist receives a rock sample with microfossils for analysis. The time needed for the complete process of microfossils isolation, performed manually, is long and after this process the rock sample is destroyed. After this, the paleontologist will analyze the microfossils in physical isolation with a microscope and classify manually each isolated fossil.

想象以下場景:一位古生物學家收到了一塊含有微化石的岩石樣本,他要對其進行分析。通過手工作業對微化石進行隔離(處理)的完整過程非常耗時,而且經過此處理後岩石樣本將被完全破壞。完成這一步驟後,古生物學家將使用顯微鏡對物理隔絕的微化石進行分析和分類。

人工智能與古生物學:使用深度學習發掘微化石

A few microfossils manually extracted from our carbonatic rocks and photographed with a Z-stack microscope

從碳酸鹽岩石中手工提取的一些微化石(使用了Z堆疊顯微拍照)

Now let's hypothesize that the company where this paleontologist works acquired an AI-based tomographic image analysis software, specific for microfossil analysis. This software performs the microfossil identification and extraction from the rock sample automatically with minimal or no supervision at all. The paleontologist now can load his tomographic sample, select the specific pipeline and leave the procedure executing during the night, leaving to the paleontologist only the work of evaluating the obtained results and classify each extracted microfossil.

現在讓我們來假設一下,這個古生物學家工作的公司獲得了一款基於AI的、專門用於微化石的斷層圖像分析軟件。該軟件可以自動識別和提取岩石樣本中的微化石,整個過程只需要很少或者完全不需要人力干預。加載(要分析的)斷層掃描樣本,選擇合適的處理流程,讓程序在夜間運行。如此一來,古生物學家的工作就只剩下:對處理的結果進行評估,對提取得到的微化石進行分類。

人工智能與古生物學:使用深度學習發掘微化石

The before & after of microfossil analysis…

微化石分析方式對照

Microfossil Identification and Classification with Deep Learning

基於深度學習的微化石識別和分類

You can perform this workflow above employing Semantic Segmentation with Deep Learning. Let's see how…

你可以使用深度學習中的語義分割模型完成上述工作。下面讓我們看一下大致步驟。

We employed a scanned carbonatic rock sample obtained from a drilling rig probe collected at the Sergipe Basin Quaternary sediments, off the northeast coast of Brazil. For the training of our Semantic Segmentation network, a team of micropalentologists generated a Ground Truth for this rock sample, segmenting and classifying manually the whole MicroCT volume. The scanner used to digitalize the sample was a Versa XRM-500 (ZEISS/XRadia), with a volume size of 956x1004x983 voxel.

用於CT掃描的碳酸鹽岩石樣本通過鑽井勘探獲得,其來源為位於巴西東北海岸的塞爾希培盆地第四紀沉積物。為了訓練我們的語義分割網絡,一個由古生物學家組成的團隊對這些岩石樣本的MicroCT影像進行了手工標記,標記的內容包括(微化石的)分割和分類。用於岩石樣品數字化的(CT)掃描儀為Versa XRM-500(ZEISS / XRadia),由此獲得的CT影像的體積為956x1004x983體素。

The full dataset, together with additional explanations and the specialist-annotated Ground Truth of manually segmented images is available here: http://www.lapix.ufsc.br/microfossil-segmentation .

包含了標記信息的完整數據集可在此處獲得:http://www.lapix.ufsc.br/microfossil-segmentation 。

人工智能與古生物學:使用深度學習發掘微化石

Digitalized rock sample

數字化的岩石樣本

What we employed was the fast.ai/PyTorch framework and with hyperparameter optimization (HYPO) we acquired a IoU of 0.98 with UNET + ResNet34 (or ResNet50). The different kinds of HYPO are important here and we will explain them below.

我們在這裡採用的框架是fast.ai/PyTorch框架,使用的模型則是UNET+ResNet34(或ResNet50)。經過超參調優之後,模型獲得了0.98的IoU。下面是超參優化的幾點解釋。

HYPO #1: Variable Resolution: The MicroCT data posed a challenge: the memory requirements imposed by both the dataset size we employed and the UNET architecture would strongly limit the batch size of our training set. In order to overcome this limitation and be able to initially work with larger batch sizes and train the network at a faster pace, we employed a step-wise progressive improving image resolution training strategy.

1.可變分辨率:MicroCT數據給我們的訓練帶來了挑戰。由於我們數據集很大,而且U-NET架構本身要求的內存也很大,所以我們在訓練過程中能使用的批次大小會受到極大的限制。為了突破這一限制,在訓練的最初階段能使用更大的批次,我們採用了逐步改進圖像分辨率的訓練策略。

HYPO #2: Differential Learning Rate: Another strategy for fine-tuning our model is the Differential Learning Rates (DLR) strategy, also informally presented by Jeremy Howard during a lecture of the same fast.ai course series.

2.差異學習率:我們進行模型微調時採用了另一個策略是差異學習率(Differential Learning Rates,DLR)策略。Jeremy Howard在fast.ai的系列課程講座中非正式地介紹了這一策略。

HYPO # 3: Fit1Cycle: We trained the network employing the fit1cycle method originally developed by Leslie N. Smith:

3.Fit1Cycle:我們使用了最初由Leslie N. Smith開發的fit1cycle訓練方法。

人工智能與古生物學:使用深度學習發掘微化石

The step-wise progressive improving image resolution training strategy

逐步改進圖像分辨率的訓練策略。

Want to see some results?

來看看結果吧

(A) shows microfossils identified and segmented by the network in a MicroCT slice. (B) shows the fossil in detail and (C) shows a microtomographic image of the fossil after it was manually extracted. (D) shows a microscope photograph of the same sample.

A圖顯示了MicroCT切片圖像經過模型識別和分割後的微化石。B圖顯示了微化石的細節。C圖顯示了手工處理後的微化石顯微圖像。D圖顯示了同一樣本的顯微鏡照片。

人工智能與古生物學:使用深度學習發掘微化石

What have we learned?

總結

Convolutional neural networks and deep learning can successfully be applied to solve problems in Marine Micropaleontology, a field where you wouldn't expect AI would find an application. A semantic segmentation network can be trained to reliably find the microfossils in microtomographic images of sedimentary rocks, gained from bore cores obtained at oil exploration rigs. An off-the-shelf UNet framework with a ResNet34 as its network is enough to solve this problem.

卷積神經網絡和深度學習可以成功解決海洋微觀古生物學中的問題,而你本來以為該領域(的問題)其實是和AI無緣的。一個經過訓練的語義分割網絡可以可靠地找到沉積岩(石油勘探鑽井獲得的鑽孔岩心樣本)顯微圖像中的微化石。一個現成的、基於ResNet34的UNet模型就足以解決這一問題。

We also tested other networks. Look at our paper on bioRxiv to see the results we obtained with different models.

我們還測試了其他的模型結構。你可以查閱我們在bioRxiv上發表的論文獲得相關模型的結果信息。

"

Artificial Intelligence & Paleontology: Use Deep Learning to search for Microfossils

In this posting we show a Deep Learning-based method for fully automated microfossil identification and extraction in bore core samples acquired via MicroCT. For the identification we developed a Deep Learning approach which resulted in a high rate of correct microfossil identification (98% IoU). To validate it we use ground truths generated by specialists in the micropaleontology field. We also present the first fully annotated MicroCT-acquired publicly available microfossils dataset.

在這篇文章中,我們展示了一種基於深度學習的微化石自動識別和提取方法,該方法可用於處理MicroCT掃描過的鑽孔岩心樣本。該方法實現了98%的微化石正確識別率。為了驗證該方法的準確性,我們使用了微體古生物學專家標註的正確樣本(進行評估)。此外,我們還提供了第一個完全標註的、公開可用的微化石數據集。

Artificial Intelligence & Paleontology

人工只能與古生物學

The applicability of the computational analysis of paleontological images ranges from the study of animals, plants and evolution of microorganisms to the simulation of the habitat of living beings of a given epoch.

古生物圖像的計算分析(即計算機視覺分析)可用於動植物研究、微生物進化和特定時期生物棲息環境的模擬。

But nowadays paleontology is not only a pure science anymore. It also can be applied to solve problems in areas of economical activity, such as oil exploration, where there are several factors to be analyzed in order to identify the potential of an exploration site and minimize the expenses related to the oil extraction process. One factor is the characterization of the environment to be explored. This analysis can occur in several ways: use of probes, extraction of samples for petrophysical components evaluation or the correlation with logs of other drilling wells.

如今的古生物學已經不再僅僅是一門純粹的科學了。它還可以用於解決經濟活動領域的問題,例如說石油勘探。為了確定勘探點的潛力,並最大限度地減少採油過程中的相關費用,石油勘探需要考慮很多因素。而分析勘探環境的特徵就是其中一個因素。這種分析可以通過(以下)幾種方式進行:使用鑽井、提取樣本進行岩石物理組分分析、與其他鑽井的勘探日誌進行關聯分析。

When we look at samples extraction, fossils found in sedimentary rock are central for the characterization of this rock. Here Computed Tomography (CT) is of importance because it preserves the sample and makes it available for several analyzes. Based on 3D images generated by CT, several analyzes and simulations can be performed and processes, currently performed manually and exhaustively, can be automated.

當我們觀察岩石樣本時,在沉積岩中發現的那些化石對於岩石的表徵至關重要。在這裡,計算機斷層掃描(Computed Tomography,CT)有著非常重要的作用。因為它可以保留樣本,並使其能用於多種分析。基於CT生成的3D圖像,(我們)可以進行若干的分析和模擬;而且,目前這種手工的、勞累的處理流程可以實現自動化。

Imagine the following scenario: A paleontologist receives a rock sample with microfossils for analysis. The time needed for the complete process of microfossils isolation, performed manually, is long and after this process the rock sample is destroyed. After this, the paleontologist will analyze the microfossils in physical isolation with a microscope and classify manually each isolated fossil.

想象以下場景:一位古生物學家收到了一塊含有微化石的岩石樣本,他要對其進行分析。通過手工作業對微化石進行隔離(處理)的完整過程非常耗時,而且經過此處理後岩石樣本將被完全破壞。完成這一步驟後,古生物學家將使用顯微鏡對物理隔絕的微化石進行分析和分類。

人工智能與古生物學:使用深度學習發掘微化石

A few microfossils manually extracted from our carbonatic rocks and photographed with a Z-stack microscope

從碳酸鹽岩石中手工提取的一些微化石(使用了Z堆疊顯微拍照)

Now let's hypothesize that the company where this paleontologist works acquired an AI-based tomographic image analysis software, specific for microfossil analysis. This software performs the microfossil identification and extraction from the rock sample automatically with minimal or no supervision at all. The paleontologist now can load his tomographic sample, select the specific pipeline and leave the procedure executing during the night, leaving to the paleontologist only the work of evaluating the obtained results and classify each extracted microfossil.

現在讓我們來假設一下,這個古生物學家工作的公司獲得了一款基於AI的、專門用於微化石的斷層圖像分析軟件。該軟件可以自動識別和提取岩石樣本中的微化石,整個過程只需要很少或者完全不需要人力干預。加載(要分析的)斷層掃描樣本,選擇合適的處理流程,讓程序在夜間運行。如此一來,古生物學家的工作就只剩下:對處理的結果進行評估,對提取得到的微化石進行分類。

人工智能與古生物學:使用深度學習發掘微化石

The before & after of microfossil analysis…

微化石分析方式對照

Microfossil Identification and Classification with Deep Learning

基於深度學習的微化石識別和分類

You can perform this workflow above employing Semantic Segmentation with Deep Learning. Let's see how…

你可以使用深度學習中的語義分割模型完成上述工作。下面讓我們看一下大致步驟。

We employed a scanned carbonatic rock sample obtained from a drilling rig probe collected at the Sergipe Basin Quaternary sediments, off the northeast coast of Brazil. For the training of our Semantic Segmentation network, a team of micropalentologists generated a Ground Truth for this rock sample, segmenting and classifying manually the whole MicroCT volume. The scanner used to digitalize the sample was a Versa XRM-500 (ZEISS/XRadia), with a volume size of 956x1004x983 voxel.

用於CT掃描的碳酸鹽岩石樣本通過鑽井勘探獲得,其來源為位於巴西東北海岸的塞爾希培盆地第四紀沉積物。為了訓練我們的語義分割網絡,一個由古生物學家組成的團隊對這些岩石樣本的MicroCT影像進行了手工標記,標記的內容包括(微化石的)分割和分類。用於岩石樣品數字化的(CT)掃描儀為Versa XRM-500(ZEISS / XRadia),由此獲得的CT影像的體積為956x1004x983體素。

The full dataset, together with additional explanations and the specialist-annotated Ground Truth of manually segmented images is available here: http://www.lapix.ufsc.br/microfossil-segmentation .

包含了標記信息的完整數據集可在此處獲得:http://www.lapix.ufsc.br/microfossil-segmentation 。

人工智能與古生物學:使用深度學習發掘微化石

Digitalized rock sample

數字化的岩石樣本

What we employed was the fast.ai/PyTorch framework and with hyperparameter optimization (HYPO) we acquired a IoU of 0.98 with UNET + ResNet34 (or ResNet50). The different kinds of HYPO are important here and we will explain them below.

我們在這裡採用的框架是fast.ai/PyTorch框架,使用的模型則是UNET+ResNet34(或ResNet50)。經過超參調優之後,模型獲得了0.98的IoU。下面是超參優化的幾點解釋。

HYPO #1: Variable Resolution: The MicroCT data posed a challenge: the memory requirements imposed by both the dataset size we employed and the UNET architecture would strongly limit the batch size of our training set. In order to overcome this limitation and be able to initially work with larger batch sizes and train the network at a faster pace, we employed a step-wise progressive improving image resolution training strategy.

1.可變分辨率:MicroCT數據給我們的訓練帶來了挑戰。由於我們數據集很大,而且U-NET架構本身要求的內存也很大,所以我們在訓練過程中能使用的批次大小會受到極大的限制。為了突破這一限制,在訓練的最初階段能使用更大的批次,我們採用了逐步改進圖像分辨率的訓練策略。

HYPO #2: Differential Learning Rate: Another strategy for fine-tuning our model is the Differential Learning Rates (DLR) strategy, also informally presented by Jeremy Howard during a lecture of the same fast.ai course series.

2.差異學習率:我們進行模型微調時採用了另一個策略是差異學習率(Differential Learning Rates,DLR)策略。Jeremy Howard在fast.ai的系列課程講座中非正式地介紹了這一策略。

HYPO # 3: Fit1Cycle: We trained the network employing the fit1cycle method originally developed by Leslie N. Smith:

3.Fit1Cycle:我們使用了最初由Leslie N. Smith開發的fit1cycle訓練方法。

人工智能與古生物學:使用深度學習發掘微化石

The step-wise progressive improving image resolution training strategy

逐步改進圖像分辨率的訓練策略。

Want to see some results?

來看看結果吧

(A) shows microfossils identified and segmented by the network in a MicroCT slice. (B) shows the fossil in detail and (C) shows a microtomographic image of the fossil after it was manually extracted. (D) shows a microscope photograph of the same sample.

A圖顯示了MicroCT切片圖像經過模型識別和分割後的微化石。B圖顯示了微化石的細節。C圖顯示了手工處理後的微化石顯微圖像。D圖顯示了同一樣本的顯微鏡照片。

人工智能與古生物學:使用深度學習發掘微化石

What have we learned?

總結

Convolutional neural networks and deep learning can successfully be applied to solve problems in Marine Micropaleontology, a field where you wouldn't expect AI would find an application. A semantic segmentation network can be trained to reliably find the microfossils in microtomographic images of sedimentary rocks, gained from bore cores obtained at oil exploration rigs. An off-the-shelf UNet framework with a ResNet34 as its network is enough to solve this problem.

卷積神經網絡和深度學習可以成功解決海洋微觀古生物學中的問題,而你本來以為該領域(的問題)其實是和AI無緣的。一個經過訓練的語義分割網絡可以可靠地找到沉積岩(石油勘探鑽井獲得的鑽孔岩心樣本)顯微圖像中的微化石。一個現成的、基於ResNet34的UNet模型就足以解決這一問題。

We also tested other networks. Look at our paper on bioRxiv to see the results we obtained with different models.

我們還測試了其他的模型結構。你可以查閱我們在bioRxiv上發表的論文獲得相關模型的結果信息。

人工智能與古生物學:使用深度學習發掘微化石

"

相關推薦

推薦中...