The Quantitative Imaging Network The National Cancer Institute Then: 1939 And Now: 2016


The NCI Informatics Technology for Cancer Research (ITCR) Program



Yüklə 4,95 Mb.
səhifə3/10
tarix30.10.2018
ölçüsü4,95 Mb.
#75971
1   2   3   4   5   6   7   8   9   10

The NCI Informatics Technology for Cancer Research (ITCR) Program

Juli Klemm, Ph.D.

NCI Center for Biomedical Informatics and Information Technology

The NCI Informatics Technology for Cancer Research (ITCR) Program supports investigator-initiated informatics technology development driven by critical needs that span the cancer research continuum including cancer biology, cancer treatment and diagnosis, cancer prevention, cancer control and epidemiology, and cancer health disparities. The program supports these activities through four funding opportunities aligned with the informatics development lifecycle:

1) Development of innovative computational algorithms and mathematical methods (R21);

2) Early stage software development (U01);

3) Advanced stage software development (U24); and

4) Sustainment of highly-accessed resources (U24).



As the number of investigators has grown, it has developed into a “community of practice” for cancer informatics and two working groups have formed around areas of common interest to investigators. This includes a Training and Outreach WG, focused on topics such as the use of social media in outreach, best practices for measuring tool use and impact, and advocating for a special journal issue focused on cancer informatics. Discussions in the Technical WG have included best practices for using container and workflow technology as well as identifying opportunities for tool integration to support integrative cancer research. Information about the program, including the supported tools, is available at http://itcr.nci.nih.gov.

NCI / CBIIT Considerations Regarding Common Data Models and Open Standards

Edward Helton, Ph.D.

NCI / CBIIT
CBIIT Cancer Informatics Branch has initiatives that directly or indirectly support QIN to include: NBIA O&M, Medici Challenge Technology O&M and Help Desk, ITCR, Cancer Data Ecosystem and Imaging and Clinical Data Informatics for the Cancer Moonshot Data Sharing. CIB is also involved in harmonization of common data models and the open standards for evidence generation such as the ISO standards DICOM and BRIDG (holding the FDA electronic submission standard SDTM). The Session addresses the exploration of a common clinical data model (TCGA/GDC) use in QIN/TCIA, and possible future thoughts of harmonization with open clinical data standards used by FDA and the pharmaceutical industry use for e-submissions/EHR.

TCIA Update
John Freymann, Justin Kirby
CIP / Fredrick National Lab
The Cancer Imaging Archive (TCIA) provides de-identification and hosting services in order to create an open-access database for cancer research.  These critically important services enable researchers to share data without having to assume the legal and technical burdens associated with safely de-identifying data.  TCIA processes address these issues in a way that complies with HIPAA legislation while also retaining the maximum amount of information necessary to enable downstream analysis.  At the moment there are over 66 distinct data sets which include more than 30,000 patients.  More than 435 publications have been written about data contained in TCIA.  The vast majority of the data is freely available under a creative commons license which enables researchers to explore new hypotheses or replicate the findings of publications based on these collections.  
Shortly after the inception of NCI's Quantitative Imaging Network TCIA was selected as the official repository for enabling data sharing and collaboration among QIN investigators.  Since then TCIA has become home to at least a dozen data sets which were contributed by QIN investigators.  In general, the QIN data sharing activities fall into three categories:

  1. Data shared among QIN sites for collaborative research projects

  2. Data shared among QIN sites for challenge competitions

  3. Data shared with the broader research community in connection with QIN published manuscripts

In this session we will provide an overview of TCIA functionality and services with a focus on some of our more recently added features.  We will then discuss some of the existing QIN activities which have benefited from the use of TCIA.  Finally, we will introduce ideas for how QIN can lead the research community in creating best practices for sharing non-image data.  Specifically, we will introduce our suggestions for adoption of a standardized models for sharing clinical data and image analysis data.  Subsequent speakers in this session will then take a deeper dive to flesh out these ideas and solicit QIN feedback.

The Genomic Data Commons
Mark Jensen, Ph.D.
NCI / Frederick National Lab

The Genomic Data Commons has developed a flexible, graph-based underlying model to manage and harmonize metadata associated with cancer genomics data. Clinical and bio-specimen data are first-class items this system. After an overview of the GDC’s strategy for choosing its initial minimal clinical set of fields, I will outline the data model organization and the high-level flow and processing of clinical data. This will include an overview of the GDC data dictionary content and its relationship to NCI-cataloged CDEs. I will discuss how users and data submitters interact with these to get clinical data in and out of the GDC. I will finish by describing how the GDC currently works directly with new submitters to harmonize their fields or add new ones.



Assessing gaps in the GDC model to support QIN clinical data sharing

Dr. David Clunie, MBBS, FRANZCR (Ret.), FSIIM

The NCI Genomic Data Common (GDC) model is emerging as the currently preferred set of data elements and values to describe various clinically-related data sharing activities beyond its primary role for genetic data. To the extent that the GDC model represents a current consensus after review and consolidation of various alternative sources, it seems obvious that it should be used for NCI image sharing activities, such as those within or linked to TCIA, unless significant weaknesses or gaps are identified.

To identify gaps, the currently promulgated GDC Common Cross-Study Clinical Data Elements [1] were compared with "clinical data" that had been captured in various ad hoc and standard formats for various different TCIA-related collections and projects, including:


  • Iowa head and neck QIICR project incorporating demographic, staging, radiotherapy and outcome clinical data [2]

  • I SPY-1 (ACRIN 6657) breast cancer neoadjuvant chemotherapy DCE MRI [3]

In addition, various standards for acquisition of clinical data were compared, including:

  • DICOM Relevant Patient Information Template [4]

  • ACR National Mammography Database [5]

  • BRIDG 5.0 [6]

References

1. GDC Common Cross-Study Clinical Data Elements. 2017. Available from: "http://gdc.cancer.gov/documentation/selecting-common-cross-study-clinical-data-elements".

2. Fedorov A. et al. DICOM for quantitative imaging biomarker development: a standards based approach to sharing clinical data and structured PET/CT analysis results in head and neck cancer research. 2016. Available from: "http://peerj.com/articles/2057/".

3. ISPY1. TCIA. Available from: "http://wiki.cancerimagingarchive.net/display/Public/ISPY1".

4. DICOM PS3.16. 2017a. Relevant Patient Information Templates. Available from: "http://dicom.nema.org/medical/dicom/current/output/chtml/part16/sect_RelevantPatientInformationTemplates.html".

5. ACR. National Mammography Database . Available from: "http://www.acr.org/Quality-Safety/National-Radiology-Data-Registry/National-Mammography-DB".

6. Biomedical Research Integrated Domain Group. BRIDG 5.0. 2017/01/31. Available from: "http://bridgmodel.nci.nih.gov/download-model/bridg-releases/release-5-0".

Poster Abstracts

QIN Research Team’s Progress


Somatic mutations drive distinct imaging phenotypes in lung cancer

Emmanuel Rios Velazquez1*, Chintan Parmar1*, Ying Liu5,7*, Thibaud P. Coroller1, Gisele Cruz2, Olya Stringfield5, Zhaoxiang Ye7, Mike Makrigiorgos1, Fiona Fennessy2, Raymond H. Mak1, Robert Gillies5, John Quackenbush3,4,6, Hugo J.W.L. Aerts1,2,3,#

Departments of 1Radiation Oncology and 2Radiology, Dana-Farber Cancer Institute, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA, Departments of 3Biostatistics & Computational Biology, and 4Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA, 5Departments of Cancer Imaging and Metabolism, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA, 6Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA, 7Department of Radiology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center of Cancer,

Key Laboratory of Cancer Prevention and Therapy

* Contributed equally

# Corresponding author
Introduction: Tumors are characterized by somatic mutations that drive biological processes, which are ultimately reflected in the tumor phenotype. Quantitative radiomics non-invasively characterizes tumor phenotypes by applying a large panel of automatic image characterization algorithms to extract quantitative features from medical images. However, precise genotype-phenotype interactions through which somatic mutations influence radiographic phenotypes remain largely unknown. Here, we present an integrated analysis of independent datasets of 763 lung adenocarcinoma patients with somatic mutation testing and quantitative computed tomography (CT) image analytics that demonstrates somatic mutations are strongly associated with imaging phenotypes.

Methods: Four independent cohorts of three different institutes were curated after being approved by the institutional review boards of each institute; Profile (n=213) and Harvard-RT (n=162) (Dana-Farber/Harvard Cancer Center, Boston, MA), Tianjin (n=257) (Tianjin Medical University, Tianjin, China) and Moffitt (n=131) (Moffitt Cancer Center, Tampa, FL). The tumor imaging phenotype was described using a set of quantitative radiomic features extracted from the segmented tumor regions on the CT scans. All features, extraction methods and tools have been described previously1,2.

In univariate analysis, we used an unsupervised two-step feature selection methodology. First, we used the test-retest stability (Intra-class correlation coefficient) of the radiomic features and selected stable radiomic features (ICC>=0.8)3. In a second step, we performed a principal component (PCA) based analysis4 and selected 26 variance retaining independent features (Variance=90%, Pearson r>0.90). We compared the radiomic features distributions between mutated and non-mutated cases for each gene using a two-sided Wilcoxon test with FDR = 5% correction5. For the multivariate analysis, we used a temporal split (median scan acquisition date) to divide each of the four cohorts into training and validation sets, which were respectively integrated into discovery and validation cohorts. We developed radiomic signatures capable of distinguishing between tumor genotypes in a discovery cohort (n = 353). In order to develop radiomic signature, we used Minimum redundancy maximum relevance (MRMR) feature selection method and trained random forest classifier in the discovery cohort as described previously6. The performance of the model was then validated in the validation cohort (n=352) using area under receiver operator characteristics curve. All analyses were performed using Matlab (Version R2012b, The Mathworks, Natick, MA) and R (Version 3.0.2).

Results: In univariate analysis, we found sixteen radiomic features to be significantly associated with EGFR mutations and ten features associated with KRAS mutations. We then compared radiographic features between EGFR and KRAS mutant tumors. We found fourteen significant features all of which were among the sixteen that distinguished EGFR mutant from EGFR non-mutated tumors. In multivariate analysis, we found a radiomic signature related to radiographic heterogeneity that could strongly discriminate between EGFR+ and EGFR- cases (AUC=0.69). Combining this signature with a clinical model of EGFR status (AUC=0.70) significantly improved the prediction accuracy (AUC=0.75). The highest performing signature was capable of distinguishing between EGFR+ and KRAS+ tumors (AUC=0.80) and when combined with a clinical model (AUC=0.81), substantially improved its performance (AUC=0.86). A KRAS+/KRAS- radiomic signature also showed significant, albeit lower, performance (AUC=0.63) and did not improve the accuracy of a clinical predictor of KRAS status.

Conclusion: These results suggest that somatic mutations drive distinct radiographic phenotypes that can be predicted using radiomics. Such radiomic-based tests can be applied non-invasively, repeatedly, and at low cost, providing an unprecedented opportunity for precision medicine applications.

References:

(1) Aerts H, Rios Velazquez E, Leijenaar R, Parmar C, Grossmann P, Cavalho S, et al. Decoding the tumor phenotype by non-invasive imaging using a quantitative radiomics approach. Nat Commun. 2014;

(2) Coroller TP, Grossmann P, Hou Y, Rios Velazquez E, Leijenaar RT, Hermann G, et al. CT-based radiomic signature predicts distant metastasis in lung adenocarcinoma. Radiother Oncol. 2015;114:345–50.


(3) Zhao B, James LP, Moskowitz CS, Guo P, Ginsberg MS, Lefkowitz RA, et al. Evaluating variability in tumor measurements from same-day repeat CT scans of patients with non-small cell lung cancer. Radiology. 2009;252:263–72.

(4) Lê, S., Josse, J., Husson, F. FactoMineR: An R Package for Multivariate Analysis. J Stat Softw. 2008;25(1):1–18.

(5) Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J R Stat Soc Series B Stat Methodol. Blackwell Publishing for the Royal Statistical Society; 1995;57:289–300.

(6) Parmar C, Grossmann P, Bussink J, Lambin P, Aerts HJWL. Machine Learning methods for Quantitative Radiomic Biomarkers. Sci Rep. Macmillan Publishers Limited; 2015;5:13087.




Highly accurate model for prediction of lung nodule malignancy with CT scans

Jason Causey1,7†, Junyu Zhang2†, Shiqian Ma3, Bo Jiang4, Jake Qualls1,7, David G. Politte5, Fred Prior6*, Shuzhong Zhang2* and Xiuzhen Huang1,7*
1Department of Computer Science, Arkansas State University,

Jonesboro, Arkansas 72467, United States of America

2Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, Minnesota 55455, United States of America

3Department of Systems Engineering and Engineering Management,

The Chinese University of Hong Kong, Shatin, N.T., Hong Kong

4Research Center for Management Science and Data Analytics, School of Information Management and Engineering, Shanghai University

of Finance and Economics, Shanghai 200433, China

5Mallinckrodt Institute of Radiology, Washington University,

St. Louis, Missouri 63110, United States of America

6Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, Arkansas 72205, United States of America

7The UALR/UAMS Joint Graduate Program in Bioinformatics, Little Rock, Arkansas 72204, United States of America
† The first two authors are considered as joint first authors.

*Corresponding authors


INTRODUCTION: CT examinations can be used to predict lung nodule malignancy in patients, which could greatly improve noninvasive early diagnosis of lung cancer. It is still challenging for current computational approaches to achieve performance comparable to experienced radiologists. To address this challenge, here we present an approach, NoduleX, for nodule malignancy prediction, based on deep learning convolutional neural network features.

MATERIALS AND METHODS: We studied the nodules reviewed by four experienced thoracic radiologists from the CT scans of 1018 patients of the LIDC/IDRI cohort. We processed 1065 nodules with different malignancy scores from 1-5 (with score 1 meaning highly unlikely to be malignant, score 2 or 3 indeterminate, score 4 moderately likely to be malignant, and score 5 highly likely to be malignant). The corresponding sets are denoted as S1, S2, S3, S4 and S5, respectively. We tested two designs: S1 versus S45, and S12 versus S45. For each design the data were grouped into completely independent training and validation sets (80% for training and 20% for validation). For the convolutional neural networks (CNN), parameter setups with different voxel/cube sizes were trained and tested, and the CNN classifiers were reinforced with Random Forest or AdaBoosting.

RESULTS: For the design of S1 versus S45, the best model on the validation set has an area under the receiver operating characteristic curve (AUC) of 0.974 (acc = 91.3%, sen = 88.5%, spc = 94.2%). The model performance is further improved when combined with previously identified quantitative image features (QIF), with an AUC of 0.993 (acc = 95.2%, sen = 94.2%, spc = 96.2%). For the design of S12 versus S45, the best model on the validation set has an AUC of 0.938 (acc = 87.9%, sen = 87.9%, spc = 87.9%). When combined with the QIF features, the model performance is further improved with an AUC of 0.971 (acc = 93.2%, sen = 87.9%, spc = 98.5%).

CONCLUSIONS: NoduleX has achieved high accuracy for nodule malignancy classification, which is commensurate with the analysis of the LIDC/IDRI cohort by experienced radiologists. Our approach provides an effective framework for highly accurate nodule malignancy prediction (classification) with the model trained on large datasets of CT scans. Our results and software are available from http://bioinformatics.astate.edu/NoduleX.



Radiomics of Non-Small Cell Lung Cancer (NSCLC)
Robert J. Gillies, PhD1, Dmitry Goldgof, PhD2, Lawrence O. Hall, PhD2,

Matthew B. Schabath, PhD1
1H. Lee Moffitt Cancer Center & Research Institute Tampa, Florida

2University of South Florida, Tampa, Florida
Background: In this QIN continuation study, we intend to build on prior work to incorporate radiomics into a decision support system for post-surgery management of non-small cell lung cancer (NSCLC) patients. NSCLC is the leading cause of cancer deaths worldwide and hence, even incremental improvements in decision support can have a profound impact on patients’ lives. We will use and extend the radiomics framework that we have developed to address a compelling and focused question in lung cancer care: whether to treat post-surgery patients with adjuvant chemotherapy (ACT). Virtually all NSCLC surgical candidates receive high-quality diagnostic CTs. Early stage NSCLC patients are commonly resected with lobectomy and mediastinal lymph node removal. Of these, up to 35% will experience distant recurrence within 5 years. Recurrence can be reduced with AT, yet the decision whether or not to treat is not trivial, as ACT is associated with significant morbidities and even mortality. This decision is currently ill-informed by cancer stage alone. There are no predictive models that can accurately identify which patients have the highest likelihood of recurrence, thus requiring most aggressive adjuvant follow-up
Goal: Our proposal will address this important clinical problem by development of a “Risk-of-Recurrence” score combining radiomic, clinical and genomic data curated from over 3,600 patients from two institutions (Moffitt Cancer Center [MCC], Tampa FL and Tianjin Medical Hospital [TMH], Tianjin China).
Methods: In Aim 1 we will first (1.1) assemble a two-institution cohort (Table 1) of over 3,600 pre-surgical lung cancer patients (Stage IA to IIIA) into a radiomic-genomic database. This database will include acquisition conditions (DICOM headers), extracted radiomic data (agnostic and semantic), clinical data (stage, histology, chemotherapy, surgical notes), genomic data (mutational status, expression profiles), and patient outcomes (recurrence, PS, OS). Consortium network partners (University of South Florida [USF] and Maastro Clinic) will create web-based tools for data input, management, and cloud storage. Then (1.2) we will we will analyze the acquired radiomic-genomic data in a Bayesian framework to develop a “risk-of-recurrence” score for individual patients to support a decision whether to treat with ACT. Then (1.3) we will investigate the influence of acquisition conditions (kVp, recon kernel, slice thickness, FOV, contrast) by determining the prognostic power of cohorts with similar acquisition conditions, and these will be informed by texture phantoms provided to both sites. In Aim 2, these data and analytical tools will be shared via unique tools developed for this purpose within the framework of the QIN. Sharing ideas, technology, and data through the QIN working groups and through multi-site collaborations has proven to be a critical success for the QIN. As such, we recognize that quantitative image biomarkers for NSCLC will only become qualified through a collaborative network, as individual centers have neither the volume nor the diversity to successfully develop and validate image biomarkers for wide clinical use.
Progress: Core data elements have been finalized at both institutions (MCC and TMH) and the curation of the pre-surgical CT images, clinical data, and genomic information are currently underway. Our USF consortium network partner has developed a web-based clinical-genomic data input tool and our Maastro consortium network partner is developing cloud-based data storage and management tools.


Table 1. Available NSCLC Patient Cohorts

Institution

PIs

Sample size for retrospective cohort

Dates

Prospective recruitment (#/year)

% Treated with ACT

Genomic Data

MCC

Gillies/Schabath

2,750

2008 to present

~615

~70%

Microarray gene expression on ~60%; tumor molecular data on ~30%

TMH

Ye

500

2013 to present

~300

~30%

Tumor molecular data on all

Yüklə 4,95 Mb.

Dostları ilə paylaş:
1   2   3   4   5   6   7   8   9   10




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə