The Quantitative Imaging Network The National Cancer Institute Then: 1939 And Now: 2016


Clinical Trial Design and Development Working Group Update 2017



Yüklə 4,95 Mb.
səhifə7/10
tarix30.10.2018
ölçüsü4,95 Mb.
#75971
1   2   3   4   5   6   7   8   9   10

Clinical Trial Design and Development Working Group Update 2017
Shu H1, Jones E2, Henderson, L3, Buatti JM4, Wahl R5, Kurland B6, Mankoff D7, Schwartz L8, Kinahan P9, Ryan C10, Rubin D11, Shim H1, Mountz J6, Gerstner E12, Hadjiyski L13, Williams J14, Linden H9
1Emory University, 2University of California San Francisco, 3NCI at NIH,

4University of Iowa, 5Washington University, 6University of Pittsburgh,

7University of Pennsylvania, 8Columbia University, 9University of Washington, 10Oregon Health and Science University, 11Stanford University, 12Massachusetts General Hospital, 13University of Michigan, 14Vanderbilt University.
The mission of the Clinical Trial Design and Development Working Group (CTDDWG) is to develop, validate and harmonize methods and tools of quantitative imaging for use in cancer clinical trials to predict outcome and tumor response to therapy. To advance this mission, the working group has pursued projects in several areas.

Major efforts of the CTDDWG this past year include the following:



  1. Completion and preparation of multiple manuscripts (detailed below).

  2. Outreach activities to the National Clinical Trials Network (NCTN) and professional societies to enhance visibility of the QIN.

  3. Assist in development of cross-institutional validation of QIN tools.

  4. Advancing the Pathways to Clinical Trials (PathCT) initiative.

Progress was made on three separate projects with plans to culminate with manuscripts. The first project was the QIN accrual survey, an effort led by Brenda Kurland, which included responses for 25 prospective trials from 12 sites. This project was completed with publication in Tomography, 2:276-82, 2016. The authors concluded that multidisciplinary collaboration in trial design and execution is essential to accrual success, with attention paid to ensuring and communicating potential trial benefits to enrolled and future patients. Two projects involving different white paper concepts are progressing. Richard Wahl has circulated different iterations of a manuscript (STandards In Reporting Quantitative Imaging, STIRQI) that hopes to set reporting guidelines for studies that utilize quantitative imaging. John Buatti and Hui-Kuo Shu are currently preparing a draft of the Quantitative Imaging in Radiation Oncology manuscript which will be ready to circulate in the next few months.

A panel session was presented at the American Society for Therapeutic Radiology and Oncology (ASTRO) Annual Meeting, September 25-28, 2016 in Boston, MA. John Buatti, Hugo Aerts, Yue Cao and Hui-Kuo Shu gave talks that highlighted the QIN and their work in quantitative imaging with relevance to radiation oncology. The 2018 ASTRO will be targeted for similar panel session with proposal submission in late 2017 to the organizing committee. Continued efforts at introducing the QIN and mature QIN tools to the NCTN groups and facilitating communications will be made by targeting presentations for specific meetings in this coming year.

The CTDDWG is assisting Richard Wahl in completion of an Auto-PERCISTTM variance test to determine inter-institutional reproducibility of FDG-PET assessments with the Auto-PERCISTTM software. The test officially launched in February 2017 with software installed at 15 institutions in the US, Asia, and Europe. 30 paired sets of anonymized FDG-PET scans will be evaluated at each site with completion expected within a few months.

The PathCT initiative, an effort to facilitate translation of QIN-developed tools to NCTN clinical trials, has been a major focus of the CTDDWG over the past year and will continue to dominate the working group agenda in this coming year. This effort started with general discussions at the QIN Face-to-Face Annual meeting in April 2016 and continued with discussions during monthly teleconferences. A one day QIN-NCTN Planning Meeting was held in December 2016 to discuss ideas and opportunities where quantitative imaging could play a key role in NCTN trials. An updated list of QIN tools with information about readiness for deployment was compiled by Lori Henderson and circulated in January 2017. Finally, a PathCT Focus group was initiated to review QIN tool readiness and facilitate in-depth discussions with corresponding PIs to discuss ideas to help bring their tools into clinical trials. The core PathCT Focus group met in February 2017 with plans to discuss mature tools with specific PIs in upcoming monthly meetings. A significant part of the agenda for the Face-to-Face meeting in April 2017 will involve discussions related to this PathCT initiative.



Bioinformatics / IT and Data Sharing Working Group Report

Introduction: Over the past year, the BIDS WG has primarily focused on pipeline technology. A description of the steps performed during the conduct of an experiment is required as a part of good science—one should be able to duplicate the research that is described. In the quantitative imaging world, if any image processing is required, it is critical to assure that the steps are done in the same way. Some of the steps may be quite complex and different groups may have expertise in one step of the total analysis workflow, but not expertise in every step of the workflow. For this reason, the QIN Bioinformatics/IT and Data Sharing (BIDS), WG has identified pipelines as its major activity for the past year. Other activities have included defining the required data elements for TCIA submissions, including recommended common data elements (CDEs) and associated issues of accepting ECOG-ACRIN data.

Most scientists that do image analysis, use a ‘pipeline’, which can be defined as a series of processing steps applied to image(s) to produce the output metric. QIN members, and scientists more generally, would benefit significantly if we agreed on the best tools for the various steps of a pipeline. But before we can get to that point, it is critical to be able to easily compare different options for a particular step and assess the impact. Therefore, it is essential that we define a way to easily replace one module/step with another.



Deliverable 1: Pipeline module interoperability

The first deliverable of the BIDS WG pipeline project is to define a way to ‘wrap’ an analysis module in a standard way, such that different options can be swapped in and out. The tools themselves will be developed in many different labs, with different computing environments. Increasingly, ‘Dockers’ is being accepted as a way to let tools run in an environment they were developed in, regardless of the ‘host’ computer that is actually running it. While Dockers does address the operating systems and libraries issues, it does not address format of data going in and out. Therefore, one must also define the format of data and parameters are passed to the Docker, and the type of data that come out. One straightforward approach to data passing is to use directories that are shared. This is efficient and straightforward. One can also use ‘streams’ to pass data, which may be less efficient but can be more flexible and might assist in secure management of the data.

The data format is also critical, and each tool tends to prefer a given format. Therefore, conversion steps are likely, but more important, it is critical that the need for conversion, and the correct type of conversion must be known when the pipeline is created. In many realms, graphical tools to create and update workflows is viewed as an important capability, and may be useful when more complex workflows are required.

Deliverable 2: Pipeline Execution and Management System

Once we have a collection of interoperable tools, we can consider how to execute them to answer scientific questions. The simplest approach is to use some scripting language and simply execute one module after the other. For simple tasks, this may be the best option.



However, as an analysis becomes more complex, the script option begins to suffer. Since the execution of a pipeline is effectively a ‘workflow’, we can leverage technologies developed to perform workflows—these are often referred to as Workflow Engines. Some of the properties that are used for evaluating workflow engines that might apply to QIN Pipeline Execution Tool are described next.

  1. Ease of creation & maintenance. If the pipeline is simple, and text editor to produce a script may be enough. However, if the data compatibility is not handled within the module, then the script creator ideally would signal any need for data conversion. Workflow engines often have a graphical workflow creator that is then converted to a text workflow description. The benefit of graphical workflow is that it may be easier to create and maintain, particularly as workflows become more complex.

  2. Computational efficiency. Some of the tasks within a pipeline can consume significant computational resources, and the pipeline system should not impose s significant performance penalty.

  3. Security. Security has many flavors, and can reflect concerns about compromise of PHI (protected health information)—this may be less if data is de-identified, but better privacy is generally a good thing. Security also includes data reliability and protection from data loss (e.g. software-, hardware-, or human-caused loss of data)

  4. Support for flexible computing. Because some tasks can require significant or unique types of computation, the pipeline should support computation models like remote (‘cloud’) computing and grid computing.

  5. Ability to monitor and manage execution. Looping and conditional execution. The former is useful when one wishes to try a range of settings for a given module.

  6. Timeboxing. This is a workflow engine feature that lets one define an action to be taken if a group of steps does not complete in a defined period of time. This can be useful for monitoring the system and creating alerts if a process might be ‘hung’.

  7. Shareable. Since one major goal of QIN is to develop reproducible quantitative imaging methods, it is essential that any QI method requiring a pipeline of steps can be executed anywhere, ideally by simply copying the pipeline and running it on some standard host

There was a discussion about creating a ‘Pipeline Challenge’ but the list of criteria by which an entry would be scored (e.g. the properties listed above) were not equally applicable to all problems that might be encountered. As a result, the group decided that a ‘challenge’ was not the most appropriate activity at this time. Rather, the WG decided that doing a demonstration of some pipeline options could be useful for the purpose of clarifying the general value of the above properties, and also for educating those not familiar with pipeline technology.

In the demonstration, we will create a VERY simple pipeline that represents some of the common steps. We will should how such a pipeline can be created graphically and monitored. We will also show exchanging one popular tool for a step with another tool. We aim to use this to generate discussion on the various properties that should be considered when pipeline options are selected. It is NOT the intent of the demonstration to decide the criteria, let alone the preferred wrapper technology or pipeline technology. That will likely be the major activity of the BIDS WG over the coming year, based on feedback and ongoing discussions of QIN.



TCIA Data Submissions

Another area of work is to define the minimum required metadata for a submission to TCIA, as well as the preferred format for the metadata. This work is still in progress, with current effort to understand metadata formats used in other NCI areas.




QIN PET-CT Subgroup – Overview of Activities
L. Hadjiyski1, D. Goldgof 2, M. McNitt-Gray3, R. Beichel4, S. Nameh5, Y. Balagurunathan2, J. Kalpathy-Cramer6, B. Zhao7, S. Napel8, J. C. Sieren9, I. Yeung10, M. Muzi4, H. Aerts11, R. Gillies2
1University of Michigan, 2Moffitt Cancer Center/ University of South Florida, 3University of California Los Angeles, 4University of Iowa, 5Memorial Sloan Kettering Cancer Center, 6Massachusetts General Hospital, 7ColumbiaUniversity Medical Center, 8Stanford University, 9University of Iowa, 10Princess Margaret Cancer Center, 11Dana Farber Cancer Institute
Introduction: The purpose of this poster is to describe the activities of the PET-CT subgroup of the Image Analysis and Performance Metrics Group during the past year (January 2016 - present). The mission of the PET-CT group is to provide guidance, coordination, consensus building, and awareness regarding the development of algorithms and methods for quantitative analysis of tumors, related tissues and organs, and changes in response to disease progression and treatment, as well as to influence the development of sharable objective methods and metrics for assessment of image analysis accuracy, reproducibility, and robustness.
Methods: In the PET-CT subgroup, there are four active challenges. (1) CT Feature Comparison Challenge using Moist Run Data (Hosted by USF/Moffitt CC). The goal of this challenge is to investigate the sensitivity of quantitative descriptors of pulmonary nodules to segmentations and to illustrate comparisons across different feature types and features computed by different implementations of feature extraction algorithms. Seven QIN teams obtained the data set, including all images and segmentations, and each computed their own set of features for each of 468 segmentations. (2) Lung Nodule Interval Segmentation Challenge using NLST data (Hosted by USF/Moffitt CC). The goal of this challenge is to study the variability of segmentation methods in estimating the size of the pulmonary nodules on scans of the same patient at two different time instances. We have 5 participating sites with 2 additional sites participating with their analytics expertise. All teams have segmented the challenge data sets and reported the segmentation masks. (3) PET Segmentation Challenge (Hosted by Iowa) The goal of this project is to perform segmentations on PET scans of phantoms and patient scans of head and neck tumors (HNC) to assess the bias and variability and to determine the impact on derived quantitative imaging measures. Seven QIN teams participated in the challenge providing 806 phantom insert and 641 lesion segmentations. (4) Dynamic PET FMISO Challenge (Hosted by MSKCC). The goal of this challenge is to assess the inter-observer variability in the compartmental kinetic analysis (CKA) of 18F-Fluoromisonidazole (FMISO) dynamic PET images. MSKCC have shared static FDG PET/CT and dynamic FMISO PET data with 5 CKA experts from 4 QIN sites.

Results: (1) 68% of the 830 features in this study had a concordance CC of ≥0.75. At a correlation threshold of 0.75 and 0.95, there were 75 and 246 uncorrelated feature subgroups, respectively, providing a measure for the features’ redundancy [1]. (2) The similarity in segmentation between all 5 sites had wide ranges, with a mean of 0.48. The concordance in volume and volume change estimates between any two sites ranged [0.71 to 0.95] and [0.15 to 0.89] respectively. The prediction accuracy (AUC) of the sites based on volume change ranged [0.64 to 0.82]. (3) On the phantom test set, the mean relative volume errors ranged from 29.9 to 87.8% and the repeat difference for each institution ranged between -36.4 to 39.9%. On the HNC test set, the mean relative volume error ranged between -50.5 to 701.5%, and the repeat difference for each institution ranged between -37.7 to 31.5% [2]. (4) Strong Interclass correlation (ICC>0.9) was measured for all kinetic rate constants (KRCs). Similarly, strong Pearson correlation (R>0.75, P<0.001) was observed among the operators for all KRCs.


Conclusions: (1) Having a common set of reference images, well-specified objects and existing object masks allowed to focus on feature computation, its sensitivity to segmentation results, and the associations among specific features. (2) While overall prediction accuracy was comparable between both manual and automatic segmentations, agreement of cancer status classification on specific nodules varied from algorithm to algorithm. (3) Analysis results underline the importance of PET scanner reconstruction harmonization and imaging protocol standardization for quantification of lesion volumes.

(4) KRCs were mostly reproducible when CKA was carried out by multiple operators. A major source of error is the target volume definition which impacts the corresponding Time Activity Curve. The challenges (1), (3), and (4) are completed. The challenge (2) is at the advanced inference phase and will be completed during the 2nd quarter of 2017.
[1] Kalpathy-Cramer J, Mamomov A, Zhao B, Lu L, Cherezov D, Napel S, Echegaray S, Rubin D, McNitt-Gray M, Lo P, Sieren JC, Uthoff J, Dilger SK, Driscoll B, Yeung I, Hadjiiski L, Cha K, Balagurunathan Y, Gillies R, Goldgof D Radiomics of Lung Nodules: A Multi-Institutional Study of Robustness and Agreement of Quantitative Imaging Features. Tomography. 2016 Dec;2(4):430-437. doi: 10.18383/j.tom.2016.00235.
[2] Beichel RR, Smith BJ, Bauer C, Ulrich EJ, Ahmadvand P, Budzevich MM, Gillies RJ, Goldgof D, Grkovski M, Hamarneh G, Huang Q, Kinahan PE, Laymon CM, Mountz JM, Muzi JP, Muzi M, Nehmeh S, Oborski MJ, Tan Y, Zhao B, Sunderland JJ, Buatti JM. “Multi-site quality and variability analysis of 3D FDG PET segmentations based on phantom and clinical image data”. Medical Physics. 2017; 44(2): 479-496.
Standardizing Radiomic Feature Descriptions for Quantitative Imaging:

A Preliminary Report of the Cooperative Efforts of the NCI’s QIN PET-CT Subgroup
Jayashree Kalpathy-Cramer1, Binsheng Zhao2, Dmitry Goldgof3, Sandy Napel4, Daniel Rubin4, Michael McNitt-Gray5, Jessica C. Sieren6, Ivan Yeung7, Lubomir Hadjiiski8, Yoganand Balagurunathan9;

1Massachusetts General Hospital, Boston, Massachusetts; 2Columbia University Medical Center, New York, New York; 3University of South Florida, Tampa, Florida; 4Stanford University, Stanford, California; 5University of California Los Angeles, Los Angeles, California; 6University of Iowa, Iowa City, Iowa; 7Princess Margaret Cancer Center, Toronto, Ontario, Canada; 8University of Michigan, Ann Arbor, Michigan; and 9Moffitt Cancer Center, Tampa, Florida
Purpose: Medical imaging is one of the largest sources of “Big Data” in the world and yet most of the data is in the form of large unstructured objects, making these data largely inaccessible. Radiomics, “the high-throughput extraction of large amounts of image features from radiographic images,” has been used to provide quantitative descriptors for the regions of interest that are the basis for the classification and prediction tasks in radiology and oncology (e.g., tumor characterization for diagnostic purposes). However, there are substantial challenges to comparing and reproducing results across sites and studies for several reasons. The purpose of this work was to begin to address one of those issues by starting to standardize the terminology to describe imaging features.

Methods: These initial efforts were driven by a cooperative project performed by the QIN’s PET-CT group in which members investigated the sensitivity of quantitative descriptors of pulmonary nodules to variations in segmentations and explored the relationships between computed features across feature types, and across features computed by different algorithms [1]. To do this, a reference set of images and specified segmentations of nodules from a previous study [2] was provided to each site. For each nodule and each segmentation, each site performed its feature calculations and reported them back to the group, along with a feature dictionary describing the features being calculated. From these, we defined a set of feature classes that included: size, intensity, global shape descriptors (GSDs), local shape descriptors (LSDs), margin and texture features. We further divided some of the feature classes (e.g., texture) into sub-classes such as: Gray Level Co-occurrence Matrix (Haralick), Laplacian of Gaussian, Law’s, Run Length, and Wavelet-based features. Participants provided this feature “class” and “subclass” information as part of the feature dictionary along with other attributes as necessary, including whether features were calculated in 2D or 3D, and whether features were multiscale (and if so, how many).

We used the repeated measures Concordance Correlation Coefficient (CCC) to assess repeatability and reproducibility of features to segmentations. In order to examine the correlations amongst features produced at different institutions, we calculated associations using the correlation coefficients (CC) between all pairs of features. We performed analyses designed to determine if there were strong correlations between similarly named features (e.g., volume) across different participant’s implementations. In addition, we examined correlations across features and feature classes both within and across institutions to identify unique, uncorrelated features

Results: As expected, some features demonstrated high correlations between sites and implementations. For example, tumor volume had correlation coefficients between all pairs of sites that were between 0.9999 and 1. Other common features such as intensity based mean, standard deviation, median, kurtosis and skewness were calculated by many participants, and were highly correlated between many pairs of participants. Features from a similar class, such as texture features based on GLCMs, were also highly correlated amongst themselves. However, in some cases, features having the same name (e.g., GLCM contrast) calculated by one pair of sites showed very poor correlation while those calculated by a different pair of sites showed good correlation, underscoring the importance of this kind of analysis.

Conclusion: Some quantitative imaging features are relatively straightforward and their definitions are reasonably standardized. However, there are many more with substantial complexity and subtlety to their implementations that result in quantitative differences, even for the same feature name. This effort will continue to explore approaches to reduce ambiguity between feature descriptions in an effort toward standardization, which should allow more direct comparisons, and ultimately allow pooling of radiomics features across sites.

[1] Kalpathy-Cramer J, Mamomov A, Zhao B, Lu L, Cherezov D, Napel S, Echegaray S, Rubin D, McNitt-Gray M, Lo P, Sieren JC, Uthoff J, Dilger SK, Driscoll B, Yeung I, Hadjiiski L, Cha K, Balagurunathan Y, Gillies R, Goldgof D Radiomics of Lung Nodules: A Multi-Institutional Study of Robustness and Agreement of Quantitative Imaging Features. Tomography. 2016 Dec;2(4):430-437. doi: 10.18383/j.tom.2016.00235.



[2] Kalpathy-Cramer J, Zhao B, Goldgof D, Gu Y, Wang X, Yang H, Tan Y, Gillies R, Napel S. A comparison of lung nodule segmentation algorithms: methods and results from a multi-institutional study. J Digit Imaging. 2016;29(4):476–487.


Yüklə 4,95 Mb.

Dostları ilə paylaş:
1   2   3   4   5   6   7   8   9   10




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə