Tez Türü: Yüksek Lisans
Tezin Yürütüldüğü Kurum: İstanbul Üniversitesi-Cerrahpaşa, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü, Türkiye
Tez Danışmanı: Özgür Can Turna
Tezin Onay Tarihi: 2021
Tezin Dili: İngilizce
Özet:
Cancer
is the name of all the diseases related to the uncontrolled cell proliferation in
a tissue or organ. It stems from the molecular alterations within the cells
leading the intracellular mechanisms to deviate from its normal functioning.
Intracellular functions are carried out by proteins. The molecular changes may
cause over- or under-synthesis of some proteins. Therefore, the proteins that
are produced in abnormal amount may disrupt many cellular functions and cause
the cells proliferate aberrantly leading to a tumor constitution. Cancerous cells
may undergo consecutive molecular alterations. Thus, several types of cancerous
cell groups emerge within the same tumor. Intra-Tumor Heterogeneity (ITH)
refers to the distinct groups of cells that a single tumor comprises. ITH is
found to be associated with numerous prognostic factors including survival,
tumor advancement, metastasis, immunity, therapeutic resistance, and drug
response. Therefore, it is essential to quantify ITH to draw inferences about
disease prognosis. Previously, ITH had been determined by visual examination of
tumor samples. However, thanks to Next Generation Sequencing, which is a recent
sequencing technology yielding various types of data regarding genomic,
epigenomic and proteomic information of patients, many researchers are allowed
to study on the determination of ITH through data science. There are the
studies evaluating ITH according to merely gene expression and DNA mutation
data. Besides, these studies are limited to only some types of cancer. This study
proposes a novel approach to utilizing genomic, epigenomic and proteomic data
sets for the purpose of establishing relationships with ITH-associated
features. Owing to that survival is strongly associated with ITH, survival
analysis is conducted by using the data sets that are transformed in such a way
that they represent the overall aberrancy level of the tumor samples. This
study aims to comprehend various molecular datasets including gene expression,
DNA methylation, protein synthesis, CNV and SNV data. As it is based on
multi-omics data and is a pan-cancer study, this study is expected to make
significant contributions to the literature by spanning hitherto unfocused data
types and cancer types.
Furthermore,
machine learning models are developed in order to predict the pre-calculated
subclone numbers by using the transformed values of the datasets. Subclone
numbers are determined based on tumor image data or mutation data. The
approaches evaluating subclone numbers based on mutation data display significantly
different results. For this reason, it is suggested to include more
comprehensive data sets to produce preferable estimations. Besides, distinct
data types such as DNA methylation and protein synthesis data have not been
used to infer the subclone numbers so far. Therefore, multi-omics approaches
are considered as potentially significant methods in estimating subclone
numbers, rather than single molecular datasets. As it predicts the subclone
numbers based on gene expression, DNA methylation, protein synthesis, CNV and
SNV data, this study is expected to be a significant research for the
literature.
The
results demonstrate that, the features calculated by the proposed method are
strongly associated with the overall survival in several cancer types and
pan-cancer scale. Besides, ensemble methods successfully predict the subclone
numbers with > 0.8 R-squared score. It is suggested for further studies to
focus on the validation of the transformation technique by applying them on
different cancer data sets.