Abstract:
Colorectal cancer (CRC), a malignancy that develops in the colon or rectum, is frequently associated with epidermal growth factor receptor (EGFR) overexpression, observed in up to 85% of cases, whereas human epidermal growth factor receptor 2 (HER2) amplification or overexpression is present in a smaller subset (approximately 2–6%, with increased prevalence in selected molecular subgroups). These molecular alterations highlight the therapeutic relevance of targeting EGFR and HER2 pathways in CRC management. Despite progress in targeted therapy development, most existing treatment approaches focus on inhibiting either EGFR or HER2 individually. These single-target therapies frequently demonstrate reduced efficacy because of alterations in downstream effectors such as the Kirsten rat sarcoma viral oncogene homolog (KRAS) and the activation of alternative signalling pathways that promote continued tumour growth. Consequently, designing therapeutic approaches that can concurrently block both EGFR and HER2 represents an essential and promising area of research.
In this research, an innovative machine learning (ML)-driven stacking ensemble framework was established to accurately identify dual EGFR and HER2 inhibitors based on Simplified Molecular-Input Line-Entry System (SMILES) representations. A comprehensive benchmark dataset comprising active and inactive compounds targeting EGFR and HER2 was compiled from the ChEMBL database. Utilising this dataset, forty baseline models were developed and fine-tuned using various molecular descriptors and ML algorithms. The predictions from these models were then integrated through logistic regression (LR) to produce a highly reliable stacking ensemble classifier.
This predictive model was further applied to natural bioactive compounds obtained from liquid chromatography–tandem mass spectrometry (LC–MS/MS) profiling of Ceratonia siliqua L. pod
9
extract, which were annotated using established spectral libraries and subsequently subjected to machine learning–guided virtual screening. The cytotoxic activity of the Ceratonia siliqua L. pod extract was confirmed experimentally using the MTT (3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyl tetrazolium bromide) assay against human colorectal carcinoma cell line (HCT116) and non-cancerous Vero cells. The extract exhibited an IC₅₀ (half maximal inhibitory concentration) value of 13.32 ± 1.09 μg/mL in HCT116 cells, underscoring its notable anti-cancer potential.
To support the experimental outcomes, molecular docking and in silico ADMET (absorption, distribution, metabolism, excretion, and toxicity) evaluations were carried out on the compounds identified from the LC–MS/MS dataset using the stacking model, alongside four Food and Drug Administration (FDA) approved anticancer drugs for comparative analysis. Among all screened molecules, NCGC00385704-01, identified from LC–MS/MS spectral data through library matching, exhibited strong dual inhibitory potential against both EGFR and HER2. Overall, this study highlights Ceratonia siliqua L. as a valuable source of potential lead molecules for colorectal cancer therapy through dual EGFR/HER2 inhibition and underscores the power of integrating computational and experimental approaches in natural product-based drug discovery.