These tools will no longer be maintained as of December 31, 2024. Archived website can be found here. PubMed4Hh GitHub repository can be found here. Contact NLM Customer Service if you have questions.


PUBMED FOR HANDHELDS

Search MEDLINE/PubMed


  • Title: UbNiRF: A Hybrid Framework Based on Null Importances and Random Forest that Combines Multiple Features to Predict Ubiquitination Sites in Arabidopsis thaliana and Homo sapiens.
    Author: Li X, Yuan Z, Chen Y.
    Journal: Front Biosci (Landmark Ed); 2024 May 21; 29(5):197. PubMed ID: 38812315.
    Abstract:
    BACKGROUND: Ubiquitination is a crucial post-translational modification of proteins that regulates diverse cellular functions. Accurate identification of ubiquitination sites in proteins is vital for understanding fundamental biological mechanisms, such as cell cycle and DNA repair. Conventional experimental approaches are resource-intensive, whereas machine learning offers a cost-effective means of accurately identifying ubiquitination sites. The prediction of ubiquitination sites is species-specific, with many existing models being tailored for Arabidopsis thaliana (A. thaliana) and Homo sapiens (H. sapiens). However, these models have shortcomings in sequence window selection and feature extraction, leading to suboptimal performance. METHODS: This study initially employed the chi-square test to determine the optimal sequence window. Subsequently, a combination of six features was assessed: Binary Encoding (BE), Composition of K-Spaced Amino Acid Pair (CKSAAP), Enhanced Amino Acid Composition (EAAC), Position Weight Matrix (PWM), 531 Properties of Amino Acids (AA531), and Position-Specific Scoring Matrix (PSSM). Comparative evaluation involved three feature selection methods: Minimum Redundancy-Maximum Relevance (mRMR), Elastic net, and Null importances. Alongside these were four classifiers: Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and Extreme Gradient Boosting (XGBoost). The Null importances combined with the RF model exhibited superior predictive performance, and was denoted as UbNiRF (A. thaliana: ArUbNiRF; H. sapiens: HoUbNiRF). RESULTS: A comprehensive assessment indicated that UbNiRF is superior to existing prediction tools across five performance metrics. It notably excelled in the Matthews Correlation Coefficient (MCC), with values of 0.827 for the A. thaliana dataset and 0.781 for the H. sapiens dataset. Feature analysis underscores the significance of integrating six features and demonstrates their critical role in enhancing model performance. CONCLUSIONS: UbNiRF is a valuable predictive tool for identifying ubiquitination sites in both A. thaliana and H. sapiens. Its robust performance and species-specific discovery capabilities make it extremely useful for elucidating biological processes and disease mechanisms associated with ubiquitination.
    [Abstract] [Full Text] [Related] [New Search]