Imprimir Resumo


Congresso Brasileiro de Microbiologia 2023
Resumo: 484-2

484-2

HYPOTHETICAL PROTEINS ANNOTATION OF THE CYANOBACTERIA GEMINOCYSTIS GBBB08 AND NOSTOC GBBB01

Autores:
Anna Letícia Silva da Costa (UFMA - UNIVERSIDADE FEDERAL DO MARANHÃO) ; Iolanda Karoline Barros dos Santos Rocha (UFMA - UNIVERSIDADE FEDERAL DO MARANHÃO) ; Jose Isaias Pimentel Barros (UFMA - UNIVERSIDADE FEDERAL DO MARANHÃO) ; Hivana Patricia Melo Barbosa Dall'agnol (UFMA - UNIVERSIDADE FEDERAL DO MARANHÃO) ; Ana Carolina de Araújo Butarelli (USP - UNIVERSIDADE DE SÃO PAULO) ; Lucas Salomão de Sousa Ferreira (UNDB - CENTRO UNIVERSITÁRIO DOM BOSCO) ; Leonardo Teixeira Dall'agnol (UFMA - UNIVERSIDADE FEDERAL DO MARANHÃO)

Resumo:
The Chapada das Mesas National Park (CMNP), located in the south of Maranhão state, is a representation of Cerrado biome with a poorly known diversity. Cyanobacteria have the capacity to adapt to different conditions, due to their phenotypic and ecological plasticity. Predominant in this ecosystem, these microorganisms play key roles such as nitrogen fixation and being important producers in the trophic network. Despite the great ecological and biotechnological importance, the community of cyanobacteria of the CMNP is little studied. In this work, the coding regions (CDs) of the genomes of two species of cyanobacteria from this area, Geminocystis GBBB08 and Nostoc GBBB01, vailable under accession number ASM1674322v1 and ASM1674321v1, were analyzed. Analyzes of protein domains or families were performed using the databases CDD, SMART, InterPro, CATH, Superfamily, PROSITE, HMMER and MOTIF; physical-chemical properties were measured using Expasy's ProtParam Server; subcellular localization using PSORTb v.3.0.2 and PSLPred; SecretomeP 2.0 and SinalP 5.0 servers for prediction of secretory proteins and signal peptide cleavage sites; Prediction of transmembrane helices and topology were performed by HMMTOP, TMHMM v.2.0 and SOSUI. To measure the performance of plotting data quantities, a ROC curve was performed using the Online ROC Curve Calculator. The CDs were characterized as HPs (hypothetical proteins) and HCs (hypothetical high-confidence proteins). 1444 HPs were recovered, being 187 HCs from the Nostoc genome and 673 HPs, being 162 HCs from Geminocystis. Among the common domains between these two species is the AAA+, NTPases, GTPases, PKS, ChaB and YCII, acting in several cellular processes such as nucleotide binding, transcription process, transport and enzymatic complexes. GBBB01 features unique domains like LAP, WD40 and COP23. GBBB08 holds SWIM, STAS and NUDIX as unique genome domains. The physical-chemical parameters indicated that most of the HCs were hydropathic and more stable. More than half of the proteins were from the cytoplasmic domain and the smallest percentage were from the periplasmic domain. Furthermore, more than 40% of the proteins are involved in secretory activity and around 7% of the proteins in each genome contain a signal peptide. The mean accuracy of the databases through the ROC curve resulted in an area under the ROC curve of 0.974 for GBBB08 and the PROSITE tool with greater accuracy; for GBBB01, the ROC area was 0.8436 and CDD was the most accurate tool. This study made it possible to identify regions and respective functions of uncharacterized proteins through the automatic annotation of this genome, contributing to a deeper understanding of the metabolic and ecological potential, through an integrated approach.

Palavras-chave:
 Bioactive Compounds, Function Predication, Genomic Analysis, ROC Curve, Secondary Metabolites


Agência de fomento:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Fundação de Amparo à Pesquisa e ao Desenvolvimento Científico e Tecnológico do Maranhão (FAPEMA) and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).