Comparison of false-discovery rates of various decoy databases

Sangjeong Lee, Heejin Park, Hyunwoo Kim

Research output: Contribution to journalArticlepeer-review

Abstract

Background: The target-decoy strategy effectively estimates the false-discovery rate (FDR) by creating a decoy database with a size identical to that of the target database. Decoy databases are created by various methods, such as, the reverse, pseudo-reverse, shuffle, pseudo-shuffle, and the de Bruijn methods. FDR is sometimes over- or under-estimated depending on which decoy database is used because the ratios of redundant peptides in the target databases are different, that is, the numbers of unique (non-redundancy) peptides in the target and decoy databases differ. Results: We used two protein databases (the UniProt Saccharomyces cerevisiae protein database and the UniProt human protein database) to compare the FDRs of various decoy databases. When the ratio of redundant peptides in the target database is low, the FDR is not overestimated by any decoy construction method. However, if the ratio of redundant peptides in the target database is high, the FDR is overestimated when the (pseudo) shuffle decoy database is used. Additionally, human and S. cerevisiae six frame translation databases, which are large databases, also showed outcomes similar to that from the UniProt human protein database. Conclusion: The FDR must be estimated using the correction factor proposed by Elias and Gygi or that by Kim et al. when (pseudo) shuffle decoy databases are used.

Original languageEnglish
Article number11
JournalProteome Science
Volume19
Issue number1
DOIs
StatePublished - 2021 Dec

Keywords

  • False discovery rate
  • Reverse decoy database
  • Shuffle decoy database
  • Target-decoy search

Fingerprint

Dive into the research topics of 'Comparison of false-discovery rates of various decoy databases'. Together they form a unique fingerprint.

Cite this