Filter Proteome Discoverer DDA output
filter_features_pd_dda.RdThis function filters the output .txt files (peptide groups or PSMs) from Proteome Discoverer for DDA, based on various criteria:
Remove features without a master protein
Remove features without a unique master protein (i.e. Number.of.Protein.Groups == 1)
Remove features matching a contaminant protein
Remove features matching any protein associated with a contaminant protein (see below)
Remove features without quantification values
Usage
filter_features_pd_dda(
obj,
master_protein_col = "Master.Protein.Accessions",
protein_col = "Protein.Accessions",
unique_master = TRUE,
filter_contaminant = TRUE,
contaminant_proteins = NULL,
crap_proteins = NULL,
filter_associated_contaminant = TRUE,
remove_no_quant = TRUE,
cont_string = "Cont_"
)Arguments
- obj
SummarisedExperimentcontaining output from Proteome Discoverer. UsereadQFeaturesto read in .txt file- master_protein_col
string. Name of column containing master proteins.- protein_col
string. Name of column containing all protein matches.- unique_master
logical. Filter out features without a unique master protein.- filter_contaminant
logical. Filter out features which match a contaminant protein.- contaminant_proteins
character vector. The protein IDs form the contaminant proteins- crap_proteins
character vector. Same as contaminant_proteins. Available for backwards compatibility. Default is NULL. If both contaminant_proteins and crap_proteins are set, an error is thrown.- filter_associated_contaminant
logical. Filter out features which match a contaminant associated protein.- remove_no_quant
logical. Remove features with no quantification- cont_string
string. string to search for contaminants
Details
Associated contaminant proteins are proteins which have at least one feature shared with a contaminant protein. It has been observed that the contaminant fasta files often do not contain all possible contaminant proteins e.g. some features can be assigned to a keratin which is not in the provided contaminant database.
In the example below, using filter_associated_contaminant = TRUE will filter out f2 and f3 in
addition to f1, regardless of the value in the Master.Protein.Accession column.
Examples
if (FALSE) { # \dontrun{
#### PSMs.txt example ####
# load PD PSMs.txt output
tmt_qf <- readQFeatures(assayData = psm_tmt_total,
quantCols = 36:45,
name = "psms_raw")
# extract the UniProt accessions from the contaminant FASTA headers
contaminant_accessions <- get_contaminant_fasta_accessions(contaminant_fasta_inf)
# filter the PSMs
psm2 <- filter_features_pd_dda(
obj = tmt_qf[['psms_raw']],
master_protein_col = "Master.Protein.Accessions",
protein_col = "Protein.Accessions",
unique_master = TRUE,
TMT = TRUE,
filter_contaminant = TRUE,
contaminant_proteins = contaminant_accessions,
filter_associated_contaminant = TRUE
)
} # }