# Algorithms in Bioinformatics: Third International Workshop, by Mohamed Ibrahim Abouelhoda, Enno Ohlebusch (auth.), Gary

By Mohamed Ibrahim Abouelhoda, Enno Ohlebusch (auth.), Gary Benson, Roderic D. M. Page (eds.)

This ebook constitutes the refereed lawsuits of the 3rd foreign Workshop on Algorithms in Bioinformatics, WABI 2003, held in Budapest, Hungary, in September 2003.

The 36 revised complete papers offered have been conscientiously reviewed and chosen from seventy eight submissions. The papers are prepared in topical sections on comparative genomics, database looking out, gene discovering and expression, genome mapping, trend and motif discovery, phylogenetic research, polymorphism, protein constitution, series alignment, and string algorithms.

Such lines are marked by an asterisk∗. Lines marked with B refer to the complementary data set B. 7 Markov chain (MC) or a hidden Markov model (HMM) [5]. In this perspective, as observed in [3], it may be advantageous to unify several ion types in one generalized ion type to better capture the consecutive fragment match pattern. For instance, one may want to consider ion types b, b-17, b-18, b++ as one general ion type B. We consider one MC and two HMMs, see Figure 1, and we denote by L2 the log-likelihood ratio of models for consecutive matches.

This is due to the basic tryptic cleavage sites (Lys, Arg), which facilitate protonation. [27] and [8] even report fragment relative length intensity dependence. A Systematic Statistical Analysis of Ion Trap Tandem Mass Spectra 33 Here we use a simple model that orders the experimental peaks by intensity and then split them into 5 bins. We obtain Lintens = L1 L3 , L3 = θ∈S L3,θ , S , a set of ion types, and L3,θ , the corresponding log-likelihood ratios. By selecting ion types for their signiﬁcance (relative entropy), we set S to (z = 1) b, b-17, b-18, y, y-17, y-18, (z = 2) b, y and (z = 3) b-17, b++ , y, y-17, y++ .

S also depends on the peptide charge state. 28 J. Colinge, A. Masselot, and J. Magnin We name the comparison of an experimental spectrum with a theoretical spectrum a match. A match can be either correct or random. From the match we compute several quantities that are then used by the score function. These quantities are modeled as random variables. It is convenient to represent them by a random vector E. The score function is intended to distinguish between correct and random matches. This problem can be viewed as an hypothesis testing problem.