BLAST programs

Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) “Basic local alignment search tool.” J. Mol. Biol. 215:403-410. PubMed

Gish, W. & States, D.J. (1993) “Identification of protein coding regions by database similarity search.” Nature Genet. 3:266-272. PubMed

Madden, T.L., Tatusov, R.L. & Zhang, J. (1996) “Applications of network BLAST server” Meth. Enzymol. 266:131-141. PubMed

Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D.J. (1997) “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.” Nucleic Acids Res. 25:3389-3402. PubMed

Zhang Z., Schwartz S., Wagner L., & Miller W. (2000), “A greedy algorithm for aligning DNA sequences” J Comput Biol 2000; 7(1-2):203-14. PubMed

Zhang, J. & Madden, T.L. (1997) “PowerBLAST: A new network BLAST application for interactive or automated sequence analysis and annotation.” Genome Res. 7:649-656. PubMed

Morgulis A., Coulouris G., Raytselis Y., Madden T.L., Agarwala R., & Schäffer A.A. (2008) “Database indexing for production MegaBLAST searches.” Bioinformatics 15:1757-1764. PubMed

Camacho C., Coulouris G., Avagyan V., Ma N., Papadopoulos J., Bealer K., & Madden T.L. (2008) “BLAST+: architecture and applications.” BMC Bioinformatics 10:421. PubMed

Boratyn GM, Schäffer AA, Agarwala R, Altschul SF, Lipman DJ, & Madden T.L. (2012) “Domain enhanced lookup time accelerated BLAST.” Biol Direct. 2012 Apr 17;7:12. PubMed

Boratyn GM, Thierry-Mieg J, Thierry-Mieg D, Busby B, Madden T.L. (2019) “Magic-BLAST, an accurate RNA-seq aligner for long and short reads.” BMC Bioinformatics. 2019 Jul 25;20(1):405. PubMed

Reviews, improvements and useful introductions

Altschul, S.F., Boguski, M.S., Gish, W. & Wootton, J.C. (1994) “Issues in searching molecular sequence databases.” Nature Genet. 6:119-129. PubMed

McGinnis S., & Madden T.L. (2004) “BLAST: at the core of a powerful and diverse set of sequence analysis tools.”Nucleic Acids Res. 32:W20-W25. PubMed

Ye J., McGinnis S, & Madden T.L. (2006) “BLAST: improvements for better sequence analysis.”Nucleic Acids Res. 34:W6-W9. PubMed

Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, & Madden T.L. (2008) “NCBI BLAST: a better web interface” Nucleic Acids Res. 36:W5-W9. PubMed

Boratyn GM, Camacho C, Cooper PS, Coulouris G, Fong A, Ma N, Madden TL, Matten WT, McGinnis SD, Merezhuk Y, Raytselis Y, Sayers EW, Tao T, Ye J, & Zaretskaya I. (2013) “BLAST: a more efficient report with usability improvements.”Nucleic Acids Res. 41:W29-W33. PubMed

Shiryev SA<sup>1</sup>, Papadopoulos JS, Sch&auml;ffer AA, Agarwala R. (2007) “Improved BLAST searches using longer words for protein seeding.”Bioinformatics 23(21):2949-51 PubMed

Madden, T.L., Busby B., Ye J. (2018) “Reply to the paper: Misunderstood parameters of NCBI BLAST impacts the correctness of bioinformatics workflows.”Bioinformatics. DOI: 10.1093/bioinformatics/bty1026. PubMed

Sequence filtering

Wootton, J.C. &amp; Federhen, S. (1996) “Analysis of compositionally biased regions in sequence databases.” Meth. Enzymol. 266:554-571. PubMed

Wootton, J.C. &amp; Federhen, S. (1993) “Statistics of local complexity in amino acid sequences and sequence databases.”Comput. Chem. 17:149-163.

Alignment scoring systems

Dayhoff, M.O., Schwartz, R.M. &amp; Orcutt, B.C. (1978) “A model of evolutionary change in proteins.” In “Atlas of Protein Sequence and Structure, vol. 5, suppl. 3.” M.O. Dayhoff (ed.), pp. 345-352, Natl. Biomed. Res. Found., Washington, DC.

Schwartz, R.M. &amp; Dayhoff, M.O. (1978) “Matrices for detecting distant relationships.” In “Atlas of Protein Sequence and Structure, vol. 5, suppl. 3.” M.O. Dayhoff (ed.), pp. 353-358, Natl. Biomed. Res. Found., Washington, DC.

Altschul, S.F. (1991) “Amino acid substitution matrices from an information theoretic perspective.” J. Mol. Biol. 219:555-565. PubMed

States, D.J., Gish, W., Altschul, S.F. (1991) “Improved sensitivity of nucleic acid database searches using application-specific scoring matrices.” Methods 3:66-70.

Henikoff, S. &amp; Henikoff, J.G. (1992) “Amino acid substitution matrices from protein blocks.” Proc. Natl. Acad. Sci. USA 89:10915-10919. PubMed

Altschul, S.F. (1993) “A protein alignment scoring system sensitive at all evolutionary distances.” J. Mol. Evol. 36:290-300. PubMed

Alignment statistics

Altschul, S.F. &amp; Gish, W. (1996) “Local alignment statistics.” Meth. Enzymol. 266:460-480. PubMed

Karlin, S. &amp; Altschul, S.F. (1990) “Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.” Proc. Natl. Acad. Sci. USA 87:2264-2268. PubMed

Karlin, S. &amp; Altschul, S.F. (1993) “Applications and statistics for multiple high-scoring segments in molecular sequences.” Proc. Natl. Acad. Sci. USA 90:5873-5877. PubMed

Dembo, A., Karlin, S. &amp; Zeitouni, O. (1994) “Limit distribution of maximal non-aligned two-sequence segmental score.” Ann. Prob. 22:2022-2039.

Altschul, S.F. (1997) “Evaluating the statistical significance of multiple distinct local alignments.” In “Theoretical and Computational Methods in Genome Research.” (S. Suhai, ed.), pp. 1-14, Plenum, New York.

Schaffer AA, Aravind L, Madden TL, Shavirin S, Spouge JL, Wolf YI, Koonin EV, Altschul SF. (2001) “Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements.” Nucleic Acids Res. 2001 Jul 15;29(14):2994-3005. PubMed

Park Y, Sheetlin S, Ma N, Madden TL, &amp; Spouge JL. (2012) “New finite-size correction for local alignment score distributions.” BMC Res Notes. 2012 Jun 12;5:286. PubMed

Programs that use blast

Ye J, Coulouris G, Zaretskaya I, Cutcutache I, Rozen S, &amp; Madden TL. (2012) “Primer-BLAST: a tool to design target-specific primers for polymerase chain reaction.” BMC Bioinformatics 13:134. PubMed

Ye J, Ma N, Madden TL, &amp; Ostell JM. (2013) “IgBLAST: an immunoglobulin variable domain sequence analysis tool.” Nucleic Acids Res. 2013 Jul;41:W34-W40. PubMed

Papadopoulos JS1 &amp; Agarwala R. (2007) “COBALT: constraint-based alignment tool for multiple protein sequences.” Bioinformatics 23(9):1073-9. PubMed