Features
- The space complexity to perform the dynamic programming with the main similarity scores matrix and the 2 auxiliary gaps matrices is reduced from O(m×n) to O(n), where m and n are the sizes of the vertical sequence and horizontal sequence respectively, by using sufficient single-dimensional arrays of size n instead of the original two-dimensional arrays of size m×n.
- The two-dimensional array of size m×n, for holding the traceback directions (diagonal, left, up and stop), is mapped into a single-dimensional array of size m×n. This approach speeds up the process of memory allocation because the Java Virtual Machine (JVM) attempts to allocate a single-dimensional array of m×n "bytes" (primitive data type), instead of attempting to allocate an array of m "objects", each of which is an "array" of n bytes.
- In addition to the 70 already included scoring matrices, which have been picked up from the NCBI site, JAligner works with user-defined scoring matrices.
- It is easy to use JAligner through a friendly Graphical User Interface (GUI), simple command line syntax or reusable Programming Application Interface (API).
Usage
There are several ways to align a pair of sequences using JAligner:Command line
Where:
s1
: path to a file containing input sequence #1.s2
: path to a file containing input sequence #2.matrix
: name of a scoring matrix, or path to a file containing a user-defined scoring matrix.open
: open gap penalty.extend
: extend gap penalty.
Example:
In order to load a user-defined scoring matrix from the file system, the path to the matrix file has to include at least one file separator (a file separator flags JAligner to load the scoring matrix from the file system instead of looking it up in jaligner.jar
).
Example:
The layout of a user-defined scoring matrix file is expected to be the same as the layout of the standard scoring matrices:
- optional comment lines (a comment line starts with a number sign "#"),
- header line with the letters in the alphabet of the two sequences, and
- a line for each letter in the alphabet where each line starts with that letter followed by the substitution scores for the corresponding letters in the header line.
Java Network Launch Protocol (JNLP)
In general, JNLP-based applications require Java Web Start (JWS) to be installed on the client machine, fortunately, JWS has been bundled within the core Standard Java Edition (J2SE) since J2SE 1.4.
So assuming JWS is already installed, JAligner can be launched by visiting the XML deployment descriptor jaligner.jnlp at (http://jaligner.sourceforge.net/jaligner.jnlp) through the web browser or command line with the executable javaws
, which exists under the javaws
directory under the installation (root) directory of the Java Runtime Environment (JRE).
Example:
In jaligner.jnlp, a full permission is requested because the application needs access to:
- the system clipboard for editing (cut and paste) the input sequences,
- the file system for loading and storing the input sequences and output alignments, and
- the JVM properties: user.home, file.separator and line.separator.
But since jaligner.jar
is signed by a self-signed certificate, once the download of the JAR file is complete, JWS displays a message warning that the application is requesting a full permission and the signing certificate could not be verified, so to bypass that warning message and to start the application, it is required to click on the "Start" button in the warning message window.
Desktop
The command line to start JAligner as a desktop GUI application is
In addition, there are downloadable installers (built using ej-technologies's install4j) for the following operating systems (Linux, UNIX, Mac OS X and Windows).
Programming Application Interface (API)
Class SmithWatermanGotoh
has the public static method align
, that can be called programmatically to align two sequences.
Notes
-
The JVM uses by default a memory allocation pool of an initial size 2MB and a maximum size 64MB. Large sequences will raise the out of memory error, when the memory requirement exceeds the available space, so for such cases, it will be necessary to initialize the JVM with the proper heap size using the -Xms (the initial size) and -Xmx (the maximum size) options.
Example:
java -Xms128m -Xmx512m -jar jaligner.jar - Compiling the source code needs an implementation of the specifications of the Java Network Launch Protocol (JNLP) to be in the compilation classpath and including Java Web Start's
javaws.jar
provides the required implementation.
Licenses
- The source code is licensed under The GNU General Public License (GPL).
- This document is licensed under The GNU Free Documentation License (GFDL).
If you are using JAligner in a published work or product, please cite:
Ahmed Moustafa, JAligner: Open source Java implementation of Smith-Waterman, (http://jaligner.sourceforge.net) (the date accessed).
References
- Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981 Mar 25;147(1):195-7.
- Gotoh O. An improved algorithm for matching biological sequences. J Mol Biol. 1982 Dec 15;162(3):705-8.
Acknowledgments
I deeply appreciate all people who have contributed with questions, comments or suggestions regarding JAligner, every single feedback has been helpful and I have learned from it. I would like to express my special thanks to:
- Martin Aus: Translation of JAligner into Estonian πͺπͺ (June 2019).
- Mary Davidson: Translation of JAligner into Polish π΅π± (May 2019).
- Sandi Wolfe: Translation of JAligner into Ukrainian πΊπ¦ (March 2019).
- Pinar Cytheree: Translation of JAligner into French π«π· (June 2018).
- Artur Weber: Translation of JAligner into Portuguese π§π· (Feb 2018).
- ej-technologies: providing free license for install4j (May 2005).
- Bram Minnaert: detecting a bug in the initialization of the auxiliary matrices (October 2004), and for fixing the traceback logic and providing testing modules for testing the produced alignments against the alignment scores (March 2005)
- Hector Gonzalez: detecting a bug in the initialization of the traceback matrix (March 2004),
- Andreas Doms: detecting a bug in the traceback stopping condition and suggesting a fix that improved the performance as well (February 2004),
- Ryan Golhar: recommending changing the traceback from recursion to iteration to avoid a stack overflow problem (August 2003), and
- Tim Carver: feedbacks on the GUI layout and alignment format (July 2003).