SSS - Sequence Similarity Search service

HGC Sequence Similarity Search service manual

Execute several sequence similarity search programs against various biological sequence databases supported at Human Genome Center.

Manual pages

NAME

	transw - compare two DNA sequences through translating to protein
	         sequences
	transq - compare DNA sequences to a protein sequence directly
	transt - compare a protein sequence to a DNA sequence dorectly
	         (inverse of transq)

SYNOPSIS

	transw [-m SMATRIX -b # -d # -f -I] database1 [database2] ... 
		[database5] query-file
	transq [-m SMATRIX -b # -d # -f -I] database1 [database2] ... 
		[database5] query-file
	transt [-m SMATRIX -b # -d # -f -I] database1 [database2] ...
		[database5] query-file

DESCRIPTION

	TRANS is the sequence comparing method employed by the programs
	transw, transq and transt.  Since these programs are based on
	Smith-Waterman-algorithm, these allow any gaps (deletions and
	insertions) into alignments. With using these programs, global
	alignment between two sequences do not have to be separate into
	several segments even if occuring some frame- shift error in each
	sequence.
	
	The TRANS family has some distinctive features. In the process of
	translation, this method do not translate one nucleotide sequence
	into 6 frame sequences (both strands).  This method makes only two
	translated amino acid sequence, that are + and - strands.  Such two
	translated amino acid sequence are used calculating the homologous
	score using Smith-Waterman-like algorithm.  As these programs are
	made for the massively parallel computer(HITACHI SR2201), when
	running these, one should use the commands for the parallel
	computer.

	For example;

		%prun -p partition -n process-number trans[w,q,t] [options]...
  
	The programs of the TRANS family are described as follows:
	
		transw  compares a nucleotide query sequence against a
			nucleotide sequence database through each translated
			amino acid sequence. In case of this method, each
			sequence is considered both strands, so four
			assortments of strands are calculated.
			
		transq  compares a nucleotide query sequence against a
			protein sequence database directly, through query
			sequence is translated into an amino acid sequence
			in both strands.
		
		transt  compares an amino acid sequence against a nucleotide
			sequence database, through translating each a
			nucleotide sequence into amino acid in both strands.
			 
	 For query sequence and database, all programs need FASTA format
	 file, that should start with a comment line which starts with a
	 '>', e.g.
	 
	 	>sp:104K_THEPA 104 KD MICRONEME-RHOPTRY ANTIGEN.
		MKFLILLF...
		>sp:10KD_VIGUN 10 ...
		...

	Any other files are not needed in using these methods.
	
	When using TRANS programs, one can choose less than 5 databases.
	For example, if one would choose two databases, please type the
	follows:

		trans[w,q,t] [options] database1 database2 query-file
	
	The TRANS family include statistical calculation. After calculating
	the homorogous score, these programs start calculating P value, Z
	value and E value.  In displaying the result of the calculation, one
	can see the histgram of the homologous score's distribution.  After
	showing the histogram, the TRANS would show the top scoring list of
	the database's sequences and the alignments of the top scoring
	sequences.
	 	

	Following options are common in the TRANS family.
	
	-m str

		(SMATRIX) the filename of an alternative scoring matirix
		file.  The default scoring matrix file is BLOSUM62.

	-b #

		The number of top scoring sequence to be shown. Normally,
		the TRANS family shows the top scoring 30 sequences.  By
		specifying this option, e.g.,

			-b 50

		one would see the top scoring 50 sequences.

	-d #

		The number of alignments to be shown. Normally, the TRANS
		family shows alignments for the 5 best scores.  By
		specifying this option, e.g.,

			-d 50

		one would see alignments for 50 best scores.

	-f

		By specifying this parameter, transq and transw calculate
		only the forward strand of query sequence and transt
		calculates only the forward strand of database's
		sequences. Normally, transq and transw calculate the both
		strands of the query sequences and transt calculates the
		both strands of the database's sequences.
		
	-I

		By spacifying this parameter, transq and transw calculate
		only the backward strand of the query sequence and transt
		calculates only the backward strand of the database's
		sequences. Normally, transq and transw calculate the both
		strands of the query sequence and transt calculates the both
		strands of the database's sequences.

AUTHOR

	  Tetsuo Nisikawa
	  nisikawa@crl.hitachi.co.jp 


Allowed combinations

Some programs have a limitation in combinations between query-target sequence types (aa: amino acid sequence, nt: nucleic acid sequence).

  • BLAST: aa-aa, aa-nt, nt-aa, nt-nt
  • FASTA: aa-aa, aa-nt, nt-aa, nt-nt
  • SSEARCH: aa-aa, nt-nt
  • EXONERATE: aa-aa, aa-nt, nt-aa, nt-nt
  • TRANS: aa-nt, nt-aa, nt-nt

The University of Tokyo The Institute of Medical Science

Copyright©2005-2012 Human Genome Center