PIPS - Pathogenicity Islands Prediction Software
--------------------------------------------------------------

WHAT IS PIPS?
PIPS is a software developed intending to identify Putative Pathogenicity Islands
in Pathogenic Bacteria. Later Versions of PIPS are planned that will turn the
application faster.


INSTALLING PIPS

1) Requirements:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _Software _ _ _ _ _ _ _ _ _ _ _ Download_Link_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ On Ubuntu_ _ _ _ _ _ _ _ _ _ 
	tRNAscan-SE*		http://lowelab.ucsc.edu/tRNAscan-SE/		
	Java Virtual Machine	http://www.sun.com/java/					sudo apt-get install sun-java6-jdk
	PERL			http://www.perl.org/						sudo apt-get install perl	
	COLOMBO/SIGI-HMM*	http://www.tcs.informatik.uni-goettingen.de/colombo-sigihmm
	BLAST*			ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/2.2.9/	sudo apt-get install blast2
	HMMER3*			http://hmmer.janelia.org/					
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _Database _ _ _ _ _ _ _ _ _ _ _ Download_Link_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
	mVirDB*			http://mvirdb.llnl.gov/
	transposaseDB*		On our website
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 	
_ _ _Optional_Softwares_ _ _ _ _ _ Download_Link_ _ _ _ _ _ _ _ _ _ _ _ _(For Manual Analysis Only)__ _ _ _ _ _ _ _ _ _ _ _ _ _ 
	Artemis			http://www.sanger.ac.uk/resources/software/artemis/
	ACT			http://www.sanger.ac.uk/resources/software/act/
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
*These softwares and databases are already included in the PIPS package, with permission of their developers, and they do not need to be downloaded. Anyway, we strongly advise PIPS users to enter their websites to learn more about them

2) To install PIPS:
	a)copy the file pips.tar.gz anywhere you like, and uncompress it:
		tar -zxvf pips.tar.gz
	b)enter the folder named other_softwares and run install.sh
		cd other_softwares
		sudo ./install.sh
	c)include the folders ~/lib and ~/man in your PATH file, ex.:
		c.1)open the file /etc/profile
			sudo gedit /etc/profile
		c.2)add this line to the end of the file:
			PATH=$PATH:~/lib:~/man:
		c.3)save the file and restart your computer or type:
			source /etc/profile


3) The directory structure of PIPS is:
/pips
	pips.sh
	/bin
		all2many.pl
		blast2many.pl	
		blast2table.pl
		codus2sysid.pl
		embl2faafna.pl
		embl2fasta.pl
		embl2product.pl
		errorhandling.pl
		gbk2embl.pl
		gccontent.pl
		mergefiles.pl
		paifinder2.pl
		paifinder.pl
		plasticity2.pl
		plasticity.pl
		splits.pl
		tab2xls.pl
		transpo2table.pl
		trna2embl.pl
	/other_softwares
		/Colombo_3.8
		/tRNAscan-SE-1.23
		/hmmer-3.0-linux-intel-x86_64
		/blast
		install.sh
	/result
	/blastdb
		transposase.hmm
		virulenceDB.protein.fasta
		virulenceDB.protein.fasta.phr
		virulenceDB.protein.fasta.pin
		virulenceDB.protein.fasta.psq
		

RUNNING PIPS

To run PIPS, enter the terminal and just type 
	pips.sh [options] -p <Pathogenic bacterium to be analysed> -n <Non-pathogenic bacterium to be compared to>
inside the directory "pips".
	Ex.: ./pips.sh -p corynebacterium_diphtheriae.embl -n corynebacterium_glutamicum.embl
The options to run PIPS are:

	OPTIONS	Default	FUNCTION
	-a	1e-5	: e-value used during creation of comparison files for act, i.e., tblastx against Non-pathogenic bacterium.
	-c	0.95	: sensitivity used during identification of regions with Codon Usage deviation.
	-d	embl	: format of the input file of your non-pathogenic bacterium (embl/gbk).
	-e	1e-5	: e-value used during identification of similarity regions for plasticity analysis, i.e., blastp against Non-pathogenic bacterium.
	-f 	embl	: format of the input file of your pathogenic bacterium (embl/gbk).
	-g	1.5	: multiple of Standard Deviations used during identification of regions with G+C content deviation.
	-m	1e-5	: e-value used during identification of virulence factors, i.e., blastp against mVIRdb.		
	-t	1e-5	: e-value used during identification of transposases, i.e., hmmsearch against pfam database of transposases.
  	-h 	----	: print help




FILES

1.The two files have to be in EMBL or GBK format
2.1 The EMBL file must be like this one:
FH   Key             Location/Qualifiers
FH
FT   source          1..3282708
FT                   /organism="Corynebacterium glutamicum ATCC 13032"
FT                   /strain="DSM 20300 = ATCC 13032"
FT                   /mol_type="genomic DNA"
FT                   /note="IS fingerprint type: 4-5"
FT                   /db_xref="taxon:196627"
FT   CDS             1..1575
FT                   /transl_table=11
FT                   /gene="dnaA"
FT                   /locus_tag="cg0001"
FT                   /product="CHROMOSOMAL REPLICATION INITIATOR PROTEIN"
FT                   /db_xref="GOA:Q8NUD8"
FT                   /db_xref="HSSP:1J1V"
FT                   /db_xref="InterPro:IPR018312"
FT                   /db_xref="UniProtKB/Swiss-Prot:Q8NUD8"
FT                   /protein_id="CAF18566.1"
FT                   /translation="MSQNSSSLLETWRQVVADLTTLSQQADSGFDPLTPTQRAYLNLTKP"
		     IAIVDGYAVLSTPNAMAKNVIENDLGDALTRVLSLRMGRSFSLAVSVEPEQEIPETPAQQ
		     EFKYQPDAPVISSNKAPKQYEVGGRGEASTSDGWERTHSAPAPEPHPAPIADPEPELATP
		     QRIPRETPAHNPNREVSLNPKYTFESFVIGPFNRFANAAAVAVAESPAKAFNPLFISGGS
		     GLGKTHLLHAVGNYAQELQPGLRIKYVSSEEFTNDYINSVRDDRQETFKRRYRNLDILMV
		     DDIQFLAGKEGTQEEFFHTFNALHQADKQIILSSDRPPKQLTTLEDRLRTRFEGGLITDI
		     QPPDLETRIAILMKKAQTDGTHVDREVLELIASRFESSIRELEGALIRVSAYSSLINQPI
		     DKEMAIVALRDILPEPEDMEITAPVIMEVTAEYFEISVDTLRGAGKTRAVAHARQLAMYL
		     CRELTDMSLPKIGDVFGGKDHTTVMYADRKIRQEMTEKRDTYDEIQQLTQLIKSRGRN"
.
.
.
XX
SQ   Sequence 3282708 BP; 757463 A; 886715 C; 880753 G; 757777 T; 0 other;
     gtgagccaga actcatcttc tttgctcgaa acctggcgcc aagttgttgc cgatctcaca        60
     actttgagcc agcaagcgga cagtggattc gacccattga cgccaactca acgtgcatat       120
     ttgaacctga cgaagccgat tgccatcgtc gatggctacg ccgtgctgtc cacacccaac       180
     gcgatggcaa aaaatgtcat tgaaaacgat ttgggcgatg ctttgacccg tgtgttgtcg       240
     ctgcgcatgg gccgatcatt cagcttggct gtcagtgtgg agcctgagca ggaaattcca       300
     gaaaccccag ctcagcagga gtttaaatat cagcctgacg cacctgtgat ctcttccaac       360
     aaggcgccaa agcagtatga agttggtggt cggggagagg cgtcgacaag cgacggctgg       420
.
.
.
//

2.2 The GENBANK file must be like this one:
FEATURES             Location/Qualifiers
     source          1..2337913
                     /organism="Corynebacterium pseudotuberculosis FRC41"
                     /mol_type="genomic DNA"
                     /strain="FRC41"
                     /db_xref="taxon:765874"
     gene            231..1811
                     /gene="dnaA"
                     /locus_tag="cpfrc_00001"
     CDS             231..1811
                     /gene="dnaA"
                     /locus_tag="cpfrc_00001"
                     /note="ATPase involved in DNA replication initiation"
                     /codon_start=1
                     /transl_table=11
                     /product="chromosomal replication initiator protein"
                     /protein_id="ADK27794.1"
                     /db_xref="GI:300684872"
                     /translation="MSRRLGRQYSLAVSVHAPEENPEVSSATPDAVSDYQEQSAVSGQ
                     YGATSANADFQNQQSTIYRKPQESQYPVTFGASSYGNEKYQENSQDQGISHHPYGFNE
                     AQRIASSASHAVPQSGSELLHDPVHTRRTDAALDQNYPGNTGGWRTEHIQEPMPTEQI
                     PSGTPRTREQPSFNPDRALALNPHYTFDSYVVSDSNKLPCSAAIAVAEKPARAYNPLF
                     IWGDSGLGKTHLMHAVGNYAQYLNPRLRIKYVSSEEFTNEYINSVRDDRQEAFKRKYR
                     ELDILMVDDIQFLQGKEGTQEEFFHTFNALYQANKQIVLSSDRPPKQLTTLEDRLRTR
                     FQAGLIADIYPPDLETRIAILMKKAASESIVADREAIELIASRFNTSIRELEGAFIRV
                     SAYASLMSPDKGKHRIDLRIAEKALEDMMPEQANEEITATTILAATAEYFEMDVNALK
                     GSGKTRAVAHARQLAMYLCRELTDLSLPKIGEQFGGKDHTTVMYADRKIRKEITEKKE
                     TYDEIQLLTQQIKSSSRG"
.
.
.
ORIGIN      
        1 gtgtcggagg ctccatcgac atggaacgag cggtggcaag aagttactaa tgagctgctg
       61 tcacagtctc aggacccgga aagtggtatt tccattacgc gacagcaaag cgcctacctg
      121 cgtctggtta aaccagttgc cagtgtagag ggtattgccg ttttaagcgt tcctcacgcc
      181 cgagcgaaaa cgagattgaa actacgctgg gacctgttat cacagaggta ttgtctcgta
      241 gactaggtcg acaatacagt cttgcagtga gcgttcatgc tccagaggaa aatccagaag
      301 tatcctcggc cactccagat gctgtgtctg attaccagga acaatctgca gtttctggac
.
.
.
//

3.The two organisms have to be closely related. The one with parameter "-p" is the pathogenic one to be analysed and the one with parameter "-n" is the non-pathogenic organism of the same species or related species.

RESULTS

The results will be stored on the result folder inside the pips folder. The files are:

<pathogenic_bacterium>.Putative_Islands	--> Putative Islands identified on the pathogenic organism analysed along with information about PAI features on the Islands.

<pathogenic_bacterium>.PAI.tab --> File with positions of the Pathogenicity Islands to be added on the original .embl file using Artemis or ACT. This file allows the visualization of the predicted PAIs on the studied genome.

<pathogenic_bacterium>.faa.vs.virulenceDB.xls --> Hyperlinked xls file to be opened on Microsoft Office. All The CDSs have links to their sequences and their alignments to the virulence factors inside the mVirDB.

<pathogenic_bacterium>.faa-vs-virulenceDB --> Folder with information about CDS sequences and alignments to virulence factors. Those files are necessary if you want to click the links on the xls file and open the information directly. The folder must be on the same folder as the xls file.

<pathogenic_bacterium>.fasta--vs--<reference_non-pathogenic_bacterium>.fasta.ff.out.tab --> Comparison file to be used on ACT (Artemis Comparison Tool) aiming to find regions which are deleted on non-pathogenic species during manual analysis.

If you have any problem with the standalone version, please contact:
siomars@gmail.com
If you have any problem with the web version, please contact:
vini.abreu@gmail.com
