From ykalidas at gmail.com Fri Sep 1 12:39:58 2006 From: ykalidas at gmail.com (Kalidas Yeturu) Date: Fri, 1 Sep 2006 22:09:58 +0530 Subject: [Bio BB] Rasmol not working in Power PC architecture Message-ID: <5632703b0609010939w55296e8cpa4ba7505102797d4@mail.gmail.com> Hi I have been using rasmol scripts in my project. But till now there is no problem with running rasmol in i686 linux machines. uname -a gives: "Linux threonine 2.4.20-8smp #1 SMP Thu Mar 13 17:45:54 EST 2003 i686 i686 i386 GNU/Linux" Now I have to execute my programs in an MPI environment on PowerPC architecture. uname on MPI-cluster gives:"Linux cnode39 2.6.5-7.139-pseries64 #1 SMP Fri Jan 14 15:41:33 UTC 2005 ppc64 ppc64 ppc64 GNU/Linux" when i type ./rasmol: the error is :"bash: ./rasmol: cannot execute binary file" I searched in internet for Rasmol FAQ's and tried out various installations of Rasmol for PowerPC architecture, but could not solve the problem. I hope someone can solve this problem Thanking You Regards -- Kalidas Y http://ssl.serc.iisc.ernet.in/~kalidas -------------- next part -------------- An HTML attachment was scrubbed... URL: From janderson_net at yahoo.com Fri Sep 1 15:03:20 2006 From: janderson_net at yahoo.com (James Anderson) Date: Fri, 1 Sep 2006 12:03:20 -0700 (PDT) Subject: [BiO BB] question about Agilent microarray data format Message-ID: <20060901190320.74897.qmail@web31210.mail.mud.yahoo.com> Hi, I have some Agilent microarray data. I am not familiar with the format. (I am more familiar with Affy data). There are some columns named "gProcessedSignal" "rProcessedSignal", "LogRatio", etc. I guess it's more like cDNA with two channels. So should I use the LogRatio value to perform the next step analysis (gene selection, PCA, clustering, etc). Thanks, James --------------------------------- Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the US (and 30+ countries) for 2?/min or less. -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.gille at charite.de Fri Sep 1 16:50:50 2006 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Fri, 1 Sep 2006 22:50:50 +0200 (CEST) Subject: [Bio BB] Rasmol not working in Power PC architecture In-Reply-To: <5632703b0609010939w55296e8cpa4ba7505102797d4@mail.gmail.com> References: <5632703b0609010939w55296e8cpa4ba7505102797d4@mail.gmail.com> Message-ID: <61253.84.190.71.56.1157143850.squirrel@webmail.charite.de> I am not really familiar with ppc but I know that there is a package "Fink" http://fink.sourceforge.net/ which turns a ppc into a UNIX wokstation with compilers, X-Windows etc. To compile Rasmol you will need C, X-Windows and Tcl/TK. Another idea, JMol is an excellent Rasmol like program in Java which takes nearly all Rasmol commands. Perhaps you could use Jmol instead. From ykalidas at gmail.com Sat Sep 2 04:47:39 2006 From: ykalidas at gmail.com (Kalidas Yeturu) Date: Sat, 2 Sep 2006 14:17:39 +0530 Subject: [Bio BB] Rasmol not working in Power PC architecture In-Reply-To: <61253.84.190.71.56.1157143850.squirrel@webmail.charite.de> References: <5632703b0609010939w55296e8cpa4ba7505102797d4@mail.gmail.com> <61253.84.190.71.56.1157143850.squirrel@webmail.charite.de> Message-ID: <5632703b0609020147he1821bagbc77726d67d14f73@mail.gmail.com> Thank you. I will try it out. On 9/2/06, Dr. Christoph Gille wrote: > > I am not really familiar with ppc > but I know that there is a package "Fink" http://fink.sourceforge.net/ > which turns a ppc into a UNIX wokstation with compilers, X-Windows etc. > > To compile Rasmol you will need C, X-Windows and Tcl/TK. > > Another idea, JMol is an excellent Rasmol like program in Java which takes > nearly all Rasmol commands. Perhaps you could use Jmol instead. > > > > _______________________________________________ > General Forum at Bioinformatics.Org - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- Kalidas Y http://ssl.serc.iisc.ernet.in/~kalidas -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.gille at charite.de Tue Sep 5 09:40:39 2006 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Tue, 5 Sep 2006 15:40:39 +0200 (CEST) Subject: [BiO BB] Yet another Web 3D alignment viewer Message-ID: <50647.141.20.6.60.1157463639.squirrel@webmail.charite.de> Hi, I have made a free Java Webstart 3D alignment viewer and would like to ask you for your opinion and suggestions. It has already been tested on Linux, Windows XP (german), Windows 2000 (german), Windows 98 (german) and Macintosh OS-X 10.4.7. It still needs to be tested on English MS-Windows, Intel Macintosh, Solaris and Irix. When the application is loaded by clicking the jnlp file the specified protein files are loaded and stored on HD. The browser should invoke .../bin/javaws on the jnlp file. Computation is performed locally on the user computer. Here are a few examples http://3d-alignment.eu/pdb/a1.jnlp This is a pure sequence alignment http://3d-alignment.eu/pdb/a2.jnlp This is a multiple 3D alignment http://3d-alignment.eu/pdb/a3.jnlp This is mixed 3D and sequence alignment http://3d-alignment.eu/pdb/a4.jnlp This exmple demonstrates the alternative syntax for PDB chain identifiers. The following links load a pdb file with a given pdb Id and search for structurally similar proteins. http://3d-alignment.eu/pdb/1prn.jnlp This is a simple case of an X-ray structure with only one chain. http://3d-alignment.eu/pdb/1aab.jnlp This is an MNR structure. Only model 1 is loaded to save time. http://3d-alignment.eu/pdb/1ryp.jnlp This structure has 28 PDB chains. There are 14 different sequences. Results are shown in a tabbed pane with 14 tabs. There is also a README telling the syntax how the Web link is formed. It is quite simple. Is it working smoothly on your computers ? From pascual at cnb.uam.es Tue Sep 5 04:38:45 2006 From: pascual at cnb.uam.es (Alberto Pascual Montano) Date: Tue, 5 Sep 2006 10:38:45 +0200 Subject: [BiO BB] question about Agilent microarray data format References: <20060901190320.74897.qmail@web31210.mail.mud.yahoo.com> Message-ID: <009001c6d0c6$b336e8a0$7257f496@ANDREA> Hi James, You can download the manual for the Agilent Feature Extraction software at: http://microarray.onc.jhmi.edu/forms/ImageAnalysisManual.pdf#search=%22Agilent%20Feature%20Extraction%20software%20%22 There you will find details of the data format. In summary, "gProcessedSignal" "rProcessedSignal" are the Cy3 and Cy5 processed signals (the normalization algorihtms used are described in the data file), "LogRatio" is the base10 log ratio (rProcessedSignal/gProcessedSignal) and PValueLogRatio is the significance level of the calculated log ratio. Regards, Alberto ----- Original Message ----- From: James Anderson To: bio board Sent: Friday, September 01, 2006 9:03 PM Subject: [BiO BB] question about Agilent microarray data format Hi, I have some Agilent microarray data. I am not familiar with the format. (I am more familiar with Affy data). There are some columns named "gProcessedSignal" "rProcessedSignal", "LogRatio", etc. I guess it's more like cDNA with two channels. So should I use the LogRatio value to perform the next step analysis (gene selection, PCA, clustering, etc). Thanks, James ------------------------------------------------------------------------------ Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the US (and 30+ countries) for 2?/min or less. ------------------------------------------------------------------------------ _______________________________________________ General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board -------------- next part -------------- An HTML attachment was scrubbed... URL: From xiaowei.jiang at msn.com Tue Sep 5 13:19:04 2006 From: xiaowei.jiang at msn.com (Xiaowei JIANG) Date: Tue, 5 Sep 2006 19:19:04 +0200 Subject: [BiO BB] Re: question about Agilent microarray data format In-Reply-To: <20060902160106.DF6533686D3@primary.bioinformatics.org> Message-ID: Dear, I asked the same question to the group weeks ago. I used the raw Agilent microarray data in the JMP Microarray analysis platform. I did data preprocessing , normalization , gene selection, multidimensional scaling , PCA , clustering ananysis , and annotation analysis on the same platform The data input engine of JMP Micoarray used the columns "gProcessedSignal" and "rProcessedSignal" to produce the SAS data set, I used a log2 transformation to make the data more normal like. Depending on your experiment and design, you choose your specific normalization method, there are some books talking about this situation. So you need specific normalization softwares, and other analysis platforms to perform data preprocessing and data analysis. So for the Agilent data format , you dont actually need to care too much about the data format itself, instead, you should consider more about the preprocessing and data analysis methods you are going to use , the whole analysis procedure pertaining to the streamlined analysis software platforms should be considered in advance, etc. Kind regards, Xiaowei JIANG From J.Hane at murdoch.edu.au Thu Sep 7 02:32:52 2006 From: J.Hane at murdoch.edu.au (James Hane) Date: Thu, 7 Sep 2006 14:32:52 +0800 Subject: [BiO BB] Metabolomics software Message-ID: Hi I was wondering what metabolomics software you use and/or recommend? Many thanks, James Hane From varvenne at genoway.com Thu Sep 7 03:39:42 2006 From: varvenne at genoway.com (Benoit VARVENNE) Date: Thu, 07 Sep 2006 09:39:42 +0200 Subject: [BiO BB] Restriction sites frequencies in mouse genome In-Reply-To: <200609061243.09299.harry.mangalam@uci.edu> Message-ID: Hello, Harry, Thanks for your answer. I'd be very interested in having this code. First i only had to calculate frequencies in mouse genome but now things have changed... I'm interested in having positions of hits and in calculating distribution, fragment length ... The next step will be to make the link between hits found and corresponding features available in Ensembl databases (site in an existing gene, centromere, repeat regions, ...). I think i'm going to use Ensembl Perl API to do so. If anyone has got other ideas, i'd be very interested in them. If anyone's interested, i've got an optimized (program memory and performance) general perl script for finding number of hits of a sequence (or a pattern version) in very big sequences (like chromosomes or genome). Let me know if you want it. There is no management of a list of program entries for the moment and no management of storing positions, .... Regards, Benoit Varvenne, Bioinformatics pearson in charge, Genoway Lyon - France. Le 6/09/06 21:43, ??Harry Mangalam?? a ?crit?: > If by calculating frequencies, you want to find all the sites in a > genome, tacg will do this. It will find all the sites you give it > (I've tested it on all human chromosome assemblies) as well as the > predicted frequency based on the base pair distribution. > > It can theoretically do the entire genome in one shot if you have > enough RAM, but I've never tried it and the output would be pretty > ferocious. > for example, for chromosome 21 (a paltry 33.6MB), the summary output > is: > > ## Sequence: #1; from file: UNAVAILABLE > Format: FASTA; ID: gi:89161201; Description: Homo sapiens > chromosome 21, alternate assembly (based on Celera assembly), whole > genome shotgun sequence. > > == Sequence info: > > NB: sequence length > A+C+G+T due to -> 224404 <- IUPAC > degeneracies. > # of: N:224404 Y:0 R:0 W:0 S:0 K:0 M:0 B:0 D:0 H:0 V:0 > > #s below are for top strand; 'sites exp' values calculated on the > basis of both strands. > 33216610 bases; 9772353 A(29.42 %) 6752472 C(20.33 %) 6753971 > G(20.33 %) 9713410 T(29.24 %) > > == Enzymes that DO NOT MAP to this sequence: > > There were NO NON-matches - ALL patterns matched at least > ONCE. > > > == Total Number of Hits per Enzyme: > AatII 1068 BsiEI 1803 EcoRV 4841 PsiI > 20384 > AccI 12230 BsiHKAI 23981 FauI 18509 > PspGI112279 > AccII 9733 BsiWI 174 Fnu4HI 74994 PspOMI > 6067 > Acc65I 3021 BslI 91011 FokI 59656 PstI > 15561 > AciI 52859 BsmI 13955 FseI 235 PvuI > 181 > AclI 2047 BsmAI 73662 FspI 1211 PvuII > 12841 > AfeI 1406 BsmBI 7619 HaeII 7030 RsaI > 56361 > AflII 7226 BsmFI 45828 HaeIII 99508 RsrII > 126 > AflIII 18426 Bsp1286I 57995 HgaI 8115 SacI > 6829 > AgeI 676 BspEI 1246 HhaI 21013 SacII > 893 > AhdI 3149 BspHI 11844 HinP1I 21013 SalI > 392 > AluI143869 BspMI 16591 HincII 13046 SanDI > 3409 > AlwI 37296 BsrI 63802 HindIII 9457 SapI > 4316 > AlwNI 16140 BsrBI 2994 HinfI 96900 Sau96I > 77627 > ApaI 6067 BsrDI 16179 HpaI 4478 Sau3AI > 79640 > ApaLI 6042 BsrFI 4609 HpaII 29934 SbfI > 1068 > ApoI 74171 BsrGI 9408 HphI 67904 ScaI > 5880 > AscI 47 BssHII 890 KasI 2793 > ScrFI137189 > AseI 17631 BssKI137189 KpnI 3021 SexAI > 3472 > AvaI 12916 BssSI 5101 MaeII 28783 SfaNI > 42093 > AvaII 31938 BstAPI 9253 MaeIII 83257 SfcI > 39408 > AvrII 6112 BstBI 1256 MboII100007 SfiI > 599 > BaeI 2868 Bst4CI 87767 MfeI 6359 SfoI > 2793 > BaeI 2868 BstDSI 14918 MluI 334 SgfI > 13 > BamHI 4165 BstEII 4065 MlyI 44962 SgrAI > 214 > BanI 18704 BstF5I 59661 MnlI308118 SmaI > 4948 > BanII 27893 BstNI112279 MscI 14579 SmlI > 29332 > BbeI 2793 BstUI 9733 MseI226716 SnaBI > 1598 > BbsI 16623 BstXI 19685 MslI 38862 SpeI > 4362 > BbvI 63057 BstYI 24349 MspA1I 17762 SphI > 6477 > BbvCI 14806 BstZ17I 4605 MwoI 73785 SrfI > 302 > BcgI 3733 Bsu36I 10646 NaeI 1898 SspI > 28450 > BcgI 3733 BtgI 14918 NarI 2793 StuI > 8988 > BciVI 7495 BtrI 3836 NciI 24927 StyI > 34781 > BclI 8350 Cac8I 66066 NcoI 8941 SwaI > 2801 > BfaI 83296 ClaI 1121 NdeI 10096 TaiI > 28783 > BglI 6550 Csp6I 56361 NgoMIV 1898 TaqI > 17908 > BglII 8895 CviJI507227 NheI 2770 TatI > 30303 > BlpI 6131 CviRI168208 NlaIII161486 TfiI > 51945 > BmrI 19063 DdeI155096 NlaIV 87348 TliI > 1496 > BplI 11478 DpnI 79640 NotI 127 TseI > 63101 > BpmI 32957 DraI 41466 NruI 209 Tsp45I > 47283 > Bpu10I 25858 DraIII 6989 NsiI 11383 > Tsp509I254887 > BsaI 18254 DrdI 3165 NspI 36783 TspRI > 98632 > BsaAI 9382 EaeI 20232 PacI 1946 Tth111I > 7783 > BsaBI 4988 EagI 1139 PciI 12666 XbaI > 9158 > BsaHI 6162 EarI 25525 PflMI 11275 XcmI > 9507 > BsaJI121468 EciI 6774 PleI 44962 XhoI > 1496 > BsaWI 3529 Ecl136II 6829 PmeI 539 XmaI > 4948 > BseMII104754 Eco57I 24123 PmlI 4081 XmnI > 11146 > BseRI 23673 EcoNI 8774 Ppu10I 11383 > BseSI 25059 EcoO109I 28937 PpuMI 12989 > BsgI 24191 EcoRI 8938 PshAI 3251 > > To get the actual prdicted number of sites, you have to generate the > Sites info which would be enormous but easily sed-able to extract > what you needed. > > This took 9.5s on a 2GHz Opteron running 64bit Linux > > If you want, I'll send you the source tarball in a separate email. > > hjm > > > On Tuesday 29 August 2006 05:35, Benoit VARVENNE wrote: >> Hello everybody, >> >> Thanks to all for your ideas and suggestions. I think i'm going to >> consider perl programming to calculate restriction sites frequency >> as softwares mentionned in your mails (+softwares i found) don't >> seem to be useful for a whole genome scale. Programming was to be >> avoid for this study but it seems to be the only solution. I'm >> really surprised not being able to find such an already done study. >> >> Thanks again, >> Regards, >> >> Beno?t Varvenne, >> Bioinformatics pearson in charge, >> Genoway Lyon - France. >> >> Le 28/08/06 11:34, ??Benoit VARVENNE?? > a ?crit?: >>> Dear Members, >>> >>> I am a new member of this mailing-list and i don't know if such a >>> post will draw the attention of anyone here. So excuse me in >>> advance if my subject is not appropriate. >>> I am searching for a way to calculate restriction sites frequency >>> in mouse genome (so sequences from 6 to 13bp). I have already >>> tried to do so using blast (or blast-like) tools and configuring >>> them as needed but it gave no results, because of too numerous >>> hits i think. >>> >>> I would be very greatful if someone could help me on this topic. >>> >>> Thanks a lot for your help, >>> Best regards, >>> >>> Beno?t Varvenne, >>> Bioinformatics pearson in charge, >>> Genoway Lyon - France >>> >>> _______________________________________________ >>> General Forum at Bioinformatics.Org - >>> BiO_Bulletin_Board at bioinformatics.org >>> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board >> >> _______________________________________________ >> General Forum at Bioinformatics.Org - >> BiO_Bulletin_Board at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From harry.mangalam at uci.edu Wed Sep 6 15:43:08 2006 From: harry.mangalam at uci.edu (Harry Mangalam) Date: Wed, 6 Sep 2006 12:43:08 -0700 Subject: [BiO BB] Restriction sites frequencies in mouse genome In-Reply-To: References: Message-ID: <200609061243.09299.harry.mangalam@uci.edu> If by calculating frequencies, you want to find all the sites in a genome, tacg will do this. It will find all the sites you give it (I've tested it on all human chromosome assemblies) as well as the predicted frequency based on the base pair distribution. It can theoretically do the entire genome in one shot if you have enough RAM, but I've never tried it and the output would be pretty ferocious. for example, for chromosome 21 (a paltry 33.6MB), the summary output is: ## Sequence: #1; from file: UNAVAILABLE Format: FASTA; ID: gi:89161201; Description: Homo sapiens chromosome 21, alternate assembly (based on Celera assembly), whole genome shotgun sequence. == Sequence info: NB: sequence length > A+C+G+T due to -> 224404 <- IUPAC degeneracies. # of: N:224404 Y:0 R:0 W:0 S:0 K:0 M:0 B:0 D:0 H:0 V:0 #s below are for top strand; 'sites exp' values calculated on the basis of both strands. 33216610 bases; 9772353 A(29.42 %) 6752472 C(20.33 %) 6753971 G(20.33 %) 9713410 T(29.24 %) == Enzymes that DO NOT MAP to this sequence: There were NO NON-matches - ALL patterns matched at least ONCE. == Total Number of Hits per Enzyme: AatII 1068 BsiEI 1803 EcoRV 4841 PsiI 20384 AccI 12230 BsiHKAI 23981 FauI 18509 PspGI112279 AccII 9733 BsiWI 174 Fnu4HI 74994 PspOMI 6067 Acc65I 3021 BslI 91011 FokI 59656 PstI 15561 AciI 52859 BsmI 13955 FseI 235 PvuI 181 AclI 2047 BsmAI 73662 FspI 1211 PvuII 12841 AfeI 1406 BsmBI 7619 HaeII 7030 RsaI 56361 AflII 7226 BsmFI 45828 HaeIII 99508 RsrII 126 AflIII 18426 Bsp1286I 57995 HgaI 8115 SacI 6829 AgeI 676 BspEI 1246 HhaI 21013 SacII 893 AhdI 3149 BspHI 11844 HinP1I 21013 SalI 392 AluI143869 BspMI 16591 HincII 13046 SanDI 3409 AlwI 37296 BsrI 63802 HindIII 9457 SapI 4316 AlwNI 16140 BsrBI 2994 HinfI 96900 Sau96I 77627 ApaI 6067 BsrDI 16179 HpaI 4478 Sau3AI 79640 ApaLI 6042 BsrFI 4609 HpaII 29934 SbfI 1068 ApoI 74171 BsrGI 9408 HphI 67904 ScaI 5880 AscI 47 BssHII 890 KasI 2793 ScrFI137189 AseI 17631 BssKI137189 KpnI 3021 SexAI 3472 AvaI 12916 BssSI 5101 MaeII 28783 SfaNI 42093 AvaII 31938 BstAPI 9253 MaeIII 83257 SfcI 39408 AvrII 6112 BstBI 1256 MboII100007 SfiI 599 BaeI 2868 Bst4CI 87767 MfeI 6359 SfoI 2793 BaeI 2868 BstDSI 14918 MluI 334 SgfI 13 BamHI 4165 BstEII 4065 MlyI 44962 SgrAI 214 BanI 18704 BstF5I 59661 MnlI308118 SmaI 4948 BanII 27893 BstNI112279 MscI 14579 SmlI 29332 BbeI 2793 BstUI 9733 MseI226716 SnaBI 1598 BbsI 16623 BstXI 19685 MslI 38862 SpeI 4362 BbvI 63057 BstYI 24349 MspA1I 17762 SphI 6477 BbvCI 14806 BstZ17I 4605 MwoI 73785 SrfI 302 BcgI 3733 Bsu36I 10646 NaeI 1898 SspI 28450 BcgI 3733 BtgI 14918 NarI 2793 StuI 8988 BciVI 7495 BtrI 3836 NciI 24927 StyI 34781 BclI 8350 Cac8I 66066 NcoI 8941 SwaI 2801 BfaI 83296 ClaI 1121 NdeI 10096 TaiI 28783 BglI 6550 Csp6I 56361 NgoMIV 1898 TaqI 17908 BglII 8895 CviJI507227 NheI 2770 TatI 30303 BlpI 6131 CviRI168208 NlaIII161486 TfiI 51945 BmrI 19063 DdeI155096 NlaIV 87348 TliI 1496 BplI 11478 DpnI 79640 NotI 127 TseI 63101 BpmI 32957 DraI 41466 NruI 209 Tsp45I 47283 Bpu10I 25858 DraIII 6989 NsiI 11383 Tsp509I254887 BsaI 18254 DrdI 3165 NspI 36783 TspRI 98632 BsaAI 9382 EaeI 20232 PacI 1946 Tth111I 7783 BsaBI 4988 EagI 1139 PciI 12666 XbaI 9158 BsaHI 6162 EarI 25525 PflMI 11275 XcmI 9507 BsaJI121468 EciI 6774 PleI 44962 XhoI 1496 BsaWI 3529 Ecl136II 6829 PmeI 539 XmaI 4948 BseMII104754 Eco57I 24123 PmlI 4081 XmnI 11146 BseRI 23673 EcoNI 8774 Ppu10I 11383 BseSI 25059 EcoO109I 28937 PpuMI 12989 BsgI 24191 EcoRI 8938 PshAI 3251 To get the actual prdicted number of sites, you have to generate the Sites info which would be enormous but easily sed-able to extract what you needed. This took 9.5s on a 2GHz Opteron running 64bit Linux If you want, I'll send you the source tarball in a separate email. hjm On Tuesday 29 August 2006 05:35, Benoit VARVENNE wrote: > Hello everybody, > > Thanks to all for your ideas and suggestions. I think i'm going to > consider perl programming to calculate restriction sites frequency > as softwares mentionned in your mails (+softwares i found) don't > seem to be useful for a whole genome scale. Programming was to be > avoid for this study but it seems to be the only solution. I'm > really surprised not being able to find such an already done study. > > Thanks again, > Regards, > > Beno?t Varvenne, > Bioinformatics pearson in charge, > Genoway Lyon - France. > > Le 28/08/06 11:34, ??Benoit VARVENNE?? a ?crit?: > > Dear Members, > > > > I am a new member of this mailing-list and i don't know if such a > > post will draw the attention of anyone here. So excuse me in > > advance if my subject is not appropriate. > > I am searching for a way to calculate restriction sites frequency > > in mouse genome (so sequences from 6 to 13bp). I have already > > tried to do so using blast (or blast-like) tools and configuring > > them as needed but it gave no results, because of too numerous > > hits i think. > > > > I would be very greatful if someone could help me on this topic. > > > > Thanks a lot for your help, > > Best regards, > > > > Beno?t Varvenne, > > Bioinformatics pearson in charge, > > Genoway Lyon - France > > > > _______________________________________________ > > General Forum at Bioinformatics.Org - > > BiO_Bulletin_Board at bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > _______________________________________________ > General Forum at Bioinformatics.Org - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board -- Harry Mangalam - Research Computing at NACS, E2148, Engineering Gateway, UC Irvine 92697 949 824 0084(o), 949 285 4487(c) harry.mangalam at uci.edu From harry.mangalam at uci.edu Thu Sep 7 12:23:39 2006 From: harry.mangalam at uci.edu (Harry Mangalam) Date: Thu, 7 Sep 2006 09:23:39 -0700 Subject: [BiO BB] Restriction sites frequencies in mouse genome In-Reply-To: References: Message-ID: <200609070923.39999.harry.mangalam@uci.edu> On Thursday 07 September 2006 00:39, Benoit VARVENNE wrote: > Hello, > > Harry, > Thanks for your answer. I'd be very interested in having this code. > > First i only had to calculate frequencies in mouse genome but now > things have changed... I'm interested in having positions of hits > and in calculating distribution, fragment length ... It can do the above things with no problems besides size of output (if you ask for all the hits for a 4cutter in 200MB, you'll get lots of output). tacg can generate output for gnuplotting directly for these kinds of distribution plots or in a few different table formats. (see -G option). > The next step will be to make the link between hits found and > corresponding features available in Ensembl databases (site in an > existing gene, centromere, repeat regions, ...). > I think i'm going to use Ensembl Perl API to do so. Unfortunately, it will not do this directly now.. Your stated approach is probably best. The src is on its way. hjm > If anyone has got other ideas, i'd be very interested in them. > > If anyone's interested, i've got an optimized (program memory and > performance) general perl script for finding number of hits of a > sequence (or a pattern version) in very big sequences (like > chromosomes or genome). Let me know if you want it. > There is no management of a list of program entries for the moment > and no management of storing positions, .... > > > Regards, > > Benoit Varvenne, > Bioinformatics pearson in charge, > Genoway Lyon - France. > > Le 6/09/06 21:43, ??Harry Mangalam?? a ?crit?: > > If by calculating frequencies, you want to find all the sites in > > a genome, tacg will do this. It will find all the sites you give > > it (I've tested it on all human chromosome assemblies) as well as > > the predicted frequency based on the base pair distribution. > > > > It can theoretically do the entire genome in one shot if you have > > enough RAM, but I've never tried it and the output would be > > pretty ferocious. > > for example, for chromosome 21 (a paltry 33.6MB), the summary > > output is: > > > > ## Sequence: #1; from file: UNAVAILABLE > > Format: FASTA; ID: gi:89161201; Description: Homo sapiens > > chromosome 21, alternate assembly (based on Celera assembly), > > whole genome shotgun sequence. > > > > == Sequence info: > > > > NB: sequence length > A+C+G+T due to -> 224404 <- IUPAC > > degeneracies. > > # of: N:224404 Y:0 R:0 W:0 S:0 K:0 M:0 B:0 D:0 H:0 > > V:0 > > > > #s below are for top strand; 'sites exp' values calculated on > > the basis of both strands. > > 33216610 bases; 9772353 A(29.42 %) 6752472 C(20.33 %) 6753971 > > G(20.33 %) 9713410 T(29.24 %) > > > > == Enzymes that DO NOT MAP to this sequence: > > > > There were NO NON-matches - ALL patterns matched at least > > ONCE. > > > > > > == Total Number of Hits per Enzyme: > > AatII 1068 BsiEI 1803 EcoRV 4841 PsiI > > 20384 > > AccI 12230 BsiHKAI 23981 FauI 18509 > > PspGI112279 > > AccII 9733 BsiWI 174 Fnu4HI 74994 PspOMI > > 6067 > > Acc65I 3021 BslI 91011 FokI 59656 PstI > > 15561 > > AciI 52859 BsmI 13955 FseI 235 PvuI > > 181 > > AclI 2047 BsmAI 73662 FspI 1211 PvuII > > 12841 > > AfeI 1406 BsmBI 7619 HaeII 7030 RsaI > > 56361 > > AflII 7226 BsmFI 45828 HaeIII 99508 RsrII > > 126 > > AflIII 18426 Bsp1286I 57995 HgaI 8115 SacI > > 6829 > > AgeI 676 BspEI 1246 HhaI 21013 SacII > > 893 > > AhdI 3149 BspHI 11844 HinP1I 21013 SalI > > 392 > > AluI143869 BspMI 16591 HincII 13046 SanDI > > 3409 > > AlwI 37296 BsrI 63802 HindIII 9457 SapI > > 4316 > > AlwNI 16140 BsrBI 2994 HinfI 96900 Sau96I > > 77627 > > ApaI 6067 BsrDI 16179 HpaI 4478 Sau3AI > > 79640 > > ApaLI 6042 BsrFI 4609 HpaII 29934 SbfI > > 1068 > > ApoI 74171 BsrGI 9408 HphI 67904 ScaI > > 5880 > > AscI 47 BssHII 890 KasI 2793 > > ScrFI137189 > > AseI 17631 BssKI137189 KpnI 3021 SexAI > > 3472 > > AvaI 12916 BssSI 5101 MaeII 28783 SfaNI > > 42093 > > AvaII 31938 BstAPI 9253 MaeIII 83257 SfcI > > 39408 > > AvrII 6112 BstBI 1256 MboII100007 SfiI > > 599 > > BaeI 2868 Bst4CI 87767 MfeI 6359 SfoI > > 2793 > > BaeI 2868 BstDSI 14918 MluI 334 SgfI > > 13 > > BamHI 4165 BstEII 4065 MlyI 44962 SgrAI > > 214 > > BanI 18704 BstF5I 59661 MnlI308118 SmaI > > 4948 > > BanII 27893 BstNI112279 MscI 14579 SmlI > > 29332 > > BbeI 2793 BstUI 9733 MseI226716 SnaBI > > 1598 > > BbsI 16623 BstXI 19685 MslI 38862 SpeI > > 4362 > > BbvI 63057 BstYI 24349 MspA1I 17762 SphI > > 6477 > > BbvCI 14806 BstZ17I 4605 MwoI 73785 SrfI > > 302 > > BcgI 3733 Bsu36I 10646 NaeI 1898 SspI > > 28450 > > BcgI 3733 BtgI 14918 NarI 2793 StuI > > 8988 > > BciVI 7495 BtrI 3836 NciI 24927 StyI > > 34781 > > BclI 8350 Cac8I 66066 NcoI 8941 SwaI > > 2801 > > BfaI 83296 ClaI 1121 NdeI 10096 TaiI > > 28783 > > BglI 6550 Csp6I 56361 NgoMIV 1898 TaqI > > 17908 > > BglII 8895 CviJI507227 NheI 2770 TatI > > 30303 > > BlpI 6131 CviRI168208 NlaIII161486 TfiI > > 51945 > > BmrI 19063 DdeI155096 NlaIV 87348 TliI > > 1496 > > BplI 11478 DpnI 79640 NotI 127 TseI > > 63101 > > BpmI 32957 DraI 41466 NruI 209 Tsp45I > > 47283 > > Bpu10I 25858 DraIII 6989 NsiI 11383 > > Tsp509I254887 > > BsaI 18254 DrdI 3165 NspI 36783 TspRI > > 98632 > > BsaAI 9382 EaeI 20232 PacI 1946 Tth111I > > 7783 > > BsaBI 4988 EagI 1139 PciI 12666 XbaI > > 9158 > > BsaHI 6162 EarI 25525 PflMI 11275 XcmI > > 9507 > > BsaJI121468 EciI 6774 PleI 44962 XhoI > > 1496 > > BsaWI 3529 Ecl136II 6829 PmeI 539 XmaI > > 4948 > > BseMII104754 Eco57I 24123 PmlI 4081 XmnI > > 11146 > > BseRI 23673 EcoNI 8774 Ppu10I 11383 > > BseSI 25059 EcoO109I 28937 PpuMI 12989 > > BsgI 24191 EcoRI 8938 PshAI 3251 > > > > To get the actual prdicted number of sites, you have to generate > > the Sites info which would be enormous but easily sed-able to > > extract what you needed. > > > > This took 9.5s on a 2GHz Opteron running 64bit Linux > > > > If you want, I'll send you the source tarball in a separate > > email. > > > > hjm > > > > On Tuesday 29 August 2006 05:35, Benoit VARVENNE wrote: > >> Hello everybody, > >> > >> Thanks to all for your ideas and suggestions. I think i'm going > >> to consider perl programming to calculate restriction sites > >> frequency as softwares mentionned in your mails (+softwares i > >> found) don't seem to be useful for a whole genome scale. > >> Programming was to be avoid for this study but it seems to be > >> the only solution. I'm really surprised not being able to find > >> such an already done study. > >> > >> Thanks again, > >> Regards, > >> > >> Beno?t Varvenne, > >> Bioinformatics pearson in charge, > >> Genoway Lyon - France. > >> > >> Le 28/08/06 11:34, ??Benoit VARVENNE?? > > > > a ?crit?: > >>> Dear Members, > >>> > >>> I am a new member of this mailing-list and i don't know if such > >>> a post will draw the attention of anyone here. So excuse me in > >>> advance if my subject is not appropriate. > >>> I am searching for a way to calculate restriction sites > >>> frequency in mouse genome (so sequences from 6 to 13bp). I have > >>> already tried to do so using blast (or blast-like) tools and > >>> configuring them as needed but it gave no results, because of > >>> too numerous hits i think. > >>> > >>> I would be very greatful if someone could help me on this > >>> topic. > >>> > >>> Thanks a lot for your help, > >>> Best regards, > >>> > >>> Beno?t Varvenne, > >>> Bioinformatics pearson in charge, > >>> Genoway Lyon - France > >>> > >>> _______________________________________________ > >>> General Forum at Bioinformatics.Org - > >>> BiO_Bulletin_Board at bioinformatics.org > >>> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > >> > >> _______________________________________________ > >> General Forum at Bioinformatics.Org - > >> BiO_Bulletin_Board at bioinformatics.org > >> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board -- Harry Mangalam - Research Computing at NACS, E2148, Engineering Gateway, UC Irvine 92697 949 824 0084(o), 949 285 4487(c) harry.mangalam at uci.edu From keshet1 at umbc.edu Thu Sep 7 14:37:25 2006 From: keshet1 at umbc.edu (Ben Keshet) Date: Thu, 7 Sep 2006 14:37:25 -0400 Subject: [BiO BB] How to read Naccess .asa .rsa files? Message-ID: <001e01c6d2ac$a9ce3830$29ad5582@umbc80a173302c> Hello, I installed Naccess (accessibility calculations, Simon Hubbard, University College London) and trying to use it. Does anyone know how to read the .asa and .rsa files that are formed after running the program on a .pdb file? I read the README file of the program, but could not understand what do the different columns represent. I suspect that the key to understand them is knowing how to interpret a PDB file, so if someone knows that, please share with me. Thanks a lot. Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: From clement at cs.byu.edu Tue Sep 12 16:04:49 2006 From: clement at cs.byu.edu (Mark Clement) Date: Tue, 12 Sep 2006 14:04:49 -0600 Subject: [BiO BB] Biotechnology and Bioinformatics Symposium (BIOT-2006) Message-ID: <2EFA73C2-7C7F-4F58-A737-936EE7827611@cs.byu.edu> We invite you to participate in the Biotechnology and Bioinformatics Symposium (BIOT-2006) on October 20-21 in Provo Utah. The symposium will include a keynote discussion on the use of knockout mice in drug development as well as discussions of the the Cancer Biomedical Informatics Grid (caBIG). Accepted papers will be presented describing research into Pharmagenomics, cost effective genotyping, human genomic sequencing, protein-DNA interactions, hardware for exon prediction, protein folding, data mining and secondary structure prediction. October 20-21, 2006 Provo, Utah http://www.biotconf.org/index.shtml ---------------- Dr. Mark Clement Department of Computer Science Brigham Young University 3370 TMCB Provo, Utah 84602 (801) 422-7608 clement at cs.byu.edu From viveksr56 at hotmail.com Wed Sep 13 03:01:45 2006 From: viveksr56 at hotmail.com (vivekanandan ramanathan) Date: Wed, 13 Sep 2006 07:01:45 +0000 Subject: [BiO BB] New setup Bioinformatics In-Reply-To: <20060816045318.M95213@hcl.in> Message-ID: Dear sir I am interested in setting up a Bioinformatics Lab in Forestry research iNstitute . what are the potential areas of BIoinformatics Application in Forestry. With best regards R.Vivekanandan >From: "Balamurugan R" >Reply-To: "General Forum at Bioinformatics.Org" > >To: "General Forum at Bioinformatics.Org" > >Subject: Re: [BiO BB] New setup Bioinformatics >Date: Wed, 16 Aug 2006 10:32:05 +0530 > >On Sat, 12 Aug 2006 04:56:36 -0700 (PDT), Rajib Borpuzari wrote > > Dear Member, > > > > I want to setup new bioinformatics centre in a > > institute through that to create database of almost > > 2000 Germplasm.Therefore you may please give me answer > > for following queries. > > > > 1.Initial infrastructure requirement. >Depends on how you want to setup your lab. > >HARDWARE: >a. Atleast a workstation or a server machine to host your database. >b. you may require some Desktop machines as client (depends on the number >of >intended users in the lab). > > > > 2.Software to create database. >If you opt for all linux solution then you get postgresql (GNU) and >MYSQL(LGPL) versions of databases that you could use. > > > 3.Total cost of in Indian rupee. >With all Linux solution, you will be spending only for your hardware and >probably for your internet connectivity ofcourse. > > > > > Thanking you. > > With best regards, > > R.Borpuzari > >Best Regards, >Balamurugan.R >Bioinformatics Solutions Group >HCL Infosystems Ltd. >Pondicherry. >_______________________________________________ >General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From alex.li at pioneer.com Wed Sep 13 15:33:05 2006 From: alex.li at pioneer.com (Li, Alex) Date: Wed, 13 Sep 2006 14:33:05 -0500 Subject: [BiO BB] Seqio and fmtseq Message-ID: We have got the fixes for James Knight's legend seqio and fmtseq to get compiled and work on linux. Let me know if anyone is still interested in compiling seqio on linux or newer unix machines. Alex Li Bioinformatics 515-334-4736 Alex.li at pioneer.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From witch_of_agnessi at yahoo.com Thu Sep 14 12:22:58 2006 From: witch_of_agnessi at yahoo.com (Skull Crossbones) Date: Thu, 14 Sep 2006 09:22:58 -0700 (PDT) Subject: [BiO BB] A question on Smith-Waterman algorithm Message-ID: <20060914162258.89586.qmail@web37901.mail.mud.yahoo.com> Hello all, In the SW algo. mismatches are given negative scores. Does this mean I can not use an Identity Scoring Matrix ( 1 for match and 0 for mismatch) for aligning DNA sequences? Does the term "Mismatch" applies for protein scoring matrices like PAM and BLOSUM Thanks in advance WoA __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From sankar.achuth at gmail.com Thu Sep 14 12:30:31 2006 From: sankar.achuth at gmail.com (Dr. Achuthsankar S. Nair) Date: Thu, 14 Sep 2006 22:00:31 +0530 Subject: [BiO BB] A question on Smith-Waterman algorithm In-Reply-To: <20060914162258.89586.qmail@web37901.mail.mud.yahoo.com> References: <20060914162258.89586.qmail@web37901.mail.mud.yahoo.com> Message-ID: <2b168b460609140930p5d38a648w8b805a851f5fd707@mail.gmail.com> When you use PAM and BLOSUM, the simple 1/0 scoring is no longer applicable, they are overtaken by the PAM/BLOSUM matrices themselves -- Dr Achuthsankar S Nair Hon. Director Centre for Bioinformatics University of Kerala, Trivandrum 695581, INDIA Tel (O) 471-2412759 (R) 471-2542220 www.cbi.keralauniversity.edu www.achu.keralauniversity.edu =================================================================== Admissions to MPhil Bioinformatics for Jan 2007 Open - Brochure and Application forms can be downloaded from www.cbi.keralauniversity.edu On 9/14/06, Skull Crossbones wrote: > > Hello all, > > In the SW algo. mismatches are given negative scores. > Does this mean I can not use an Identity Scoring > Matrix ( 1 for match and 0 for mismatch) for aligning > DNA sequences? Does the term "Mismatch" applies for > protein scoring matrices like PAM and BLOSUM > > Thanks in advance > WoA > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > General Forum at Bioinformatics.Org - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marty.gollery at gmail.com Thu Sep 14 12:36:52 2006 From: marty.gollery at gmail.com (Martin Gollery) Date: Thu, 14 Sep 2006 09:36:52 -0700 Subject: [BiO BB] A question on Smith-Waterman algorithm In-Reply-To: <20060914162258.89586.qmail@web37901.mail.mud.yahoo.com> References: <20060914162258.89586.qmail@web37901.mail.mud.yahoo.com> Message-ID: Yes, you can use an Identity matrix with nucleotide. Marty On 9/14/06, Skull Crossbones wrote: > > Hello all, > > In the SW algo. mismatches are given negative scores. > Does this mean I can not use an Identity Scoring > Matrix ( 1 for match and 0 for mismatch) for aligning > DNA sequences? Does the term "Mismatch" applies for > protein scoring matrices like PAM and BLOSUM > > Thanks in advance > WoA > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > General Forum at Bioinformatics.Org - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- -- Martin Gollery Associate Director Center For Bioinformatics University of Nevada at Reno Dept. of Biochemistry / MS330 775-784-7042 ----------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeff at bioinformatics.org Thu Sep 14 14:54:20 2006 From: jeff at bioinformatics.org (J.W. Bizzaro) Date: Thu, 14 Sep 2006 14:54:20 -0400 Subject: [BiO BB] Open Source Bioinformatics for Researchers (reminder) Message-ID: <4509A55C.6050905@bioinformatics.org> This course is being held on Tuesday and Wednesday of next week. Some space is still available. --------------------- OPEN SOURCE BIOINFORMATICS FOR RESEARCHERS RADISSON HOTEL CAMBRIDGE CAMBRIDGE, MA SEPT. 19 & 20, 2006 http://edu.bioinformatics.org/06a (CS101 Introduction to Bioinformatics Programming: Perl I & R I for Biologists) September 19th & 20th, 2006 Radisson Hotel Cambridge , 777 Memorial Drive, Cambridge, Massachusetts Poster (172 KB PDF) DESCRIPTION This is a course on the fundamentals of open-source programming, to help biologists understand how and when to use the right computer tools for solving computational biology problems, whether sequence analysis, gene expression, mass spectrometry, or systems biology. This course is modularized so that researchers can understand two distinct tools: a programming and scripting language such as Perl and a data analysis and visualization language such as R. Armed with knowledge and some hands-on experience with these tools (including add-on modules like BioPerl and Bioconductor), scientists will be able to appreciate and use software better in their organization, and also be able to put research questions in the context of these tools. They will be able to do basic computational tasks themselves and better communicate with their IT group. PREREQUISITES There are no prerequisites for this course other than having a need to learn some of the fundamentals of programming, in case the scientist has any bioinformatic tasks in their day-to-day work. COURSE OUTLINE Day 1: Session 1: 08:00 - 08:30 am: Registration 08:30 - 10:00 am: Installation, Fundamentals of Perl 10:00 - 10:30 am: Exercises - simple Perl programs 10:30 - 10:45 am: BREAK 10:45 - 11:30 am: Perl loops, file i/o, list operations, conditional statements 11:30 - 12:00 pm: Exercises - more Perl programs Session 2: 01:00 - 02:00 pm: Perl syntax - regular expressions, hash functions 02:00 - 03:00 pm: Exercises - manipulate DNA sequence, annotation data 03:00 - 03:15 pm: BREAK 03:15 - 03:30 pm: Additional Perl syntax - subroutines 03:30 - 05:00 pm: Exercises - automate BLAST queries Day 2: 09:00 - 12:00 pm: Basics of R 12:00 - 01:00 pm: LUNCH 01:00 - 04:30 pm: Bioconductor for microarray analysis Breakfast and afternoon tea are provided, but lunch is not. LOGISTICS When: September 19th & 20th, 2006 Where: Radisson Hotel Cambridge , 777 Memorial Drive, Cambridge, Massachusetts All attendees are encouraged to bring their own notebook computers, since this will be a hands-on workshop. CDs will be provided to install the necessary software, and lecture notes and exercises will be provided. CERTIFICATION This course is certified by the Bioinformatics Organization, Inc. , the largest international affiliation in the field, and it will count as *16* "Continuing Scientific Education" (CSE) credits (one credit per contact hour) within the Organization. Students completing the course will receive a certificate attesting to that. REGISTRATION Commercial tuition: $600/person Academic tuition: $300/person Registration deadline: September 18th, 2006 or when filled Available payment methods: 1. Online Registration Form (account required) Use this form only if paying by credit card (via secured PayPal). 2. Mail-in Registration Form (158 KB PDF) Use this form for all other methods of payment. You may also go to the course website and click on "login as a guest" to view the online course materials. Please send questions to . -- J.W. Bizzaro Bioinformatics Organization, Inc. (Bioinformatics.Org) E-mail: jeff at bioinformatics.org Phone: +1 508 890 8600 -- From boris.steipe at utoronto.ca Thu Sep 14 13:56:55 2006 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Thu, 14 Sep 2006 13:56:55 -0400 Subject: [BiO BB] A question on Smith-Waterman algorithm In-Reply-To: <20060914162258.89586.qmail@web37901.mail.mud.yahoo.com> References: <20060914162258.89586.qmail@web37901.mail.mud.yahoo.com> Message-ID: <324FAF37-4F07-4CAF-B87F-1812A28B2442@utoronto.ca> If you use a matrix that gives a positive expectation value for a random match, a >>local<< alignment algorithm like SW will simply extend the alignment into random noise, since the mismatches it encounters do not reduce the score. Remember that a scoring matrix is only a tool to represent a model of how similarity came about. The 1/0 matrix implicitly states that there is information if you observe matches and no information if you observe mismatches. This is not a model of evolution however, since evolution implies that mismatches are less likely and thus should be penalized if two sequences are related. HTH. Boris On 14-Sep-06, at 12:22 PM, Skull Crossbones wrote: > Hello all, > > In the SW algo. mismatches are given negative scores. > Does this mean I can not use an Identity Scoring > Matrix ( 1 for match and 0 for mismatch) for aligning > DNA sequences? Does the term "Mismatch" applies for > protein scoring matrices like PAM and BLOSUM > > Thanks in advance > WoA > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > General Forum at Bioinformatics.Org - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From aloraine at gmail.com Sun Sep 17 03:19:38 2006 From: aloraine at gmail.com (Ann Loraine) Date: Sun, 17 Sep 2006 02:19:38 -0500 Subject: [BiO BB] command-line (scriptable) ORF finders? Message-ID: <83722dde0609170019n17c690f4xde230b88626d76d9@mail.gmail.com> Hello all, I'm hoping someone on the list who is involved with EST or full-length cDNA sequencing projects can help me with something (well..two things): (1) I am looking for a command-line, scriptable tool that can take as input an EST, cDNA, or assembled EST contig ("unigene") sequence and return the most likely or longest open reading frame. This is for a plant EST project. It should also pay attention to codon usage rules. (2) I am also looking for a tool that can take as input a set of exon annotations (or mRNA-to-genome alignments) and return the most likely CDS start and end for the given gene structure. Tools that can jigger the alignment/exon boundaries to optimize the ORF *and* which pay attention to codon usage rules would be extra great. This is for deducing novel gene structures from cross-species mRNA-to-genome alignments. Maybe there is a gene-finder that does this? I've found a variety of web sites that claim to do this, but, as you know, Web sites don't really cut it when you are working with thousands of sequences. And also, I would like to see the code in case I run into problems. Any thoughts or suggestions (other than pointers to Web tools, please) would be greatly appreciated! Sincerely, Ann Loraine -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From pmr at ebi.ac.uk Fri Sep 15 05:09:58 2006 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Fri, 15 Sep 2006 10:09:58 +0100 (BST) Subject: [BiO BB] A question on Smith-Waterman algorithm In-Reply-To: <20060914162258.89586.qmail@web37901.mail.mud.yahoo.com> References: <20060914162258.89586.qmail@web37901.mail.mud.yahoo.com> Message-ID: <1230.86.133.36.221.1158311398.squirrel@webmail.ebi.ac.uk> Dear WoA > In the SW algo. mismatches are given negative scores. > Does this mean I can not use an Identity Scoring > Matrix ( 1 for match and 0 for mismatch) for aligning > DNA sequences? Does the term "Mismatch" applies for > protein scoring matrices like PAM and BLOSUM No, you cannot use a matrix with only 1 and 0. Well, you can - but it will not work. This is because of the way the Smith Waterman algorithm works. It calculates scores for all pairwise matches, allows for gap penalties, finds the highest score anywhere in the matrix and works back until the score becomes negative. It is the "becomes negative" that catches you. With no negative scores in the matrix you will get a global (Needleman Wunsch) alignment instead, starting at one terminmating edge of the matrix (because scores will never go down) and ending at one of the starting edges. Mismatch scores for nucleotide are simply mismatches usually all with the same score (you can adjust for G:U base pairing in RNA) - there is not the same concept of partial matches that you have with protein matrices. So, pick a reasonable identity score (it doesn't have to be 1, you can try 10 to avoid a +1 and -1 matrix)) and something negative for everything else. Hope that helps, Peter Rice From landman at scalableinformatics.com Sun Sep 17 13:39:10 2006 From: landman at scalableinformatics.com (Joe Landman) Date: Sun, 17 Sep 2006 13:39:10 -0400 Subject: [BiO BB] command-line (scriptable) ORF finders? In-Reply-To: <83722dde0609170019n17c690f4xde230b88626d76d9@mail.gmail.com> References: <83722dde0609170019n17c690f4xde230b88626d76d9@mail.gmail.com> Message-ID: <450D883E.7010806@scalableinformatics.com> Hi Ann: Ann Loraine wrote: > Hello all, > > I'm hoping someone on the list who is involved with EST or full-length > cDNA sequencing projects can help me with something (well..two > things): > > (1) I am looking for a command-line, scriptable tool that can take as > input an EST, cDNA, or assembled EST contig ("unigene") sequence and > return the most likely or longest open reading frame. This is for a > plant EST project. It should also pay attention to codon usage rules. Would getorf from EMBOSS help? http://emboss.sourceforge.net/apps/cvs/getorf.html Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 or +1 866 888 3112 cell : +1 734 612 4615 From sariego9 at yahoo.com Sun Sep 17 14:06:58 2006 From: sariego9 at yahoo.com (Diego Martinez) Date: Sun, 17 Sep 2006 11:06:58 -0700 (PDT) Subject: [BiO BB] command-line (scriptable) ORF finders? In-Reply-To: <83722dde0609170019n17c690f4xde230b88626d76d9@mail.gmail.com> Message-ID: <20060917180658.52071.qmail@web32506.mail.mud.yahoo.com> Hello, There is also the SEALS package from Koonin's group at NCBI, we use that alot. it has a bunch of command line tools, I believe it is all in PERL, so you can gut it and reuse. http://www.ncbi.nlm.nih.gov/CBBresearch/Walker/SEALS/ if you are looking at ESTs, you may also want to look at estscan, http://www.ch.embnet.org/software/ESTScan2.html or there is a genewise like est Gene modeler tool the Wise2 package by Birney and Durbin that you may want to look at. Diego ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .=$=. .=$=. .=$=. .=$=. .=$=. .=$=. @ @ | | | @ | | | @ @ | | | @ | | | @ @ | | | @ | | | | @ @ | | | @ @ | | | @ @ | | | @ @ | | | @ @ | | | @ @ | | | | @ | | | @ @ | | | @ | | | @ @ | | | @ | | | @ @ | ~' `~$~' `~$~' `~$~' `~$~' `~$~' `~ ----- Original Message ---- From: Ann Loraine To: General Forum at Bioinformatics.Org Sent: Sunday, September 17, 2006 1:19:38 AM Subject: [BiO BB] command-line (scriptable) ORF finders? Hello all, I'm hoping someone on the list who is involved with EST or full-length cDNA sequencing projects can help me with something (well..two things): (1) I am looking for a command-line, scriptable tool that can take as input an EST, cDNA, or assembled EST contig ("unigene") sequence and return the most likely or longest open reading frame. This is for a plant EST project. It should also pay attention to codon usage rules. (2) I am also looking for a tool that can take as input a set of exon annotations (or mRNA-to-genome alignments) and return the most likely CDS start and end for the given gene structure. Tools that can jigger the alignment/exon boundaries to optimize the ORF *and* which pay attention to codon usage rules would be extra great. This is for deducing novel gene structures from cross-species mRNA-to-genome alignments. Maybe there is a gene-finder that does this? I've found a variety of web sites that claim to do this, but, as you know, Web sites don't really cut it when you are working with thousands of sequences. And also, I would like to see the code in case I run into problems. Any thoughts or suggestions (other than pointers to Web tools, please) would be greatly appreciated! Sincerely, Ann Loraine -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org _______________________________________________ General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From aloraine at gmail.com Sun Sep 17 17:39:45 2006 From: aloraine at gmail.com (Ann Loraine) Date: Sun, 17 Sep 2006 16:39:45 -0500 Subject: [BiO BB] command-line (scriptable) ORF finders? In-Reply-To: <20060917180658.52071.qmail@web32506.mail.mud.yahoo.com> References: <83722dde0609170019n17c690f4xde230b88626d76d9@mail.gmail.com> <20060917180658.52071.qmail@web32506.mail.mud.yahoo.com> Message-ID: <83722dde0609171439j1575a722m280fbcb3b4428dbd@mail.gmail.com> Thanks! -Ann On 9/17/06, Diego Martinez wrote: > Hello, > > There is also the SEALS package from Koonin's group at NCBI, > we use that alot. it has a bunch of command line tools, I believe it > is all in PERL, so you can gut it and reuse. > > http://www.ncbi.nlm.nih.gov/CBBresearch/Walker/SEALS/ > > if you are looking at ESTs, you may also want to look at estscan, > > http://www.ch.embnet.org/software/ESTScan2.html > > or there is a genewise like est Gene modeler tool the Wise2 > package by Birney and Durbin that you may want to look at. > > Diego > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > .=$=. .=$=. .=$=. .=$=. .=$=. .=$=. > @ @ | | | @ | | | @ @ | | | @ | | | @ @ | | | @ | | | > | @ @ | | | @ @ | | | @ @ | | | @ @ | | | @ @ | | | @ @ | | > | | @ | | | @ @ | | | @ | | | @ @ | | | @ | | | @ @ | > ~' `~$~' `~$~' `~$~' `~$~' `~$~' `~ > > ----- Original Message ---- > From: Ann Loraine > To: General Forum at Bioinformatics.Org > Sent: Sunday, September 17, 2006 1:19:38 AM > Subject: [BiO BB] command-line (scriptable) ORF finders? > > Hello all, > > I'm hoping someone on the list who is involved with EST or full-length > cDNA sequencing projects can help me with something (well..two > things): > > (1) I am looking for a command-line, scriptable tool that can take as > input an EST, cDNA, or assembled EST contig ("unigene") sequence and > return the most likely or longest open reading frame. This is for a > plant EST project. It should also pay attention to codon usage rules. > > (2) I am also looking for a tool that can take as input a set of exon > annotations (or mRNA-to-genome alignments) and return the most likely > CDS start and end for the given gene structure. Tools that can jigger > the alignment/exon boundaries to optimize the ORF *and* which pay > attention to codon usage rules would be extra great. This is for > deducing novel gene structures from cross-species mRNA-to-genome > alignments. Maybe there is a gene-finder that does this? > > I've found a variety of web sites that claim to do this, but, as you > know, Web sites don't really cut it when you are working with > thousands of sequences. And also, I would like to see the code in case > I run into problems. > > Any thoughts or suggestions (other than pointers to Web tools, please) > would be greatly appreciated! > > Sincerely, > > Ann Loraine > > -- > Ann Loraine > Assistant Professor > Section on Statistical Genetics > University of Alabama at Birmingham > http://www.ssg.uab.edu > http://www.transvar.org > _______________________________________________ > General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > > > _______________________________________________ > General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From stefan.rensing at biologie.uni-freiburg.de Mon Sep 18 02:44:55 2006 From: stefan.rensing at biologie.uni-freiburg.de (Stefan Rensing) Date: Mon, 18 Sep 2006 08:44:55 +0200 Subject: [BiO BB] command-line (scriptable) ORF finders? In-Reply-To: <83722dde0609170019n17c690f4xde230b88626d76d9@mail.gmail.com> References: <83722dde0609170019n17c690f4xde230b88626d76d9@mail.gmail.com> Message-ID: <450E4067.9030305@biologie.uni-freiburg.de> Hi, > (1) I am looking for a command-line, scriptable tool that can take as > input an EST, cDNA, or assembled EST contig ("unigene") sequence and > return the most likely or longest open reading frame. This is for a > plant EST project. It should also pay attention to codon usage rules. We found FrameD to be superior to ESTScan and Estwise in predicting ORFs in moss (P. patens). We are using species-specific (i)HMMs, repectively. http://bioinfo.genopole-toulouse.prd.fr/apps/FrameD/FD http://bioinfo.genopole-toulouse.prd.fr/apps/FrameD/Help/FrameDWeb_5.html > (2) I am also looking for a tool that can take as input a set of exon > annotations (or mRNA-to-genome alignments) and return the most likely > CDS start and end for the given gene structure. Tools that can jigger > the alignment/exon boundaries to optimize the ORF *and* which pay > attention to codon usage rules would be extra great. This is for > deducing novel gene structures from cross-species mRNA-to-genome > alignments. Maybe there is a gene-finder that does this? You might want to have a look at GenomeThreader, http://www.genomethreader.org/, which allows spliced alignments using non-identical mRNA/protein sequences (i.e., homologs from other species). Cheers, Stefan -- Dr. Stefan Rensing, Group Leader Computational Biology Plant Biotechnology, Faculty of Biology, University of Freiburg Schaenzlestr. 1, D-79104 Freiburg, Fon: +49 761 203-6974, Fax: -6945 http://www.plant-biotech.net/ http://www.cosmoss.org/ stefan.rensing at biologie.uni-freiburg.de "There is science, logic, reason; there is thought verified by experience. And then there is California." Edward Abbey From aloraine at gmail.com Mon Sep 18 11:05:18 2006 From: aloraine at gmail.com (Ann Loraine) Date: Mon, 18 Sep 2006 10:05:18 -0500 Subject: [BiO BB] command-line (scriptable) ORF finders? In-Reply-To: <450E4067.9030305@biologie.uni-freiburg.de> References: <83722dde0609170019n17c690f4xde230b88626d76d9@mail.gmail.com> <450E4067.9030305@biologie.uni-freiburg.de> Message-ID: <83722dde0609180805v7294cce9ifbc90f65950539c8@mail.gmail.com> Thank you very much for the pointers...it was very helpful. Sincerely, Ann On 9/18/06, Stefan Rensing wrote: > Hi, > > > (1) I am looking for a command-line, scriptable tool that can take as > > input an EST, cDNA, or assembled EST contig ("unigene") sequence and > > return the most likely or longest open reading frame. This is for a > > plant EST project. It should also pay attention to codon usage rules. > > We found FrameD to be superior to ESTScan and Estwise in predicting ORFs > in moss (P. patens). We are using species-specific (i)HMMs, repectively. > http://bioinfo.genopole-toulouse.prd.fr/apps/FrameD/FD > http://bioinfo.genopole-toulouse.prd.fr/apps/FrameD/Help/FrameDWeb_5.html > > > (2) I am also looking for a tool that can take as input a set of exon > > annotations (or mRNA-to-genome alignments) and return the most likely > > CDS start and end for the given gene structure. Tools that can jigger > > the alignment/exon boundaries to optimize the ORF *and* which pay > > attention to codon usage rules would be extra great. This is for > > deducing novel gene structures from cross-species mRNA-to-genome > > alignments. Maybe there is a gene-finder that does this? > > You might want to have a look at GenomeThreader, > http://www.genomethreader.org/, which allows spliced alignments using > non-identical mRNA/protein sequences (i.e., homologs from other species). > > Cheers, Stefan > > > -- > Dr. Stefan Rensing, Group Leader Computational Biology > Plant Biotechnology, Faculty of Biology, University of Freiburg > Schaenzlestr. 1, D-79104 Freiburg, Fon: +49 761 203-6974, Fax: -6945 > http://www.plant-biotech.net/ http://www.cosmoss.org/ > stefan.rensing at biologie.uni-freiburg.de > > "There is science, logic, reason; > there is thought verified by experience. > And then there is California." > Edward Abbey > _______________________________________________ > General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From divyaps at ncbs.res.in Mon Sep 18 12:49:48 2006 From: divyaps at ncbs.res.in (divyaps at ncbs.res.in) Date: Mon, 18 Sep 2006 22:19:48 +0530 (IST) Subject: [BiO BB] ncbi entry retrieval Message-ID: <32933.192.168.1.1.1158598188.squirrel@192.168.1.1> dear all, I was doing a psiblast search with the organism specific peptide sequence downloaded from ensembl.Now I have the blast output sequences with ensemble id. I need to retrieve the corresponding ncbi entries of these psiblast hits. Is there any way to do the same? A software, server or a perl script? A suggestion or solution will be highly appreciated. thanks in advance divya p syamala NCBS From basu at pharm.sunysb.edu Mon Sep 18 16:35:50 2006 From: basu at pharm.sunysb.edu (Siddhartha Basu) Date: Mon, 18 Sep 2006 16:35:50 -0400 Subject: [BiO BB] ncbi entry retrieval In-Reply-To: <32933.192.168.1.1.1158598188.squirrel@192.168.1.1> References: <32933.192.168.1.1.1158598188.squirrel@192.168.1.1> Message-ID: <450F0326.20403@pharm.sunysb.edu> divyaps at ncbs.res.in wrote: > dear all, > I was doing a psiblast search with the organism specific peptide > sequence downloaded from ensembl.Now I have the blast output > sequences with ensemble id. I need to retrieve the corresponding ncbi > entries of these psiblast hits. Is there any way to do the same? A > software, server or a perl script? A suggestion or solution will be highly > appreciated. > > thanks in advance > > divya p syamala > NCBS > Hi, Presuming that you are looking to convert your ensembl ids to entrez ids, biomart (http://www.ensembl.org/Multi/martview) should be a good option. In the first screen, choose your organism, in the second load your ensembl ids(in the id list limit) and in the third, select out "EntrezGene ID" in the "External References" section. Lastly, select the output format you prefer and click export. Hopefully, that will do the conversion for you. -siddhartha > > > > _______________________________________________ > General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From ethan.strauss at promega.com Tue Sep 19 12:04:43 2006 From: ethan.strauss at promega.com (Ethan Strauss) Date: Tue, 19 Sep 2006 11:04:43 -0500 Subject: [BiO BB] NCBI web service In-Reply-To: <450F0326.20403@pharm.sunysb.edu> Message-ID: Hi, I have been using the NCBI eutils web service from a C# application to automatically retrieve additional information (species, links to the Gene db etc) about BLAST hits from the gi numbers. This works when it works, but I have been having a lot of trouble with the web service. It is slow and frequently I get Web service errors (The underlying connection was closed: An unexpected error occurred on a send) which seem to related to timeout and proxies and keepalive and other stuff I don't really understand. Anyway, Is there another web service that I might use to get the same sorts of information from gi numbers or accession numbers? I need to get species and associations with the gene database. If you know how to get the NCBI service to work better for me, that would be good too. I would like to get the full description line, but could live without it. It is important for this application that I can get the info from a web service. Thanks! Ethan From ethan.strauss at promega.com Tue Sep 19 16:02:42 2006 From: ethan.strauss at promega.com (Ethan Strauss) Date: Tue, 19 Sep 2006 15:02:42 -0500 Subject: [BiO BB] NCBI web service In-Reply-To: Message-ID: Hello everyone, I have figured out part of my problem, but not how to fix it... What is happening is that I am running blast and just grabbing gi numbers from the XML blast results. When I send these gi numbers to NCBI, it sends me back all the data associated with each gi number. This data includes the sequence. It turns out that a few of the gi numbers in my test set point to complete chromosomes! I think that NCBI is returning the information completely, but that the connection can't support the many millions of characters being passed. What I would like to do is somehow query just for the size of the sequences being returned and not asked for info on sequences which are too large. I have not dug through NCBI's documentation in great depth yet, but a quick look turns up nothing. If anyone knows how to do this already, I would appreciated it. Thanks! Ethan -----Original Message----- From: bio_bulletin_board-bounces+ethan.strauss=promega.com at bioinformatics.org [mailto:bio_bulletin_board-bounces+ethan.strauss=promega.com at bioinformat ics.org] On Behalf Of Ethan Strauss Sent: Tuesday, September 19, 2006 11:05 AM To: General Forum at Bioinformatics.Org Subject: [BiO BB] NCBI web service Hi, I have been using the NCBI eutils web service from a C# application to automatically retrieve additional information (species, links to the Gene db etc) about BLAST hits from the gi numbers. This works when it works, but I have been having a lot of trouble with the web service. It is slow and frequently I get Web service errors (The underlying connection was closed: An unexpected error occurred on a send) which seem to related to timeout and proxies and keepalive and other stuff I don't really understand. Anyway, Is there another web service that I might use to get the same sorts of information from gi numbers or accession numbers? I need to get species and associations with the gene database. If you know how to get the NCBI service to work better for me, that would be good too. I would like to get the full description line, but could live without it. It is important for this application that I can get the info from a web service. Thanks! Ethan _______________________________________________ General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From mkgovindis at yahoo.com Tue Sep 19 23:21:38 2006 From: mkgovindis at yahoo.com (govind mk) Date: Tue, 19 Sep 2006 20:21:38 -0700 (PDT) Subject: [BiO BB] Yet another Web 3D alignment viewer In-Reply-To: <50647.141.20.6.60.1157463639.squirrel@webmail.charite.de> Message-ID: <20060920032138.60911.qmail@web34410.mail.mud.yahoo.com> Hi Is any one aware of any database that keeps track of the NCBI (Accession's) sequence revision history. If such a database is available , is the data available in a downloadable format or can it be accessed by a program ? I have had a look at the NCBI.The NCBI Sequence Revision History db is not available for download. Thank you Regards, Govind --------------------------------- Stay in the know. Pulse on the new Yahoo.com. Check it out. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mkgovindis at yahoo.com Tue Sep 19 23:21:55 2006 From: mkgovindis at yahoo.com (govind mk) Date: Tue, 19 Sep 2006 20:21:55 -0700 (PDT) Subject: [BiO BB] Re: NCBI Sequence revision history data In-Reply-To: <50647.141.20.6.60.1157463639.squirrel@webmail.charite.de> Message-ID: <20060920032155.89467.qmail@web34404.mail.mud.yahoo.com> Hi Is any one aware of any database that keeps track of the NCBI (Accession's) sequence revision history. If such a database is available , is the data available in a downloadable format or can it be accessed by a program ? I have had a look at the NCBI.The NCBI Sequence Revision History db is not available for download. Thank you Regards, Govind --------------------------------- Do you Yahoo!? Get on board. You're invited to try the new Yahoo! Mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From floris at crs4.it Wed Sep 20 02:14:27 2006 From: floris at crs4.it (Matteo Floris) Date: Wed, 20 Sep 2006 08:14:27 +0200 Subject: [BiO BB] Re: ncbi entry retrieval (divyaps@ncbs.res.in) Message-ID: <2AF9E22C-7C55-4E45-B280-318461DE55B6@crs4.it> Hi, you can use BioMart for that. See http://www.ensembl.org/biomart/martview you can submit a list of ensembl IDs, then export their ncbi IDs. It is very easy. Regards, Matteo Floris From basu at pharm.sunysb.edu Wed Sep 20 11:07:15 2006 From: basu at pharm.sunysb.edu (Siddhartha Basu) Date: Wed, 20 Sep 2006 11:07:15 -0400 Subject: [BiO BB] Re: NCBI Sequence revision history data In-Reply-To: <20060920032155.89467.qmail@web34404.mail.mud.yahoo.com> References: <20060920032155.89467.qmail@web34404.mail.mud.yahoo.com> Message-ID: <45115923.3030500@pharm.sunysb.edu> govind mk wrote: > Hi > > Is any one aware of any database that keeps track of the NCBI > (Accession's) sequence revision history. > > If such a database is available , is the data available in a > downloadable format or can it be accessed by a program ? Hi, If you are familiar and have bioperl installed, Bio::DB::SeqVersion is the module that can access the sequence revision history of NCBI. -siddhartha > > I have had a look at the NCBI.The NCBI Sequence Revision History db is > not available for download. > > > Thank you > > Regards, > Govind > > ------------------------------------------------------------------------ > Do you Yahoo!? > Get on board. You're invited > > to try the new Yahoo! Mail. > > > ------------------------------------------------------------------------ > > _______________________________________________ > General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board From janderson_net at yahoo.com Thu Sep 21 12:34:36 2006 From: janderson_net at yahoo.com (James Anderson) Date: Thu, 21 Sep 2006 09:34:36 -0700 (PDT) Subject: [BiO BB] question on low level processing of Liquid chromatography (LC) Message-ID: <20060921163436.25862.qmail@web31202.mail.mud.yahoo.com> Hi, I am new to LC, I have a question about low level processing of LC. I am quite familar with the low level processing of SELDI/MADLI which has the following steps: 1. Smoothing 2. Baseline removal, 3. normalization 4. Peak detection, 5. Peak alignment. Does low level processing of LC have the same steps? Especially for baseline removal (or background subtraction) and normalization. For seldi/maldi, the baseline removal is the removed the artifact caused by the energy absorbing matrix, but for LC, do I need to subtract the baseline as well? If so, what's the physical reason behind this? In addition, the normalization of seldi/maldi uses TIC. what should I do with the normalization of LC? Another question is: is every point is LC the sum of every point of Mass spec on the same retention time? Thanks, James --------------------------------- Get your email and more, right on the new Yahoo.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From hulk.norris at googlemail.com Thu Sep 21 10:57:24 2006 From: hulk.norris at googlemail.com (Hulk Norris) Date: Thu, 21 Sep 2006 15:57:24 +0100 Subject: [BiO BB] Extreme 3` EST Assembly Message-ID: <98f499160609210757v55875681n9b21a2c1eb2cfdd9@mail.gmail.com> Hi, I have started work on the clustering and assembly of 3` sequenced ESTs. Because of the nature of the sequencing process we can be certain that each EST represents the extreme 3` end of an expressed transcript. In order to allow for incorrect base calling and determine which transcripts are detected with greater frequency I wish to cluster and assemble these ESTs to form consensus sequences and generate contigs. My problem is that clustering and assembly software does not take into account the fact that all the ESTs under investigation are confirmed extreme 3` and will assimilate genuine terminal 3` sequence into upstream positions of longer transcripts in cases of alternative polyadenylation of a single gene. Does anyone have experience of similar problems or approaches? Any help or direction would be sincerely appreciated. Regards, Dr Hulk Norris Principal Bioinformatician -------------- next part -------------- An HTML attachment was scrubbed... URL: From schuerer at genomining.com Fri Sep 22 04:19:40 2006 From: schuerer at genomining.com (Katja Schuerer) Date: Fri, 22 Sep 2006 10:19:40 +0200 Subject: [BiO BB] Programming for bioinformatics Message-ID: <45139C9C.30407@genomining.com> Hi, ************************************************************************* Course in informatics for biology 2007 at Institut Pasteur http://www.pasteur.fr/formation/infobio-en.html *** ATTENTION : Registration will be closed on October 15 2006. *** ************************************************************************* In the series of courses offered at the Pasteur Institute, a course will be offered in informatics in biology. The next session will take place from January to end of April 2007. The main goal of this course is to provide researchers in biology an initial exposure to informatics. Admitance in the course is reserved for those with a degree in biology or a related discipline. With more and more bioinformatics tools available, it becomes increasingly important for researchers in biology to be able both to manage their data, implement their ideas, and judge for themselves the usefulness of new algorithms and software. This course will emphasize fundamental aspects of computer science and apply them to biological examples. Theoretical aspects (algorithm development, logic, problem modeling and design methods), and technical applications (databases and web technologies) that are relevant for biologists will be thoroughly discussed. Programming is presented through the object-oriented paradigm, using a modern high-level language, Python, provided with tools for biology and enabling both prototyping or scripting and the building of important software systems. Learning of an additional language (C) will be available for interested students. Learning during the course will be reinforced with computing exercises, and effective training will be provided by a 2 month research project. The working language of the course is French. For further information, please consult: http://www.pasteur.fr/formation/infobio-en.html *** Registration will be closed on October 15 2006. *** Sincerely, -- Catherine Letondal, Institut Pasteur & Katja Schuerer, Genomining Course informatics for biology From ykalidas at gmail.com Sat Sep 23 19:15:19 2006 From: ykalidas at gmail.com (Kalidas Yeturu) Date: Sun, 24 Sep 2006 04:45:19 +0530 Subject: [BiO BB] Seperation of Protein PDB into multiple units having binding sites Message-ID: <5632703b0609231615h4506e921qca2c107904f9904e@mail.gmail.com> Hello Everyone I am working on protein binding sites. I am not yet very much familiar with terminology - subunits,domains etc., My work requires obtaining/splitting a protein PDB into various structural units such that each has binding site. For example 1A4G neuraminidase has two structural units - chain A and chain B both having binding sites. But splitting a PDB based on chain-id alone, may not always be correct. Some manually curated database would be better. I would be grateful if anyone can cite a database where protein-PDB files corresponding to structural units having binding sites are provided for each protein. Thanking You -- Kalidas Y http://ssl.serc.iisc.ernet.in/~kalidas -------------- next part -------------- An HTML attachment was scrubbed... URL: From skhadar at gmail.com Mon Sep 25 03:44:09 2006 From: skhadar at gmail.com (Shameer Khadar) Date: Mon, 25 Sep 2006 00:44:09 -0700 Subject: [BiO BB] Seperation of Protein PDB into multiple units having binding sites In-Reply-To: <5632703b0609231615h4506e921qca2c107904f9904e@mail.gmail.com> References: <5632703b0609231615h4506e921qca2c107904f9904e@mail.gmail.com> Message-ID: To get a hold of all teminology related to subunits and subdomains Grab the book and Read it once or twice :) Bioinformatics : Genes, Proteins & Computers Eds. Christine Orengo and JM Thornton To split your proteins try with PROTEIN PEELING approach, Web Server is available here : http://www.ebgm.jussieu.fr/~gelly/ Happy splitting with your proteins, Shameer Khadar NCBS - TIFR On 9/23/06, Kalidas Yeturu wrote: > > Hello Everyone > I am working on protein binding sites. I am not yet very much familiar > with terminology - subunits,domains etc., > > My work requires obtaining/splitting a protein PDB into various > structural units such that each has binding site. > For example 1A4G neuraminidase has two structural units - chain A and > chain B both having binding sites. > But splitting a PDB based on chain-id alone, may not always be correct. > Some manually curated database would be better. > > I would be grateful if anyone can cite a database where protein-PDB files > corresponding to structural units having binding sites are provided for each > protein. > > Thanking You > > -- > Kalidas Y > http://ssl.serc.iisc.ernet.in/~kalidas > _______________________________________________ > General Forum at Bioinformatics.Org - > BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dtheobald at brandeis.edu Tue Sep 26 12:23:39 2006 From: dtheobald at brandeis.edu (Douglas L. Theobald) Date: Tue, 26 Sep 2006 12:23:39 -0400 Subject: [BiO BB] THESEUS, a program for maximum likelihood superpositioning of macromolecules Message-ID: Announcing a fundamentally new way to superimpose structures: Maximum likelihood instead of least squares. http://www.theseus3d.org/ The Program: THESEUS is a unix command line program for performing maximum likelihood (ML) superpositions and analysis of macromolecular structures. While all conventional superpositioning methods use ordinary least-squares as the optimization criterion, THESEUS uses maximum likelihood, which provides superpositions with substantially improved accuracy (see the figure at http://www.theseus3d.org/ for an example). When superpositioning macromolecules with different residue sequences, other programs and algorithms currently discard residues that are aligned with gaps. THESEUS, however, uses a novel ML algorithm that includes all of the available data. The Rationale: Over 30 years ago, Cox, Diamond, McLachlan, Kabsch, and others investigated and solved the least-squares superposition problem for macromolecular structures (Flower 1999), and the least-squares method has been used effectively ever since for comparing structures. However, least-squares is not ideal. As a fitting criterion, least-squares is based theoretically on two strong assumptions: (1) that all atoms in a structure have the same variability and (2) that all atoms are independent and uncorrelated. We know that both of these assumptions are false. Some regions of a structure are more variable than others, and atoms are connected to each other via chemical bonds. The ML method used by THESEUS properly down-weights variable structural regions and corrects for correlations among atoms. The Benefits: ML superpositioning is robust and insensitive to the specific atoms included in the analysis. In current practice, regions of structures that are considered "unsuperimposable" or divergent are subjectively excluded from the superposition. However, when doing a ML superposition, you do not need to hand prune selected variable atomic coordinates, since the variability is already accounted for in the ML method. ML superpositioning will greatly improve our ability to accurately compare biological macromolecules in many applications, including analysis of NMR families, alternate crystal structures, evolutionarily homologous molecules, molecular dynamics simulations, and de novo structure predictions. Output from THESEUS includes both likelihood-based and frequentist statistics for evaluation of the adequacy of a superposition and for reliable analysis of structural similarities and differences. Residue ranges for excluding/including in the superposition can be specified on the command line. For ease of comparison, THESEUS will also calculates least-squares superpositions. Additionally, THESEUS performs principal components analysis (PCA) for analyzing the complex correlations found among the atoms and residues within a structural ensemble. Source code and binaries for several platforms are available from: http://www.theseus3d.org/ Refs: Theobald, D.L. and Wuttke, D.S. (2006) "THESEUS: Maximum likelihood superpositioning and analysis of macromolecular structures." Bioinformatics 22(17):2171 http://bioinformatics.oxfordjournals.org/cgi/content/abstract/22/17/2171 Overview of mathematical results and algorithm (supplementary materials from Theobald & Wuttke 2006): http://www.theseus3d.org/pdfs/ Theobald_Wuttke_2006_Bioinformatics_THESEUS_SuppMat.pdf Theobald, D. L. and Wuttke, D. S. (2006) "Empirical Bayes hierarchical models for regularizing maximum likelihood estimation in the matrix Gaussian Procrustes problem." PNAS, in press Cox, J. M. (1967) "Mathematical methods used in the comparison of the quaternary structures." J Mol Biol, 28, 151?156. Diamond, R. (1966) "A mathematical model-building procedure for proteins." Acta Crystallogr, 21, 253?266. Diamond, R. (1976) "On the comparison of conformations using linear and quadratic transformations." Acta Crystallogr A, 32, 1?10. Flower, D. R. (1999) "Rotational superposition: A review of methods." J Mol Graph Model, 17, 238?244. Kabsch, W. (1978) "A discussion of the solution for the best rotation to relate two sets of vectors." Acta Crystallogr A, 34, 827?828. McLachlan, A. (1972) "A mathematical procedure for superimposing atomic coordinates of proteins." Acta Crystallogr A, 28, 656?657. From john_abraham_bio at yahoo.com Thu Sep 28 02:58:00 2006 From: john_abraham_bio at yahoo.com (John Abraham) Date: Wed, 27 Sep 2006 23:58:00 -0700 (PDT) Subject: [BiO BB] Re: NCBI Sequence revision history data In-Reply-To: <20060920032155.89467.qmail@web34404.mail.mud.yahoo.com> Message-ID: <20060928065800.15987.qmail@web57007.mail.re3.yahoo.com> The readme file keep tracks such a changes govind mk wrote: Hi Is any one aware of any database that keeps track of the NCBI (Accession's) sequence revision history. If such a database is available , is the data available in a downloadable format or can it be accessed by a program ? I have had a look at the NCBI.The NCBI Sequence Revision History db is not available for download. Thank you Regards, Govind --------------------------------- Do you Yahoo!? Get on board. You're invited to try the new Yahoo! Mail._______________________________________________ General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bio_bulletin_board --------------------------------- Want to be your own boss? Learn how on Yahoo! Small Business. -------------- next part -------------- An HTML attachment was scrubbed... URL: From invitation at iaria.org Thu Sep 28 03:51:11 2006 From: invitation at iaria.org (invitation at iaria.org) Date: Thu, 28 Sep 2006 00:51:11 -0700 (PDT) Subject: [BiO BB] Second Call for Submissions || ICCGI 2007 || ICWMC 2007, Guadeloupe, March 2007 Message-ID: <7089827.1159429871479.JavaMail.Onitza@Oana2> CALL FOR PAPERS The Third International Conference on Wireless and Mobile Communications ICWMC 2007 Date: March 4-9, 2007 Place: Guadeloupe, French Caribbean Site: http://www.iaria.org/conferences2007/ICWMC07.html Submit: http://www.iaria.org/conferences2007/SubmitICWMC07.html Important deadlines: Full paper submission, October 15, 2006 Author notification, November 15, 2006 Registration/camera ready, December 1, 2006 CALL FOR PAPERS The Second International Multi-Conference on Computing in the Global Information Technology ICCGI 2007 Date: March 4-9, 2007 Place: Guadeloupe, French Caribbean Site: http://www.iaria.org/conferences2007/ICCGI07.html Submit: http://www.iaria.org/conferences2007/SubmitICCGI07.html includes: - IPv6TD 2007 : The Second International Workshop on IPv6 Today - Technology and Deployment http://www.iaria.org/conferences2007/IPV6TD.html - MOC 2007 : The Second International Workshop on Modeling, Optimization, and Complexity http://www.iaria.org/conferences2007/MOC.html Important deadlines: Full paper submission, October 15, 2006 Author notification, November 15, 2006 Registration/camera ready, December 1, 2006 Note: For ICWMC 2006 and ICCGI 2006 programs, awards, photos, see: http://www.iaria.org/conferences/ICW06.html http://www.iaria.org/conferences/ICCGI06.html Publicity Board ======================================================================= To be removed from this announcement list, please reply to this email with UNSUBSCRIBE in the subject line. From phoebe at deakin.edu.au Fri Sep 29 00:32:24 2006 From: phoebe at deakin.edu.au (Phoebe Chen) Date: Fri, 29 Sep 2006 14:32:24 +1000 Subject: [BiO BB] APBC2007 Call for Posters/Tutorials by Tomorrow Message-ID: <5.1.1.5.2.20060929143141.0387e470@mail.deakin.edu.au> Dear Colleagues, We apologize if you receive multiple copies of this call for posters and tutorials. Regards, Organizing Committee of APBC 2007 ------------ CALL FOR POSTERS/TUTORIALS (APBC 2007) Asia-Pacific Bioinformatics Conference, APBC 2007 will be held in Hong Kong during 15-17 January 2007. http://www.cs.hku.hk/apbc2007 Please consider to submit a poster or hold a tutorial in the conference. The deadline for submission is 30 Sept, 2006. The details of the call for posters and call for tutorials can be found here: http://www.cs.hku.hk/apbc2007/callforposters.htm http://www.cs.hku.hk/apbc2007/callfortutorial.htm We look forward to seeing you in Hong Kong, an exciting place to explore. Regards, Organizing Committee of APBC 2007 -------------- next part -------------- An HTML attachment was scrubbed... URL: From asidhu at biomap.org Sat Sep 30 13:32:04 2006 From: asidhu at biomap.org (Amandeep S. Sidhu) Date: Sun, 1 Oct 2006 03:32:04 +1000 (EST) Subject: [BiO BB] (no subject) Message-ID: <49639.144.136.102.223.1159637524.squirrel@biomap.org> 2007 IEEE Workshop on Biomedical Applications for Digital Ecosystems (BADE 2007) with Inaugural IEEE International Digital Ecosystems and Technology Conference 2007 20 February 2007, Cairns, Australia http://bade07.biomap.org/ Scope of the workshop: The primary focus of BADE 2007 workshop is to share research applications using biomedical data and to identify new issues and directions for future research in biomedical applications. Authors are invited to submit original papers to the workshop exploring theories, techniques, and applications for biomedicine. Papers are invited, but not limited to the following themes: * Bioinformatics and Computational Biology * Data Representation and Visualization * Biological Databases & Data Integration * Microarray analysis * Protein and RNA structure prediction * Feature selection and pattern discovery in biological data * System Biology and Pathways * Biomedical Ontologies and taxonomies * Text Mining * Health Care Information Systems * Electronic Health Records * Clinical Assessment and Patient Diagnosis * Disease Control and Prevention * Privacy and Security in Healthcare Important Dates: * Submission of Full Papers: November 10, 2006 * Noification of Acceptance: December 10, 2006 * Camera-ready Copies of papers: December 31, 2006 Paper Submission Procedures: All paper submissions will be handled electronically at: http://bradleyuniversityvolunteer.ieee-ies.org/submit/dest07/ * Authors are Invited to submit electronically, a full paper (6 pages, about 4500 words, PDF file) of their original work. * Select "W01: Biomedical Applications" in Technical Track drop down menu on Author Page. High quality papers in biomedical applications are solicited. Original papers exploring new directions will receive especially careful and supportive reviews. Papers that have already been accepted or are currently under review for other conferences or journals will not be considered for publication at IEEE DEST 2007. Paper submissions should be in the IEEE 2-column format, and will be reviewed by the Program Committee on the basis of technical quality, relevance, originality, significance, and clarity. Accepted IEEE BADE 2007 will be published in the conference proceedings by IEEE Industrial Electronics Society and will be included in EI index and IEEE Xplore. General Chairs: Tharam S. Dillon (University of Technology Sydney, Australia) Elizabeth Chang (Curtin University of Technology, Australia) Program Chairs: Amandeep S. Sidhu (University of Technology Sydney, Australia) Xiaohua Hu (Drexel University, USA) Farookh Hussain (Curtin University of Technology, Australia) Maja Hadzic (Curtin University of Technology, Australia) For further inquiries, please contact bade07 at biomap.org From asidhu at biomap.org Sat Sep 30 13:37:44 2006 From: asidhu at biomap.org (Amandeep S. Sidhu) Date: Sun, 1 Oct 2006 03:37:44 +1000 (EST) Subject: [BiO BB] 1st CFP: IEEE Workshop on Biomedical Applications for Digital Ecosystems (BADE 2007) Message-ID: <49694.144.136.102.223.1159637864.squirrel@biomap.org> 2007 IEEE Workshop on Biomedical Applications for Digital Ecosystems (BADE 2007) with Inaugural IEEE International Digital Ecosystems and Technology Conference 2007 20 February 2007, Cairns, Australia http://bade07.biomap.org/ Scope of the workshop: The primary focus of BADE 2007 workshop is to share research applications using biomedical data and to identify new issues and directions for future research in biomedical applications. Authors are invited to submit original papers to the workshop exploring theories, techniques, and applications for biomedicine. Papers are invited, but not limited to the following themes: * Bioinformatics and Computational Biology * Data Representation and Visualization * Biological Databases & Data Integration * Microarray analysis * Protein and RNA structure prediction * Feature selection and pattern discovery in biological data * System Biology and Pathways * Biomedical Ontologies and taxonomies * Text Mining * Health Care Information Systems * Electronic Health Records * Clinical Assessment and Patient Diagnosis * Disease Control and Prevention * Privacy and Security in Healthcare Important Dates: * Submission of Full Papers: November 10, 2006 * Noification of Acceptance: December 10, 2006 * Camera-ready Copies of papers: December 31, 2006 Paper Submission Procedures: All paper submissions will be handled electronically at: http://bradleyuniversityvolunteer.ieee-ies.org/submit/dest07/ * Authors are Invited to submit electronically, a full paper (6 pages, about 4500 words, PDF file) of their original work. * Select "W01: Biomedical Applications" in Technical Track drop down menu on Author Page. High quality papers in biomedical applications are solicited. Original papers exploring new directions will receive especially careful and supportive reviews. Papers that have already been accepted or are currently under review for other conferences or journals will not be considered for publication at IEEE DEST 2007. Paper submissions should be in the IEEE 2-column format, and will be reviewed by the Program Committee on the basis of technical quality, relevance, originality, significance, and clarity. Accepted IEEE BADE 2007 will be published in the conference proceedings by IEEE Industrial Electronics Society and will be included in EI index and IEEE Xplore. General Chairs: Tharam S. Dillon (University of Technology Sydney, Australia) Elizabeth Chang (Curtin University of Technology, Australia) Program Chairs: Amandeep S. Sidhu (University of Technology Sydney, Australia) Xiaohua Hu (Drexel University, USA) Farookh Hussain (Curtin University of Technology, Australia) Maja Hadzic (Curtin University of Technology, Australia) For further inquiries, please contact bade07 at biomap.org