From p.molenaar at amc.uva.nl  Fri Oct  5 05:37:54 2007
From: p.molenaar at amc.uva.nl (piet molenaar)
Date: Fri, 5 Oct 2007 11:37:54 +0200
Subject: [BiO BB] Symposium: Integrative Bioinformatics;
	schedule and titles of talks definitive
Message-ID: <554293e20710050237n316bc1aey65421de2c742834b@mail.gmail.com>

One day symposium
8 November 2007 - Academic Medical Center - Amsterdam- Netherlands

Integrative Bioinformatics:
At the cutting edge of network analysis and biological data integration

Each molecular biologist working with high throughput data is confronted
with the need to reconstruct and validate the underlying complex regulatory
networks. At this one-day symposium, some of the most prominent leaders in
the field will present their approaches and views. This symposium offers a
platform for a discussion of state of the art biological network analyses.

Confirmed speakers and titles:

   - Leroy Hood: Biological networks and disease
   - Ruedi Aebersold: Protein-centered networks in systems biology:
   Analysis and visualization
   - Trey Ideker: Gaining power in gene association studies with
   Cytoscape
   - Andrew Hopkins: Network Pharmacology: chemical opportunities for
   systems biology
   - Peter Sorger: Modeling Mammalian Death and Survival Pathways
   - Ewan Birney: Reactome, Networks and Genomes
   - Rogier Versteeg: Oncogenic networks of cancer pathways in childhood
   cancer
   - Benno Schwikowski: Computational tools increasing sensitivity and
   reliability of mass spec-based proteomics
   - Chris Sander: Systems Biology of Cancer Pathways: from Molecular
   Perturbations to Cellular Phenotypes
   - Jean-Daniel Fekete: Visualizing Dense Networks with Enhanced and
   Hybrid Matrices


Please distribute the attached poster (pdf) in your institute.

The symposium is part of the 5th annual Cytoscape Public Symposium and
Developers Retreat.

For registration, Symposium programme and directions go to
www.cytoscape.org/retreat2007

We look forward to welcoming you to Amsterdam!

The Organizing Committee, 5th Cytoscape Retreat 2007
w: www.cytoscape.org/retreat2007
e: cytoretreat at cytoscape.org

Department of Human Genetics - M1-131
Academic Medical Center - University of Amsterdam
Meibergdreef 9 - 1100 DD - Amsterdam - the Netherlands

From jeedward at gmail.com  Fri Oct  5 20:24:33 2007
From: jeedward at gmail.com (John Edward)
Date: Fri, 5 Oct 2007 20:24:33 -0400
Subject: [BiO BB] BCBGC-2008 Call for papers
Message-ID: <000301c807af$475d0990$6401a8c0@cisnotebookbp>

Call for papers

 
The 2008 International Conference on Bioinformatics, Computational Biology,
Genomics and Chemoinformatics (BCBGC-08) (website: www.PromoteResearch.org
<http://www.promoteresearch.org/>  ) will be held during July 7-10 2008 in
Orlando, FL, USA. We invite draft paper submissions and session proposals.
The conference will be held at the same time and place where several other
major events are taking place. The website contains more details.

 
Sincerely

John Edward

 
From christoph.gille at charite.de  Tue Oct  9 12:09:19 2007
From: christoph.gille at charite.de (Dr. Christoph Gille)
Date: Tue, 9 Oct 2007 18:09:19 +0200 (CEST)
Subject: [BiO BB] genbank2swissprot ?
Message-ID: <38572.141.42.56.114.1191946159.squirrel@webmail.charite.de>

Is there a mapping of identifiers from genbank nt sequences
to identifiers of swissprot (protein) ?
Using some tables in
ftp://ftp.ncbi.nih.gov/refseq/
ftp://ftp.ncbi.nih.gov/gene/DATA
there seems to be a way indirectly over the proteinkb.
But perhaps there is a more direct way?
Many thanks

Christoph


From boris.steipe at utoronto.ca  Tue Oct  9 14:17:49 2007
From: boris.steipe at utoronto.ca (Boris Steipe)
Date: Tue, 9 Oct 2007 14:17:49 -0400
Subject: [BiO BB] genbank2swissprot ?
In-Reply-To: <38572.141.42.56.114.1191946159.squirrel@webmail.charite.de>
References: <38572.141.42.56.114.1191946159.squirrel@webmail.charite.de>
Message-ID: <3FDA43CE-8638-49B8-8C6A-652A0F02035C@utoronto.ca>

Does the UniProt ID mapping service fit your requirements?
   http://www.pir.uniprot.org/search/idmapping.shtml


Boris


On 9-Oct-07, at 12:09 PM, Dr. Christoph Gille wrote:

> Is there a mapping of identifiers from genbank nt sequences
> to identifiers of swissprot (protein) ?
> Using some tables in
> ftp://ftp.ncbi.nih.gov/refseq/
> ftp://ftp.ncbi.nih.gov/gene/DATA
> there seems to be a way indirectly over the proteinkb.
> But perhaps there is a more direct way?
> Many thanks
>
> Christoph
>
> _______________________________________________
> General Forum at Bioinformatics.Org -  
> BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board


From christoph.gille at charite.de  Tue Oct  9 16:56:53 2007
From: christoph.gille at charite.de (Dr. Christoph Gille)
Date: Tue, 9 Oct 2007 22:56:53 +0200 (CEST)
Subject: [BiO BB] genbank2swissprot ?
In-Reply-To: <3FDA43CE-8638-49B8-8C6A-652A0F02035C@utoronto.ca>
References: <38572.141.42.56.114.1191946159.squirrel@webmail.charite.de>
	<3FDA43CE-8638-49B8-8C6A-652A0F02035C@utoronto.ca>
Message-ID: <61125.84.190.65.179.1191963413.squirrel@webmail.charite.de>

Many thanks Boris,

this is quite good, though the table is incomplete.
I think  I somewhere saw
an assignment of mRNA to each  swissprot entry.
Unfortunately I do not find it any more.

Christoph


From Stan.Gaj at BIGCAT.unimaas.nl  Wed Oct 10 07:58:55 2007
From: Stan.Gaj at BIGCAT.unimaas.nl (Gaj Stan (BIGCAT))
Date: Wed, 10 Oct 2007 13:58:55 +0200
Subject: [BiO BB] genbank2swissprot ?
In-Reply-To: <3FDA43CE-8638-49B8-8C6A-652A0F02035C@utoronto.ca>
References: <38572.141.42.56.114.1191946159.squirrel@webmail.charite.de>
	<3FDA43CE-8638-49B8-8C6A-652A0F02035C@utoronto.ca>
Message-ID: <1C6586068DF1054DA3899159C38B32C90B04ABD1@um-mail0136.unimaas.nl>

Hi Christoph,

There are two other possibilities:

a) Use BioMART at www.ensembl.org to retrieve EnsEMBL gene IDs using your list of RefSeq ID (You did mention you used NM_-ID's, so I assume you mean RefSeq IDs) and export this list with their UniProt crosslinking as well.  A problem you'll surely encounter using this approach is that there are situations where more than one UniProt ID has been associated with an EnsEMBL gene. The generated list contains this information, but on seperate lines. You'll need to filter the list for this. 

b) The RefSeq group has recently announced that they updated their databases with information towards UniProt (since they collaborated closely on this one). I can't find the archive of their Gene-Announce-list, but here is the announcement:

=======
Announcing the availability of RefSeq-UniProtKB cross-link data 
        In collaboration with UniProtKB  (http://www.pir.uniprot.org/) ,  the RefSeq group is now  reporting explicit cross-references to Swiss-Prot and  TrEMBL proteins  that correspond to a RefSeq protein. These correspondences are being calculated by the UniProtKB group, and will be updated every three weeks to correspond to UniProt's release cycle. The data are being made available  from several sites within NCBI:
         
        1.   The  full report from Entrez Gene, in the Reference Sequences section. 
           For an example, go to the Full Report page for the sevenless gene of  Drosophila melanogaster  (http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gene&Cmd=DetailsSearch&Term=32039%5Buid%5D) and click on the Reference Sequences section in the table of contents on the right.  You will see

              mRNA and Protein(s) 
              NM_078559.2?NP_511114.2 sevenless CG18085-PA [Drosophila melanogaster] 
             UniProtKB/Swiss-Prot  P13368    <--- new data 
         2. Links in NCBI's  Protein database 
            Explicit links between corresponding RefSeq and Swiss-Prot proteins are now provided within  the NCBI Protein database.  These links are available in the ?Links? menu located at the upper right of the protein display page.  The link names are:

              Protein (RefSeq):          provides a link from a Swiss-Prot record the corresponding RefSeq record 
              Protein (UniProtKB):       provides a link to the equivalent Swiss-Prot record 
         3. Filter choices in NCBI's  Protein database 
                protein protein refseq2uniprot    find RefSeq protein records with a link to a UniProtKB protein in NCBI's protein database
                protein protein uniprot2refseq    find UniProtKB protein records with a link to a RefSeq protein in NCBI's protein database
        4. ftp sites 
                A new file was added to the gene and refseq ftp sites to report the relationship between NCBI Reference Sequence protein accessions and UniProtKB protein accessions.  The new gene_refseq_uniprotkb_collab.gz file specifies the corresponding pairs of NCBI and UniProtKB protein accessions.
                        ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_refseq_uniprotkb_collab.gz 
         or 
                        ftp://ftp.ncbi.nlm.nih.gov/refseq/uniprotkb/gene_refseq_uniprotkb_collab.gz 
The README file on the gene and refseq ftp sites has been updated to document this addition. See: 
                        ftp://ftp.ncbi.nlm.nih.gov/gene/README 
                        ftp://ftp.ncbi.nlm.nih.gov/refseq/README 
        5. the ASN.1 in Entrez Gene 
         
          New implementation of a gene-commentary: 
     Each cross-reference will be reported in a gene-commentary of type other. Note: more than one cross-reference per RefSeq protein record is possible.
                          type other, 
                          source { 
                             { 
                              src { 
                                  db "UniProtKB/Swiss-Prot", 
                                  tag str "P23760" 
                             }, 
                              anchor "P23760" 
                             } 
                             { 
                              src { 
                                  db "UniProtKB/TrEMBL", 
                                  tag str "O23760" 
                             }, 
                             anchor "O23760" 
                            } 
                  

====
I haven't tested this one out myself, but I think it might do the trick for you (:

Best wishes,

  -- Stan


-----Original Message-----
From: bio_bulletin_board-bounces+stan.gaj=bigcat.unimaas.nl at bioinformatics.org [mailto:bio_bulletin_board-bounces+stan.gaj=bigcat.unimaas.nl at bioinformatics.org] On Behalf Of Boris Steipe
Sent: 09 October 2007 20:18
To: General Forum at Bioinformatics.Org
Subject: Re: [BiO BB] genbank2swissprot ?

Does the UniProt ID mapping service fit your requirements?
   http://www.pir.uniprot.org/search/idmapping.shtml


Boris


On 9-Oct-07, at 12:09 PM, Dr. Christoph Gille wrote:

> Is there a mapping of identifiers from genbank nt sequences
> to identifiers of swissprot (protein) ?
> Using some tables in
> ftp://ftp.ncbi.nih.gov/refseq/
> ftp://ftp.ncbi.nih.gov/gene/DATA
> there seems to be a way indirectly over the proteinkb.
> But perhaps there is a more direct way?
> Many thanks
>
> Christoph
>
> _______________________________________________
> General Forum at Bioinformatics.Org -  
> BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board

_______________________________________________
General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board


From dan.bolser at gmail.com  Wed Oct 10 05:26:07 2007
From: dan.bolser at gmail.com (Dan Bolser)
Date: Wed, 10 Oct 2007 11:26:07 +0200
Subject: [BiO BB] genbank2swissprot ?
In-Reply-To: <61125.84.190.65.179.1191963413.squirrel@webmail.charite.de>
References: <38572.141.42.56.114.1191946159.squirrel@webmail.charite.de>
	<3FDA43CE-8638-49B8-8C6A-652A0F02035C@utoronto.ca>
	<61125.84.190.65.179.1191963413.squirrel@webmail.charite.de>
Message-ID: <2c8757af0710100226r68e05c9elc1f03f636415036a@mail.gmail.com>

>From a recent 'refseq-announce' email;

*Announcing the availability of RefSeq-UniProtKB cross-link data*

In collaboration with UniProtKB  (http://www.pir.uniprot.org/) ,  the
RefSeqgroup is now  reporting explicit cross-references to Swiss-Prot
and  TrEMBL
proteins  that correspond to a RefSeq protein. These correspondences are
being calculated by the UniProtKB group, and will be updated every three
weeks to correspond to UniProt's release cycle. The data are being made
available from several sites within NCBI:

2.* **Links in NCBI's Protein database*

 Explicit links between corresponding RefSeq and Swiss-Prot proteins are now
provided within  the NCBI Protein database.  These links are available in
the 'Links' menu located at the upper right of the protein display page.
The link names are:

             * **Protein (RefSeq)*:          provides a link from a
Swiss-Prot record the corresponding RefSeq record
             * **Protein (UniProtKB*):       provides a link to the
equivalent Swiss-Prot record

4*. ftp sites*

A new file was added to the gene and refseq ftp sites to report the
relationship between NCBI Reference Sequence protein accessions and
UniProtKB protein accessions.  The new gene_refseq_uniprotkb_collab.gz file
specifies the corresponding pairs of NCBI and UniProtKB protein accessions.


ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_refseq_uniprotkb_collab.gz
         or
                        ftp://ftp.ncbi.nlm.nih.gov/refseq
/uniprotkb/gene_refseq_uniprotkb_collab.gz

The README file on the gene and refseq ftp sites has been updated to
document this addition. See:

                        ftp://ftp.ncbi.nlm.nih.gov/gene/README
                        ftp://ftp.ncbi.nlm.nih.gov/refseq/README

HTH,

Dan.


On 09/10/2007, Dr. Christoph Gille <christoph.gille at charite.de> wrote:
>
> Many thanks Boris,
>
> this is quite good, though the table is incomplete.
> I think  I somewhere saw
> an assignment of mRNA to each  swissprot entry.
> Unfortunately I do not find it any more.
>
> Christoph
>
> _______________________________________________
> General Forum at Bioinformatics.Org -
> BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>


-- 
hello


From yanz at email.unc.edu  Mon Oct 22 18:38:49 2007
From: yanz at email.unc.edu (Yan Zhang)
Date: Mon, 22 Oct 2007 18:38:49 -0400
Subject: [BiO BB] Call for Participation: NSF Biomedical Informatics
	Workshop, Dec.
Message-ID: <20071022183849.mxbzw6t7kk4ksow8@webmail3.isis.unc.edu>

Call for Participation: NSF Biomedical Informatics Workshop, Dec. 4-5, 
2007, Portland, OR

We are seeking US-based participants for the upcoming NSF sponsored 
workshop on Biomedical Informatics. The core program of the workshop 
will feature speakers and panelists who have been invited to join the 
workshop. However, we reserved a few speaker and panel slots for 
accommodating participants who have not been invited but may qualify 
based on current research and interests.

All costs associated with the workshop event will be covered for most 
participants selected based on this CFP. If you currently have funding 
from a US Government Agency to conduct research on an area related to 
the main themes of the workshop, it is highly likely you will be 
selected to participate and you will be reimbursed for your cost from 
_our_ NSF workshop grant. You will be expected to conduct a panel 
presentation or a poster presentation at the workshop. Please contact 
Javed Mostafa (jm at unc.edu) if you are interested in participating. In 
your email to Javed, do include a paragraph describing your current 
research and / or point to current research materials, and describe any 
funded projects you are currently engaged in or recently completed. We 
will be able to respond to you with an answer fairly soon after we 
receive your interest statement.

Below we are providing a brief overview of the workshop. More 
information on the workshop can be found at the workshop website: 
http://biomedweb.info.

We are planning to explore research challenges and emerging solutions 
for handling health data from the point of origin (i.e., data
collection) to the presentation and manipulation stages. We are also 
interested in exploring a wide spectrum of health data ? starting at 
the genomics and cellular level, to the patients? level, and ultimately 
to the population level. Some specific topics that we will cover 
include:

* Data Acquisition: Capturing data, conversion to appropriate 
structures/formats, and natural language processing
* Data Standards: Limitations of current standards such as the HL7V3 
standard but also potential utility of emerging standards
* Semantic Interoperability: Vocabularies, ontologies, and techniques 
for semantic level sharing of data
* Data Management: Challenges associated with scale, heterogeneity, 
distributed, and fragmentary nature of data
* Data Presentation: Visual, adaptive, and optimal presentation of data 
for enhancing use and understanding
* Data Services: Emerging applications for supporting research, quality 
and safety management, public health studies, etc.

Beyond the areas above, we are also interested in the the following
topics: Cyberinfrastructures for health information delivery, 
social/economical/political barriers in expanding access to health 
information, and privacy/security issues in health information 
delivery. We are planning to have panels focusing on these topics at 
the workshop.

To encourage further research on improving health care information 
systems, NSF has provided funds to Indiana University and University of 
North Carolina at Chapel Hill to conduct this workshop. Our aim is to 
bring together experts from research, professional, and government 
sectors. The workshop will dedicate two days to individual and panel 
presentations and birds-of- feather events to identify and prioritize 
key challenges. Current NSF and NIH grantees, academics with 
substantial track records in the area, industry experts, and program 
officers in key US agencies have been invited to participate. The 
workshop will be held on December 4th and 5th, 2007, in Portland, OR.

Yan Zhang [Submitted on behalf of Dr. Javed Mostafa: jm at unc.edu]


From ngadewal at yahoo.com  Tue Oct 23 06:24:49 2007
From: ngadewal at yahoo.com (nikhil gadewal)
Date: Tue, 23 Oct 2007 03:24:49 -0700 (PDT)
Subject: [BiO BB] sequence analysis
Message-ID: <838860.9612.qm@web51504.mail.re2.yahoo.com>

Hello all
   
  I am interested to find tetramer peptide matching 100% to the carboxyl terminal of the protein sequences from human.
  Is there any specific tool available to do so.
  Can BLAST or FASTA handle it by changing the parameters.
   
  Thankyou in advance.
   
  Nikhil


NIKHIL S. GADEWAL ACTREC, Tata Memorial Centre, Kharghar, Navi Mumbai, India  Great minds discuss ideas; Average minds discuss events; Small minds discuss people.
 __________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From marty.gollery at gmail.com  Tue Oct 23 19:27:26 2007
From: marty.gollery at gmail.com (Martin Gollery)
Date: Tue, 23 Oct 2007 16:27:26 -0700
Subject: [BiO BB] sequence analysis
In-Reply-To: <838860.9612.qm@web51504.mail.re2.yahoo.com>
References: <838860.9612.qm@web51504.mail.re2.yahoo.com>
Message-ID: <bdd10c2a0710231627m2ff0dfbcnb4612cf6b7cfc5cf@mail.gmail.com>

For an exact match you can simply use grep, or open the target in an
editor such as textpad and use the search function.

Marty

On 10/23/07, nikhil gadewal <ngadewal at yahoo.com> wrote:
> Hello all
>
>   I am interested to find tetramer peptide matching 100% to the carboxyl terminal of the protein sequences from human.
>   Is there any specific tool available to do so.
>   Can BLAST or FASTA handle it by changing the parameters.
>
>   Thankyou in advance.
>
>   Nikhil
>
>
> NIKHIL S. GADEWAL ACTREC, Tata Memorial Centre, Kharghar, Navi Mumbai, India  Great minds discuss ideas; Average minds discuss events; Small minds discuss people.
>  __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>


-- 
-- 
Martin Gollery
Senior Bioinformatics Scientist
TimeLogic- a Division of Active Motif
775-833-9113
880 Northwood Blvd. Suite 7
Incline Village, NV 89451


From marty.gollery at gmail.com  Tue Oct 23 19:54:02 2007
From: marty.gollery at gmail.com (Martin Gollery)
Date: Tue, 23 Oct 2007 16:54:02 -0700
Subject: [BiO BB] sequence analysis
In-Reply-To: <200710231843.55373.kanzure@gmail.com>
References: <838860.9612.qm@web51504.mail.re2.yahoo.com>
	<bdd10c2a0710231627m2ff0dfbcnb4612cf6b7cfc5cf@mail.gmail.com>
	<200710231843.55373.kanzure@gmail.com>
Message-ID: <bdd10c2a0710231654u367c0b1ct866515eb081a4589@mail.gmail.com>

No, the proteins are not too big, only about 20 MB.

Marty

On 10/23/07, Bryan Bishop <kanzure at gmail.com> wrote:
>
> "Of the protein sequences from human." Martin, it sounds like what
> Nikhil wants to do would require much more memory than textpad would be
> allowed to allocate. Right?
>
> - Bryan
>
> On Tuesday 23 October 2007 18:27, Martin Gollery wrote:
> > For an exact match you can simply use grep, or open the target in an
> > editor such as textpad and use the search function.
> >
> > Marty
> >
> > On 10/23/07, nikhil gadewal <ngadewal at yahoo.com> wrote:
> > > Hello all
> > >
> > >   I am interested to find tetramer peptide matching 100% to the
> > > carboxyl terminal of the protein sequences from human. Is there any
> > > specific tool available to do so.
> > >   Can BLAST or FASTA handle it by changing the parameters.
> > >
> > >   Thankyou in advance.
> > >
> > >   Nikhil
>


-- 
-- 
Martin Gollery
Senior Bioinformatics Scientist
TimeLogic- a Division of Active Motif
775-833-9113
880 Northwood Blvd. Suite 7
Incline Village, NV 89451


From marchywka at hotmail.com  Tue Oct 23 21:13:28 2007
From: marchywka at hotmail.com (Mike Marchywka)
Date: Tue, 23 Oct 2007 21:13:28 -0400
Subject: [BiO BB] sequence analysis
In-Reply-To: <bdd10c2a0710231654u367c0b1ct866515eb081a4589@mail.gmail.com>
Message-ID: <BAY108-F22F12931B44625B5F32F29BE940@phx.gbl>

> > > >   I am interested to find tetramer peptide matching 100% to the
> > > > carboxyl terminal of the protein sequences from human. Is there any
> > > > specific tool available to do so.
> > > >   Can BLAST or FASTA handle it by changing the parameters.

I took a quick look at BLAST documentation and didn't find anything.
I also tried standard PERL regex anchors $ and ^ but didn't seem to work.

What is your objective? I've had a heck of a time with downloaded prosite 
rules that have
anchors as I have to turn them off for translation products from non-edited
transcripts. The reason I mention this is that you may in fact not want to 
limit
yourself until you see what the other hits look like. With only a few acids
you may have to sort through a lot of hits but also see if the conserved 
domain
tools offer any help as apparently some of these can be position sensitive.

I've got my own code that does the opposite of what you want ( :) ).
That is, given a few hundred sequences and some patterns in the form of
PERL regex, I can find each one in each sequence and use this information 
for
alignment and clustering ( at least that is the hope, and initial results 
don't look
foolish).

If there is a real need , I guess I could modify this to do the opposite.
That is, if you get a few 1000 proteins that contain your query anywhere,
and  you can't separate with existing
tools but you can phrase your query as a PERL regex, I may have something 
that helps.


But, sure you can just download all the blast hits that have your peptide 
anywhere
and grep for all the occurences ( this can be a hassle as some straddle line 
breaks
etc) anchored to the end.


Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com
Note: Hotmail is blocking my mom's entire
ISP claiming it is to reduce spam but probably
to force users to use hotmail. Please DON'T
assume I am ignoring you and try
me on marchywka at yahoo.com if no reply
here. Thanks.

_________________________________________________________________
Make every IM count. Download Messenger and join the i?m Initiative now. 
It?s free. http://im.live.com/messenger/im/home/?source=TAGHM


From christoph.gille at charite.de  Wed Oct 24 03:17:08 2007
From: christoph.gille at charite.de (Dr. Christoph Gille)
Date: Wed, 24 Oct 2007 09:17:08 +0200 (CEST)
Subject: [BiO BB] sequence analysis
In-Reply-To: <838860.9612.qm@web51504.mail.re2.yahoo.com>
References: <838860.9612.qm@web51504.mail.re2.yahoo.com>
Message-ID: <37573.141.42.56.114.1193210228.squirrel@webmail.charite.de>

You could use fgrep. Fgrep is faster than grep.

Before, you should transform the database file such that
each sequence takes one line without blank and without
line breaks (using tr and sed)

Database files are optimized for hole cards for
historical reasons. Lines are  wrapped after at least after 72
characters, preventing the use of fgrep.


From Sterten at aol.com  Wed Oct 24 03:26:09 2007
From: Sterten at aol.com (Sterten at aol.com)
Date: Wed, 24 Oct 2007 03:26:09 EDT
Subject: [BiO BB] genbank orthography
Message-ID: <d43.156b8eeb.34504d91@aol.com>


names are not spelled uniformly, e.g. Viet Nam and Vietnam,
also many typos, this makes it very difficult to sort and analyse the  entries
by computer.
I'm looking for a complete list of different spellings
(thousands of entries...) and the suggested standard so we can 
correct/uniformify them automatically.


From dan.bolser at gmail.com  Wed Oct 24 04:41:05 2007
From: dan.bolser at gmail.com (Dan Bolser)
Date: Wed, 24 Oct 2007 10:41:05 +0200
Subject: [BiO BB] genbank orthography
In-Reply-To: <d43.156b8eeb.34504d91@aol.com>
References: <d43.156b8eeb.34504d91@aol.com>
Message-ID: <2c8757af0710240141l288546e0qf90cd627a746aa11@mail.gmail.com>

On 24/10/2007, Sterten at aol.com <Sterten at aol.com> wrote:
>
> names are not spelled uniformly, e.g. Viet Nam and Vietnam,
> also many typos, this makes it very difficult to sort and analyse the  entries
> by computer.
> I'm looking for a complete list of different spellings
> (thousands of entries...) and the suggested standard so we can
> correct/uniformify them automatically.

Great idea. The PDB needs something similar also!


>
>
>
>
> _______________________________________________
> General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>


-- 
hello


From marchywka at hotmail.com  Wed Oct 24 08:27:36 2007
From: marchywka at hotmail.com (Mike Marchywka)
Date: Wed, 24 Oct 2007 08:27:36 -0400
Subject: [BiO BB] sequence analysis
In-Reply-To: <37573.141.42.56.114.1193210228.squirrel@webmail.charite.de>
Message-ID: <BAY108-F40E1289C88D8EE927C62B2BE940@phx.gbl>


>Before, you should transform the database file such that

I've taken my local blast databases and used their fasta form for
"grepping" ( using my own code that calls either greta or boos regex 
libraries)
against various genome sequences. It turns out to be too slow for repetitive
usage but I would comment as follow.

The patterns of biological interest tend to be subsets of regex so you can
implement special code that is a lot faster when your query isn't 
blast-friendly.
For example, a "conserved" domain may look like  "neutral"-many irrlelvant-
cysteine-X-cysteine-many irrelevant-H- etc (I just made this up but it is 
based on
many thing I've seen in the literature). You may have a hard time blasting 
for this
but you can grep for it with something like 
[ANCQGILMFPSTWYV].{50,60}C.C.{10,100}H

If you want a real-life example, here are some from prosite using my prosite 
to
PERL translation scheme ( I hate illustrating with real things that may not 
be right):

[LIVM][VIC].[^H]G[DENQTA].[GAC][^L].[LIVMFY]{4}.{2}G >rule|16|PEPDTIDE 
Prosite CNMP_BINDING_1
[EQ][^LNYH].[ATV][FY][^LDAM][^T]W[^PG]N >rule|18|PEPDTIDE Prosite ACTININ_1


>From what I've seen, this is too slow for grep against many genes ( or 
pre-translated peptides)
but you can compile the query and target for much faster searching ( similar
to a transient database index ). Even literal string matching can be slow 
without
doing this - I have 500k empirically discovered ( highly-redundant lots of 
junk )
repeats that I can now label against 100, 60kb sequences in "reasonable" 
time
which I could not do before. This works fine for the 600 or so mirna 
sequences
I finally figured out how to download from sanger too :)


Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com
Note: Hotmail is blocking my mom's entire
ISP claiming it is to reduce spam but probably
to force users to use hotmail. Please DON'T
assume I am ignoring you and try
me on marchywka at yahoo.com if no reply
here. Thanks.


>From: "Dr. Christoph Gille" <christoph.gille at charite.de>
>Reply-To: "General Forum at Bioinformatics.Org" 
><bio_bulletin_board at bioinformatics.org>
>To: "General Forum at Bioinformatics.Org" 
><bio_bulletin_board at bioinformatics.org>
>Subject: Re: [BiO BB] sequence analysis
>Date: Wed, 24 Oct 2007 09:17:08 +0200 (CEST)
>
>You could use fgrep. Fgrep is faster than grep.
>
>Before, you should transform the database file such that
>each sequence takes one line without blank and without
>line breaks (using tr and sed)
>
>Database files are optimized for hole cards for
>historical reasons. Lines are  wrapped after at least after 72
>characters, preventing the use of fgrep.
>
>
>
>
>_______________________________________________
>General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board

_________________________________________________________________
Get a FREE Web site and more from Microsoft Office Live Small Business!  
http://clk.atdmt.com/MRT/go/aub0930004958mrt/direct/01/


From Sterten at aol.com  Wed Oct 24 09:38:00 2007
From: Sterten at aol.com (Sterten at aol.com)
Date: Wed, 24 Oct 2007 09:38:00 EDT
Subject: [BiO BB] genbank orthography
Message-ID: <d50.15cd4aa1.3450a4b8@aol.com>

 
I once made a list for influenza, but not nearly complete. Just some  hundred 
of the
most common (mis-)spellings i.e. wrt. the genes
 
 
In einer eMail vom 24.10.2007 10:41:32 Westeurop?ische Normalzeit schreibt  
dan.bolser at gmail.com:

On  24/10/2007, Sterten at aol.com <Sterten at aol.com> wrote:
>
>  names are not spelled uniformly, e.g. Viet Nam and Vietnam,
> also many  typos, this makes it very difficult to sort and analyse the   
entries
> by computer.
> I'm looking for a complete list of  different spellings
> (thousands of entries...) and the suggested  standard so we can
> correct/uniformify them automatically.

Great  idea. The PDB needs something similar  also!


From marchywka at hotmail.com  Wed Oct 24 10:25:25 2007
From: marchywka at hotmail.com (Mike Marchywka)
Date: Wed, 24 Oct 2007 10:25:25 -0400
Subject: [BiO BB] genbank orthography
In-Reply-To: <d43.156b8eeb.34504d91@aol.com>
Message-ID: <BAY108-F35C237AD6323D2B2F1A755BE940@phx.gbl>


I deleted most of the posts on this thread but as with other thread
if you can reduce the db to text, there are plenty of good tools for
one-time text processing- this is easy with sed and perl. There are
indexing scripts on the web that are only 10-20 lines long. It isn't hard
to find typos in such a list, with or without a spelling dictionary.


>From: Sterten at aol.com
>Reply-To: "General Forum at Bioinformatics.Org" 
><bio_bulletin_board at bioinformatics.org>
>To: bio_bulletin_board at bioinformatics.org
>Subject: [BiO BB] genbank orthography
>Date: Wed, 24 Oct 2007 03:26:09 EDT
>
>
>names are not spelled uniformly, e.g. Viet Nam and Vietnam,
>also many typos, this makes it very difficult to sort and analyse the  
>entries
>by computer.
>I'm looking for a complete list of different spellings
>(thousands of entries...) and the suggested standard so we can
>correct/uniformify them automatically.
>
>
>
>
>_______________________________________________
>General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board

_________________________________________________________________
i'm making a difference.?Make every IM count for the cause of your choice. 
Join Now. http://im.live.com/messenger/im/home/?source=TAGHM


From kanzure at gmail.com  Tue Oct 23 19:43:55 2007
From: kanzure at gmail.com (Bryan Bishop)
Date: Tue, 23 Oct 2007 18:43:55 -0500
Subject: [BiO BB] sequence analysis
In-Reply-To: <bdd10c2a0710231627m2ff0dfbcnb4612cf6b7cfc5cf@mail.gmail.com>
References: <838860.9612.qm@web51504.mail.re2.yahoo.com>
	<bdd10c2a0710231627m2ff0dfbcnb4612cf6b7cfc5cf@mail.gmail.com>
Message-ID: <200710231843.55373.kanzure@gmail.com>


"Of the protein sequences from human." Martin, it sounds like what 
Nikhil wants to do would require much more memory than textpad would be 
allowed to allocate. Right?

- Bryan

On Tuesday 23 October 2007 18:27, Martin Gollery wrote:
> For an exact match you can simply use grep, or open the target in an
> editor such as textpad and use the search function.
>
> Marty
>
> On 10/23/07, nikhil gadewal <ngadewal at yahoo.com> wrote:
> > Hello all
> >
> >   I am interested to find tetramer peptide matching 100% to the
> > carboxyl terminal of the protein sequences from human. Is there any
> > specific tool available to do so.
> >   Can BLAST or FASTA handle it by changing the parameters.
> >
> >   Thankyou in advance.
> >
> >   Nikhil


From Lambert at Chatham.edu  Tue Oct 23 20:46:51 2007
From: Lambert at Chatham.edu (Lambert, Lisa)
Date: Tue, 23 Oct 2007 20:46:51 -0400
Subject: [BiO BB] sequence analysis
References: <838860.9612.qm@web51504.mail.re2.yahoo.com>
Message-ID: <7980CF1A43C6564184A9A92D278F839E06AFBDDD@hickory.chatham.local>

I would suggest using PatScan: http://www-unix.mcs.anl.gov/compbio/PatScan/.
Unlike doing a plain text search, it will let you specify that the pattern must be at the end or the beginning of a sequence. 
 
Lisa

	-----Original Message----- 
	From: bio_bulletin_board-bounces+lambert=chatham.edu at bioinformatics.org on behalf of nikhil gadewal 
	Sent: Tue 10/23/2007 6:24 AM 
	To: bio_bulletin_board at bioinformatics.org 
	Cc: 
	Subject: [BiO BB] sequence analysis[Scanned]
	
	
	Hello all
	  
	  I am interested to find tetramer peptide matching 100% to the carboxyl terminal of the protein sequences from human.
	  Is there any specific tool available to do so.
	  Can BLAST or FASTA handle it by changing the parameters.
	  
	  Thankyou in advance.
	  
	  Nikhil
	
	
	NIKHIL S. GADEWAL ACTREC, Tata Memorial Centre, Kharghar, Navi Mumbai, India  Great minds discuss ideas; Average minds discuss events; Small minds discuss people.
	 __________________________________________________
	Do You Yahoo!?
	Tired of spam?  Yahoo! Mail has the best spam protection around
	http://mail.yahoo.com
	_______________________________________________
	General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org
	https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
	

From Richard.Squires at UTSouthwestern.edu  Wed Oct 24 11:10:48 2007
From: Richard.Squires at UTSouthwestern.edu (Richard Squires)
Date: Wed, 24 Oct 2007 10:10:48 -0500
Subject: [BiO BB] genbank orthography
Message-ID: <471F1A28020000E50003C47E@swnw124.swmed.edu>

For correct spellings you can use a Gazetteer. As far as a resource of misspellings I am not aware of one.

Burke


---
Burke Squires
BioHealthBase BRC
Influenza Bioinformaticist
University of Texas Southwestern Medical Center at Dallas
richard.squires at utsouthwestern.edu
(214) 648-4952
>>> <Sterten at aol.com> 10/24/07 2:26 AM >>>

names are not spelled uniformly, e.g. Viet Nam and Vietnam,
also many typos, this makes it very difficult to sort and analyse the  entries
by computer.
I'm looking for a complete list of different spellings
(thousands of entries...) and the suggested standard so we can 
correct/uniformify them automatically.


_______________________________________________
General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board


From ma11 at gen.cam.ac.uk  Wed Oct 24 12:58:38 2007
From: ma11 at gen.cam.ac.uk (Michael Ashburner)
Date: Wed, 24 Oct 2007 18:58:38 +0200
Subject: [BiO BB] genbank orthography
In-Reply-To: <2c8757af0710240141l288546e0qf90cd627a746aa11@mail.gmail.com>
References: <d43.156b8eeb.34504d91@aol.com>
	<2c8757af0710240141l288546e0qf90cd627a746aa11@mail.gmail.com>
Message-ID: <F88B97BB-19DC-4A34-A6B9-01340961F82E@gen.cam.ac.uk>

I agree it is a terrible mess. Not the new gaz.obo project.  This is  
an attempt to build an artefact in OBO
format for geographical locations. The current version has about  
20,000 locations. We have a parse of about
45,000 from Genbank but it will take some time to check them and get  
them in to this file.

http://obo.cvs.sourceforge.net/obo/obo/ontology/environmental/gaz.obo? 
view=log


Michael Ashburner


The file is available from the OBO CVS site
On 24 Oct 2007, at 10:41, Dan Bolser wrote:

> On 24/10/2007, Sterten at aol.com <Sterten at aol.com> wrote:
>>
>> names are not spelled uniformly, e.g. Viet Nam and Vietnam,
>> also many typos, this makes it very difficult to sort and analyse  
>> the  entries
>> by computer.
>> I'm looking for a complete list of different spellings
>> (thousands of entries...) and the suggested standard so we can
>> correct/uniformify them automatically.
>
> Great idea. The PDB needs something similar also!
>
>
>>
>>
>>
>>
>> _______________________________________________
>> General Forum at Bioinformatics.Org -  
>> BiO_Bulletin_Board at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>>
>
>
> -- 
> hello
> _______________________________________________
> General Forum at Bioinformatics.Org -  
> BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board


From postelv at gmail.com  Thu Oct 25 06:33:38 2007
From: postelv at gmail.com (vladi postal)
Date: Thu, 25 Oct 2007 12:33:38 +0200
Subject: [BiO BB] readseq(converting genbank to gcg format)
Message-ID: <81e4d3400710250333k75b9823fv80480ba8280a23e0@mail.gmail.com>

Hi,
I use readseq program in unix to convert genbank files to gcg format.
The problem is that for some files the program don't work,
for example: gbpat2.seq, gbpat3.seq,gbpat8.seq don't work but for
gbpat1.seq, gbpat4.seq, gbpat6 , seq,gbpat7.seq  it works fine.
does anyone have the same problem?
How to solve it?
any help will be appreciated.

vladi,


From jeff at bioinformatics.org  Fri Oct 26 18:00:05 2007
From: jeff at bioinformatics.org (J.W. Bizzaro)
Date: Fri, 26 Oct 2007 18:00:05 -0400
Subject: [BiO BB] Course: Mitochondriomics
Message-ID: <47226365.8030309@bioinformatics.org>

                                Mitochondriomics

                   Online at the Bioinformatics Organization
               In collaboration with Roskilde University, Denmark

                               November 5-9, 2007

CONTENTS:

    1. BACKGROUND
    2. OBJECTIVE
    3. INSTRUCTORS
    4. COURSE OUTLINE
    5. ADDITIONAL INFORMATION

----------------------------------------
1. BACKGROUND
----------------------------------------

Mitochondria are semiautonomous organelles, presumed to be the evolutionary product of a symbiosis between a eukaryote and a prokaryote. The organelle is present in almost all eukaryotic cells in an extent from 103-104 copies. The main function of mitochondria is production of ATP by oxidative phosphorylation and its involvement in apoptosis. The organelles contain almost exclusively maternally inherited mtDNA, and they have specific systems for transcription, translation and replication of mtDNA.

Mitochondrial dysfunction has been correlated with mitochondrial diseases where the clinical pathologies are believed to include infertility, diabetes, blindness, deafness, stroke, migraine and heart-, kidney-, and liver diseases. Recently cancer was added to this list when investigations into human cancer cells from breast, bladder, neck, and lung, revealed a high occurrence of mutations in mtDNA. With the emerging understanding of the role of mitochondria in a vast array of pathologies, research of mitochondria and mitochondrial dysfunction have in the last decade yielded a huge amount of data in form of publications and databases. Nevertheless, the field of mitochondrial research is still far from exhausted with many unknown factors yet to be discovered.

----------------------------------------
2. OBJECTIVE
----------------------------------------

The purpose of this course is to introduce the student to the various databases and wet-lab methods available. Furthermore the course will through selected articles give an understanding of the pitfalls and limitations of the various databases and methods.

----------------------------------------
3. INSTRUCTORS
----------------------------------------

    * Claus Desler (cdesler at ruc.dk)
    * Prashanth Suravajhala (prash at ruc.dk) 

----------------------------------------
4. COURSE OUTLINE
----------------------------------------

    * Day 1: Introduction to mitochondria and its pathways, genetics; proteomics of mitochondria 

    * Day 2: Various assays used and advances in mitochondrial research 

    * Day 3: Tools and databases used in mitochondrial research; exercises 

    * Day 4: Exercises, report, review of literature continues 

    * Day 5: Summary and questions and answers; evaluation 

----------------------------------------
5. ADDITIONAL INFORMATION
----------------------------------------

Please visit:
    * http://wiki.bioinformatics.org/BI221A_Mitochondriomics


From harry.mangalam at uci.edu  Fri Oct 26 20:07:11 2007
From: harry.mangalam at uci.edu (Harry Mangalam)
Date: Fri, 26 Oct 2007 17:07:11 -0700
Subject: [BiO BB] sequence analysis
In-Reply-To: <37573.141.42.56.114.1193210228.squirrel@webmail.charite.de>
References: <838860.9612.qm@web51504.mail.re2.yahoo.com>
	<37573.141.42.56.114.1193210228.squirrel@webmail.charite.de>
Message-ID: <200710261707.11239.harry.mangalam@uci.edu>

And not to start a 'grep' war, but nrgrep and agrep are also faster 
than either fgrep or grep and allow searching with errors.  nrgrep 
decays more smoothly with more complex patterns but agrep has more 
implementations.
Both allow searching across arbitrary boundaries.
Both are free.
hjm

On Wednesday 24 October 2007, Dr. Christoph Gille wrote:
> You could use fgrep. Fgrep is faster than grep.
>
> Before, you should transform the database file such that
> each sequence takes one line without blank and without
> line breaks (using tr and sed)
>
> Database files are optimized for hole cards for
> historical reasons. Lines are  wrapped after at least after 72
> characters, preventing the use of fgrep.
>
>
>
>
> _______________________________________________
> General Forum at Bioinformatics.Org -
> BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board


-- 
Harry Mangalam - Research Computing, NACS, E2148, Engineering Gateway, 
UC Irvine 92697  949 824 0084(o), 949 285 4487(c) 
harry.mangalam at uci.edu


From me.lixue at gmail.com  Sat Oct 27 00:14:06 2007
From: me.lixue at gmail.com (Xue Li)
Date: Fri, 26 Oct 2007 23:14:06 -0500
Subject: [BiO BB] need help on HMM(Hidden Markov Model) package
Message-ID: <62ed16460710262114u2fe5fb98ie8efcf03daef7947@mail.gmail.com>

Hello all,

Does anyone know some good HMM(Hidden Markov Model) package? It would be
perfect if it is written in Perl, or can be called in Perl.

Thank a lot!

-- 
Xue, Li
Bioinformatics and Computational Biology program @ ISU
Ames, IA 50010
515-450-7183


From landman at scalableinformatics.com  Sat Oct 27 00:20:14 2007
From: landman at scalableinformatics.com (Joe Landman)
Date: Sat, 27 Oct 2007 00:20:14 -0400
Subject: [BiO BB] need help on HMM(Hidden Markov Model) package
In-Reply-To: <62ed16460710262114u2fe5fb98ie8efcf03daef7947@mail.gmail.com>
References: <62ed16460710262114u2fe5fb98ie8efcf03daef7947@mail.gmail.com>
Message-ID: <4722BC7E.8010207@scalableinformatics.com>

Xue Li wrote:
> Hello all,
> 
> Does anyone know some good HMM(Hidden Markov Model) package? It would be
> perfect if it is written in Perl, or can be called in Perl.

HMMer (http://hmmer.janelia.org/) is one of the standard tools, and it 
can be used from within BioPerl (http://www.bioperl.org/wiki/Main_Page) 
fairly easily  (see 
http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SearchIO/hmmer.html 
)

> Thank a lot!

Joe


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615