Bioinformatics.org
Not logged in
  • Log in
  • Bioinformatics.org
    Membership (42713+) Group hosting [?] Wiki
    Franklin Award
    Sponsorships

    Careers
    About bioinformatics
    Bioinformatics training
    Bioinformatics jobs

    Research
    All information groups
    Online databases Online analysis tools Online education tools More tools

    Development
    All software groups
    FTP repository
    SVN & CVS repositories [?]
    Mailing lists

    Forums
    News & Commentary
  • Submit
  • Archives
  • Subscribe

  • Jobs Forum
    (Career Center)
  • Submit
  • Archives
  • Subscribe
  • Latest announcements
    Submit Archive Subscribe
    Software: Genozip: A universal compressor for genomic files
    Submitted by Divon Lan; posted on Tuesday, July 20, 2021

    Genozip is a universal compressor for genomic files – it is optimized to compress FASTQ, SAM/BAM/CRAM, VCF/BCF, FASTA, GVF, PHYLIP, Chain, Kraken and 23andMe files, but it can also compress any other file (including non-genomic files).

    Typically, a 2X-5X improvement over the existing compression is achieved when compressing already-compressed files like .fastq.gz .bam vcf.gz, and up to 200X for a high-sample-count VCF file.

    Yes, Genozip can compress already-compressed files (.gz .bz2 .xz .bam .cram).

    The compression is lossless – the decompressed file is 100% identical to the original file.

    Details: genozip.com. Available on conda (conda-forge channel) and github.com/divonlan/genozip

    Reference:
    Lan, D., et al. (2021) Genozip: a universal extensible genomic data compressor Bioinformatics, btab102, doi.org/10.1[...]ab102

    Submitter

    ABSTRACT

    The literature knowledge panels developed and implemented in PubChem are described. These help to uncover and summarize important relationships between chemicals, genes, proteins, and diseases by analyzing co-occurrences of terms in biomedical literature abstracts. Named entities in PubMed records are matched with chemical names in PubChem, disease names in Medical Subject Headings (MeSH), and gene/protein names in popular gene/protein information resources, and the most closely related entities are identified using statistical analysis and relevance-based sampling. Knowledge panels for the co-occurrence of chemical, disease, and gene/protein entities are included in PubChem Compound, Protein, and Gene pages, summarizing these in a compact form. Statistical methods for removing redundancy and estimating relevance scores are discussed, along with benefits and pitfalls of relying on automated (i.e., not human-curated) methods operating on data from multiple heterogeneous sources.

    Full article: www.frontiersin.org/arti[...]/full

    Submitter

    IPC 2.0 – Isoelectric Point Calculator 2.0 is a web service and a standalone program for the estimation of protein and peptide isoelectric point (pI) and dissociation constant (pKa) values using a mixture of deep learning and support vector regression models.

    Input: amino acid sequence(s)
    Output: pI values predicted by >15 methods alongside with pKa dissociation constants for charged residues

    Isoelectric point, the pH at which a particular molecule carries no net electrical charge, is a critical parameter for many analytical biochemistry and proteomics techniques, especially for 2D gel electrophoresis (2D-PAGE), capillary isoelectric focusing (cIEF), X-ray crystallography, and liquid chromatography--mass spectrometry (LC-MS).

    According to the benchmarks, the prediction accuracy (RMSD) of IPC 2.0 for proteins and peptides outperforms previous algorithms: 0.848 versus 0.868 and 0.222 versus 0.405, respectively. Moreover, the IPC 2.0 prediction of pKa using sequence information alone was better than the prediction from structure-based methods (0.576 versus 0.826) and a few folds faster.

    AVAILABILITY

    IPC 2.0 is available at www.ipc2-isoelectric-point.org

    REFERENCE

    Kozlowski LP (2021) IPC 2.0: prediction of isoelectric point and pKa dissociation constants. Nucleic Acid Res. DOI: doi.org/10.1093/nar/gkab295
    Software: BIRCH Bioinformatics System v3.80
    Submitted by Brian Fristensky; posted on Thursday, July 08, 2021

    Submitter

    BIRCH 3.80 now available for download at home.cc.umanitoba.ca/~psg[...].html

    NEW

    • Improvements to BLAST search results
    • Improvements and updates for multiple alignment and phylogeny
    • Numerous updated software packages
    • Extensive improvements to pre-processing and QC of sequencing reads
    BIRCH unifies hundreds of popular bioinformatics tools through the BioLegato family of Object-Oriented applications. BioLegato makes it easy to try out new programs, and to experiment with your data at every step in the analytical process.

    Visit our YouTube Channel at www.youtube.com/chan[...]ublic

    Submitter

    EXCERPT

    A study involving virtual rather than real patients was as effective as traditional clinical trials in evaluating a medical device used to treat brain aneurysms, according to new research.

    The findings are proof of concept for what are called in-silico trials, where instead of recruiting people to a real-life clinical trial, researchers build digital simulations of patient groups, loosely akin to the way virtual populations are built in The Sims computer game.
    Source: www.leeds.ac.uk/news[...]rials
    Publication: www.nature.com/arti[...]998-w
    Education: FirstGlance in Jmol now on YouTube: Intro & Design Goals
    Submitted by Eric Martz; posted on Wednesday, March 24, 2021

    Submitter

    FirstGlance in Jmol is the easiest way to understand 3D structures of proteins. There is no command language to learn. It is much easier than PyMOL. It is easy for students, yet has plenty of power for researchers. Any molecular view in FirstGlance can be rendered as an animation ready to drop into a Powerpoint slide. It takes just a few mouse clicks. During the past year, FirstGlance was used >300 times/day on average.

    A new video on YouTube provides an introduction to FirstGlance in Jmol, and discusses the inspiration for its creation, and its design goals: www.youtube.com/watch?v=80og2ASrvnQ

    Slides showing examples of molecular animations created with FirstGlance: docs.google.com/pres[...]=id.p

    FirstGlance in Jmol: FirstGlance.Jmol.Org

    Submitter

    As posted recently by Jeff Bizzaro, in 2020, AlphaFold2 predicted protein structures with truly astonishing accuracy, certified by the bi-annual double-blind competition, CASP 14. Its predictions were based on the amino acid sequences of the target proteins, using massive artificial intelligence machine learning from sequence and structure databases. Predictions were made "blind", without access to empirical structures of the targets, and were judged later in 2020 when empirical structures became public. The judges did not know who made which prediction.

    AlphaFold2 was one of over 100 groups that submitted predictions for over 100 target single-chain domains. In most cases, AlphaFold2 made the best prediction, while the second best prediction was far less accurate. This was particularly impressive for "free modeling" targets, those for which no suitable homology modeling templates were available.

    I have briefly summarized the breakthrough here:

    proteopedia.org/w/Th[...]odels

    I have analyzed two free modeling cases in detail, with comparisons visualized in interactive 3D. One (92 amino acids) is the ORF8 virulence factor from SARS-CoV-2. Among the free modeling targets, it had the largest discrepancy between the best and 2nd best predictions. The second is a phage RNA polymerase, the longest free-modeling target domain (404 amino acids). See:

    proteopedia.org/w/Al[...]SP_14

    In January 2021, Degenics was launched as a blockchain-based, anonymous-first DNA testing platform, in collaboration with Blocksphere, a blockchain consulting company. Degenics intends to partner with projects in the Polkadot ecosystem, including KILT Protocol to incorporate credentials for the project's genetic lab & genetic products ecosystem.

    The platform provides a meeting place between genetic testing laboratories and privacy-conscious genetic testing users and includes sovereignty mechanisms that ensure that genetic test results produced will remain in full possession of each individual while providing incentive-flow mechanisms for the laboratories.

    Degenics calls for laboratories to register for the closed beta system test via Degenics.com.

    Article: ritzherald.com/dege[...]form/

    Submitter

    EXCERPT

    Pleiotropy analysis, which provides insight on how individual genes result in multiple characteristics, has become increasingly valuable as medicine continues to lean into mining genetics to inform disease treatments. Privacy stipulations, though, make it difficult to perform comprehensive pleiotropy analysis because individual patient data often can't be easily and regularly shared between sites. However, a statistical method called Sum-Share, developed at Penn Medicine, can pull summary information from many different sites to generate significant insights. In a test of the method, published in Nature Communications, Sum-Share's developers were able to detect more than 1,700 DNA-level variations that could be associated with five different cardiovascular conditions.
    Source: medicalxpress.com/news[...].html
    Article: doi.org/10.1[...]211-2

    Submitter

    EXCERPT

    DeepMind's AlphaFold is an AI system built to tackle this long-standing challenge. In 2018, the initial version of AlphaFold debuted at CASP (Critical Assessment of protein Structure Prediction), a biennial worldwide event for experimenting with state-of-the-art protein structuring technologies. AlphaFold achieved the highest accuracy of the participating technologies at CASP13 in 2018, but has now been developed further into what is being labeled a "stunning advance."

    The system was trained on publicly available data on around 170,000 protein structures and a large database of unknown protein structures ahead of its appearance at CASP14 this week. Technologies are graded from 0-100 for accuracy on what is known as the Global Distance test, which assesses what percentage of beads in the protein chain are within a threshold distance of the correct location. In results released today, AlphaFold scored 92.4 across all targets.
    Source: newatlas.com/biol[...]blem/

    More on AlphaFold: deepmind.com/blog[...]ology
    Submit Archive Subscribe

     

    Acknowledgments

    We wish to thank the following for their support:

    [Bio-IT World]
    [Become a sponsor]
    Copyright © 2021 Scilico, LLC · Privacy Policy