CONTENTS

  1. Upcoming Pathway Tools Tutorial at SRI, January 16-17, 2020

  2. New Pathway Tools Publication

  3. The PathoLogic Inference Component of Pathway Tools

Upcoming Pathway Tools Tutorial at SRI, January 16-17, 2020

We will offer a two-day Introduction to Pathway Tools tutorial at SRI from January 16-17, 2020. The early registration deadline is December 12. For more information click here.


New Pathway Tools Publication

The following just-published article describes the many enhancements made to Pathway Tools during the past four years including multiple extensions to its metabolic network modeling, omics data analysis, and core database management capabilities:

"Pathway Tools version 23.0 update: software for pathway/genome informatics and systems biology" in Briefings in Bioinformatics.


The PathoLogic Inference Component of Pathway Tools

PathoLogic is the computational inference module of Pathway Tools. Here we summarize the basic operation of PathoLogic as well as some new pathway inference capabilities coming in our version 23.5 release in the next few weeks.

The current inference capabilities of Pathway Tools are as follows:

  • Infer the metabolic reactions catalyzed by the organism
  • Infer the metabolic pathways comprised of those reactions
  • Infer which genes code for missing enzymes (pathway holes) in those metabolic pathways
  • Infer the operons of prokaryotic organisms
From its inception, the reactome and pathway inference components of PathoLogic have been designed to start with an existing genome annotation for an organism rather than re-annotating the genome. We had two rationales for this approach: (1) Building genome annotation pipelines is a big problem and a hard problem that many other groups were addressing; we preferred to build on their work rather than undertake that hard problem ourselves. Furthermore, many groups perform expert manual oversight and inference during the annotation process; were we to re-annotate the genome we would be discarding those manual inferences. In hindsight we feel this strategy has been validated: genome annotation pipelines have grown even more sophisticated, e.g., integrating BLAST searches with multiple HMM libraries and producing a variety of outputs including protein names, EC numbers, and Gene Ontology terms. Metabolic inference tools that re-annotate genomes using simple approaches such as BLAST searches alone will produce inferior annotations and therefore inferior reactome and pathway inferences.

Since PathoLogic takes as its input the output of genome annotation pipelines, it must map protein names, EC numbers, and Gene Ontology terms to reaction assignments. Since the latter two entities are controlled vocabularies, they are straightforward for PathoLogic to accept and map to reactions. Enzyme names are much less straightforward to handle. The core of our approach is to utilize the enzyme name and reaction associations recorded by curators in the MetaCyc database. We also supplement those names with additional enzyme synonyms that we search out using a program that iterates across the 14,000 genomes in BioCyc and finds the most frequently unrecognized enzyme names across all of these genomes. Those names become the highest priorities for curation. Our curators have recently entered an additional 300 new names from this list so that we now have a total of 48,000 enzyme names and synonyms available to the "enzyme name matching" component of PathoLogic. These new names have significantly boosted reaction inference.

However, some of those enzyme names are ambiguous, either because an enzyme with one name catalyzes multiple reactions at multiple active sites, or because different enzymes with the same name catalyze different reactions. In version 23.5, the enzyme name matcher will use gene names (if available in the genome annotation) to disambiguate these ambiguous enzyme names. Gene names will also be used to infer reaction mappings when the enzyme name is not recognized. Although there are occasional errors, our review of these assignments indicate they are quite accurate. These two strategies further boost reaction inference.

Version 23.5 will also include several improvements to pathway inference. MetaCyc pathways now include more extensive use of "key reaction" definitions, where pathways specify reactions that must have an enzyme present in the organism to infer that pathway as present. We also introduce a new pathway field called "key non-reactions", which prevent inference of a given pathway if a specified reaction outside that pathway is catalyzed by the organism. In particular these new rules enable more accurate inference of the appropriate variants of the TCA cycle and glycolysis pathways. We also added new rules that result in more accurate inference of super pathways (pathways built from several smaller pathways).

Another new feature in 23.5 is a new variant of the pathway evidence report available from the web command Analysis → Reports → Pathway Evidence. Previously, this report sorted pathways by pathway ontology. The new variation of the report sorts the report by pathway score to speed user review of low-scoring pathways.