Summary of:

Architecture and Evolution of Organic Chemistry

Authored by: Marcin Fialkowski, Kyle J.M. Bishop, Victor A. Chubukov, Christopher J. Campbell, and Bartosz A. Grzybowski
Published in: Angew. Chem. Int. Ed. 2005, 44, 7263-7269

This is a paper on Computational Chemistry, and is the first work related to the Chematica algorithm and software. Here, the results are summarized. Several of the key results we will seek to compare to our network.

Key Findings:

  • All chemical reactions can be represented as a network (directed graph), where the nodes are the molecules, and the edges are the reaction directions.
  • There are statistical laws which describe the creation of molecules through chemical reactions.
  • The network is temporally and geometrically scale-free - meaning, at any snapshot in time or at any size, it has the same topological properties.
  • Oddly enough, the topology of this network is similar to that of the world-wide web.

Chemical Reaction Data:

They used the Bielstein database (BD), which at the time of the publication had 9,293,250 chemical reactions and 9,550,398 chemical substances. The BD has the reactions timestamped to their year of discovery. The oldest reaction was from 1779. After some cleaning and filtering of the reactions, they ended up including 6,539,158 reactions in the network.

The Network of Chemical Reactions:

A very basic network of chemical reactions (NCR) simply consists of the nodes as chemical substances and the edges as reactions.


Here are snapshots of the NCR in 1835 and 1850.

Two useful quantities for describing the topological structure of a network are the average connectivity, $<k>$ and the number of connections incoming ($k_{in}$) or outgoing ($k_{out}$) from a node.

The average connectivity is a global property of the network, meaning that it describes one aspect of the network in its totality, rather than just some local region of the network. It is defined simply as the ratio of edges to nodes.

\begin{align} <k> = \frac{N_{edges}}{N_{nodes}} \end{align}

The number of connections incoming or outgoing to a single node is a local property of the network. One can make observations of these variables for a network. Comparing these distributions for different networks shows if they scale together or not. They did this for networks over time. They showed that over time, the $k_{in}$ and $k_{out}$ distributions are scale-free.