Discussion
The body of published biomedical research contains a lot of information about relationships between diseases and genes. Those relationships can be summarized by the help of networks. In this study we used a disease-pathway network constructed using PubMed abstracts, MeSH terms and a data-driven approach based on graph theory (trees and clustering) to infer new biological hypotheses.
We determined which types of clustering techniques worked best for such a network, assessing them both by objective statistical criteria and by the judgment of life scientists. The trees summarization of the networks revealed large structures of the network which are either due to biological relationships or coverage bias in the literature. Overall, the clusters revealed diseases with common mechanisms. These clusters imply some novel hypotheses about human disease biology.