Shared Vocabularies: Research Needs Catch Up With Our ELN

What happens when researchers identify the same information by different terms?

We advocate that vocabulary standards become an intrinsic element within industry softwareConsider the difficulties created in later finding or sharing existing research, whether internally through staff turnover or externally among an increasingly collaborative scientific community, when information is not recorded or tagged with uniform – or at least associated – terminology.  Many of these records are not easily searchable from the beginning because they exist either on paper or as loose electronic documents, emails, spreadsheets, images, and other file types shared over email from one scientist to the next. Divergent vocabulary compounds an already prevalent obstruction in research and development.

Informatics experts in the pharmaceutical and biomedical fields recently published a group paper on the importance of shared vocabularies in scientific research.  “Empowering industrial research with shared biomedical vocabularies,” (1) published in the November 2011 edition of Drug Discovery Today, is an article that expresses the semantic challenges and frustration common to many research disciplines.

The negative impact of partial and missing vocabularies on industrial research is not a new issue. However, in the current, rapidly evolving environment, new scientific, business and technical indicators suggests this problem will become even more acute…. A good example of the direction in which many are headed is provided by a recent biomarker study from Genentech that focused on samples taken from over 3000 patients with rheumatoid arthritis. Multiple genetic, gene expression, cell population and protein marker studies were performed by several contract research organizations (CROs) and subsequently integrated for the analysis. The authors describe how their efforts were hindered by a lack of vocabulary standards, with significant laborious, manual intervention required to match up ethnicity, study regions, drugs and drug types across the results.Harland, et al 2011.

The authors propose that the biomedical industry should band together to produce a universal ontology for use in research records. Though time-intensive from the start, the article asserts that the rewards for such an effort would be widespread, including saving effort costs, eliminating redundancy, and more widely encompassing the available body of research.  But what then?

From a technical perspective, the drive for integration has led many industry and academic informaticians to explore ‘Semantic Web’ technology. This approach holds promise in addressing major information challenges by combining data integration with powerful querying and inferencing capabilities…. Secondly, we advocate that vocabulary standards become an intrinsic element within industry software, whether they are document repositories, electronic laboratory notebooks or intelligence systems. Ensuring that the designers of these applications consider how they will identify concepts in a way that facilitates integration will provide significant future benefits.” Harland, et al 2011.


Semantic searching is a t our software developers have considered from the start.

CERF is only ELN platform that was built on these semantic web technologies and semantic controlled vocabularies and data standards from the start. Technical understanding and user requirements are finally catching up with the need our developers saw 10 years ago by our scientists who then developed a product specifically to meet the needs of other scientists.

ELN industry analyst Mike Elliott of Atrium Research (2) once said of our foundation software, “One of the few ELN solutions focused on biology, CERF might be ahead of its time. Is the market really ready for semantic approaches to data management? Most companies cannot get a handle on their data and have a difficult time implementing a basic ELN system. We give CERF a great deal of credit for pioneering what might be the future of life sciences data management architectures. They are putting all the pieces together,” Elliott wrote.


1. Lee Harland, Christopher Larminie, Susanna-Assunta Sansone, Sorana Popa, M. Scott Marshall, Michael Braxenthaler, Michael Cantor, Wendy Filsell, Mark J. Forster, Enoch Huang, Andreas Matern, Mark Musen, Jasmin Saric, Ted Slater, Jabe Wilson, Nick Lynch, John Wise and Ian Dix (2011). Empowering industrial research with shared biomedical vocabularies. Drug Discovery Today, 16(21-22), 940-947.

2. Atrium Research & Consulting (2005). Electronic Laboratory Notebooks—A Foundation for Scientific Knowledge Management, Edition II.