Exploring Open Science n°2: DNAdigest interviews SolveBio

DNAdigest continues with the series of interviews. Here we would like to introduce you to Mr Mark Kaganovich, CEO of SolveBio, who agreed on an interview with us. He shared a lot about what SolveBio does and discussed with us the importance of genomic data sharing.

Mark

Mark Kaganovich, CEO of SolveBio

1) Could you describe what SolveBio does?

SolveBio delivers the critical reference data used by hospitals and companies to run genomic applications. These applications use SolveBio’s data to predict the effects of slight DNA variants on a person’s health. SolveBio has designed a secure platform for the robust delivery of complex reference datasets. We make the data easy to access so that our customers can focus on building clinical grade molecular diagnostics applications, faster.

2) How did you come up with the idea of building a system that integrates genomic reference data into diagnostic and research applications? And what was the crucial moment when you realised the importance of creating it?

As a graduate student I spent a lot of time parsing, re-formatting, and integrating data just to answer some basic questions in genomics. At the same time (this was about two years ago) it was becoming clear that genomics was going to be an important industry with a yet unsolved IT component. David Caplan (SolveBio’s CTO) and I started hacking away at ways to simplify genome analysis in the anticipation that interpreting DNA would be a significant problem in both research and the clinic. One thing we noticed was that there were no companies or services out there to help out guys like us – people that were programming with genomic data. There were a few attempts at kludgy interfaces for bioinformatics and a number of people were trying to solve the read mapping computing infrastructure problem, but there were no “developer tools” for integrating genomic data. In part, that was because a couple years ago there wasn’t that much data out there, so parsing, formatting, cleaning, indexing, updating, and integrating data wasn’t as big of a problem as it is now (or will be in a few years). We set out to build an API to the world’s genomic data so that other programmers could build amazing applications with the data without having to repeat painful meaningless tasks.

As we started talking to people about our API we realized how valuable a genomic data service is for the clinic. Genomics is no longer solely an academic problem. When we started talking to hospitals and commercial diagnostic labs, that’s when we realized that this is a crucial problem. That’s also when we realized that an API to public data is just the tip of the iceberg. Access to clinical genomic information that can be used as reference data is the key to interpreting DNA as a clinical metric.

3) After the molecular technology revolution made it possible for us to collect large amounts of precise medical data at low cost, another problem appeared to take over. How do you see the solution of the problem that the data are not in a language doctors can understand?

The molecular technology revolution will make it possible to move from “Intuitive Medicine” to “Precision Medicine”, in the language of Clay Christensen and colleagues in “Innovator’s Prescription”. Molecular markers are much closer to being unique fingerprints of the individual than whatever can be expressed by the English language in a doctor’s note. If these markers can be conclusively associated with diagnosis and treatment, medicine will be an order of magnitude better, faster, cheaper than it is now. Doctors can’t possibly be expected to read the three billion base pairs or so that make up the genome of every patient and recall which diagnosis and treatment is the best fit in light of the genetic information. This is where the digital revolution – i.e. computing – comes in. Aggregating silo’ed data while maintaining the privacy of the patients using bleeding edge software will allow doctors to use clinical genomic data to better medicine.

4) What are your plans for the future of SolveBio? Are you working on developing more tools/apps?

Our goal is to be the data delivery system for genomic medicine. We’ve built the tools necessary to integrate data into a genomic medical application, such as a diagnostic tool or variant annotator. We are now building some of these applications to make life easier for people running genetic tests.

5) Do you recognise the problem of limited sharing of genomics data for research and diagnosis? Can you think of an example of how the work of SolveBio supports data access and knowledge sharing within the genomics community?

The information we can glean from DNA sequence is only as good as the reference data that is used for research and diagnostic applications. We are particularly interested in genomics data from the perspective of how linking data from different sources creates the best possible reference for clinical genomics. This is, in a way, a data sharing problem.

I would add though that a huge disincentive to distributing data is the privacy, security, liability, and branding concern that clinical and commercial outfits are right to take into account. As a result, we are especially tailoring our platform to address those concerns.

However, even the data that is currently being “shared” openly, largely as a product of the taxpayer funded academic community, is very difficult and costly to access. Open data isn’t free. It involves building and maintaining substantial infrastructure to make sure the data is up-to-date and to verify quality. SolveBio solves that problem. Developers building DNA interpretation tools no longer have to worry about setting up their data infrastructure. They can integrate data with a few lines of code through SolveBio.

6) Which is the most important thing that should be done in the field of genetic data sharing and what does ethical data sharing mean to you?

Ethical data sharing means keeping patient data private and secure. If data is used for research or diagnostic purposes and needs to be transferred among doctors, scientists, or engineers then privacy and security is a key concern. Without privacy and security controls genomic data will never benefit from the aggregate knowledge of programmers and clinicians because patients will be rightly opposed to measuring, let alone distributing, their genomic information. Patient data belongs to the patient. Sometimes clinicians and researchers forget that. I definitely think the single most important thing to get right is the data privacy and security standard. The entire field depends upon it.

logo-SolveBio

Open Spending: Tracking Financial Data worldwide

If you have followed the activites of the OKFN these last years, you probably already know Open Spending, the community-driven project initiated in 2007 and which has considerably grown since then. First, the idea started with Where Does My Money Go?, a database for UK public financial data, financed by the 4IP (4 Innovation for the Public) fund of the British channel 4. Few years later in 2011, the initiative has been internationalized and Open Spending was born, a worldwide platform which has largely gone beyond the British borders. Today, the site shows data from 73 countries from Bosnia to Uganda and the visualisation tool Spending Stories could be developed at the same time, thanks a grant from the Knight Foundation. Talking about funding, not to forget the Open Society Foundations which supports the community building work and the Omidyar Network which funded the research behind the report “Technology for Transparent and Accountable Public Finance”. You guessed it? Everything is Open Source.

OpenSpending_web

Open Spending consists not only in aggregating worldwide public financial data as budgets, spending, balance sheets, procurement or employees salaries; giving information on how public money has been spent all over the world and in your own city. It allows users to visualise directly the available data via Spending Stories and add new datasets as well. The community members making use of the tools and developing them show various backgrounds and every one is invited to join. Additionally, articles are regularly posted on the blog to incite to share knowledge each other.

The results so far are very good since numerous administrations and media have already used the visualisations, as the city of Berlin and the Guardian for instance. But besides them, independent journalists, activists from the civil society, students and engaged citizens take also avantage of the datasets, allowing a better understanding on public money.

Bildschirmfoto vom 2014-12-03 18:19:44           TheGuardian

DNAdigest Symposium: A tour in Open Science in human genomics research

This past weekend, DNAdigest organized a Symposium on the topic “Open Science in human genomics research – challenges and inspirations”. The event brought together very interested in the topic and enthusiastic people along with the DNAdigest team. We are very pleased to say that this day turned out to be a success, where both participants and organizers enjoyed the amazing talks of our speaker and the discussion sessions.

The day started with a short introduction on the topic by Fiona Nielsen.

DNAdigestSummit1

Then our first speaker, Manuel Corpas was a source of inspiration to all participants, talking us through the process he experienced in order to fully sequence the whole genomes of his family and himself and to share this data widely with the whole world.  Here is a link to the presentation he introduced on the day.

The Symposium was organized in the format of Open Space conference, where everybody got to suggest different topics related to Open Science or choose to join one which sounds most interesting. Again, we used HackPad to take notes and interesting thoughts throughout the discussions. You can take a look at it here.

DNAdigestSummit2

We had three more speakers invited to our Symposium: Tim Hubbard (slides) talked about how Genomics England gets to engaged the research community, in the face of genomic scientists and patient communities, to collaborate on both data generation and data analysis of the 100k Genomes Project for the public benefit. Julia Wilson (slides) came as a representative of the Global Alliance. She introduced us to the GA4GH and explained how their work helps to implement standards for data sharing across genomics and health. Last, but not least was Nick Sireau (slides). He walked us through an eight-step process to show us how exactly the scientific community and the patient community can engage in collaborations, and how Open Science (sharing of hypotheses, methods and results, throughout the science process) may be either beneficial or challenging in this context.

DNAdigest Symposium

The event came to its end with a summary of learning points and a rounding up by Fiona Nielsen.

We have also made a storify summary where you can find a collection of all the tweets and most of the photos covering the duration of the day.  Also there is a gallery including all pictures taken by our team members.

Now to all former and future participants, If you enjoy participating in these events please donate to DNAdigest by texting DNAD14 £10 to 70070, so that we can continue organizing more of these interactive and exciting events in the future. You can also buy some of our cool DNAdigest T-shirts and Mugs from our website shop.

It was great to see you all, and we look forward to welcoming you again for our next events!

DNAdigest team: Fiona, Adrian, Margi, Francis, Sebastian, Xocas and Tim

This event would not have been possible without the contributions of our generous sponsors:

DNAdigestSummit_sponsor3

DNAdigestSummit_sponsor

DNAdigestSummit_sponsor2