DNAdigest continues with the series of interviews. Here we would like to introduce you to Mr Mark Kaganovich, CEO of SolveBio, who agreed on an interview with us. He shared a lot about what SolveBio does and discussed with us the importance of genomic data sharing.
Mark Kaganovich, CEO of SolveBio
1) Could you describe what SolveBio does?
SolveBio delivers the critical reference data used by hospitals and companies to run genomic applications. These applications use SolveBio’s data to predict the effects of slight DNA variants on a person’s health. SolveBio has designed a secure platform for the robust delivery of complex reference datasets. We make the data easy to access so that our customers can focus on building clinical grade molecular diagnostics applications, faster.
2) How did you come up with the idea of building a system that integrates genomic reference data into diagnostic and research applications? And what was the crucial moment when you realised the importance of creating it?
As a graduate student I spent a lot of time parsing, re-formatting, and integrating data just to answer some basic questions in genomics. At the same time (this was about two years ago) it was becoming clear that genomics was going to be an important industry with a yet unsolved IT component. David Caplan (SolveBio’s CTO) and I started hacking away at ways to simplify genome analysis in the anticipation that interpreting DNA would be a significant problem in both research and the clinic. One thing we noticed was that there were no companies or services out there to help out guys like us – people that were programming with genomic data. There were a few attempts at kludgy interfaces for bioinformatics and a number of people were trying to solve the read mapping computing infrastructure problem, but there were no “developer tools” for integrating genomic data. In part, that was because a couple years ago there wasn’t that much data out there, so parsing, formatting, cleaning, indexing, updating, and integrating data wasn’t as big of a problem as it is now (or will be in a few years). We set out to build an API to the world’s genomic data so that other programmers could build amazing applications with the data without having to repeat painful meaningless tasks.
As we started talking to people about our API we realized how valuable a genomic data service is for the clinic. Genomics is no longer solely an academic problem. When we started talking to hospitals and commercial diagnostic labs, that’s when we realized that this is a crucial problem. That’s also when we realized that an API to public data is just the tip of the iceberg. Access to clinical genomic information that can be used as reference data is the key to interpreting DNA as a clinical metric.
3) After the molecular technology revolution made it possible for us to collect large amounts of precise medical data at low cost, another problem appeared to take over. How do you see the solution of the problem that the data are not in a language doctors can understand?
The molecular technology revolution will make it possible to move from “Intuitive Medicine” to “Precision Medicine”, in the language of Clay Christensen and colleagues in “Innovator’s Prescription”. Molecular markers are much closer to being unique fingerprints of the individual than whatever can be expressed by the English language in a doctor’s note. If these markers can be conclusively associated with diagnosis and treatment, medicine will be an order of magnitude better, faster, cheaper than it is now. Doctors can’t possibly be expected to read the three billion base pairs or so that make up the genome of every patient and recall which diagnosis and treatment is the best fit in light of the genetic information. This is where the digital revolution – i.e. computing – comes in. Aggregating silo’ed data while maintaining the privacy of the patients using bleeding edge software will allow doctors to use clinical genomic data to better medicine.
4) What are your plans for the future of SolveBio? Are you working on developing more tools/apps?
Our goal is to be the data delivery system for genomic medicine. We’ve built the tools necessary to integrate data into a genomic medical application, such as a diagnostic tool or variant annotator. We are now building some of these applications to make life easier for people running genetic tests.
5) Do you recognise the problem of limited sharing of genomics data for research and diagnosis? Can you think of an example of how the work of SolveBio supports data access and knowledge sharing within the genomics community?
The information we can glean from DNA sequence is only as good as the reference data that is used for research and diagnostic applications. We are particularly interested in genomics data from the perspective of how linking data from different sources creates the best possible reference for clinical genomics. This is, in a way, a data sharing problem.
I would add though that a huge disincentive to distributing data is the privacy, security, liability, and branding concern that clinical and commercial outfits are right to take into account. As a result, we are especially tailoring our platform to address those concerns.
However, even the data that is currently being “shared” openly, largely as a product of the taxpayer funded academic community, is very difficult and costly to access. Open data isn’t free. It involves building and maintaining substantial infrastructure to make sure the data is up-to-date and to verify quality. SolveBio solves that problem. Developers building DNA interpretation tools no longer have to worry about setting up their data infrastructure. They can integrate data with a few lines of code through SolveBio.
6) Which is the most important thing that should be done in the field of genetic data sharing and what does ethical data sharing mean to you?
Ethical data sharing means keeping patient data private and secure. If data is used for research or diagnostic purposes and needs to be transferred among doctors, scientists, or engineers then privacy and security is a key concern. Without privacy and security controls genomic data will never benefit from the aggregate knowledge of programmers and clinicians because patients will be rightly opposed to measuring, let alone distributing, their genomic information. Patient data belongs to the patient. Sometimes clinicians and researchers forget that. I definitely think the single most important thing to get right is the data privacy and security standard. The entire field depends upon it.