Exploring Open Science n°4: DNAdigest interviews Nowomics

This week I would like to introduce you to Richard Smith, founder and software developer of Nowomics. He kindly agreed to answer some questions for our post blog series and here it is – first hand information on Nowomics. Keep reading to find out more about this company.

richard_smith

Richard Smith, founder and software developer of Nowomics

1. Could you please give us a short introduction to Nowomics (goals, interests, mission)?

Nowomics is a free website to help life scientists keep up with the latest papers and data relevant to their research. It lets researchers ‘follow’ genes and keywords to build their own news feed of what’s new and popular in their field. The aim is to help scientists discover the most useful information and avoid missing important journal articles, but without spending a lot of their time searching websites.

2. What makes Nowomics unique?

Nowomics tracks new papers, but also other sources of curated biological annotation and experimental data. It can tell you if a gene you work on has new annotation added or has been linked to a disease in a recent study. The aim is to build knowledge of these biological relationships into the software to help scientists navigate and discover information, rather than recommending papers simply by text similarity.

3. When did you realise that a tool such as Nowomics would be of a great help to the genomic research community?

I’ve been building websites and databases for biologists for a long time and have heard from many scientists how hard it is to keep up with the flood of new information. There are around 20,000 biomedical journal articles published every week and hundreds of sources of data online, receiving lots of emails with lists of paper titles isn’t a great solution. In social media interactive news feeds that adapt to an individual are now commonly used as an excellent way to consume large amounts of new information, I wanted to apply these principles to tracking biology research.

4. Which part of developing the tool you found most challenging?

As with a lot of software, making sure Nowomics is as useful as possible to users has been the hardest part. It’s quite straightforward to identify a problem and build some software, but making sure the two are correctly aligned to provide maximum value to users has been the difficult part. It has meant trying many things, demonstrating ideas and listening to a lot of feedback. Handling large amounts of data and writing text mining software to identify thousands of biological terms is simple by comparison!

5. What are your plans for the future of Nowomics? Are you working on adding new features/apps?

There are lots of new features planned. Currently Nowomics focuses on genes/proteins and selected organisms. We’ll soon make this much broader, so scientists will be able to follow diseases, pathways, species, processes and many other keywords. We’re working on how these terms can be combined together for fine grained control of what appears in news feeds. It’s also important to make sharing with colleagues and recommending research extremely simple.

6. Can you think of examples of how Nowomics supports data access and knowledge dissemination within the genomics community?

The first step to sharing data sets and accessing research is for the right people to know they exist. This is exactly what Nowomics was set up to achieve, to benefit both scientists who need to be alerted to useful information and for those generating or funding research to reach the best possible audience. Hopefully Nowomics will also alert people to relevant shared genomics data in future.

7. What does ethical data sharing mean to you?

For data that can advance scientific and medical research the most ethical thing to do is to share it with other researchers to help make progress. This is especially true for data resulting from publicly funded research. However, with medical and genomics data the issues of confidentiality and privacy must take priority, and individuals must be aware what their information may be used for.

8. What are the most important things that you think should be done in the field of genetic data sharing?

The challenge is to find a way to unlock the huge potential of sharing genomics data for analysis while respecting the very real privacy concerns. A platform that enables sharing in a secure, controlled manner which preserves privacy and anonymity seems essential, I’m very interested in what DNADigest are doing in this regard.

Bildschirmfoto vom 2015-01-12 15:45:52

Exploring Open Science n°3: DNAdigests interviews NGS logistics

NGS logistics is the next project featured in our blog interviews. We have interviewed Amin Ardeshirdavani who is a PhD student involved in the creation of this web-based application. Take a look at the interview to find why this tool has become very popular within KU Leuven.

NGSlogistics

1. What is NGS logistics?

NGS-Logistics is a web-based application, which accelerates the federated analysis of Next Generation Sequencing data across different centres. NGS-Logistics acts as a real logistics company: you order something from the Internet; the owner processes your request and then ships it through a safe and trustful logistics company. In this of NGS-Logistics, the goods are human sequence data and researchers ask for possible variations and their frequency among the whole population. We try to deliver the answers in the fastest and safest possible way.

2. What is your part in NGS logistics?

Right now I am a PhD student at KU Leuven and the whole idea of my PhD project is designing and developing new data structures for analysing of massive amount of data produced by Next Generation Sequencing machines. NGS logistics is exactly that. I have done the whole design and development of the application and database. Hereby I would also like to acknowledge all the people from the KU Leuven, ESAT IT Dept., UZ Leuven IT Dept., and UZ Genomics core Dept. who assisted me on this project and for their kind support, especially Erika Souche.

3. When did you first start working on the idea of creating NGS logistics and what made you think it would be something useful?

It was almost three years ago when I had a meeting with my promotor Professor Yves Moreau, and he had an idea to somehow connect sequencing centres and query their data without moving them into one repository. As a person with an IT background it wasn’t that difficult for me to develop an application but there were lots of practical issues that needed to be taken care of. The majority of these issues are related to protecting the privacy of the individuals, because the data we deal with are coming from human genome sequencing experiments and people are rightfully worried about how this data will be used and protected. At the time of my first meeting there was no system in place to share this data but many people understood the need for this kind of structure and for us to start working on it. As we know, information can be a true scientific goldmine and by having access to more data we are able to produce more useful information. The novelty of the data, the possibility of sharing this wealth of information, and the complexity of this kind of applications make me so eager to work on this project.

4. How does your open source tool work and who it is designed for?

NGS-Logistics has three modules: Web Interface, Access control list and the Query manager. The source code of each one of these modules plus the database structure behind them is available upon simple request. As the modules are being upgraded continuously, I have not made any public repository for the source code yet. However, if someone would be interested to gain access to the source code it will be our pleasure to give it to them while I do think that the whole idea of the Data sharing is more important than the source code itself. Anyhow, it is our pleasure to share our experience with different problems and issues that we had to tackle during the past three years with others. In general, NGS-Logistics is designed to help researchers to save time when they need to have access to more data. It will help them to get a better overview of their questions and if they need to have access to the actual data, it will help them get the most useful data sets that match their cases.

5. Who has access to the system and how do you manage access permissions?

Researchers with a valid email address and affiliation are welcome to register and use the application. This means that we need to know who is querying the data to prevent structural queries, which may lead to identify an individual. I spent almost 20 months on the Access Control List (ACL) module. Most of the tasks are controlled and automatically updated by the system itself. Center Admins will be responsible for updating the list of samples they want to share with the others. PIs and their power users are responsible to group the samples as data sets and assign them to the users and groups. ACL has a very rich and user-friendly interface that makes it very easy to learn and use.

6. In what way do you think data sharing should be further improved?

Because of all the concerns around the term “Data Sharing”, I prefer to use the term “Result Sharing”. In our framework, we mostly try to answer very high-level questions like “The prevalence of a certain mutation in different populations”, preventing any private information from leaking out. By having more access to data we can gain more insight and produce more useful information; as Aristotle said: “The whole is greater than the sum of its parts.” On the other hand we always have to be careful about the consequences of sharing.

7. What does ethical data sharing mean to you?

It means everything and nothing. Why? Because ethics really depends on the subject and the location we are talking about. If we talk about sharing weather forecast data, I would say it is not important and it does not have any meaning. But when we talk about the data produced based on human genomes then we have to be careful. Legal frameworks differ a lot between many countries. Some of them are very restrictive when it comes to dealing with sensitive and private data whereas others are much less restrictive. Mostly this is because they have different definitions of private data. In most cases, any information that allows us to uniquely identify a person is defined as private information and as we know there is a possibility to identify a person by his or her genome sequence. Therefore, I feel that it is very important to keep track of what data is being used by who, when, at which level and for what reason.

NGS

Amin Ardeshirdavani et al, has published his work in Genome Medicine 6:71 : “NGS-Logistics: federated analysis of NGS sequence variants across multiple locations”. You can take a look at it here.

Exploring Open Science n°2: DNAdigest interviews SolveBio

DNAdigest continues with the series of interviews. Here we would like to introduce you to Mr Mark Kaganovich, CEO of SolveBio, who agreed on an interview with us. He shared a lot about what SolveBio does and discussed with us the importance of genomic data sharing.

Mark

Mark Kaganovich, CEO of SolveBio

1) Could you describe what SolveBio does?

SolveBio delivers the critical reference data used by hospitals and companies to run genomic applications. These applications use SolveBio’s data to predict the effects of slight DNA variants on a person’s health. SolveBio has designed a secure platform for the robust delivery of complex reference datasets. We make the data easy to access so that our customers can focus on building clinical grade molecular diagnostics applications, faster.

2) How did you come up with the idea of building a system that integrates genomic reference data into diagnostic and research applications? And what was the crucial moment when you realised the importance of creating it?

As a graduate student I spent a lot of time parsing, re-formatting, and integrating data just to answer some basic questions in genomics. At the same time (this was about two years ago) it was becoming clear that genomics was going to be an important industry with a yet unsolved IT component. David Caplan (SolveBio’s CTO) and I started hacking away at ways to simplify genome analysis in the anticipation that interpreting DNA would be a significant problem in both research and the clinic. One thing we noticed was that there were no companies or services out there to help out guys like us – people that were programming with genomic data. There were a few attempts at kludgy interfaces for bioinformatics and a number of people were trying to solve the read mapping computing infrastructure problem, but there were no “developer tools” for integrating genomic data. In part, that was because a couple years ago there wasn’t that much data out there, so parsing, formatting, cleaning, indexing, updating, and integrating data wasn’t as big of a problem as it is now (or will be in a few years). We set out to build an API to the world’s genomic data so that other programmers could build amazing applications with the data without having to repeat painful meaningless tasks.

As we started talking to people about our API we realized how valuable a genomic data service is for the clinic. Genomics is no longer solely an academic problem. When we started talking to hospitals and commercial diagnostic labs, that’s when we realized that this is a crucial problem. That’s also when we realized that an API to public data is just the tip of the iceberg. Access to clinical genomic information that can be used as reference data is the key to interpreting DNA as a clinical metric.

3) After the molecular technology revolution made it possible for us to collect large amounts of precise medical data at low cost, another problem appeared to take over. How do you see the solution of the problem that the data are not in a language doctors can understand?

The molecular technology revolution will make it possible to move from “Intuitive Medicine” to “Precision Medicine”, in the language of Clay Christensen and colleagues in “Innovator’s Prescription”. Molecular markers are much closer to being unique fingerprints of the individual than whatever can be expressed by the English language in a doctor’s note. If these markers can be conclusively associated with diagnosis and treatment, medicine will be an order of magnitude better, faster, cheaper than it is now. Doctors can’t possibly be expected to read the three billion base pairs or so that make up the genome of every patient and recall which diagnosis and treatment is the best fit in light of the genetic information. This is where the digital revolution – i.e. computing – comes in. Aggregating silo’ed data while maintaining the privacy of the patients using bleeding edge software will allow doctors to use clinical genomic data to better medicine.

4) What are your plans for the future of SolveBio? Are you working on developing more tools/apps?

Our goal is to be the data delivery system for genomic medicine. We’ve built the tools necessary to integrate data into a genomic medical application, such as a diagnostic tool or variant annotator. We are now building some of these applications to make life easier for people running genetic tests.

5) Do you recognise the problem of limited sharing of genomics data for research and diagnosis? Can you think of an example of how the work of SolveBio supports data access and knowledge sharing within the genomics community?

The information we can glean from DNA sequence is only as good as the reference data that is used for research and diagnostic applications. We are particularly interested in genomics data from the perspective of how linking data from different sources creates the best possible reference for clinical genomics. This is, in a way, a data sharing problem.

I would add though that a huge disincentive to distributing data is the privacy, security, liability, and branding concern that clinical and commercial outfits are right to take into account. As a result, we are especially tailoring our platform to address those concerns.

However, even the data that is currently being “shared” openly, largely as a product of the taxpayer funded academic community, is very difficult and costly to access. Open data isn’t free. It involves building and maintaining substantial infrastructure to make sure the data is up-to-date and to verify quality. SolveBio solves that problem. Developers building DNA interpretation tools no longer have to worry about setting up their data infrastructure. They can integrate data with a few lines of code through SolveBio.

6) Which is the most important thing that should be done in the field of genetic data sharing and what does ethical data sharing mean to you?

Ethical data sharing means keeping patient data private and secure. If data is used for research or diagnostic purposes and needs to be transferred among doctors, scientists, or engineers then privacy and security is a key concern. Without privacy and security controls genomic data will never benefit from the aggregate knowledge of programmers and clinicians because patients will be rightly opposed to measuring, let alone distributing, their genomic information. Patient data belongs to the patient. Sometimes clinicians and researchers forget that. I definitely think the single most important thing to get right is the data privacy and security standard. The entire field depends upon it.

logo-SolveBio

DNAdigest Symposium: A tour in Open Science in human genomics research

This past weekend, DNAdigest organized a Symposium on the topic “Open Science in human genomics research – challenges and inspirations”. The event brought together very interested in the topic and enthusiastic people along with the DNAdigest team. We are very pleased to say that this day turned out to be a success, where both participants and organizers enjoyed the amazing talks of our speaker and the discussion sessions.

The day started with a short introduction on the topic by Fiona Nielsen.

DNAdigestSummit1

Then our first speaker, Manuel Corpas was a source of inspiration to all participants, talking us through the process he experienced in order to fully sequence the whole genomes of his family and himself and to share this data widely with the whole world.  Here is a link to the presentation he introduced on the day.

The Symposium was organized in the format of Open Space conference, where everybody got to suggest different topics related to Open Science or choose to join one which sounds most interesting. Again, we used HackPad to take notes and interesting thoughts throughout the discussions. You can take a look at it here.

DNAdigestSummit2

We had three more speakers invited to our Symposium: Tim Hubbard (slides) talked about how Genomics England gets to engaged the research community, in the face of genomic scientists and patient communities, to collaborate on both data generation and data analysis of the 100k Genomes Project for the public benefit. Julia Wilson (slides) came as a representative of the Global Alliance. She introduced us to the GA4GH and explained how their work helps to implement standards for data sharing across genomics and health. Last, but not least was Nick Sireau (slides). He walked us through an eight-step process to show us how exactly the scientific community and the patient community can engage in collaborations, and how Open Science (sharing of hypotheses, methods and results, throughout the science process) may be either beneficial or challenging in this context.

DNAdigest Symposium

The event came to its end with a summary of learning points and a rounding up by Fiona Nielsen.

We have also made a storify summary where you can find a collection of all the tweets and most of the photos covering the duration of the day.  Also there is a gallery including all pictures taken by our team members.

Now to all former and future participants, If you enjoy participating in these events please donate to DNAdigest by texting DNAD14 £10 to 70070, so that we can continue organizing more of these interactive and exciting events in the future. You can also buy some of our cool DNAdigest T-shirts and Mugs from our website shop.

It was great to see you all, and we look forward to welcoming you again for our next events!

DNAdigest team: Fiona, Adrian, Margi, Francis, Sebastian, Xocas and Tim

This event would not have been possible without the contributions of our generous sponsors:

DNAdigestSummit_sponsor3

DNAdigestSummit_sponsor

DNAdigestSummit_sponsor2

Exploring Open Science: DNAdigest interviews Aridhia

As promised last week in the DNAdigest’s newsletter, we are giving life to our first blog post interview. Be introduced to Mr Rodrigo Barnes, part of the Aridia team. He kindly agreed to answer our questions about Aridhia and their views on genomic data sharing.

rodrigo-barnes-300x198

Mr Rodrigo Barnes, CTO of Aridhia

1. You are a part of the Aridhia team. Please, tell us what the goals and the interests of the company are?

Aridhia started with the objective of using health informatics and analytics to improve efficiency and service delivery for healthcare providers, support the management of chronic disease and personalised medicine, and ultimately improve patient outcomes.

Good outcomes had already started to emerge in diabetes and other chronic diseases, through some of the work undertaken by the NHS in Scotland and led by one of our founders, Professor Andrew Morris. This included providing clinicians and patients with access to up-to-date, rich information from different parts of the health system.

Aridhia has since developed new products and services to solve informatics challenges in the clinical and operational aspects of health. As a commercial organisation, we have worked on these opportunities in collaboration with healthcare providers, universities, innovation centres and other industry partners, to ensure that the end products are fit for purpose, and the benefits can be shared between our diverse stakeholders. We have always set high standards for ourselves, not just technically, but particularly when it comes to respecting people’s privacy and doing business with integrity.

2. What is your role in the organisation and how does your work support the mission of the company?

Although my background is in mathematics, I’ve worked as a programmer in software start-ups for the majority of my career. Since joining Aridhia as one of its first employees, I have designed and developed software for clinical data, often working closely with NHS staff and university researchers. This has been great opportunity to work on (ethically) good problems and participate in multidisciplinary projects with some very smart, committed and hard-working people.

In the last year, I took on the CTO (Chief Technology Officer) role, which means I have to take a more strategic perspective on the business of health informatics. But I still work directly with customers and enjoy helping them develop new products.

3. What makes Aridhia unique?

We put collaboration at the very heart of everything we do. We work really hard to understand the different perspectives and motivations people bring to a project, and acknowledge expertise in others, but we’re also happy to assert our own contribution. We have also been lucky to have investors who recognise the challenges in this market and support our vision for addressing them.

4. Aridhia have recently won a competition for helping businesses develop new technology to map and analyse genes and more specifically to support the efforts of NHS to map whole genomes of patients with rare diseases or cancer. On which phase are you now and have you developed an idea (or even a prototype) that you can tell us more about?

It’s a little early to say too much about our product plans, but we have identified a number of aspects within genomic medicine that we feel need to be addressed. Based on our extensive experience in the health field, we think a one size fits all approach won’t work when it comes to annotating genomes and delivering that information usefully into the NHS (and similar healthcare settings). There will be different user needs, of course, but there are also IT procurement and deployment challenges to tackle before any smart solution can become common practice in the NHS.

We strongly believe that there is a new generation of annotation products and services waiting to emerge from academic/health collaborations. We believe that clinical groups have the depth of knowledge and the databases of cases that are needed to provide real insight into complex diseases with genetic factors, and we are keen to help these SMEs and spin outs validate their technology and get them ‘to market’ in the NHS and healthcare settings around the world.

Overall our initial objective is to help take world class annotations out of research labs and into operational use in the NHS. Both of these goals are very much in line with Genomic England‘s mandate to improve health and wealth in the UK.

5. Aridhia is a part of The Kuwait Scotland eHealth Innovation Network (KSeHIN). Can you tell us something more about this project and what your plans for further development are?

Kuwait has one of the highest rates of obesity and diabetes in the world, and the Kuwait Ministry of Health has responsibility for tackling this important issue. We’ve worked with the Dasman Diabetes Centre in Kuwait and the University of Dundee to bring informatics, education and resources to improve diabetes care. The challenge from the initial phase is to scale up to a national system. We think there are good opportunities to work with the Ministry of Health in Kuwait to achieve their goals as well as working with the Dasman’s own genomics and research programmes. This project is an excellent example of the combination of skills and resources needed to make an impact on the burden of chronic disease.

6. Do you recognise the problem of limited sharing of genomics data for research and diagnosis? How does the work of Aridhia support data access and knowledge sharing within the genomics community?

This is a sensitive subject of course, and we have to acknowledge that this is data that can’t readily be anonymised. Sharing, if it’s permissible, won’t follow the patterns we are used to with other types of data. That’s why we took an interest in the work DNA Digest is doing.

Earlier in the year, Aridhia launched its collaborative data science platform, AnalytiXagility which takes a tiered approach to the managed sharing of sensitive data. We make sure that we offer data owners and controllers what they need to ensure they feel comfortable in sharing data. AnalytiXagility delivers a protocol for negotiation and sharing, backed by a ‘life-cycle’ or ‘lease’ approach to the sharing and audit systems to verify compliance. This has been primarily used for clinical, imaging and genomics data to date.

In a ‘Research Safe Haven’ model, the analysts come to the data, and have access to that for the intended purpose and duration of their project. This system is in place at the Stratified Medicine Scotland – Innovation Centre, which already supports projects using genomic and clinical data. The model we are developing for genomic data extends that paradigm of bringing computing to the data. We are taking this step by step and working with partners and customers to strengthen the system.

From a research perspective, the challenges are likely to be related to having enough linked clinical data, but also having enough samples and controls to get a meaningful result. So we think we will see standards emerging for federated models – research groups will try to apply their analysis against raw genomic data at multiple centres using something like the Global Alliance 4 Genomics and Health API, and then collate results for analysis under a research safe haven model. We recently joined the Global Alliance and will bring our experience of working with electronic patient records and clinical informatics to the table.

7. What are your thought on the most important thing that should be done in the field of genetic data sharing?

Trust and transparency are important factors. I am interested in seeing what could be done to establish protocols and accreditations that would give participants visibility of how data is being used and how the benefits are shared.

aridhia_logo-300x231