Exploring Open Science: DNAdigest interviews Aridhia

As promised last week in the DNAdigest’s newsletter, we are giving life to our first blog post interview. Be introduced to Mr Rodrigo Barnes, part of the Aridia team. He kindly agreed to answer our questions about Aridhia and their views on genomic data sharing.

rodrigo-barnes-300x198

Mr Rodrigo Barnes, CTO of Aridhia

1. You are a part of the Aridhia team. Please, tell us what the goals and the interests of the company are?

Aridhia started with the objective of using health informatics and analytics to improve efficiency and service delivery for healthcare providers, support the management of chronic disease and personalised medicine, and ultimately improve patient outcomes.

Good outcomes had already started to emerge in diabetes and other chronic diseases, through some of the work undertaken by the NHS in Scotland and led by one of our founders, Professor Andrew Morris. This included providing clinicians and patients with access to up-to-date, rich information from different parts of the health system.

Aridhia has since developed new products and services to solve informatics challenges in the clinical and operational aspects of health. As a commercial organisation, we have worked on these opportunities in collaboration with healthcare providers, universities, innovation centres and other industry partners, to ensure that the end products are fit for purpose, and the benefits can be shared between our diverse stakeholders. We have always set high standards for ourselves, not just technically, but particularly when it comes to respecting people’s privacy and doing business with integrity.

2. What is your role in the organisation and how does your work support the mission of the company?

Although my background is in mathematics, I’ve worked as a programmer in software start-ups for the majority of my career. Since joining Aridhia as one of its first employees, I have designed and developed software for clinical data, often working closely with NHS staff and university researchers. This has been great opportunity to work on (ethically) good problems and participate in multidisciplinary projects with some very smart, committed and hard-working people.

In the last year, I took on the CTO (Chief Technology Officer) role, which means I have to take a more strategic perspective on the business of health informatics. But I still work directly with customers and enjoy helping them develop new products.

3. What makes Aridhia unique?

We put collaboration at the very heart of everything we do. We work really hard to understand the different perspectives and motivations people bring to a project, and acknowledge expertise in others, but we’re also happy to assert our own contribution. We have also been lucky to have investors who recognise the challenges in this market and support our vision for addressing them.

4. Aridhia have recently won a competition for helping businesses develop new technology to map and analyse genes and more specifically to support the efforts of NHS to map whole genomes of patients with rare diseases or cancer. On which phase are you now and have you developed an idea (or even a prototype) that you can tell us more about?

It’s a little early to say too much about our product plans, but we have identified a number of aspects within genomic medicine that we feel need to be addressed. Based on our extensive experience in the health field, we think a one size fits all approach won’t work when it comes to annotating genomes and delivering that information usefully into the NHS (and similar healthcare settings). There will be different user needs, of course, but there are also IT procurement and deployment challenges to tackle before any smart solution can become common practice in the NHS.

We strongly believe that there is a new generation of annotation products and services waiting to emerge from academic/health collaborations. We believe that clinical groups have the depth of knowledge and the databases of cases that are needed to provide real insight into complex diseases with genetic factors, and we are keen to help these SMEs and spin outs validate their technology and get them ‘to market’ in the NHS and healthcare settings around the world.

Overall our initial objective is to help take world class annotations out of research labs and into operational use in the NHS. Both of these goals are very much in line with Genomic England‘s mandate to improve health and wealth in the UK.

5. Aridhia is a part of The Kuwait Scotland eHealth Innovation Network (KSeHIN). Can you tell us something more about this project and what your plans for further development are?

Kuwait has one of the highest rates of obesity and diabetes in the world, and the Kuwait Ministry of Health has responsibility for tackling this important issue. We’ve worked with the Dasman Diabetes Centre in Kuwait and the University of Dundee to bring informatics, education and resources to improve diabetes care. The challenge from the initial phase is to scale up to a national system. We think there are good opportunities to work with the Ministry of Health in Kuwait to achieve their goals as well as working with the Dasman’s own genomics and research programmes. This project is an excellent example of the combination of skills and resources needed to make an impact on the burden of chronic disease.

6. Do you recognise the problem of limited sharing of genomics data for research and diagnosis? How does the work of Aridhia support data access and knowledge sharing within the genomics community?

This is a sensitive subject of course, and we have to acknowledge that this is data that can’t readily be anonymised. Sharing, if it’s permissible, won’t follow the patterns we are used to with other types of data. That’s why we took an interest in the work DNA Digest is doing.

Earlier in the year, Aridhia launched its collaborative data science platform, AnalytiXagility which takes a tiered approach to the managed sharing of sensitive data. We make sure that we offer data owners and controllers what they need to ensure they feel comfortable in sharing data. AnalytiXagility delivers a protocol for negotiation and sharing, backed by a ‘life-cycle’ or ‘lease’ approach to the sharing and audit systems to verify compliance. This has been primarily used for clinical, imaging and genomics data to date.

In a ‘Research Safe Haven’ model, the analysts come to the data, and have access to that for the intended purpose and duration of their project. This system is in place at the Stratified Medicine Scotland – Innovation Centre, which already supports projects using genomic and clinical data. The model we are developing for genomic data extends that paradigm of bringing computing to the data. We are taking this step by step and working with partners and customers to strengthen the system.

From a research perspective, the challenges are likely to be related to having enough linked clinical data, but also having enough samples and controls to get a meaningful result. So we think we will see standards emerging for federated models – research groups will try to apply their analysis against raw genomic data at multiple centres using something like the Global Alliance 4 Genomics and Health API, and then collate results for analysis under a research safe haven model. We recently joined the Global Alliance and will bring our experience of working with electronic patient records and clinical informatics to the table.

7. What are your thought on the most important thing that should be done in the field of genetic data sharing?

Trust and transparency are important factors. I am interested in seeing what could be done to establish protocols and accreditations that would give participants visibility of how data is being used and how the benefits are shared.

aridhia_logo-300x231

Giving research data the credit it’s due

In many ways, the currency of the scientific world is publications. Published articles are seen as proof – often by colleagues and future employers – of the quality, relevance and impact of a researcher’s work. Scientists read papers to familiarize themselves with new results and techniques, and then they cite those papers in their own publications, increasing the recognition and spread of the most useful articles. However, while there is undoubtedly a role for publishing a nicely-packaged, (hopefully) well-written interpretation of one’s work, are publications really the most valuable product that we as scientists have to offer one another?

As biology moves more and more towards large-scale, high-throughput techniques – think all of the ‘omics – an increasingly large proportion of researchers’ time and effort is spent generating, processing and analyzing datasets. In genomics, large sequencing consortia like the Human Genome Project or ENCODE  were funded in part to generate public resources that could serve as roadmaps to guide future scientists. However, in smaller labs, all too often after a particular set of questions is answered, large datasets end up languishing on a dusty server somewhere. Even for projects whose express purpose is to create a resource for the community, the process of curating, annotating and making data available is a time-consuming and often thankless task.

images

Current genomics data repositories like GEO and ArrayExpress serve an important role in making datasets available to the public, but they typically contain data that is already described in a published article; citing the dataset is typically secondary to citing the paper. If more, easier-to-use platforms existed for publishing datasets themselves, alongside methods to quantify the use and impact of these datasets, it might help drive a shift away from the mindset of ascribing value purely to journal articles towards a more holistic approach where the actual products of research projects – including datasets as well as code or software tools used to analyse them, in addition to articles – are valued. Such a shift could bring benefits to all levels of biological research, from ensuring that students who toiled for years to produce a dataset get adequate credit for their work, to encouraging greater sharing and reuse of data that might not have made it into a paper but still has the potential to yield scientific insights.

Tools and platforms to do just this are gradually emerging and gaining recognition in the biological community. Figshare is a particularly promising platform that allows for the sharing and discovery of many types of research outputs, including datasets as well as papers, posters and various media formats. Importantly, items uploaded to Figshare are assigned a Digital Object Identifier (DOI), which provides a unique and persistent link to each item and allows it to be easily cited. This is analogous to the treatment of articles on preprint servers such as arXiv and bioRxiv, whose use is also growing in biological disciplines; however, Figshare is more flexible in terms of the types of research output it accepts. In addition to the space and ability to share and cite data, the research community could benefit from better quantification of data citation and impact. Building on the altmetrics movement, which attempts to provide alternative measures of the impact of scientific articles besides the traditional journal impact factor, a new Data-Level Metrics pilot project has recently been announced as a collaboration between PLOS, the California Digital Library and DataONE. The goal of this project is to create a new set of metrics that quantify usage and impact of shared datasets.

Although slow at times, the biological research community is gradually adapting to the new needs and possibilities that come along with high-throughput datasets. Particularly in the field of genomics, I hope that researchers will continue to push for and embrace innovative ways of sharing their data. If data citation becomes the new standard, it could facilitate collaboration and reproducibility while helping to diversify the range of outputs that scientists consider valuable. Hopefully, the combination of easy-to-use platforms and metrics that capture the impact of non-traditional research outputs will provide incentives to researchers to make their data available and encourage the continued growth of sharing, recognizing and citing biological datasets.

Analysing journalistic data with detective.io

detectiveioNo doubt, the power of the internet has changed profoundly the way in which journalists gather their information. To keep up with the growing amount of data digitally available, more and more tools for data-journalists are being developed. They help facing the challenge of handling vast amounts of data and the subsequent extraction of relevant information (here you can find our little collection of useful tools).

One powerful tool is detective.io, a platform that allows you to store and mine all the data you have collected on a precise topic. Developed by Journalism++, a Berlin- and Paris-based agency for data-journalism, it was launched one year ago.

By now, several investigations that used the tool have made headlines in Europe, amongst others The Belarus Network, an investigation about Belarus’ president Alexander Lukashenko and the country’s elite affairs by French news channel France24, and, most notably, The Migrants Files, a database on the more than 25,000 migrants who have died on their way to Europe since 2000. According to the developers at Journalism++, the applied methodology, measuring the actual casualty rate per migration route – has now been picked up by UNHCR and IOM. Another example is a still ongoing investigation on police violence, started by NU.nl, the main news website in the Netherlands.

What does detective.io do?

Basically, detective.io lets you upload and store your data and search relationships in it bywith a graph search using some network analyses. The tool, which is open source and still a beta version, structures and maps relationships between subjects of an investigation. This can be a vast number of entities such as organizations, countries, people and events.

In its basic version, the tool offers three generic data schemes that help structuring the data you have – for instance on a corporate network, the respective ownerships, branches, individuals involved and so on. To deal with more complex datasets, a customized data scheme is needed. There is no need for special skills to use detective.io but one needs to think hard about what elements of information are needed for the analysis before creating the data structure. However, such custom data schemes are not included in the basic version. The team at Detective.io offers several paid plans that include additional and/or customized data schemes and respective customer support.

There are special offers for NGOs and investigative journalists, too.

Open Steps Directory - Detective.io 2014-11-09 13-56-12One powerful asset of detective.io is that investigations can be shared with collaborators and/or made public. Here you can have a look at what our Open Knowledge Directory looks like on detective.io and explore the relations of organizations and individuals by using the graph search.

Currently, the developers at Journalism++ are working on a new GUI/frontend for detective.io that will allow every user to edit the data schemes by themselves.

Here you can request an account for the beta version and if you are interested to collaborate in the development of detective.io, you can find the tool’s GitHub here.