NGS logistics is the next project featured in our blog interviews. We have interviewed Amin Ardeshirdavani who is a PhD student involved in the creation of this web-based application. Take a look at the interview to find why this tool has become very popular within KU Leuven.
1. What is NGS logistics?
NGS-Logistics is a web-based application, which accelerates the federated analysis of Next Generation Sequencing data across different centres. NGS-Logistics acts as a real logistics company: you order something from the Internet; the owner processes your request and then ships it through a safe and trustful logistics company. In this of NGS-Logistics, the goods are human sequence data and researchers ask for possible variations and their frequency among the whole population. We try to deliver the answers in the fastest and safest possible way.
2. What is your part in NGS logistics?
Right now I am a PhD student at KU Leuven and the whole idea of my PhD project is designing and developing new data structures for analysing of massive amount of data produced by Next Generation Sequencing machines. NGS logistics is exactly that. I have done the whole design and development of the application and database. Hereby I would also like to acknowledge all the people from the KU Leuven, ESAT IT Dept., UZ Leuven IT Dept., and UZ Genomics core Dept. who assisted me on this project and for their kind support, especially Erika Souche.
3. When did you first start working on the idea of creating NGS logistics and what made you think it would be something useful?
It was almost three years ago when I had a meeting with my promotor Professor Yves Moreau, and he had an idea to somehow connect sequencing centres and query their data without moving them into one repository. As a person with an IT background it wasn’t that difficult for me to develop an application but there were lots of practical issues that needed to be taken care of. The majority of these issues are related to protecting the privacy of the individuals, because the data we deal with are coming from human genome sequencing experiments and people are rightfully worried about how this data will be used and protected. At the time of my first meeting there was no system in place to share this data but many people understood the need for this kind of structure and for us to start working on it. As we know, information can be a true scientific goldmine and by having access to more data we are able to produce more useful information. The novelty of the data, the possibility of sharing this wealth of information, and the complexity of this kind of applications make me so eager to work on this project.
4. How does your open source tool work and who it is designed for?
NGS-Logistics has three modules: Web Interface, Access control list and the Query manager. The source code of each one of these modules plus the database structure behind them is available upon simple request. As the modules are being upgraded continuously, I have not made any public repository for the source code yet. However, if someone would be interested to gain access to the source code it will be our pleasure to give it to them while I do think that the whole idea of the Data sharing is more important than the source code itself. Anyhow, it is our pleasure to share our experience with different problems and issues that we had to tackle during the past three years with others. In general, NGS-Logistics is designed to help researchers to save time when they need to have access to more data. It will help them to get a better overview of their questions and if they need to have access to the actual data, it will help them get the most useful data sets that match their cases.
5. Who has access to the system and how do you manage access permissions?
Researchers with a valid email address and affiliation are welcome to register and use the application. This means that we need to know who is querying the data to prevent structural queries, which may lead to identify an individual. I spent almost 20 months on the Access Control List (ACL) module. Most of the tasks are controlled and automatically updated by the system itself. Center Admins will be responsible for updating the list of samples they want to share with the others. PIs and their power users are responsible to group the samples as data sets and assign them to the users and groups. ACL has a very rich and user-friendly interface that makes it very easy to learn and use.
6. In what way do you think data sharing should be further improved?
Because of all the concerns around the term “Data Sharing”, I prefer to use the term “Result Sharing”. In our framework, we mostly try to answer very high-level questions like “The prevalence of a certain mutation in different populations”, preventing any private information from leaking out. By having more access to data we can gain more insight and produce more useful information; as Aristotle said: “The whole is greater than the sum of its parts.” On the other hand we always have to be careful about the consequences of sharing.
7. What does ethical data sharing mean to you?
It means everything and nothing. Why? Because ethics really depends on the subject and the location we are talking about. If we talk about sharing weather forecast data, I would say it is not important and it does not have any meaning. But when we talk about the data produced based on human genomes then we have to be careful. Legal frameworks differ a lot between many countries. Some of them are very restrictive when it comes to dealing with sensitive and private data whereas others are much less restrictive. Mostly this is because they have different definitions of private data. In most cases, any information that allows us to uniquely identify a person is defined as private information and as we know there is a possibility to identify a person by his or her genome sequence. Therefore, I feel that it is very important to keep track of what data is being used by who, when, at which level and for what reason.
Amin Ardeshirdavani et al, has published his work in Genome Medicine 6:71 : “NGS-Logistics: federated analysis of NGS sequence variants across multiple locations”. You can take a look at it here.