Interview with Journalism++ @ Paris/Berlin, France/Germany

logo_jppJournalism++ is a network of data-journalists and developers which has chapters in five cities across Europe. With the goal of promoting the use of data and its visualisation for journalistic purposes, they create Open Source tools, organise trainings and consult other organisations in this area.

We contacted Nicolas Kayser-Bril, one of its co-founders, and asked him to give us an inside view about his company and the concept of data-journalism. Covering the theory, how data is currently being used to enhance story-telling, and the advantages for journalists working with Open Source and Open Data, this interview exposes a topic we were eager to learn more about.

1) Hi Nico, many thanks for sharing time with us. Could you first introduce yourself and present briefly Journalism++? How does it come that you are represented in five different cities in Europe?

We started Journalism++ with Pierre Romera, a developer, in 2011. At the time, we were working together at OWNI as a team of journalist & developer. When we left, we asked several newsrooms if we could join, as a team, and do data-journalism. Most were eager to hire us but not one was ready to let us work together. In order to keep working together, we created Journalism++. The name is a nerdy joke, as the “++” sign is an increment in most programming languages. In effect, it means “journalism is now equal to journalism plus one”.

As the company grew, we offered other data-journalists in Europe to use the Journalism++ brand. The Journalism++ network is organized around these chapters, in something that resembles a franchise. Companies such as Subway or NGOs like Transparency International operate in much the same way. Today, 3 companies that operate independently from us use the brand in Stockholm, Amsterdam and Cologne. All we ask from chapters is that they adhere to the Journalism++ Manifesto and be financially sustainable.

2) What does it mean to be a data-journalist? How does it differ from traditional journalism? Is the use of Open Data and its visualisation what make that difference?

At its most basic, data-journalism means using numerical data to tell stories. Let’s say you have a database to work from. You’ll need to clean it, check its authenticity, interview the data using data-mining techniques, and finally communicate your results, sometimes using data visualisations or more complex interfaces. This process can be done by one-person operations using Google Spreadsheets. But sometimes, you’ll need much expert skills, like statistics, computer forensics, designers or developers. And project managers to hold everything together. The end product changes too. Where we had articles or video reports, we can now tell stories using evolving databases. Homicide watch in Washington, DC, is a good example: it compiles all data it can find on homicides in the town. It accomplishes a basic task of journalism in a totally new format.

From a simple thing (doing journalism with data) we end up with a totally new way of doing journalism, which is very close to traditional software development. That explains why small companies like ours are better equipped than big newsrooms to do data-journalism.

3) You have participated in many events and trainings around Europe, divulging the benefits of using Open Data applied to journalism. How is Open Data seen among the journalistic community? Is there a general movement towards using Open Data in journalism or is it still a new and almost undiscovered topic?

Data-driven is still very new to most newsrooms. There is an acknowledgement of what it can do and that it can help journalists overcoming some of the challenges they face. But there’s no movement towards using open data. The number of requests for open data in most EU countries (look at the reports from CADA in France or at tools like Frag den Staat in Germany and Austria) from journalists still range in the few hundreds per year. It’s getting better, but very slowly.

4) We have seen in your portfolio that some of your clients come from the public sector. Is the public administration specially demanding Open Data-based-tools nowadays?

We’re very proud to work for the ÃŽle-de-France region, Europe’s biggest region by GDP. They set up a data-driven communication strategy alongside their open data platform, which we help them implement. Many administrations, as well as NGOs and corporations, are realizing that they sit on very valuable data troves. Most are just starting to organizing them and are thinking of making them more open. They understand that more open data will make it easier for them to communicate on their action.

5) You already developed really interesting tools and civic apps (Cartolycées, e-diplomacy, Alertepolitique, Datawrapper, …). Where do all these ideas come from? Could you explain more about the conception process and its context?

Most of our projects start at the coffee table, within the company or with clients and partners. We then take these ideas from a drawing on a napkin to full-fledged products. We sometimes have to find funding in the process. Clients are very open to experimenting with new ideas. In the case of E-diplomacy, for instance, a visualisation of diplomats’ Twitter streams for Agence France Presse, the tool really emerged from a back-and-forth ideation process between us and AFP journalists.

6) We know it might be difficult to choose one, but can you pitch one of your projects in particular? Perhaps the one you consider the most useful?

I’ll take the latest project we released, called SpendingStories. We had this idea with the Open Knowledge Foundation (OKF), which financed the project through a grant from the Knight Foundation. With its OpenSpending project, OKF collects a lot of data on budgets and spending throughout the world. But not many people know how to read, much less make sense of, this data. So we built a very simple interface that let people enter any amount, in any currency, and see how it compares to items in different budgets. We hope it’ll make it easier for journalists to put things into perspective when a politician announces a million or billion-euro plan, instead of resorting to meaningless comparisons such as “this is as much as the GDP of [insert country here]”. You can access the demo version of SpendingStories, which contains data about UK public spending, here: http://okf-spendingstories.herokuapp.com

7) You release most of your projects as Open Source. What is the motivation behind this? What are the benefits for a private company like yours in a market economy?

There are several reasons. One is practical: Open source projects are granted privileges by many companies eager to encourage openness. We don’t pay to host our code at Github and many APIs and other services are free for open source projects. It’s also a great way to showcase our work to other developers and make sure that we code in a clean manner. It’s great to ensure a high quality in our work.

So far, we haven’t coded anything that is worth protecting for its technical value. What we sell to clients is our expertise rather than our code proper. They know that we’ll develop an app or a variation of an app much faster than they would, so it makes a lot of sense for them to pay us rather than simply take the code and do it themselves.

8) Where do you find the data you are working with? Does this data already exist or does it have to be collected before? Is the data already open and available? Which are the Open Data platforms you are using the most?

There’s no fixed rule. Sometimes we’ll tell stories using open data. Sometimes we’ll do a Freedom of Information request. Sometimes we’ll scrape it. Sometimes we’ll obtain it though leaked documents. Sometimes we structure already available data. And if we still don’t find what we need, we crowdsource data collection.

As for open data platforms, the World Bank’s is certainly the most useable. It’s great to see institutions such as the IMF and Eurostat making their data available. But I’m not a fan of the newer brand of data catalogs, à la data.gov. Most of them simply aggregate data that was already published somewhere else and add little value in the process.

9) Let’s talk about what it’s still to come. In your opinion, how will data-journalism evolve in the upcoming years and what are the future steps for Journalism++?

We want to become the number one network of data-journalism companies worldwide: a dozen of financially independent companies operating in close cooperation, so as to be able to launch large-scale journalism projects at anytime and keep hacking things!

Meeting & Workshop with KLP @ CIS, Bangalore, India

klp_mapThe last organisation we have met in Bangalore is the Karnataka Learning Partnership (KLP), an initiative launched in 2007 by the Akshara Foundation, which collects, analyses and visualises data to improve primary education in Karnataka. By browsing its website, users can find a very elaborated map and reports containing information on public primary schools. Position, availability of sanitation facilities, demographic and nutrition statistics are the kind of datasets that are being presented. Among others, public officials are making use of this material for the improvement of the decision-making process. The data comes from various sources: public administration, collaborating organisations and volunteer surveys too. Since these information is also relevant for parents, who most of them don’t have access to online resources, KLP is working on a SMS/phone based methodology for them to access the data. The results have been already proven to be really successful and the future plans include the expansion of the number of districts covered, currently 3. We invite you to watch the following video to experience more about it:

We met Gautam John, Head of KLP, former lawyer who actively works in the educational sector and initiated also Pratham Books, a non-profit publishing house that uses Creative Commons licenses to further distribution, translation and reuse of children’s books.

Together with him, we organised our event at the Centre for Internet and Society, which is a non-profit research organisation in Bangalore that works on numerous relevant issues like freedom of expression, accessibility for persons with disabilities, access to knowledge, intellectual property rights reform and openness; engaging in academic research on digital natives and digital humanities.

frameAn intense open debate characterized our workshop and many of the around twenty participants had ongoing projects to show as example of smart use of data. Most of them are indeed active members of the datameet group, the indian-wide online forum that we have already mentioned in our previous articles. We experienced about projects like theballot.in, a weekly online data publication which presents political facts and figures about the world’s largest democracy by using richly illustrated graphs and charts. Also, we could learn more about the Indian Water Portal, an organisation with a deep understanding on how to use data to improve water management; and even one member of TacticalTech talked about their activities we have recently covered. However, there were attendees who are still working on the initial phase of their projects, in areas such as the fight against sexual harassment or the improvement of waste management at neighbourhood level. Those were specially interested in topics as data collection or how to face challenges like the lack of data or citizen engagement. It was for sure an interesting session!

With this productive event, we put an end to our busy week in Bangalore. We are happy to have met such passionate activists and learned so much from them!

Update: Here you can watch the video of the theoretical part of our presentation. Not complete, apologies for that…

[vimeo]https://vimeo.com/81172590[/vimeo]

Meeting @ KSHIP, Bangalore, India

Our meeting today took place in the central office of KSHIP (Karnataka State Highways Improvement Project), an initiative of the Public Works Department of the Government of Karnataka for improvement of road network of the southern indian state. By creating a special committee called [email protected], the organisation aims to include Open Governance mechanisms in its workflow, thus encouraging citizen’s participation and giving transparency more weight.

DSCF0267We were invited to be part of the second meeting of the committee and were asked to give an input on tools and strategies in the field of Open Data they could adopt to realise their goals. For us, it was really interesting to have an insight on how such a project gets developed in a public organisation from its initial state. Since the project is still in the concept phase, where the basic steps have to be defined, our presentation and the big amount of examples we introduced served as inspiration and reference of what can be done in a later phase.

DSCF0273Our participation in this meeting wouldn’t have been possible without the help of Sridhar Pabbisetty, one of the contacts we established in the indian IT-Metropolis. With a background in Computer Science and a MBA at IIM Bangalore, Sridhar is one of the most active individuals pushing Open Government initiatives and the constructive use of Open Data in India.

His activities in the field are numerous. First, he conducted the creation of opengovernanceindia.org, the first Open Data platform in India which was launched just one week before the one from the national government. Besides participating in worldwide events as the OKCon 2012, where he held a lightning presentation, he is advising administrations and organisations about the benefits of acting towards openness, allowing citizens to be part of the decision-making process and raising consciousness of a sustainable use of resources.

After leading the Center of Public Policy, he took the decision to contest for the Hebbal Assembly constituency in the Karnataka Assembly Elections (MLA) in spring 2013 obtaining encouraging results. Parallel to all of this, he initiated the Center for Inclusive Governance, a team of people that “strives to enable citizens to lead the change they want to see, helping them to understand the legal, bureaucratic, political and civil society perspectives.”. We are happy to have met such an remarkable activist today and wish him all the best for his future projects.

Bangalore has proven to be a very productive environment for our research. Next Monday, we still have our workshop at the Center of Internet and Society and look forward to discovering even more!

Meeting @ Tactical Tech, Bangalore, India

imagesTacticalTechnology Collective is the first organisation we have met as part of this intense week in Bangalore, the indian IT-Metropolis. Tactical Tech is both a dutch NGO and a registered studio company. In the beginning, the members of the organisation worked as a worldwide network of individuals, in the last years the structure has being strengthened and soon they will settle their main office in Berlin.

TacticalTech provides expertise and Know-How to NGOs, activists and rights advocates working on corruption, transparency, human rights and a long list of other relevant issues. After spending time together and analysing their needs, Tactical Tech helps them use safely and effectively digital tools and work with data visualisation for campaigning, communication and awareness making.

Besides this, the Berlin/India based NGO is sharing all this knowledge by generating a large number of contents which are available in form of films, toolkits, guides, trainings and events.

A very elaborated multimedia output

DSCF0215First, we have these wonderfully designed books and toolkits which contains guides and essential information around topics like creating and running a NGO (ngo-in-a-box, a collection of essential Open Source tools for running a small-to-medium NGO that has become a piece of cult), mobile advocacy (mobiles-in-a-box, 2008) and making media with impact (message-in-a-box, 2008). Although their work consists on analysing critically how we make use of technology (security-in-a-box), they have not forgotten the importance and effectiveness of the print and visual media. In fact, they have produced and released several movies that complete their online and printed material.

visualising_information_for_advocacy_book_pic_sWe would like to specially remark their last publication (Visualising Information for Advocacy, September 2013) which has been developed out of their experience over the last past ten years and reflects what they have learned about working with information, technology, design and networks in advocacy.

This book contains ideas, strategies and valuable information accompanied with numerous examples and successful worldwide stories that show how information can be used effectively on making awareness, telling stories and exploring issues. We definitely recommend you to get the book in case you are looking for an inspiration source and a-z guide on the topic.

And all of this with a worldwide approach! Most of the products has being translated to several languages and their workshops and trainings take place all over the world. Not to forget, events like the Info-Activism Camp they have organised twice, the last one taking place this year in Italy. This is probably a consequence of the multicultural nature of the collective.

Read, use, and pass on!

One of the reasons we are covering TacticalTech’s activities is that they are real supporters of the principles behind Open Source and Creative Commons. All their works are being released under Creative Commons licenses which allows others to take their contents as a basis for derivative works.

A remarkable proof of this collaborative potential dates back to March 2013, when Tactical Tech met with five organisations in Beirut to brainstorm ways in which their range of info-activism resources could be adapted for use by activists in the Arab region.

The interesting results contained the translation of some of their printed and online contents, the contextualization of some of the strategies in critical environments like the syrian revolution movement or the development of printed versions from existing online resources.

See you in Berlin

We wish the team behind Tactical Tech, specially Maya Indira Ganesh, who kindly received us in their office in Bangalore, a successful future and look forward to meeting them in Berlin once they move in their brand new office.

Interview with Fiona Nielsen, DNAdigest.org, Cambridge, UK

logoRecently, we learned about a project which shows how the principles of Knowledge Sharing can be applied to the scientific domain, specifically to genomics data. DNAdigest is a Not-for-Profit Organisation founded and located in Cambridge, UK, by a group of individuals from diverse backgrounds who all want to see genomics used to its full potential to aid medical research. The objective of DNAdigest is to provide a simple, secure and effective mechanism for sharing genomics data for research without compromising the data privacy of the individual contributors.

fionaFrom the beginning, this concept sounded very appealing to us. That’s why we contacted Fiona Nielsen, founder of this great initiative, to talk about the goals of the project, its approach on making use of such sensitive data and the current status of data sharing within the scientific community.

DNAdigest is still in the development process but already shows a promising future. Not only they have been selected for the Wayra UnLtd accelerator programme for social entrepreneurs, they are also working hard on building a community around the idea, organising events like hack days and workshops. Since, no one can describe the project better than its creator, we invite you to discover more about it through the following sequence of questions and answers.


1) Fiona, could you first introduce yourself and DNAdigest?

I am a bioinformatics scientist turned entrepreneur. I used to work in a biotech company where I was developing tools for interpretation of next-generation sequencing data and I took part in a number of projects where I was doing the data analysis of cancer sequencing samples. During my work, I realised how difficult it is to find and get access to genomics data for research.

DNAdigest was founded as an entity to provide a novel mechanism for sharing of data, aligning the interests of patients and researchers through a data broker mechanism, enabling easy access to anonymised aggregated data.

2) Why it is important to share genomics data? Quoting your website, the current state of sharing this information is embarrassingly limited. How does DNAdigest address this problem?

The human genome is very complex. Made up of 3 billion base pairs and varying from individual from individual, it is equivalent to looking for a needle in a haystack when you as a researcher attempt to nail down the genetic variation that is causing a genetic disease. The only way to narrow your search is by filtering out genetic variation that has been seen before in healthy individuals and annotate the variation that is left by what disease(s) the variation occurs in. This type of comparative analysis requires looking at variants from as many samples as possible. Ideally you will need to compare to tens of thousands of samples to make your comparison approach statistical significance. Accessing thousands of samples today is not only difficult in terms of permissions, but also in terms of mere storage and network capacity it is not practical to download huge datasets for every team that wants to do a comparison. DNAdigest is developing a data broker which will allow the researcher to submit queries for specific variants and only the aggregated information about the selected variants is returned as a result. For example, examining a specific mutation in cancer, the query could be “what is the frequency of this mutation in cancer samples?” and the result would be returned as a frequency, e.g. 3%. The aim of DNAdigest is to reduce the time to discover, access and retrieve the data relevant to genomic comparison.

3) It seems that your idea looks quite revolutionary and actually very needed. How was the reaction of the scientific community towards your initiative so far? Are the principles behind sharing and opening data something new for scientists?

Similar approaches have been suggested and a handful of approaches have been prototyped within the academic community before. However, all of the projects for sharing data in an academic setting have ultimately faced the same problems: They do not have the resources to scale up their solution to work for the entire community, and even if they should have the ambition to scale up the solution, they would find that it is extremely difficult to find funding for infrastructure projects from traditional research funding. In general, there is a positive attitude towards data sharing in research. However, the immediate concerns of researchers revolves around writing papers and not so much towards building common infrastructure.
Based on this knowledge of the community, I realised that a separate entity is needed to take initiative for developing a solution, drawing on the knowledge generated in academia, and building an organisation that can do independent fundraising and collaborate across institutions. We have registered DNAdigest as a charity so that we can function as an independent and trusted third party to provide the community with a feasible solution.

4) What do researchers have to do in order to access genomics data on DNAdigest.org? Can individuals share their genomics information directly on the platform?

We are still designing and developing the platform, so I can not yet give you the exact user guide. Our objective is not to store entire datasets, but to connect to existing data repositories and data management systems with a common API that allows queries into the metadata to select samples, and for the samples for which patient consent is available, to query into the genetic data to provide aggregated statistics collected across datasets.

We have no plans at this point to make storage capacity for individual genomic data, currently for this purpose, an individual would have to find an associated repository, for example through their patient community, which will allow storage of their genomic data.

5) Sharing such private information is a big concern for many people nowadays. How do you approach the privacy issue? What is your solution for this?

Our approach to privacy is to provide anonymization through aggregation. We will provide an API from which it is possible to query for summary statistics over selections of the available data. For example, for a researcher interpreting a specific mutation for a patient with a genetic disease, the associated query for DNAdigest would be “what is the frequency of this mutation for patients with this genetic disease?”. The query could be also be used to look for mutation frequencies in healthy individuals or for patients with related diseases.

6) Which kind of projects could profit from DNAdigest.org?

DNAdigest is still at an early stage and we have a lot of work still to do in designing and implementing the secure query platform. The projects that are most likely to benefit from the resource of data that DNAdigest will make easily accessible are data analysis and interpretation of genetic variants in connection with rare diseases and other genetics research. In the bigger picture, a future of genomic medicine where diagnosis from genome sequencing is commonplace will only be possible if the means for interpretation, namely data access across patient groups and across repositories, becomes available.

7) We read from your blog that DNAdigest.org has been selected for the WAYRA UnLtd Accelerator. Congratulations for that! Do you benefit from other support sources? And, in general, how far are investors supporting social enterprises and non-profit-oriented ideas?

We are very happy that we were selected for the Wayra UnLtd accelerator at this early stage of our project. The accelerator is not just an office space, but a community of startups and business-savvy people helping each other develop sustainable businesses. So far, DNAdigest has been bootstrapping our initiative with volunteer participation and charitable donations.

8) You also have organised a hack day in Cambridge and even workshops, thus building an expanding community. How does DNAdigest.org benefit from this encounters? How are the results so far?

Engaging the community in our project is essential if we want to develop a new mechanism to change the existing culture and structure of data sharing. The stakeholders from academia, industry and patient groups all have very different priorities with regards to sharing of data. Through our hack day, we arrived at more complete understanding of the stakeholder interests and the potential sustainable development models and technical implementation that may be feasible on the short and the long term.

9) As you might know, we are particularly interested in Open Data. By accessing open information, developers are creating apps which are solving certain problems. Is there already any app using open genomics data? If not, how could such an app look like?

Sensitive information like medical records and genetics sequences are unlikely to be released as Open Data, however, the knowledge generated from the data, such as statistics can and should be made both public and easily available for the scientific community to build on. In addition, the metadata describing existing datasets currently residing at research repositories could be made openly available at no risk to privacy. However, a common problem in the research community is that it is difficult to provide incentives for researchers to spend time and effort to register their data in public repositories. Luckily, there is an increasing push from funding agencies to require that data produced with public funding should be made publicly available.

Regarding apps: in the bioinformatics community there are many many tools being developed to analyse proprietary data and many tools are developed to make use of data made openly available through public databases. For two such sources of public data (but not patient data), see the UCSC Genome Browser and the Ensembl Genome Browser.

10) In your opinion, how can the scientific community take profit from Open Data?

It would be ideal if there could be a real shift in research practices that researchers would register the existence of datasets even before publication (ie. Making the metadata Open Data), so that other researchers would have every opportunity to find and identify potential collaborators and sources of data for their research. For sensitive data, such as the genetic information and medical health record details for individual patients, we believe that a common interface is needed to make use of the wealth of data that is being produced today. We propose DNAdigest can provide such an alternative data access by working as the discovery and aggregation mechanism that will let you query across sensitive datasets.

Many thanks!

Read more about DNAdigest and sign up for the newsletter at DNAdigest.org