Analysing journalistic data with detective.io

detectiveioNo doubt, the power of the internet has changed profoundly the way in which journalists gather their information. To keep up with the growing amount of data digitally available, more and more tools for data-journalists are being developed. They help facing the challenge of handling vast amounts of data and the subsequent extraction of relevant information (here you can find our little collection of useful tools).

One powerful tool is detective.io, a platform that allows you to store and mine all the data you have collected on a precise topic. Developed by Journalism++, a Berlin- and Paris-based agency for data-journalism, it was launched one year ago.

By now, several investigations that used the tool have made headlines in Europe, amongst others The Belarus Network, an investigation about Belarus’ president Alexander Lukashenko and the country’s elite affairs by French news channel France24, and, most notably, The Migrants Files, a database on the more than 25,000 migrants who have died on their way to Europe since 2000. According to the developers at Journalism++, the applied methodology, measuring the actual casualty rate per migration route – has now been picked up by UNHCR and IOM. Another example is a still ongoing investigation on police violence, started by NU.nl, the main news website in the Netherlands.

What does detective.io do?

Basically, detective.io lets you upload and store your data and search relationships in it bywith a graph search using some network analyses. The tool, which is open source and still a beta version, structures and maps relationships between subjects of an investigation. This can be a vast number of entities such as organizations, countries, people and events.

In its basic version, the tool offers three generic data schemes that help structuring the data you have – for instance on a corporate network, the respective ownerships, branches, individuals involved and so on. To deal with more complex datasets, a customized data scheme is needed. There is no need for special skills to use detective.io but one needs to think hard about what elements of information are needed for the analysis before creating the data structure. However, such custom data schemes are not included in the basic version. The team at Detective.io offers several paid plans that include additional and/or customized data schemes and respective customer support.

There are special offers for NGOs and investigative journalists, too.

Open Steps Directory - Detective.io 2014-11-09 13-56-12One powerful asset of detective.io is that investigations can be shared with collaborators and/or made public. Here you can have a look at what our Open Knowledge Directory looks like on detective.io and explore the relations of organizations and individuals by using the graph search.

Currently, the developers at Journalism++ are working on a new GUI/frontend for detective.io that will allow every user to edit the data schemes by themselves.

Here you can request an account for the beta version and if you are interested to collaborate in the development of detective.io, you can find the tool’s GitHub here.

Waiting for the new French Digital law

According to the last UN Survey on E-Government published this year, France proves to be at the top of the list of the countries embracing a high level of e-government development, reaching the 1st rank in Europe and the 4th worldwide. The study praises particularly the good integration of e-services through the online platform service-public initiated in 2005 which enables citizens, professionals and associations the access to administrative information (on their duties and legal texts among others), simplifies procedures and provides a large civil service directory. Not to forget Legifrance and vie-publique which both document legal and current affairs online. Let’s just say that efforts towards a transparent public administration have been the leitmotiv behind these initiatives.

Bildschirmfoto vom 2014-09-20 17:24:07Bildschirmfoto vom 2014-09-20 17:53:45

If we look at the Open Data side, we come to data.gouv.fr, the national Open Data platform launched in December 2011 which features nowadays its second version, this time developed with CKAN and without any fee so that the data gets indeed re-used. Those fees were one of the blackheads listed on the OKFN Index in 2013 which ranked France at the 16th position among 70 countries from all continents. Among the negative points are following the lack of relevant data like government spending or budget and the too low resolution of maps from the National Institute of Geographic and Forest Information. Thus, if a national Open Data strategy has been embraced since 2011, there is still lots to be done. Above all a law (currently being drafted) is needed to push local and regional administrations to liberate their data on an open way, because the situation is strongly disparate.

Bildschirmfoto vom 2014-09-20 17:08:32Actually, the French OD movement took root at the local level. It started in the Western region of France, Brittany, where the city of Brest decided in March 2010 to release its geographical data and in Rennes, the main town, which launched at the same time an OD site dedicated to transport data and a couple of months later the first OD platform in France, multi-sectoral and containing various web and mobile apps besides the datasets. A similar site in Nantes then regional initiatives in Loire-Atlantique and Saône-et-Loire followed during autumn 2011. Today, the map of the local and regional OD movement in France made by LiberTIC shows the commitment of administrations at different levels (regions, cities and even villages as the one of Brocas with OpérationLibre) in different parts of the country and the creation of civil society groups too.

According to the current draft of the law on decentralization imposing French towns to release their data as open, only municipalities over 3500 habitants will be affected that means 92% of them are excluded. In addition, the obligation is limited to the data already electronically available and none format or standards has been specified. Never mind, the law has to be in compliance with the implementation of the European Directive 2013/37/EU on the re-use of public sector information, named PSI Directive, which strengthens the Open principles and has to be transposed into the different national laws by each EU member country until the 18th July 2015. In France, Etalab, a special committee created in 2011 and dedicated to the governmental OD strategy, is in charge of the implementation.

The French FOI law dates back to 1978. It was modified in 2005 by an order, according to the European Directive 2003/98/EC, the first legislative measure which shaped the European framework for Open Data and was amended by the Directive of 2013 above mentioned. Preparing the implementation of this last one with the law on decentralization and another on digital technology, France appears to be very active these last months and hopefully that is a good omen for the future. Etalab organised last April a national conference on Open Data and Open Government, inviting representatives of the private sector and the civil society. The future appointment of a Chief Data Officer was announced (still to be designated) as well as the participation of the French government in the Open Government Partnership (OGP) and France will even join the OGP steering committee from 1st October. Last but not the least, the Senate published in June a report on the access to administrative documents and public data which supports the efforts made by the government since 2011 to release public data to the public domain but underlines that the results so far aren’t up to the actual challenges and don’t fulfil neither what has been expected by the civil society. Too often, the data is not complete or available in an unfriendly format, its quality varies depending on the administration, updates and meta-data are missing, revealing the lack of resources and reluctance to agree with the Open Data action. The report ends with 16 recommendations like the use of visualisations to make the data more comprehensible for the users which should be taken into consideration in the preparation of the both upcoming laws.

HackYourPhd: reporting on Open Science from the US @ Boca Raton/Paris, USA/France

carte-voyage-HYPhDUS_rev2A summer trip through the US to discover and document Open Science projects? When we first heard about HackYourPhd, we were excited to notice how similar is the concept of their research with our own. The idea was initiated last year by two young french researchers, Célya Gruson-Daniel & Guillaume Dumas, and “aims to bring more collaboration, transparency, and openness in the current practices of research.” Célya travelled during 3 months from Boca Raton (Florida) to Washington DC, gathering information and meeting people and groups active in the Open Science scene.

While this roundtrip in the US is now over, HackYourPhd is still active and has become an online community where the research continues. Read below the interview with the two persons behind this fantastic initiative and discover how the idea came to life, the insights of the trip and what is coming next.

1) Hi Célya & Guillaume, you both co-founded HackYourPhd, a community focused on Open Science which gave a globetrotter-initiative in the US last year. We are really curious how did you get this idea and to know more about it. Don’t forget to introduce yourself and the concept of Open Science too!

Hi Margo & Alex, thanks for this interview. We discovered a few months ago your great project. Now, we are much happy to help you since it is a lot related to what we tried to do last summer with “HackYourPhD aux States”. But before speaking about this Open Science tour across the USA, let’s us remind first the genesis and the aim of HackYourPhD in general. HackYourPhD is a community which gathers young researchers, PhD and master students, designers, social entrepreneurs, etc. around the issues raised by the Open Science movement. We co-founded this initiative a year ago. The idea of this community emerged from our mutual interest to research and its current practices. Guillaume is indeed postdoc in cognitive science and complex systems. He is also involved in art-science collaborative projects and scientific outreach. Célya is specialized in science communication. After two years as community manager for a scientific social network based on Open Access, she is now working in science communication for different projects related to MOOCs and higher education. We are both strong advocator for Open Science and that mainly why we came up with HackYourPhD. While Guillaume has tried to integrate Open Science in his practice, Célya wanted to explore the different facets with a PhD. But before, she wanted to meet the multiple actors behind this umbrella word. This is what motivated “HackYourPhD aux States,” the globetrotter-initiative per-see.

2) Why did it make sense especially in the US to follow and report Open Science projects? Could you imagine yourself doing it in other countries? What about France?

Because this was in the English speaking country that the Open Science movement has been started. That is thus also there that it is the most developed to date, from Open Access (e.g. PLoS) to the hackerspaces (e.g. noisebridge). There is also a big network of entrepreneurs in Open Science, which is specifically an aspect we were interested in. Célya thus decided to first look at the source of the movement and take time (three month) before doing a similar exploration in Europe with shorter missions (e.g. one week). Concerning France, we have still begun to monitor what is taking off, from citizen science to open data and open access. While we have certainly a better vision, the movement is still embryonic. But the movement will also take other forms and that is also what we are interested in. Célya is thinking to make her PhD in a research action mode, being observer and actor in this dynamical construction of the French Open Science movement.

3) From our experience, we could schedule our encounters and events both before starting the journey and on the way. Is that the same for you? How did you select your stops, the projects documented and persons interviewed? Is Open Science a widespread topic or it was actually difficult to find cases for your research?

Célya had already a blueprint of the big cities and the main path to follow. With the help of the HackYourPhD community, she gathered many contacts and constitute a first database of locations to visits and people to meet. Before starting, the first step—San Diego and the bay area—was almost scheduled. Then, the rest of the trip was set up on the way. Few important meetings were already scheduled of course (e.g. the Center for Open Science, the Mozilla Science Lab, etc.) but across the travel, new contact were given spontaneously by the people interviewed. Serendipity is your friend there! Regarding difficulties to find cases, this is quite function of the city. While San Francisco was really easy, Boston for example, which is full of nice projects, was nevertheless more challenging.

4) We know it is difficult to point out just one of them … but could you tell us what is your favourite or one of the most relevant Open Science initiatives you have discovered?

When Célya was in Cambridge, she visited the Institute for Quantitative Social Science. She met the director of the Data Science, Mercè Crosas and her team. Célya discovered the Dataverse Network project. It is one of the most relevant Open Science initiatives she discovered. Indeed, this project combines multiple facets of Open Science. It consists in building a platform allowing any researcher to archive, share and cite his data. It has many functionalities cleverly linking it to other aspects of Open Science (open access journal with OJS, citation, alt-metrics..). Here are the interview Mercè Crosas

5) As we discussed previously with Fiona Nielsen, sharing knowledge in the scientific domain has a positive impact. After your research, why does Open Science matter and how does it change the way scientists have been working till now?

Open Science provides many ways to increase efficiency in scientific practices. For example, Open Data allows research to better collaborate; while this solution seems obvious to many, it appears as a necessity when it comes to big science (e.g. CERN, ENCODE, Blue Brain, etc.) Open Data means also more transparency, which is critical to solve the lack of reproducibility or even frauds.

Open Access presents several advantages but the main one remains the guarantee to access scientific papers to everyone. As a journalist, Célya faced many times the issue of paywalls, and this is always frustrating. Last but not least, Open Science opens up new possibilities for collaboration between academia and other spheres (entrepreneurs, civil societies, NGO, etc.) Science is a social and collective endeavour, it thus needs contact with society and leave its ivory tower. The Open Science movement is profoundly going in that direction, and that why it matters.

6) As you know, Open Steps focuses on Open Data related projects. Quoting you, “In Seattle, I noticed a strong orientation of Open Science issues around Open Data.”, could you tell us more about this relation and the current situation in the US? Could you point us to any relevant Open Data initiative that we might want to document?

Open Data depends on scientific fields. Indeed, Seattle was a rich environment on that topic, but this is certainly caused by the software culture in the city (Amazon, Microsoft, etc.) The Open Data topic is related to Big Data. Thus, the key domains are genetics, neuroscience, and health in general. Lot of projects are interesting. We already mentioned the Dataverse Network, but you may also enjoy the Delsa Global Project (interview with Eugene Kolker) or Sage Bionetwork.

7) There are a lot of sponsors supporting you. Was it easy to convince them? Is that how you finance 100% of the project or do you have others sources of income?

All the sponsors were done thanks to the crowdfunding campaign on KissKissBankBank. This is not a question of convincing them, they just demonstrated the need of covering the topic of Open Science in France. Their financial help represents 36% of the total amount collected.

Their were no other source of income. The travel was not expensive since Célya used the collaborative economy solutions (couchsurfing, carpooling, etc.)

8) Now the trip is over …. but HackYourPhd still running. How does it go on now?

We are pursuing the daily collaborative curation, with almost a thousand people on our Facebook group. We are also organizing several events, mainly in Paris but with a growing network with other cities and even countries. The community is self-organized but needs some structure. We are currently thinking about this specific issue and hope 2014 will be a great year for the project!

Merci à vous deux!

Interview with Journalism++ @ Paris/Berlin, France/Germany

logo_jppJournalism++ is a network of data-journalists and developers which has chapters in five cities across Europe. With the goal of promoting the use of data and its visualisation for journalistic purposes, they create Open Source tools, organise trainings and consult other organisations in this area.

We contacted Nicolas Kayser-Bril, one of its co-founders, and asked him to give us an inside view about his company and the concept of data-journalism. Covering the theory, how data is currently being used to enhance story-telling, and the advantages for journalists working with Open Source and Open Data, this interview exposes a topic we were eager to learn more about.

1) Hi Nico, many thanks for sharing time with us. Could you first introduce yourself and present briefly Journalism++? How does it come that you are represented in five different cities in Europe?

We started Journalism++ with Pierre Romera, a developer, in 2011. At the time, we were working together at OWNI as a team of journalist & developer. When we left, we asked several newsrooms if we could join, as a team, and do data-journalism. Most were eager to hire us but not one was ready to let us work together. In order to keep working together, we created Journalism++. The name is a nerdy joke, as the “++” sign is an increment in most programming languages. In effect, it means “journalism is now equal to journalism plus one”.

As the company grew, we offered other data-journalists in Europe to use the Journalism++ brand. The Journalism++ network is organized around these chapters, in something that resembles a franchise. Companies such as Subway or NGOs like Transparency International operate in much the same way. Today, 3 companies that operate independently from us use the brand in Stockholm, Amsterdam and Cologne. All we ask from chapters is that they adhere to the Journalism++ Manifesto and be financially sustainable.

2) What does it mean to be a data-journalist? How does it differ from traditional journalism? Is the use of Open Data and its visualisation what make that difference?

At its most basic, data-journalism means using numerical data to tell stories. Let’s say you have a database to work from. You’ll need to clean it, check its authenticity, interview the data using data-mining techniques, and finally communicate your results, sometimes using data visualisations or more complex interfaces. This process can be done by one-person operations using Google Spreadsheets. But sometimes, you’ll need much expert skills, like statistics, computer forensics, designers or developers. And project managers to hold everything together. The end product changes too. Where we had articles or video reports, we can now tell stories using evolving databases. Homicide watch in Washington, DC, is a good example: it compiles all data it can find on homicides in the town. It accomplishes a basic task of journalism in a totally new format.

From a simple thing (doing journalism with data) we end up with a totally new way of doing journalism, which is very close to traditional software development. That explains why small companies like ours are better equipped than big newsrooms to do data-journalism.

3) You have participated in many events and trainings around Europe, divulging the benefits of using Open Data applied to journalism. How is Open Data seen among the journalistic community? Is there a general movement towards using Open Data in journalism or is it still a new and almost undiscovered topic?

Data-driven is still very new to most newsrooms. There is an acknowledgement of what it can do and that it can help journalists overcoming some of the challenges they face. But there’s no movement towards using open data. The number of requests for open data in most EU countries (look at the reports from CADA in France or at tools like Frag den Staat in Germany and Austria) from journalists still range in the few hundreds per year. It’s getting better, but very slowly.

4) We have seen in your portfolio that some of your clients come from the public sector. Is the public administration specially demanding Open Data-based-tools nowadays?

We’re very proud to work for the Île-de-France region, Europe’s biggest region by GDP. They set up a data-driven communication strategy alongside their open data platform, which we help them implement. Many administrations, as well as NGOs and corporations, are realizing that they sit on very valuable data troves. Most are just starting to organizing them and are thinking of making them more open. They understand that more open data will make it easier for them to communicate on their action.

5) You already developed really interesting tools and civic apps (Cartolycées, e-diplomacy, Alertepolitique, Datawrapper, …). Where do all these ideas come from? Could you explain more about the conception process and its context?

Most of our projects start at the coffee table, within the company or with clients and partners. We then take these ideas from a drawing on a napkin to full-fledged products. We sometimes have to find funding in the process. Clients are very open to experimenting with new ideas. In the case of E-diplomacy, for instance, a visualisation of diplomats’ Twitter streams for Agence France Presse, the tool really emerged from a back-and-forth ideation process between us and AFP journalists.

6) We know it might be difficult to choose one, but can you pitch one of your projects in particular? Perhaps the one you consider the most useful?

I’ll take the latest project we released, called SpendingStories. We had this idea with the Open Knowledge Foundation (OKF), which financed the project through a grant from the Knight Foundation. With its OpenSpending project, OKF collects a lot of data on budgets and spending throughout the world. But not many people know how to read, much less make sense of, this data. So we built a very simple interface that let people enter any amount, in any currency, and see how it compares to items in different budgets. We hope it’ll make it easier for journalists to put things into perspective when a politician announces a million or billion-euro plan, instead of resorting to meaningless comparisons such as “this is as much as the GDP of [insert country here]”. You can access the demo version of SpendingStories, which contains data about UK public spending, here: http://okf-spendingstories.herokuapp.com

7) You release most of your projects as Open Source. What is the motivation behind this? What are the benefits for a private company like yours in a market economy?

There are several reasons. One is practical: Open source projects are granted privileges by many companies eager to encourage openness. We don’t pay to host our code at Github and many APIs and other services are free for open source projects. It’s also a great way to showcase our work to other developers and make sure that we code in a clean manner. It’s great to ensure a high quality in our work.

So far, we haven’t coded anything that is worth protecting for its technical value. What we sell to clients is our expertise rather than our code proper. They know that we’ll develop an app or a variation of an app much faster than they would, so it makes a lot of sense for them to pay us rather than simply take the code and do it themselves.

8) Where do you find the data you are working with? Does this data already exist or does it have to be collected before? Is the data already open and available? Which are the Open Data platforms you are using the most?

There’s no fixed rule. Sometimes we’ll tell stories using open data. Sometimes we’ll do a Freedom of Information request. Sometimes we’ll scrape it. Sometimes we’ll obtain it though leaked documents. Sometimes we structure already available data. And if we still don’t find what we need, we crowdsource data collection.

As for open data platforms, the World Bank’s is certainly the most useable. It’s great to see institutions such as the IMF and Eurostat making their data available. But I’m not a fan of the newer brand of data catalogs, à la data.gov. Most of them simply aggregate data that was already published somewhere else and add little value in the process.

9) Let’s talk about what it’s still to come. In your opinion, how will data-journalism evolve in the upcoming years and what are the future steps for Journalism++?

We want to become the number one network of data-journalism companies worldwide: a dozen of financially independent companies operating in close cooperation, so as to be able to launch large-scale journalism projects at anytime and keep hacking things!