My experience building a 100% Open Source based Open Data platform

During the great OKFest 2014, we were lucky to re-encounter the folks from EWMI and Open Development Cambodia (ODC), a non-for-profit organization advocating for transparency that we could get to know during the Open Steps Journey. Since 2011, the team at ODC has been doing an amazing work sharing data with journalists, researchers and human rights activists so they can count with openly licensed information to support their activities in the south-east Asian country. At OKFest, they told us about EWMI’s Open Development Initiative and their plans of what now has become the Open Development Mekong project, an open data and news portal providing content about 5 countries of the Mekong region in South-east Asia. Back then, they were looking for somebody that could give a hand for conceiving and implementing the platform. That’s how I got engaged on this challenging project that has been keeping me busy for the last 9 months.

I’m writing this article to share my personal experience participating in a 100% Open Source project, within an agile and extremely collaborative environment whose outcome in terms of information, knowledge and code are meant to be reused by the community.

The project’s requirements and its architecture

ODC’s site already features lots of information, datasets and visualizations. The team has done a great work getting the most out of WordPress, the CMS software the site is build upon. However, since the main expectations for this new iteration of the platform were to host much more machine-readable data and expose it through both web interface and API, a specific framework for storing, managing and exposing datasets was needed. After analysing the current options out there, we decided to implement an instance of CKAN, which is an Open Source solution for building Open Data portals. Coordinated by Open Knowledge and strongly maintained by a great community of worldwide developers, it was definitely a good choice. Being Open Source not onlymeans that we could deploy it for free, but we could use plenty of extensions developed by the community and get our questions answered by the developers at the #CKAN channel on IRC or directly on the github repositories where the project is maintained.

gen_ii_architecture_Analogue to ODC, the OD Mekong project should present a great amount of news, data and visualizations in a comprehensive manner, allowing users to search within the large amount of contents and sharing them on social networks or among friends. Taking in consideration that the editorial team had already expertise working with WordPress and the fact that it is a widely used, community supported Open Source CMS, we went ahead and deployed a multi-site network instance, featuring one site for the whole region ( Mekong ) and one site for each of the countries ( Cambodia, Thailand, Laos, Vietnam, Myanmar ). The theme chosen for the front-end, called JEO and developed specifically for Geo-Journalism sites, provides with a set of great features to geo-localize, visualize, and share news content. Since OD Mekong’s team works intensively with geo-referenced information ( also an instance of Geoserver is part of the architecture), JEO proved to be a great starting point and thanks to the work of its developers, lots of features could be used out-of-the-box.

To be able to facilitate the complex work-flow of OD Mekong’s editorial team, many WordPress plug-ins were used for aggregating content automatically, presenting featured information in a visual way or for allowing users to provide feedback. Also, we developed WPCKAN, a WordPress plug-in which allows to pull/push content between CKAN and WordPress, the main elements of OD Mekong’s architecture. Although is extensively used across the whole OD Mekong site, this plug-in has been developed generically, so other folks out there can re-use it in similar scenarios.

Working in a collaborative environment

Since the beginning, OD Mekong’s intention is to become a platform where multiple organizations from the region, which share common goals, can work together. This is not an easy task and has conditioned many of the decisions taken during the conception and development.

This collaborative process has been taking place (and will continue) at different levels:

  • Organizations participate on the content creation process. Once credentials are granted, datasets can be uploaded to the CKAN instance and news, articles or reports to the specific country sites. In order to ensure the quality of the contents, a vetting system has been conceived which allows site administrators to review them before they get published.
  • Developers from the community can contribute on the development of the platform. All code repositories are available on Open Development Mekong’s github site and provisioning scripts based on Vagrant and Ansible, both open source technologies, are available for everyone to reproduce OD Mekong’s architecture with just one command.
  • Since this is an interregional endeavour, all components of the architecture need to have multilingual capabilities. For that, many contents and pieces of the software needed to be translated. Within OD Mekong, the localization process relied on Transifex, a web-based translation platform that gives teams the possibility to translate and review software collaboratively. Although not open source anymore, Transifex is free for Open Source projects. I would like to highlight here that the OD Mekong team contributed to the translation of CKAN version 2.2 in Khmer, Thai and Vietnamese languages. Bravo!!

It is also very important to remark the benefits of documenting every process, every work-flow, every small tutorial in order to share the knowledge with the rest of the team, thus avoiding having to communicate the same information repeatedly. For that, since the beginning of the development process, a Wiki had been set up to store all the knowledge around the project. Currently, the contents on OD Mekong’s WIKI are still private but after being reviewed information will be made publicly available soon, so stay tuned!

An amazing professional ( but also personal ) experience

Leaving the technical aspect and going more into human values. I can only say that for me, working in this project has marked a milestone in my professional career. I have had the pleasure to work with an amazing team from which I have learned tons of new things. And not only related to software development but also System administration, Human Rights advocacy, Copyright law, Project management, Communication and a large etcetera. All within the best work atmosphere, even when deadlines were approaching and the github issues started to pile up dramatically :) .

This is why I want to thank Terry Parnell, Eric Chuk, Mishari Muqbil, HENG Huy Eng, CHAN Penhleak, Nikita Umnov and Dan Bishton for the great time and all the learnings.

Learn more

As part of the ambassador programme at, I hosted yesterday a skill-sharing session where I explain, this time on video, my experience within this project. Watch it to discover more…

[one_half last=”no”]


[one_half last=”yes”]


Analysing journalistic data with

detectiveioNo doubt, the power of the internet has changed profoundly the way in which journalists gather their information. To keep up with the growing amount of data digitally available, more and more tools for data-journalists are being developed. They help facing the challenge of handling vast amounts of data and the subsequent extraction of relevant information (here you can find our little collection of useful tools).

One powerful tool is, a platform that allows you to store and mine all the data you have collected on a precise topic. Developed by Journalism++, a Berlin- and Paris-based agency for data-journalism, it was launched one year ago.

By now, several investigations that used the tool have made headlines in Europe, amongst others The Belarus Network, an investigation about Belarus’ president Alexander Lukashenko and the country’s elite affairs by French news channel France24, and, most notably, The Migrants Files, a database on the more than 25,000 migrants who have died on their way to Europe since 2000. According to the developers at Journalism++, the applied methodology, measuring the actual casualty rate per migration route – has now been picked up by UNHCR and IOM. Another example is a still ongoing investigation on police violence, started by, the main news website in the Netherlands.

What does do?

Basically, lets you upload and store your data and search relationships in it bywith a graph search using some network analyses. The tool, which is open source and still a beta version, structures and maps relationships between subjects of an investigation. This can be a vast number of entities such as organizations, countries, people and events.

In its basic version, the tool offers three generic data schemes that help structuring the data you have – for instance on a corporate network, the respective ownerships, branches, individuals involved and so on. To deal with more complex datasets, a customized data scheme is needed. There is no need for special skills to use but one needs to think hard about what elements of information are needed for the analysis before creating the data structure. However, such custom data schemes are not included in the basic version. The team at offers several paid plans that include additional and/or customized data schemes and respective customer support.

There are special offers for NGOs and investigative journalists, too.

Open Steps Directory - 2014-11-09 13-56-12One powerful asset of is that investigations can be shared with collaborators and/or made public. Here you can have a look at what our Open Knowledge Directory looks like on and explore the relations of organizations and individuals by using the graph search.

Currently, the developers at Journalism++ are working on a new GUI/frontend for that will allow every user to edit the data schemes by themselves.

Here you can request an account for the beta version and if you are interested to collaborate in the development of, you can find the tool’s GitHub here.

Introducing the new Open Knowledge directory with PLP Profiles

Bildschirmfoto 2014-10-25 um 11.11.10
During Open Steps’s journey around the world discovering Open Knowledge initiatives, the existence of a global community of like-minded individuals and groups became clear. Across the 24 countries we visited, we could meet people working on Open Knowledge related projects in every single one of them. Currently, and thanks to social networks, blogs, discussion groups and newsletters, this community manages to stay connected and get organized across borders. However, getting to meet the right people can result a difficult task for somebody without the overview or who is who and doing what, specially in a foreign country.

Me and my travel companion, Margo Thierry, started building a contact list as we met new amazing people during this great journey and finally realized that sharing this information would have a positive impact. That’s how the Open Knowledge directory came to life, with its aim of increasing the visibility of Open Knowledge projects and help forging collaborations between individuals and organizations across borders.

After some iterations we are now releasing a new version which not only features a new user interface with better usability and sets a base for a continuous development that aims to fulfill the goal of connecting people, monitor the status of Open Knowledge worldwide and raise awareness about relevant projects and initiatives worth to discover.

Bildschirmfoto 2014-10-25 um 11.11.25One of the main features of this version is the implementation of the Portable Linked Profiles, short PLP. In case you did not read the article I wrote about the inspiring GET-D conference last month where I spoke about it for the first time, you would like to know that PLP allows you to create a profile with your basic contact information that you can use and share. With basic contact information I mean the kind of information you are used to type in dozens of online forms, from registering on social networks, accessing web services or leaving your feedback in forums, it is always the same information: Name, Email, Address, Website, Facebook, Twitter, etc… PLP tries to address this issue but also, and most important, allows you to own your data and decide where you want it to be stored.

By implementing PLP, this directory does not make use anymore of the old Google Form and now allow users to edit their data and keep it up-to-date easily. For the sake of re-usability and interoperability, it makes listing your profile in another directory so easy as just pasting the URI of your profile on it, listo! If you want to know more about PLP, kindly head to the home page or to the github repository with the documentation.PLP is Open Source software and is based on Open Web Standards and Common Vocabularies.

We invite you now to register on our Open Knowledge directory if you are not there yet or update your information if you are. This directory is meant to be continuously improved so please drop us a line if you have any feedback, we’ll appreciate it.

Tabula : Liberating data tables trapped inside PDF-Files @ Buenos Aires, Argentina

In the context of the Open Data movement, we are currently witnessing how organisations (whether public administrations or private corporations) are increasingly releasing data to the public domain. The intention behind this can be of becoming more transparent or to encourage developers to build useful applications on top of the published data.

Bildschirmfoto 2014-05-08 um 13.49.48For the sake of its re-use, this information should be optimally stored in a well-structured and machine-readable file, formatted as XML, CSV or EXCEL. However, this is not always the case and although such organisations are willing to share the data, the format is not properly chosen what, in some cases, makes the information even useless. It is the case of PDF files. PDF is a format originally thought to contain data meant to be printed. That is the reason why this kind of files support paging, paper-like sizing or can contain indexes, but in any case achieves the goal of storing large amounts of structured data as we expect from Open Data.

Activists, journalists or researchers willing to analyse big amounts of information published in PDF files often have to give up on their intention due to the effort associated to extracting all the numbers out of the files. That is why we want to introduce you Tabula, a tool that helps extracting the information contained in tables inside PDF files.

68747470733a2f2f662e636c6f75642e6769746875622e636f6d2f6173736574732f35333132392f3238373935372f36626566656564652d393236352d313165322d396538352d6165386631393337646562332e706e67Developed by Manuel Aristarán with the help of other fellows working on data journalism, Tabula can be installed on every computer (Windows, Mac or Linux) and, as if it was magic, extracts the information from tables present in PDF files, exporting it directly in a nice CSV formatted file. The interface makes the tool really easy to use, allowing the user to “draw” a box to select the relevant information. This saves up lots of valuable time.

Although, it is important to warn that only text-based PDFs are supported by now and not scanned documents, which are in their internal structure significantly different. This is a feature that would make the tool super powerful and is placed on the top of the improvements wish-list. Did we mentioned that Tabula is Open Source? That means that you can contribute improving it if you are a developer (OCR gurus more than welcomed!), contribute with some improvement ideas or give your feedback as user.

Meeting @ Cargografías, Buenos Aires, Argentina

DSCF7160Matter of fact, most of the experts and participants gathered in hackathons and events around Open Data / Open Government come from the IT or media scene. But Open Data and Open Government are not a private club for coders and journalists. You might give your two cents whatever you do. Designers are also part of the hacktivists initiating and developing such projects. And we have enjoyed so much exploring this perspective through the work of Andrés Snitcofsky.

Captura-de-pantalla-2013-10-23-a-las-15.27.59Both graphic designer and professor of heuristics at the University of Buenos Aires, Andrés had the idea in September 2011 to build what became later on Cargografías, an interactive time-line visualisation of the highest positions from the Argentinian political sphere. Users can search by position or name to explore and easily understand how the political framework is structured, what are the relations between the different positions at the power and how the higher politicians have been replaced along the years. The idea arose in the context of the Argentina’s economic crisis in 2011 and, although time was too short to fix it for the presidential elections in October 2011, the tool was finally achieved and revealed to be very useful during the next campaign of 2013.

The project got developed within the group of Hacks/Hackers Buenos Aires (HHBA), created at the same time in 2011 and which counts nowadays nearly 2500 members, the biggest local group in Latin America of the international grassroots journalism organization. With other members of HHBA, Andrés collected the raw data, researching on wikipedia or scrapping the information from other relevant sources before sorting it out manually into spreadsheets. The actual version 2.0. is the result of this collaboration and, even if it already represents a great piece of work, some updates are needed and new features could be added to extend the current capabilities. The users’ feedback, set as a participative function, help to point out what could be improved and also which contents have to be completed.

The initial team has sadly been changed and Andrés is now looking for a developer to implement the next version. This is a call for a coder! Cargografías will be released as Open Source as soon as some help (no matter from Argentina or not) will be found, since the tool is definitely worth to be replicated in further countries and political contexts.

Ushahidi: Open Source platform for collaborative data collection @ Nairobi, Kenya

logo_300It is not necessary to say that software plays a very important role in the current Open Data scene. Developers are creating brilliant pieces of code that make working with data a fast, efficient and sometimes even fun experience. This also applies to data collection. Because sometimes it is not possible to find the data we are looking for, we are in need of gathering it ourselves. Ushahidi is a platform you will like to look at if you are in this situation.

In a nutshell, it allows citizens to make reports in a collaborative way, creating crowdsourced interactive maps. With a very intelligent approach, Ushahidi gives citizens the possibility to use the web, their smartphones and even SMS to gather data, which makes this technology accessible almost everywhere and for everyone. Originally created in Kenya to serve as an instrument for social activism and public accountability in crisis situations, the software has proven to be a great companion worldwide in bringing advocacy campaigns to a successful end. The team behind Ushahidi has not only created a world-changing technology but also they share it with others since it is released as Open Source. We contacted Chris Albon, director of data projects, and asked him some questions so you can learn more about this great tool.


1) Hello Chris, could you first introduce yourself? Briefly, what are the activities of Ushahidi as company and what is the purpose of your main product, the Ushahidi platform?

My name is Chris Albon and I am in charge of Ushahidi’s data work. Ushahidi is a Kenyan technology non-profit that builds platforms, tools, and communities extending from rural Kenya to the coast of Louisiana. As a disruptive organization we believe our place is at the bleeding edge; it is part of our organization’s DNA.

Beyond leading a movement in crisis mapping through mobile phones and the internet, and revolutionizing an industry of data-use to solve problems, we helped build the iHub in Nairobi, creating a new model for innovation and tech startups in the region and changed the perspective about where innovation comes from.

The core Ushahidi platform for data collection and changing the way information flows is now used in 159 countries around the world. It has been translated into 35 languages and has been deployed over 50,000 times. In addition, the iHub has grown to more than 10,000 members, spun out 28 companies and spawned a movement of tech hubs across the African continent.

2) Quoting your website, “the tool contributes to democratise information, increase transparency and lower the barriers for individuals to communicate their stories”. Could you give us some examples of Ushahidi-based initiatives that have succeeded on their goals?

There are many examples of Ushahidi tools being used to democratize information, from fighting sexual harassment in Egypt to civil society activism in Ukraine. For us, success is a user able to gain new knowledge and power from data from the crowd.

3) Who is actually using the Ushahidi platform? Are they individuals, NGOs, activists, public administrations? Is there a topic or an issue that is much more addressed among all the users? In which countries/continents has the platform been more actively used? Why?

Ushahidi is used by all manner of people and organizations, from small non-profits wishing to monitor an election to international organizations tracking disaster relief efforts. The platform is used globally and on a whole spectrum of issues.

4) We have recently written an article focused on data-journalism. Is the Ushahidi platform currently being used also for journalistic purposes? How can data-journalists work with it?

Ushahidi has been used to gather new data and reports from journalists. My particular favourite comes from 2012: Al Jazeera used Ushahidi to tell a story previously almost entirely untold in the international media about what Ugandans thought about Joseph Kony.

5) We see on your website that you have other products too. Could you tell us a bit more about Crowdmap, BRCK and Swiftriver?

Crowdmap is our hosted geo-story telling platform, allowing people add a layer of “place” to things that matter to them. Swiftriver is our product for tracking and understanding the social web. Ping is an app we built after the West-gate attack to help people report in that they are okay after an emergency. Finally, BRCK is our rugged router for maintaining data connectivity no matter the environment.

6) You are working hard to build up a community (support, development wiki, help forum,…). What kind of contributors are getting involved in it? How big has been the impact of it for the development of Ushahidi’s products?

Ushahidi’s community has had a huge impact on the products development in a wide variety of areas, from volunteering during deployments of the software, to bug testing, to developing new features. We could not do what we do without them.

7) Ushahidi develops open source software. What are the reasons and benefits for a company like yours for making the code available for everyone? Reading your site, “we have also built a strong team of volunteer developers in Africa, but also in Europe, South America and the US”. Is this engagement a consequence of the Open Source collaborating philosophy?

Absolutely. The open source nature of the software makes community involvement possible. If our software was not open source, there would be very little way for our great community to help us make the software better.

8) Ushahidi provides services (consulting, customization, deployment) around the platform. What kind of organisations do you count under your clients? Besides this, do you rely on other financial resources?

Ushahidi is lucky enough to have a set of great organizations supporting our work, from the Rockefeller Foundation to to many others. In addition we also provide additional services for users who want some technical customization, training, or strategic guidance in the deployment of the platform or management of crowdsourced data.

9) Recently, you have released the version 3.0 of your platform. Can you give us an insight of the new features? Also, what are the kind of development of your products we can expect in the upcoming versions? In general, which are the next steps for Ushahidi?

Ushahidi has had the same platform code base for 5 years. A year ago we spent the time to do very deep user experience research within our user base, our developer base, and our own team in order to build a new, better Ushahidi core platform. We call it “v3″.

The purpose of v3 (The next generation of the platform) is to provide a better crowdsourcing platform, so that the leaders, crisis responders, funders, and decision making organizations can do their work more efficiently, gather better information, and understand what’s happening on the ground. It is a data collection platform that makes gathering and organizing data easy. It is a mobile first platform, as always, thinking of people with simple phones and moving up to those with web access for a beautiful visual feel.

Many thanks for all your time!

“Open Data Now”: an inspiring reading for Open Data entrepreneurs @ New York, USA

open-data-coverWritten by Joel Gurin, “Open Data Now” has being published early this year and “presents a strategy for success in the coming era of massive data”. The author, a former sciencejournalist who also worked as consumer advocate before turning into the federal administration, has the experience to give us an overview on how Open Data is affecting both private and public sector. Although useful for everyone interested in the topic, this book is specially dedicated for entrepreneurs, small business owners and corporate executiveswilling to build additional value on top of it. Not to forget that its subtitle reads: “The Secret to Hot Startups, Smart Investing, Savvy Marketing, and Fast Innovation”.

But also for citizens, advocates and researchers

The 14-chapters-book begins defining the concept of Open Data and shares details on how the movement got developed in its origins in the US. Already in the introduction, the author remarks the positive impact of Open Data in the private sector, and this focus remains present along the entire book. However, it is described how Open Data also acts as a regulatory mechanism that pushes organisations towards being more transparent. Different cases are presented where data is used to raise awareness on social or local issues, to improve public safety or to condemn irregularities that affect citizens.

But not only private companies and governments are being influenced by this new movement. The scientific research is a field where sharing data is also playing a very important role. Stressing the concept of Open Innovation, chapter ten gives examples of research institutions which are directly profiting from an increasing amount of data being released. Collaboration between scientists and crowdsourcing strategies are defined as new elements for the success of academic and scientific challenges. Reading these pages, we thought automatically about, the initiative for sharing genomics data for research we previously covered.

As the reader can notice, the book focuses on the status of Open Data in the US and UK. Most of the examples come from there. It is clear that both countries are leaders in this global movement, but what happens with others? Can the contents of this book be applied to other parts of the world? We were curious about this and asked the author:

  • Mr. Gurin, your research is mainly based on the context of the US and UK. As you mentioned, Open Data is now a global topic and we can find actors in every continent. What was the reason to leave out other countries? Should we expect a second book with a worldwide approach?

I can’t say yet whether I’ll write a second book – I’m still spreading the word about this one! I focused on the US and UK in Open Data Now because this book is largely focused on business applications of Open Data, and my sense is that those have developed first and most extensively in those two countries. However, the Open Data 500 project (see below) has attracted interest from countries around the world, and we’re now preparing to replicate it in a dozen countries or more. I hope that will help bring a broader international perspective to the field.

Personal data for consumer’s benefit

In the third chapter, the concept of smart disclosure gets presented as a tool to help consumers take better decisions and spend money more wisely in different areas such as healthcare, energy or education. Furthermore, an efficient use of open governmental data leads to the creation of new business opportunities, some of them are illustrated along these pages.

The same chapter is also dedicated to the value of personal data. Although this information should not be qualified as open, its use offers benefits both for consumers and service providers (i.e optimizing shopping, helping to choose the best health insurance or finding a suitable house). It is in this part that we experienced for the first time about the “Blue Button” and “Green Button” initiatives which allow patients and consumers in the US download their medical and energy consumption reports respectively. We asked Mr. Gurin a second question in order to get more information about this:

  • Mr. Gurin, do you think that users are ready to share their personal data with third parties? At what price? Will this kind of data get the same momentum as Open Data has? Is the internet enough safe to allow a sustainable and secure development of this area?

This is a great question, and one that we can’t answer yet. My best guess is that consumers will be attracted to “personal data vaults” – the new technologies for storing your personal data in a secure way – because they promise a way to keep individual data safe and under the user’s control. Once personal data vaults become common, they’ll offer the opportunity for people to share their personal data selectively and securely with third parties who can help them by knowing more about them. Whether we can make the Internet safe enough to prevent serious data breaches, however, remains a question.

“Open Data Now” is not only a book

Bildschirmfoto 2014-01-25 um 10.56.38As the author states,“ the world of Open Data is moving fast, and no book on this topic can be completely current”. That’s why Mr. Gurin has created the website which contains a blog with news and links to follow the latest developments, debates and opportunities around the topic. We encourage you to visit it to stay updated and also discover about the Open Data 500 project: a study run by The Governance Lab where the author serves as senior advisor. It consists on identifying 500 of the US companies that use open government data to generate new business and develop new products and services. The upcoming release is planned for early 2014 and will allow researchers to download collected data. A very interesting idea that will definitely help to monitor the influence of Open Data in the business sector in the US.

Get the book

Feeling interested? You can get the book or read the first chapter here!

HackYourPhd: reporting on Open Science from the US @ Boca Raton/Paris, USA/France

carte-voyage-HYPhDUS_rev2A summer trip through the US to discover and document Open Science projects? When we first heard about HackYourPhd, we were excited to notice how similar is the concept of their research with our own. The idea was initiated last year by two young french researchers, Célya Gruson-Daniel & Guillaume Dumas, and “aims to bring more collaboration, transparency, and openness in the current practices of research.” Célya travelled during 3 months from Boca Raton (Florida) to Washington DC, gathering information and meeting people and groups active in the Open Science scene.

While this roundtrip in the US is now over, HackYourPhd is still active and has become an online community where the research continues. Read below the interview with the two persons behind this fantastic initiative and discover how the idea came to life, the insights of the trip and what is coming next.

1) Hi Célya & Guillaume, you both co-founded HackYourPhd, a community focused on Open Science which gave a globetrotter-initiative in the US last year. We are really curious how did you get this idea and to know more about it. Don’t forget to introduce yourself and the concept of Open Science too!

Hi Margo & Alex, thanks for this interview. We discovered a few months ago your great project. Now, we are much happy to help you since it is a lot related to what we tried to do last summer with “HackYourPhD aux States”. But before speaking about this Open Science tour across the USA, let’s us remind first the genesis and the aim of HackYourPhD in general. HackYourPhD is a community which gathers young researchers, PhD and master students, designers, social entrepreneurs, etc. around the issues raised by the Open Science movement. We co-founded this initiative a year ago. The idea of this community emerged from our mutual interest to research and its current practices. Guillaume is indeed postdoc in cognitive science and complex systems. He is also involved in art-science collaborative projects and scientific outreach. Célya is specialized in science communication. After two years as community manager for a scientific social network based on Open Access, she is now working in science communication for different projects related to MOOCs and higher education. We are both strong advocator for Open Science and that mainly why we came up with HackYourPhD. While Guillaume has tried to integrate Open Science in his practice, Célya wanted to explore the different facets with a PhD. But before, she wanted to meet the multiple actors behind this umbrella word. This is what motivated “HackYourPhD aux States,” the globetrotter-initiative per-see.

2) Why did it make sense especially in the US to follow and report Open Science projects? Could you imagine yourself doing it in other countries? What about France?

Because this was in the English speaking country that the Open Science movement has been started. That is thus also there that it is the most developed to date, from Open Access (e.g. PLoS) to the hackerspaces (e.g. noisebridge). There is also a big network of entrepreneurs in Open Science, which is specifically an aspect we were interested in. Célya thus decided to first look at the source of the movement and take time (three month) before doing a similar exploration in Europe with shorter missions (e.g. one week). Concerning France, we have still begun to monitor what is taking off, from citizen science to open data and open access. While we have certainly a better vision, the movement is still embryonic. But the movement will also take other forms and that is also what we are interested in. Célya is thinking to make her PhD in a research action mode, being observer and actor in this dynamical construction of the French Open Science movement.

3) From our experience, we could schedule our encounters and events both before starting the journey and on the way. Is that the same for you? How did you select your stops, the projects documented and persons interviewed? Is Open Science a widespread topic or it was actually difficult to find cases for your research?

Célya had already a blueprint of the big cities and the main path to follow. With the help of the HackYourPhD community, she gathered many contacts and constitute a first database of locations to visits and people to meet. Before starting, the first step—San Diego and the bay area—was almost scheduled. Then, the rest of the trip was set up on the way. Few important meetings were already scheduled of course (e.g. the Center for Open Science, the Mozilla Science Lab, etc.) but across the travel, new contact were given spontaneously by the people interviewed. Serendipity is your friend there! Regarding difficulties to find cases, this is quite function of the city. While San Francisco was really easy, Boston for example, which is full of nice projects, was nevertheless more challenging.

4) We know it is difficult to point out just one of them … but could you tell us what is your favourite or one of the most relevant Open Science initiatives you have discovered?

When Célya was in Cambridge, she visited the Institute for Quantitative Social Science. She met the director of the Data Science, Mercè Crosas and her team. Célya discovered the Dataverse Network project. It is one of the most relevant Open Science initiatives she discovered. Indeed, this project combines multiple facets of Open Science. It consists in building a platform allowing any researcher to archive, share and cite his data. It has many functionalities cleverly linking it to other aspects of Open Science (open access journal with OJS, citation, alt-metrics..). Here are the interview Mercè Crosas

5) As we discussed previously with Fiona Nielsen, sharing knowledge in the scientific domain has a positive impact. After your research, why does Open Science matter and how does it change the way scientists have been working till now?

Open Science provides many ways to increase efficiency in scientific practices. For example, Open Data allows research to better collaborate; while this solution seems obvious to many, it appears as a necessity when it comes to big science (e.g. CERN, ENCODE, Blue Brain, etc.) Open Data means also more transparency, which is critical to solve the lack of reproducibility or even frauds.

Open Access presents several advantages but the main one remains the guarantee to access scientific papers to everyone. As a journalist, Célya faced many times the issue of paywalls, and this is always frustrating. Last but not least, Open Science opens up new possibilities for collaboration between academia and other spheres (entrepreneurs, civil societies, NGO, etc.) Science is a social and collective endeavour, it thus needs contact with society and leave its ivory tower. The Open Science movement is profoundly going in that direction, and that why it matters.

6) As you know, Open Steps focuses on Open Data related projects. Quoting you, “In Seattle, I noticed a strong orientation of Open Science issues around Open Data.”, could you tell us more about this relation and the current situation in the US? Could you point us to any relevant Open Data initiative that we might want to document?

Open Data depends on scientific fields. Indeed, Seattle was a rich environment on that topic, but this is certainly caused by the software culture in the city (Amazon, Microsoft, etc.) The Open Data topic is related to Big Data. Thus, the key domains are genetics, neuroscience, and health in general. Lot of projects are interesting. We already mentioned the Dataverse Network, but you may also enjoy the Delsa Global Project (interview with Eugene Kolker) or Sage Bionetwork.

7) There are a lot of sponsors supporting you. Was it easy to convince them? Is that how you finance 100% of the project or do you have others sources of income?

All the sponsors were done thanks to the crowdfunding campaign on KissKissBankBank. This is not a question of convincing them, they just demonstrated the need of covering the topic of Open Science in France. Their financial help represents 36% of the total amount collected.

Their were no other source of income. The travel was not expensive since Célya used the collaborative economy solutions (couchsurfing, carpooling, etc.)

8) Now the trip is over …. but HackYourPhd still running. How does it go on now?

We are pursuing the daily collaborative curation, with almost a thousand people on our Facebook group. We are also organizing several events, mainly in Paris but with a growing network with other cities and even countries. The community is self-organized but needs some structure. We are currently thinking about this specific issue and hope 2014 will be a great year for the project!

Merci à vous deux!

Mapping Open Data with CartoDB @ Madrid/New York, Spain/USA

logos_full_cartodb_lightIf you have been following Open Steps, you know that a great part of the project consists on running a workshop on Open Data visualisation in the different cities visited. In these sessions, after going through some theory, we get hands on and teach how geo-referenced datasets can be represented on a map. We wanted to teach an easy but powerful tool that could be used by everyone, so we chose CartoDB. And it was a good choice!

Greatly based on Open Source software, this online platform has been conceived to serve journalists, designers, scientists and a large etcetera in the task of creating beautiful and informative interactive maps. The developers behind the tool had Open Data in mind since the first days and fact is that importing and visualizing datasets couldn’t be easier and faster. In addition, great features such as dynamic visualizations, support for your favourite Open Data formats and the endless possibilities of its Javascript API allow beginners but also big organisations (NASA, The Guardian, National Geographic among others) to tell stories with numbers.

Andrew Hill, member of the team, took some time and answered our questions about the creation and philosophy of the tool, its Open Source core and the importance of Open Data for educational, scientific and social development. We invite you to find out more about CartoDB here:

1) Hi Andrew, can you introduce yourself briefly and explain us what CartoDB is?

Hi, I’m the senior scientist at Vizzuality and CartoDB. CartoDB is our online mapping platform that we built to let people make beautiful interactive maps easily.

2) Your company, Vizzuality, is based between Madrid and New York. What is the story behind its creation? Besides CartoDB, are you working on other products or have other activities?

Vizzuality was created by our co founders, Sergio Alvarez and Javier de la Torre, both from Madrid. Our first office was in Madrid where we started to grow the company. It wasn’t until a couple years later that Javier and I moved to New York to start the office here. The idea was just to grow and explore new collaborations.

Right now, our biggest focus by far is CartoDB. There is a lot of innovation around maps on the web right now and we are really enjoying contributing to it. CartoDB has become more than we could ever have imagined and now we can see so many ways to keep making it more incredible, so I’m sure we’re going to be focused on it for some time to come.

3) Let’s focus on CartoDB, since it is the tool we are teaching on our workshop. Who is currently using it? Journalists, designers, developers? Can you point us to remarkable projects making use of all the possibilities the tool has to offer?

Yeah, all of those people, plus students, governments, city planners, nonprofits, you name it :)

Sure, I think one of the best places to find recent examples is our blog or on Twitter. Some highlights include:

Twitter has been using us for a lot of quick visualizations

and many more…

4) CartoDB, as the rest of your products, is based on open source software and its code is released to the public domain. What is your motivation behind this decision? For your company and the development of your products, what is the impact of choosing an Open Source license?

We have always been committed open source. Largely it has to do with our background as a scientific company, working with and interacting with scientific research it seemed obvious to us that science benefits greatly from open source. Not only does it benefit from it, it almost seems irresponsible to do anything else.

With the importance of maps in society, I feel it also seems irresponsible to rely on black boxes for mapping. CartoDB doesn’t hide anything from you, it is there for you to criticize, improve or change as you need.

5) As we know, Open Source does not necessary exclude commercial products. What is the business-model for your products?

We offer a lot of incentives on top of our hosted service. Including our caching, backups, uptime, maintenance, upgrades, etc. With paid hosting plans you also get dedicated support and access to the foremost experts of CartoDB to help you become a better mapper, data visualization expert, or GIS expert on our platform. So there is a lot of benefits that using our hosted platform can bring to businesses and individuals and we are seeing already that businesses are being built around that, it feels great.

6) Let’s talk about the community around CartoDB. Do you receive feedback from users or from developers to improve the tool? How important is for an Open Source-based product to count with such contributions?

We have received a lot of feedback from our users including feature requests. We also do our best to contribute to the open source libraries that are used by CartoDB, so it is very much a community effort and that community is what makes it all possible for sure.

7) On our workshop, we teach how to import and visualise Open Data with CartoDB. Is the tool specially thought to be used with Open Data? In your opinion, why does Open Data and its visualisation matter?

We think about open data when developing CartoDB all the time. I wouldn’t say that is the sole target of our tool development, a lot of private companies are using CartoDB to analyse and map data that is part of a business offering, so not open. However, we think that visualizing open data can be a very powerful method of educating and demonstrating it’s contents and importance. The title of a recent article about some maps I created shows that I’m not alone in thinking that.

8) We recently saw that you have released great new features (dynamic visualisation, live data feeds,…). How do you set the priorities of the features you are developing? What are the next features you are working on? And in general, how does the future for CartoDB look like?

I’d say we balance three things as best we can when going for new features in CartoDB: what users express they want or need, what we see as improvements that can be made in performance, simplicity or design, and functionality that we see as innovations that we hope users will love :)

Thanks Andrew!


Meeting @ Open Development Cambodia , Phnom Penh, Cambodia

ODC-LogoIf you happen to search for Open Data initiatives in Cambodia, Open Development Cambodia is definitely going to appear on the top of the results list. Started in 2011 as a project under the activities of the EWMI and on the way to be registered as a NGO, ODC represents the most active effort in the South-East-Asian country to collect, use and share data for social improvement.

With a strong philosophy of objectivity and independence, the team does not focus on advocacy in particular sectors nor does it pursue any agenda, other than aggregating and offering information to the public in easily accessible forms. Self-defined as an intersection between NGO, media platform, and think-thank, ODC concentrates its resources on aggregating data (which necessarily must be already available somewhere in the public domain) and creating objective briefings, maps, and graphics available for everyone to download, analyse and re-use. Sources are quoted and even the methodology they employed to create these contents is transparent and can be found on their site. That is what can be understood as an open way of working.

Bildschirmfoto 2014-01-08 um 15.35.44Among other contents, we learned about their forest cover page. At the heart of the page are animated forest cover change maps developed based on analysis of satellite imagery released in public domain by NASA. These maps and accompanying graphics provides information about the extent and rate of Cambodia’s forest cover change over the past 40 years. This and other information found on the site has been already used by NGOs, bloggers, journalists, researchers, grassroots groups, rights advocates and even government technocrats and investors to inform their research, reporting, analysis, and planning. As an example, the local rights-focused website uses maps from ODC as base layers on which they add other analysis. An interesting statistic: since its creation, their website has counted visits from users from almost every country and state of the world, although the majority of users are Cambodians.

All this, in a country whose administration is not particularly supportive when it comes to releasing data to the public domain or sharing information with its citizens. It is important to note that there is currently no Freedom of Information laws in Cambodia, even an attempt to pass a draft law was rejected in January 2013. At the time we are writing these lines, there is no Open Data platform initiated or planned by the government.

PRAJ2Jul2013bHowever, the remarkable work of organisations such as ODC and the presence of a newly created local chapter of the OKFN are examples of the current will to fill the gap and realise a positive development of openness and transparency for Cambodia. Talking about what is to come, ODC team will add interesting new features on their platform, such as and API, to improve user experience and more effective access to their aggregated datasets. The site will also be available in Khmer language within the next few months.