My experience building a 100% Open Source based Open Data platform

During the great OKFest 2014, we were lucky to re-encounter the folks from EWMI and  Open Development Cambodia (ODC), a non-for-profit organization advocating for transparency that we could get to know during the Open Steps Journey. Since 2011, the team at ODC has been doing an amazing work sharing data with journalists, researchers and human rights activists so they can count with openly licensed information to support their activities in the south-east Asian country. At OKFest, they told us about EWMI’s Open Development Initiative and their plans of what now has become the Open Development Mekong project, an open data and news portal providing content about 5 countries of the Mekong region in South-east Asia. Back then, they were looking for somebody that could give a hand for conceiving and implementing the platform. That’s how I got engaged on this challenging project that has been keeping me busy for the last 9 months.

I’m writing this article to share my personal experience participating in a 100% Open Source project, within an agile and extremely collaborative environment whose outcome in terms of information, knowledge and code are meant to be reused by the community.

The project’s requirements and its architecture

ODC’s site  already features lots of information, datasets and visualizations. The team has done a great work getting the most out of WordPress, the CMS software the site is build upon. However, since the main expectations for this new iteration of the platform were to host much more machine-readable data and expose it through both web interface and API, a specific framework for storing, managing and exposing datasets was needed. After analysing the current options out there, we decided to implement an instance of CKAN, which is an Open Source solution for building Open Data portals. Coordinated by Open Knowledge and strongly maintained by a great community of worldwide developers, it was definitely a good choice. Being Open Source not only means that we could deploy it for free, but we could use plenty of extensions developed by the community and get our questions answered by the developers at the #CKAN channel on IRC or directly on the github repositories where the project is maintained.

gen_ii_architecture_Analogue to ODC, the OD Mekong project should present a great amount of news, data and visualizations in a comprehensive manner, allowing users to search within the large amount of contents and sharing them on social networks or among friends. Taking in consideration that the editorial team had already expertise working with WordPress and the fact that it is a widely used, community supported Open Source CMS, we went ahead and deployed a multi-site network instance, featuring one site for the whole region ( Mekong ) and one site for each of the countries ( Cambodia, Thailand, Laos, Vietnam, Myanmar ). The theme chosen for the front-end, called JEO and developed specifically for Geo-Journalism sites, provides with a set of great features to geo-localize, visualize, and share news content. Since OD Mekong’s team works intensively with geo-referenced information ( also an instance of Geoserver is part of the architecture), JEO proved to be a great starting point and thanks to the work of its developers, lots of features could be used out-of-the-box.

To be able to facilitate the complex work-flow of OD Mekong’s editorial team, many WordPress plug-ins were used for aggregating content automatically, presenting featured information in a visual way or for allowing users to provide feedback. Also, we developed WPCKAN, a WordPress plug-in which allows to pull/push content between CKAN and WordPress, the main elements of OD Mekong’s architecture. Although is extensively used across the whole OD Mekong site, this plug-in has been developed generically, so other folks out there can re-use it in similar scenarios.

Working in a collaborative environment

Since the beginning, OD Mekong’s intention is to become a platform where multiple organizations from the region, which share common goals, can work together. This is not an easy task and has conditioned many of the decisions taken during the conception and development.

This collaborative process has been taking place (and will continue) at different levels:

  • Organizations participate on the content creation process. Once credentials are granted, datasets can be uploaded to the CKAN instance and news, articles or reports to the specific country sites. In order to ensure the quality of the contents, a vetting system has been conceived which allows site administrators to review them before they get published.
  • Developers from the community can contribute on the development of the platform. All code repositories are available on Open Development Mekong’s github site and provisioning scripts based on Vagrant and Ansible, both open source technologies, are available for everyone to reproduce OD Mekong’s architecture with just one command.
  • Since this is an interregional endeavour, all components of the architecture need to have multilingual capabilities. For that, many contents and pieces of the software needed to be translated. Within OD Mekong, the localization process relied on Transifex, a web-based translation platform that gives teams the possibility to translate and review software collaboratively. Although not open source anymore, Transifex is free for Open Source projects. I would like to highlight here that the OD Mekong team contributed to the translation of CKAN version 2.2 in Khmer, Thai and Vietnamese languages. Bravo!!

It is also very important to remark the benefits of documenting every process, every work-flow, every small tutorial in order to share the knowledge with the rest of the team, thus avoiding having to communicate the same information repeatedly. For that, since the beginning of the development process, a Wiki had been set up to store all the knowledge around the project. Currently, the contents on OD Mekong’s WIKI are still private but after being reviewed information will be made publicly available soon, so stay tuned!

An amazing professional ( but also personal ) experience

Leaving the technical aspect and going more into human values. I can only say that for me, working in this project has marked a milestone in my professional career. I have had the pleasure to work with an amazing team from which I have learned tons of new things. And not only related to software development but also System administration, Human Rights advocacy, Copyright law, Project management, Communication and a large etcetera. All within the best work atmosphere, even when deadlines were approaching and the github issues started to pile up dramatically 🙂 .

This is why I want to thank Terry Parnell, Eric Chuk, Mishari Muqbil, HENG Huy Eng, CHAN Penhleak, Nikita Umnov and Dan Bishton for the great time and all the learnings.

Learn more

As part of the ambassador programme at, I hosted yesterday a skill-sharing session where I explain, this time on video, my experience within this project. Watch it to discover more…

[one_half last=”no”]


[one_half last=”yes”]


Mozilla Weekend is coming to Berlin

In less than 2 weeks, Berlin will be lightened up by one of the flagship Mozilla community events: Mozilla Weekend, organized on the 11th and 12th.
As the name suggests, the whole weekend is dedicated to Mozilla, its products and its initiatives, especially, but not limited to, Firefox and Firefox OS. After the German speaking community meetup in February, Mozilla Weekend aims to cater to new contributors and help the onboarding process.

The first day of the event (Saturday) will be filled with presentations and will take place at the Wikimedia Offices, while the second day will focus on workshops. Also, don´t miss out on the AMA (ask me anything) sessions as the Mozilla Leadership will be there!
The variety of presentations offers something for anyone, no matter if technical or not. Afterall, the passion for the open internet is the greatest common ground for us. You can register your (free) ticket via Eventbrite on
Of course there will be free goodies and drinks, so even if you cannot attend the whole day, feel free to pass by!


Wikimedia Office (Tempelhofer Ufer 23-24)

View Larger Map

Mozilla Office (Voltastr. 5)

View Larger Map

Open Source Conference Albania, OSCAL 2015

[alert type=”info” title=””]This is a blog post written originally by Redon Skikuli on his blog and has been aggregated with the author’s permission. [/alert]


OSCAL (Open Source Conference Albania) is the first international conference in Albania organized by Open Labs to promote software freedom, open source software, free culture and open knowledge, concepts that originally started more than 25 years ago.

The second edition of the conference will take place at 9 & 10 May 2015 in Tirana (Godina Liria) and will gather free libre open source technology users, developers, academics, governmental agencies and people who share the idea that software should be free and open for the local community and governments to develop and customize to its needs; that knowledge is a communal property and free and open to everyone.

I’m exited, proud and lucky to be part of the organizing team of the second edition of the event, working with a great group of Albanian FLOSS enthusiasts that know how to create qualitative projects in a decentralized way. This edition is organized in the most decentralized way of working possible in the decision making process and the software used to document and plan activities and tasks. These tools include, but are not limited to Etherpads, Telegram for chat and WordPress for the maintenance of the website. Unfortunately in some cases we also used some proprietary cloud services, but we are planing to change this in the next edition.

Working and taking decision in a decentralized way is not only amazing, but also the the key theme of my talk during the first day and is also the main message we want to share with the participants during OSCAL 2015.

Here is the list with some of the inspirational speakers for this year, the agenda, the blog section with all the latest news, a humble guide to Tirana for our friends from abroad, some banners in case you dig the whole thing and want to spread the #OSCAL2015 vibe and the mobile app, your companion during the event. There will also be competitions, side events related to Open Street Map, LibreOffice, Mozilla and Wikipedia and a massive after-party.

Participation is free of charge, but online registration is required.

Looking forward for the result of months of hard work from all the team and the amazing volunteers in the second weekend of May 2015!

Mozilla German-speaking Community Meetup 2015 in Berlin

I had the pleasure to be invited to the annual Mozilla german speaking community meetup in Berlin this year. Although I am based in Albania and not in Germany, Austria or Switzerland; I contribute from time to time also to the German community, having helped out for the Firefox 10h Anniversary campaign and various other stuff (Firefox has a market share of almost 50% in Germany!).

As I grew up in Germany, I am quite familiar with the culture and speak the language also fluently. However I am most of the time unable to put my German into good use in Albania, for obvious reasons, so it always feels good to practice it.

This was my first time in Berlin and my first time in Germany in almost 4 years. I never visited a Mozilla office before either, so I was really excited for the meetup this year.

Disclaimer: This is a short summary from everything which happened during the community meetup. I am including here Michael Kohler’s notes from his blog, simply due to laziness. Kudos to Mexikohler for being so awesome! Check out his blog for the German version also.

Day 1

The meetup was held on February 20 to February 22 2015. To facilitate the coordination between all volunteers and staff living/working in the German speaking countries (Germany, Austria, Switzerland) we meet once a year to discuss any topics, plans and goals for the year. Further it’s important to meet regularly to have certain discussions in person since these are faster and more efficient. In total 27 persons attended this meetup.

On Saturday we started the first official day at 10am.

Start End Topic Duration Who?
10:00 10:30 Getting to know each other, Mozilla in general 30′ Everyone
10:30 12:00 Introductionary Discussions + Mozilla Goals 1h 30′ Everyone
12:00 13:00 Discussions / Group Planning 1h Groups
13:00 14:00 Lunch in the Office 1h Everyone
14:00 15:30 Feedback of the working groups + Discussions 1h 30′ Everyone
16:30 17:30 Participation 2015 (English) 1h Everyone
17:30 19:00 Community Tiles 1h 30′ Everyone
20:00 22:00 Dinner 2h 30′ Everyone

We began the meetup with a short introduction round since not all of the attendees knew each other. It was nice to see that from all around the Mozilla projects people came to Berlin to discuss and plan the future.

After that Brian introduced us to Mozilla’s goals and plans for 2015. Firefox (more focus on Desktop this year), Firefox OS (user driven strategy), Content Services (differentiate income) and Webmaker were the focus. To reach our goals for the community we also need to know about Mozilla’s overall goals so we can align them.

To know where we currently stand with our community, we did a “SWOT” analysis (Strength, Weaknesses, Opportunities, Threats).


  • • L10N:  amount of work that was done and the quality of it
  • • a lot of different projects are worked on by the community
  • • we had more (and more impactful) events in 2013
  • • Being spontaneous
  • • …


  • • a lot of work
  • • “bus factor”
  • • communication
  • • not a lot of social media activities
  • • weekly meetings aren’t very efficient
  • • ….


  • • Web Standards
  • • Rust
  • • Privacy
  • • Firefox Student Ambassadors
  • • …


  • • Fragmentation
  • • Chrome + Google Services
  • • …


We splitted up in different groups to discuss group-specific topics and report back to everybody. We had “Localization”, “Developer Engagement / Programming”, “Community Building” and “Websites”.

We discussed the first outcomes of the groups together. Please refer to day 2 to see the results.

Markus, a local developer from Berlin, came by on Saturday. He’d like to organize regular events in Berlin to increase the presence of Mozilla in the city and to build a local community. We like this idea and will support him in 2015!

(Photo: Mario Behling)

After the group discussions Brian had further information: Participation. Please refer to Mark Surman’s blogpost to get more information about that.

At the end of the official part of the day we had a discussion about the “Community Tile”. When you open a new tab in a new Firefox profile you’ll see an overview of different sites you can visit. One of these links is reserved for the community. We discussed our proposal and came to the conclusion that we should focus to tell everyone what the German speaking community does and especially that there are local people working on Mozilla projects.


CommunityTiles(Photo: Hagen Halbach)

Want to see who was there? See for yourself!

(Photo: Brian King)

You can find all pictures of the meetup on flickr.

Day 2

On Sunday we once again started at 10am at the Berlin Office.

Start End Topic Duration
10:00 13:00 Plan 2015 / Events / Goals / Roles etherpad 45′ Everyone
13:00 13:45 Content 45′ Everyone
13:45 14:15 IRC Meeting + Summary Meeting 30′ Everyone
14:00 … Departing or other discussions … Everyone

At first we had the same breakout groups again, this time to evaluate goals for 2015. After that we discussed those together with the whole group and decided on goals.


The l10n group has worked out a few points. First they updated multiple wiki pages. Second they discussed several other topics. You can find the overview of topics here.


  • • Finish the documentation on the wiki
  • • Get in touch with the “Localizers in Training”


SUMO has done an introduction into the new tools. Further they decided on a few goals.


  • • Have 90% of all articles on SUMO translated all the time
  • • For Firefox releases all of the top 100 articles should be translated



  • • organize a “Mozilla Weekend” (this does not only cover developers)
  • • give a talk on Jetpack
  • • continue the Rust meetups
  • • developer meetups in Berlin
  • • recruit 5 new dev contributors

 Community Building

In the community building group we talked about different topics. For example we looked at what’s working now and what’s not. Further we talked about Firefox Student Ambassadors and recognition. You can find the overview here.


  • • have at least 10 FSA until the end of the year
  • • have 2 new Reps in the north of Germany
  • • get WoMoz started (this is a difficult task, let’s see)
  • • finish the visual identity (logo) until end of Q2
  • • have at least 5 events in cities, where we never did events before
  • • Mozilla Day / Weekend
  • • define onboarding process
  • • better format for the weekly meeting


All German Mozilla sites are currently hosted by Kadir. Since Kadir doesn’t have enough time to support them, the goal is to move them to Community IT. This was agreen upon at the community meetup. You can find the relevant bug here.


  • • transfer all sites
  • • refresh the content

All these plans and goals are summarized in our Trello board. All German speaking community members can self-assign a task and work on it. With this board we want to track and work on all our plans.

(Photo: Hagen Halbach)

After that we discussed what features should be on the website. In general, all the content will be updated.

  • • product and project overview
  • • landing page for the community tile
  • • list of events
  • • Download-Button
  • • link to “contribute”
  • • link to the mailing list (no support!)
  • • link to the newsletter
  • • Planet
  • • Social Media
  • • prominent link to SUMO for help
  • • link to the dictionaries

(Photo: Hagen Halbach)

At the end we talked about our weekly meeting and drafted a proposal how to make it more efficient. The following changes will be done once everything is clear (we’re discussing this on the mailing list). Until then everything stays the same.

  • • biweekly instead of weekly
  • • Vidyo instead of IRC
  • • document everything on the Etherpad so everybody can join without Vidyo (Workflow: Etherpad -> Meeting -> Etherpad)
  • • the final meeting notes will be copied to the Wiki from the Etherpad

Feedback / Lessions learned

  • • planning long-term before events makes sense
  • • the office is a good location for these kind of meetups, but not for bigger ones
  • • there is never enough time to discuss everything together, so individual breakouts are necessary

I’d like to thank all attendees who participated in very informative and constructive discussions during the weekend. I think that we have a lot to do in 2015. If we can save the motivation from this meetup and work on our defined plans and goals, we’ll have a very successful year. You can find all pictures of the meetup on flickr.

India Open Data Summit, 2015

ODSummit1Open Knowledge India, with support from the National Council of Education Bengal and the Open Knowledge micro grants, organised the India Open Data Summit on February, 28. It was the first ever Data Summit of this kind held in India and was attended by Open Data enthusiasts from all over India. The event was held at Indumati Sabhagriha, Jadavpur University. Talks and workshops were held throughout the day. The event succeeded in living up to its promise of being a melting point of ideas.

The attendee list included people from all walks of life. Students, teachers, educationists, environmentalists, scientists, government officials, people’s representatives, lawyers, people from the tinseltown — everyone was welcomed with open arms to the event. The Chief Guests included the young and talented movie director Bidula Bhattacharjee, a prominent lawyer from the Kolkata High Court Aninda Chatterjee, educationist Bijan Sarkar and an important political activist Rajib Ghoshal. Each one of them added value to the event, making it into a free flow of ideas. The major speakers from the side of Open Knowledge India included Subhajit Ganguly, Priyanka Sen and Supriya Sen. Praloy Halder, who has been working for the restoration of the Sunderbans Delta, also attended the event. Environment data is a key aspect of the conservation movement in the Sunderbans and it requires special attention.

ODSummit2The talks revolved around Open Science, Open Education, Open Data and Open GLAM. Thinking local and going global was the theme from which the discourse followed. Everything was discussed from an Indian perspective, as many of the challenges faced by India are unique to this part of the world. There were discussions on how the Open Education Project, run by Open Knowledge India, can complement the government’s efforts to bring the light of education to everyone. The push was to build up a platform that would offer the Power of Choice to the children in matters of educational content. More and more use of Open Data platforms like the CKAN was also discussed. Open governance not only at the national level, but even at the level of local governments, was something that was discussed with seriousness. Everyone agreed that in order to reduce corruption, open governance is the way to go. Encouraging the common man to participate in the process of open governance is another key point that was stressed upon. India is the largest democracy in the world and this democracy is very complex too.Greater use of the power of the crowd in matters of governance can help the democracy a long way by uprooting corruption from the very core.

ODSummit3Opening up research data of all kinds was another point that was discussed. India has recently passed legislature ensuring that all government funded research results will be in the open. A workshop was held to educate researchers about the existing ways of disseminating research results. Further enquiries were made into finding newer and better ways of doing this. Every researcher, who had gathered, resolved to enrich the spirit of Open Science and Open Research. Overall, the India Open Data Summit, 2015 was a grand success in bringing likeminded individuals together and in giving them a shared platform, where they can join hands to empower themselves. The first major Open Data Summit in India ended with the promise of keeping the ball rolling. Hopefully, in near future we will see many more such events all over India.

Across the Atlantic: Journalism++ opens its first chapter outside of Europe

Bildschirmfoto 2015-02-27 um 16.55.22

Journalism++, the data journalism agency, opens its first chapter outside of Europe: Jornalismo++ São Paulo, also the first data journalism agency in Brazil. The Brazilian office will strenghthen current data journalism teams and lead projects of data-storytelling for news media organisations in the region, adding up to J++’s portfolio of award winning projects such as Datawrapper, and Broken Promises.

Brazilian newsrooms are catching up to the data journalism revolution, although most of them still don’t have the resources to hire professionals from different backgrounds, such as Computer and Data Science, Design and Social Network Analysis, to lead data-driven investigations. Jornalismo++ São Paulo is an effort to fill this gap with a handpicked team of experts with an extensive experience in major Brazilian newsrooms and data journalism projects. “We want to bring data journalism to Brazil, helping newsrooms that want to do good journalism with data, but don’t have the manpower to do it in the short term”, says Marco Túlio Pires, journalist and programmer, one of the founders of the chapter in São Paulo.

Besides Marco Túlio Pires, who also coordinates School of Data Brazil, the team in São Paulo is lead by four other professionals: Juan Torres, editor of city’s desk at the Correio newspaper, the biggest in Salvador; Natália Mazotte, teacher assistant at the Knight Center for Journalism in the Americas and also School of Data Brazil coordinator; Tiago Mali, Training Director at Brazil’s Association for Investigative Journalism; and Thomaz Rezende, who worked as a programmer and designer for VEJA Magazine.

The name of the agency is a pun between a common operator in programming languages and journalism itself. “The operator ‘++’ means ‘plus one’ to a certain numeric variable. In other words, we want Jornalismo++ to go beyond traditional journalism, even beyond what’s already on the web. In our work, we increment journalism with skills from other areas, such as Computer Science, Design and Data Analysis”, explains Natália.

Jornalismo++ São Paulo will also maintain a blog about Data Journalism with the latest updates in the field for a Portuguese-speaking audience. For more information about J++ São Paulo visit their website:

Open Data in the Philippines: Best practices from disaster relief and transportation mapping

While attending Geeks on a Beach last month, we also spent some time in Manila to visit a few labs and agencies, and had several discussions on the state of open-data in the Philippines.

Quick reminder: open-data is a recent trend for government, companies and institutions to release their datasets freely, so that users, developers, citizens or consumers can make use of it and create new services (check FixmyStreet for a “citizen 2.0″ stint or FlyonTime for a more commercial approach).

The Philippines, a 100m population country we have been exploring, hosts quite a few very good applications of open-data, and they also have a strong support from the government side to do so. Here’s some of their creations, with the explanations of Ivory Ong, Outreach Lead of Open Data for the Department of Budget and Management of the government of the Philippines.

Key milestones of the open-data in the Philippines

The major milestone for the open-data in the Philippines was the official launch of in January 16, 2014 after a 6-month development period. “We have had 500,000 page views as of June this year. We published 650 datasets at the time and had infographics (static data visualizations and interactive dashboards) already which was the unique selling point of our data portal. We were able to push out an additional 150 datasets by May 5, 2014″, says Ivory.


The team also lead two government-organized hackathons: #KabantayNgBayan (on budget transparency) and Readysaster (on disaster preparedness) to build awareness on the use and benefits of open government data. Another milestone is having a Data Skills Training for civil society organizations, media, and government to build capacity with data scraping, cleaning, and visualizing.


“Back in June, we likewise conducted our first Open Data training at a city level (Butuan City, Agusan del Norte) where local civil society organizations and local government units created offline visualizations from data disclosed by the Department of Interior and Local Government (DILG) via the Full Disclosure Policy Portal“, she adds.

Mapping the transports in Metro Manila with students embedded on all routes

While talking with Ivory and Levi Tan Ong, one of the co-founders of By Implication, a digital agency, I’ve heard about a quite funny story.

Just as in so many emerging markets, the transportation system is organically grown. Except for the MRT or subway systems where an official map helps to navigate the city, most routes by local bus (dubbed jeepneys in the Philippines, but one can think of Nairobi’s matutu as well) are unwritten. People just know them, stations are all over the road and nowhere at the same time.


So the Department of Transport launched two initiatives to solve the issue. First, by putting students with GPS plotting software in all the jeepneys and local buses to map their actual routes, and then by releasing the data to have the communities of developers build an app for that. “From the little that I know, this was done because Department of Transport and Communication and its attached agencies have clashing statistics on the exact number of routes”, adds Ivory.


Creative agency By Implication then won the Open Community Award at the Philippines Transit App Challenge, with, an app which helps you to know which combination of transportation to use to go from A to B… quite convenient for the foreigner I am in the gigantic Metro Manila area! The app is recording about 50 000 requests per month since inception, and if there’s still some glitches on the data, it’s the first real online map and direction service for Manila.

Where is the Foreign Aid for disaster going? Open Reconstruction will tell

The same agency is also behind Open Reconstruction, an open-data platform which tracks where theaid money after typhoon Yolanda hit the archipelago in November 2013.



It’s not just a storytelling of where funds are allocated, as Levi says: “Several towns asked for money to rebuild infrastructure and housing, but at that time, it was a long process in 5 steps at least to get funding, and all was in paper. So what we provide is a digitalisation of the aid process. First, by streamlining the process of applying for money and making all steps digital, traceable, and in a second step, by releasing this data to the public to increase transparency of the overall aid effort”.



The connection between the agency’s work and the government open-data team seems to work on the topic of foreign aid. Ivory adds that “Context at the time was that there were a lot of news releases saying that humanitarian aid was coming in specifically for Yolanda. There were assumptions that government agencies might be getting funds yet are not using it for its intended purpose. When we finally launched the site and finished the scoping of the information-goods-cash flow [see infographic from the FAITH site below], we found out that only a small portion went to government anda vast majority went to multilateral agencies such as the UN and the Philippine Red Cross. Public demand died down because of it”.


Open Reconstruction is the other half of what the open-data team wanted FAiTH data to be connected to: how the money was spent and if it was used for the intended purpose. It gives anyone, by bringing data to light, a chance to be a watchdog to hold government to account.

What’s next for open-data in the Philippines? Training, training, training

In just a few months, the open-data community did hit quite a few convincing milestones, both with government support and the involvement of the community of developers. There’s still a lot to do, as Ivory tells us, because as in any digitalisation, training, change management and making sure the administration and the public understand and accept this new policy is key.

“I guess this goes back to our first time to run the training to create offline data visualizations back in June. Local government unit representatives who were intimately familiar with local budget data had an easier time to create visualizations and explain it. After the crash course training for free online tools they can use, we went into a workshop proper where they select PDF files from the Full Disclosure Policy Portal (based on the city/municipality they lived in) and proceeded to discuss with their groupmates on how best to visualize it using colored paper and pentel pens.

These actors at a local level are important since they serve as potential information intermediaries who can communicate data into digestible stories that citizens can relate to based on their needs. Citizens who reside in remote or rural areas and are not familiar with government jargon/processes can be informed and empowered if intermediaries exist.

From our initial experience, I think I can propose 4 important must-have skills for intermediaries:

  • technological capacity (i.e. use of ICT) to clean/structure/visualize data
  • good understanding of government vocabulary and process (for data analysis and interpretation)
  • deep knowledge of local / community needs and priorities
  • communication skills, particularly storytelling with data

The last skill is important because stories are easier to understand versus listening to technical jargon. Filipinos are very much into knowing hat’s what in the lives of family, friends, celebrities, and politicians. Stories trump statistics in this case so learning how to narrate what dataset/s mean can be more useful. If Open Data is to make an impact in the lives of citizens, it must be in a language that is relatable and understandable”

Written by Martin Pasquier from Innovation Is Everywhere

7 Predictions for “Open Data” in 2015

What’s going to happen to the “open data” movement in 2015?  Here are Dennis D. McDonald‘s predictions:7predictionsOD2015

  1. Some high profile open data web sites are going to die. At some sites the lack of updates and lack of use will catch up with them.  Others will see highly publicized discussions of errors and omissions.  For some in the industry this will be black eye.  For others it will be an “I told you so” moment causing great soul-searching and a re-emphasis on the need for effective program planning.
  2. Greater attention paid to cost, governance, and sustainability. In parallel with the above there will be more attention paid to open data costs, governance, and program sustainability.  Partly this will be in response to the issues raised in (1) and partly because the “movement” is maturing.  As people move beyond the low-hanging-fruit and cherry-picking stage they will be giving more thought to what it takes to manage an open data program effectively.
  3. Greater emphasis on standards, open source, and APIs. This is another aspect of the natural evolution of the movement. Much of the open data movement has relied on “bottom up” innovation and the enthusiasm of a developer community accustomed to operating on the periphery of the tech establishment. Some of this is generational as younger developers move into positions of authority. Some is due to the ease with which data and tools can be obtained and combined by individuals and groups working remotely and collaborating via systems like GitHub.
  4. More focus on economic impacts of open data in developed and developing countries alike. While many open data programs have been justified on the basis of laudable goals such as “transparency” and “civic engagement,” sponsors will inevitably ask questions about “impact” as update costs begin to roll in.  Some of the most important questions are also the simplest to ask but the hardest to answer, such as, “Are the people we hoped would use the data actually using the data?” and “Is using the data doing any good?”
  5. More blurring of the distinctions between public sector and private sector data. One of the basic ideas behind making government data “open” is to allow the public and entrepreneurs to use and combine public data with other data in new and useful ways. It is inevitable that private sector data will come into the mix. When public and private data are combined some interesting intellectual property, ownership, and pricing questions will be raised. Managers must be ready to address questions such as, “Why should I have to pay for a product that contains data I paid to collect via my tax dollars?”
  6. Inclusion of open data features in mainstream ERP, database, middleware, and CRM products. Just as vendors have incorporated social networking and collaboration features with older products, so too will open data features be added to mainstream enterprise products to enable access via file downloads, visualization, and documented APIs. Such features will be justified by the extra utility and engagement they support. Some vendors will incorporate monetization features to make it easier to track and charge for data the new tools expose.
  7. Continued challenges to open data ROI and impact measurement. As those experienced with usage metrics will tell you it’s not just usage that’s important it’s the impact of usage that really counts. In the coming year this focus on open data impact measurement will continue to grow. I take that as a good sign.  I also predict that open data impact measurement will continue to be a challenge.  Just as in the web site world it’s easier to measure pageviews than measure the impacts of the information communicated via the pageviews, so too will it continue to be easier to measure data file downloads and API calls than the impacts the use of the data thus obtained will have.

By Dennis D. McDonald, Ph.D.

Exploring Open Science n°4: DNAdigest interviews Nowomics

This week I would like to introduce you to Richard Smith, founder and software developer of Nowomics. He kindly agreed to answer some questions for our post blog series and here it is – first hand information on Nowomics. Keep reading to find out more about this company.


Richard Smith, founder and software developer of Nowomics

1. Could you please give us a short introduction to Nowomics (goals, interests, mission)?

Nowomics is a free website to help life scientists keep up with the latest papers and data relevant to their research. It lets researchers ‘follow’ genes and keywords to build their own news feed of what’s new and popular in their field. The aim is to help scientists discover the most useful information and avoid missing important journal articles, but without spending a lot of their time searching websites.

2. What makes Nowomics unique?

Nowomics tracks new papers, but also other sources of curated biological annotation and experimental data. It can tell you if a gene you work on has new annotation added or has been linked to a disease in a recent study. The aim is to build knowledge of these biological relationships into the software to help scientists navigate and discover information, rather than recommending papers simply by text similarity.

3. When did you realise that a tool such as Nowomics would be of a great help to the genomic research community?

I’ve been building websites and databases for biologists for a long time and have heard from many scientists how hard it is to keep up with the flood of new information. There are around 20,000 biomedical journal articles published every week and hundreds of sources of data online, receiving lots of emails with lists of paper titles isn’t a great solution. In social media interactive news feeds that adapt to an individual are now commonly used as an excellent way to consume large amounts of new information, I wanted to apply these principles to tracking biology research.

4. Which part of developing the tool you found most challenging?

As with a lot of software, making sure Nowomics is as useful as possible to users has been the hardest part. It’s quite straightforward to identify a problem and build some software, but making sure the two are correctly aligned to provide maximum value to users has been the difficult part. It has meant trying many things, demonstrating ideas and listening to a lot of feedback. Handling large amounts of data and writing text mining software to identify thousands of biological terms is simple by comparison!

5. What are your plans for the future of Nowomics? Are you working on adding new features/apps?

There are lots of new features planned. Currently Nowomics focuses on genes/proteins and selected organisms. We’ll soon make this much broader, so scientists will be able to follow diseases, pathways, species, processes and many other keywords. We’re working on how these terms can be combined together for fine grained control of what appears in news feeds. It’s also important to make sharing with colleagues and recommending research extremely simple.

6. Can you think of examples of how Nowomics supports data access and knowledge dissemination within the genomics community?

The first step to sharing data sets and accessing research is for the right people to know they exist. This is exactly what Nowomics was set up to achieve, to benefit both scientists who need to be alerted to useful information and for those generating or funding research to reach the best possible audience. Hopefully Nowomics will also alert people to relevant shared genomics data in future.

7. What does ethical data sharing mean to you?

For data that can advance scientific and medical research the most ethical thing to do is to share it with other researchers to help make progress. This is especially true for data resulting from publicly funded research. However, with medical and genomics data the issues of confidentiality and privacy must take priority, and individuals must be aware what their information may be used for.

8. What are the most important things that you think should be done in the field of genetic data sharing?

The challenge is to find a way to unlock the huge potential of sharing genomics data for analysis while respecting the very real privacy concerns. A platform that enables sharing in a secure, controlled manner which preserves privacy and anonymity seems essential, I’m very interested in what DNADigest are doing in this regard.

Bildschirmfoto vom 2015-01-12 15:45:52

Exploring Open Science n°2: DNAdigest interviews SolveBio

DNAdigest continues with the series of interviews. Here we would like to introduce you to Mr Mark Kaganovich, CEO of SolveBio, who agreed on an interview with us. He shared a lot about what SolveBio does and discussed with us the importance of genomic data sharing.


Mark Kaganovich, CEO of SolveBio

1) Could you describe what SolveBio does?

SolveBio delivers the critical reference data used by hospitals and companies to run genomic applications. These applications use SolveBio’s data to predict the effects of slight DNA variants on a person’s health. SolveBio has designed a secure platform for the robust delivery of complex reference datasets. We make the data easy to access so that our customers can focus on building clinical grade molecular diagnostics applications, faster.

2) How did you come up with the idea of building a system that integrates genomic reference data into diagnostic and research applications? And what was the crucial moment when you realised the importance of creating it?

As a graduate student I spent a lot of time parsing, re-formatting, and integrating data just to answer some basic questions in genomics. At the same time (this was about two years ago) it was becoming clear that genomics was going to be an important industry with a yet unsolved IT component. David Caplan (SolveBio’s CTO) and I started hacking away at ways to simplify genome analysis in the anticipation that interpreting DNA would be a significant problem in both research and the clinic. One thing we noticed was that there were no companies or services out there to help out guys like us – people that were programming with genomic data. There were a few attempts at kludgy interfaces for bioinformatics and a number of people were trying to solve the read mapping computing infrastructure problem, but there were no “developer tools” for integrating genomic data. In part, that was because a couple years ago there wasn’t that much data out there, so parsing, formatting, cleaning, indexing, updating, and integrating data wasn’t as big of a problem as it is now (or will be in a few years). We set out to build an API to the world’s genomic data so that other programmers could build amazing applications with the data without having to repeat painful meaningless tasks.

As we started talking to people about our API we realized how valuable a genomic data service is for the clinic. Genomics is no longer solely an academic problem. When we started talking to hospitals and commercial diagnostic labs, that’s when we realized that this is a crucial problem. That’s also when we realized that an API to public data is just the tip of the iceberg. Access to clinical genomic information that can be used as reference data is the key to interpreting DNA as a clinical metric.

3) After the molecular technology revolution made it possible for us to collect large amounts of precise medical data at low cost, another problem appeared to take over. How do you see the solution of the problem that the data are not in a language doctors can understand?

The molecular technology revolution will make it possible to move from “Intuitive Medicine” to “Precision Medicine”, in the language of Clay Christensen and colleagues in “Innovator’s Prescription”. Molecular markers are much closer to being unique fingerprints of the individual than whatever can be expressed by the English language in a doctor’s note. If these markers can be conclusively associated with diagnosis and treatment, medicine will be an order of magnitude better, faster, cheaper than it is now. Doctors can’t possibly be expected to read the three billion base pairs or so that make up the genome of every patient and recall which diagnosis and treatment is the best fit in light of the genetic information. This is where the digital revolution – i.e. computing – comes in. Aggregating silo’ed data while maintaining the privacy of the patients using bleeding edge software will allow doctors to use clinical genomic data to better medicine.

4) What are your plans for the future of SolveBio? Are you working on developing more tools/apps?

Our goal is to be the data delivery system for genomic medicine. We’ve built the tools necessary to integrate data into a genomic medical application, such as a diagnostic tool or variant annotator. We are now building some of these applications to make life easier for people running genetic tests.

5) Do you recognise the problem of limited sharing of genomics data for research and diagnosis? Can you think of an example of how the work of SolveBio supports data access and knowledge sharing within the genomics community?

The information we can glean from DNA sequence is only as good as the reference data that is used for research and diagnostic applications. We are particularly interested in genomics data from the perspective of how linking data from different sources creates the best possible reference for clinical genomics. This is, in a way, a data sharing problem.

I would add though that a huge disincentive to distributing data is the privacy, security, liability, and branding concern that clinical and commercial outfits are right to take into account. As a result, we are especially tailoring our platform to address those concerns.

However, even the data that is currently being “shared” openly, largely as a product of the taxpayer funded academic community, is very difficult and costly to access. Open data isn’t free. It involves building and maintaining substantial infrastructure to make sure the data is up-to-date and to verify quality. SolveBio solves that problem. Developers building DNA interpretation tools no longer have to worry about setting up their data infrastructure. They can integrate data with a few lines of code through SolveBio.

6) Which is the most important thing that should be done in the field of genetic data sharing and what does ethical data sharing mean to you?

Ethical data sharing means keeping patient data private and secure. If data is used for research or diagnostic purposes and needs to be transferred among doctors, scientists, or engineers then privacy and security is a key concern. Without privacy and security controls genomic data will never benefit from the aggregate knowledge of programmers and clinicians because patients will be rightly opposed to measuring, let alone distributing, their genomic information. Patient data belongs to the patient. Sometimes clinicians and researchers forget that. I definitely think the single most important thing to get right is the data privacy and security standard. The entire field depends upon it.