Data, Privacy, & Trust in Open Source: 10 Lessons from Wikipedia

Accepted Session
Short Form
Scheduled: Thursday, June 26, 2014 from 2:30 – 3:15pm in B304


Few people today are not concerned with the way data is used to enhance or subvert individual privacy. This is especially true on the Web, where open source technologies are behind much of what we interact with and use on a daily basis. As the most fundamental aspects of our lives become networked -- social relationships, work, finance, and even how we get our food -- how can we make sure that open source technologies foster a sense of trust with users, protect their privacy, and still give data scientists the tools they need to gain insight?


Wikipedia is a classic example of “big data” and a free software community, building the MediaWiki engine that powers many open, collaborative communities. With nearly 500 million visitors a month and 75,000+ active contributors to the site, it’s a treasure trove of data that’s interesting to researchers across a large number of fields. It also is one of the websites specifically called out in leaked NSA documentation, highlighting how Internet activity is of key interest to those engaged in surveillance of the public. Privacy and analytics systems built in to Wikipedia impact a huge swath of the world, including for those who reuse its software.

Given this state of affairs, how do the engineers, designers, product managers and data scientists responsible for shepherding this system protect individual privacy and maintain the trust of users? With Wikipedia as an example, we will derive 10 lessons that might be applied to any FOSS community when it comes to data, privacy and trust.


privacy, data, community, Wikis, data science, analytics

Speaking experience

I have spoken at Open Source Bridge 2010, Ignite Portland, WordCamps, Wikimania (Wikipedia's global meetup), and more.