Rebecca Weiss, Head of Data Science at Mozilla, is deep in the data and the decisions Mozilla makes about it. Her team’s role in a nutshell: find value in data. We talked about how privacy, openness and data collection can co-exist so that data is both useful to business and respectful of people’s privacy.
~ ~ ~ ~
How has data science at Mozilla evolved in the six years you’ve been here?
We didn’t have a data science team when I started. We were sort of against the thought of it given what it was associated with because it seemed like the only reason to have “data science” team was for nefarious data mining purposes, and we didn’t want to be responsible for a telemetry platform that enabled data mining. We really wanted to avoid individual-level observation as much as possible. This came from a place of good intent; we simply didn’t want to blindly trust ourselves to “do the right thing” with data without any obvious oversight. One of the biggest changes over the time I’ve been here is how Mozilla has adapted to a world where data collection is ubiquitous in the industry. To be honest, we still don’t trust ourselves, but we acknowledge that concern head on and try to practice openness by default. You can see all the data we collect on you, and we’re trying to work on ways to make it easier to see how we use this data.
The Firefox Public Data Report was just released. What can you tell me about that project?
The idea of the Public Data Report is that it’s a topline view of the way people are using Firefox to browse the web. It gives people a chance to see the web the way we see it, as a browser only can. We are starting from this idea that people want to know more about the state of the web, and the browser should be a fundamental part of the conversation because the browser sees everything.
The Public Data Report is also a transparency initiative. There is a base level of data collection that’s just going to happen forever. It’s not going anywhere. We are a company that builds a browser that has privacy as one of its core values, but you should know what that means. It does not mean that we do not collect data. We do, and we enable user control over our data collection. Some of our data collection is opt-out, but you always have controls in place to make the final decision about whether we should have your data or not. That’s no joke. We take that seriously. It’s a big part of our conversation.
What kind of tension is there between being an open organization and standing for privacy at the same time, especially with regards to data collection?
At Mozilla I also have this role where I’m the “lead data steward.” Data review is this process where we document responses to questions about what and and why we’re proposing new data collection in Firefox. A data steward produces an assessment as to whether or not a proposed form of data collection or a request to collect some kind of data is supported by the privacy policy or not, because ultimately this is a very gray area.
So when you talk about the tension, it is often really about these diametrically opposed belief systems that are concerned with control of information, privacy and openness. Everyone naively says these two things are diametrically opposed, but in my experience that is not true because I can collect data, but I have data review, which forces me to commit to a statement in public about what I am collecting and why (you can find data reviews in any proposed measurement bug thread), in order to make sure that it’s aligned with our vision of privacy. And so data review is our effort to live up to openness, while also trying to keep awareness of privacy as a top-level concern. We’re protecting your privacy because we’re showing you what we are measuring and why.
How do we reach equilibrium?
We are living in a world where data collection is inevitable. It’s proven to be too valuable. People find utility in it. It’s not going away. So as a society if we care about privacy, the question is how do you balance this with privacy? There is a path through here that we have to navigate ourselves, and I don’t think our belief systems are going to help us much. It’s going to be case by case decision making and mapping it out.
What do we do at Firefox that sets us apart?
One of the things that I think is good about the way Mozilla has done things data-wise is that we’re thrifty in the way we think about data. We have to be. We can’t afford it.
Facebook gets data from you because they track you everywhere and that passive background observation has led to the richest market research data set that they own 100%. They’ve hired some of the best analysts in the world to use that data and they sell ads based off that data. It’s a brilliant demonstration of how to make money with data science, if you want to put aside the morality for a second. The same thing is true of Google. They are data science companies when it comes down to it. The fact that they produce software is a reflection of the time and place in which the technology industry was at that moment. But as the zeitgeist shifts more and more to this cultural conversation about data collection and information communication technologies and the role we expect of that sector as a society, we’re going to start seeing more and more of these companies being seen as data science companies rather than just software.
The difference between us and a lot of these other companies, and where I think we are right now is that because we are thrifty, because we can’t afford to be a data science company like that — they make billions of dollars, we do not — we have to be more careful about the questions that we ask. We cannot answer every question. We have to think about what we can do as an analytical organization with a thrifty budget mindset. The way that we use privacy to inform that thrift has two benefits.
First, the consumer benefits because we ask these questions. And just the act of asking questions early in the process, of saying something like hey should we even be doing this? Or does this violate user expectation or trust? Just asking those questions out loud, earlier, has downstream positive privacy applications for the user. And second, it also encourages this sense of thrift in us because if you can’t defend against that level of questioning, you probably shouldn’t be going down that path in the first place.
Most companies, in my experience, don’t ask those questions until much much later. And yes, it’s a compliance thing, but they really treat it as a thing on a checklist rather than fundamental to the early stages of planning. That is a good thing that we’re doing. It’s a good habit that we have, and it’s a good thing that more companies should develop.
You spend a lot of time with data and technology. How do you disconnect?
I really only do two things. I either go into some extravagant attempt to try to get into nature as quickly as I can. A few weeks ago I took a boat and went inter-island sailing in Hawaii. The alternative is that I play a ridiculous amount of video games.
What’s your favorite video game?
No. I can’t do that. I can’t say favorite because it changes with time!
OK, top three.
I can tell you what I’ve recently played. Breath of the Wild. And Horizon Zero Dawn. And recently I’ve been playing a lot of Heroes of the Storm, HOTS, which is a MOBA. To be honest, I had to stop playing video games for a bit because I was trying to finish my dissertation, and so I needed to stop wasting time. But that’s done now, so I’m back to looking at all the games coming out.
Have you gotten caught up in the Fortnite trend?
So, I played PUBG… But when Fortnite came out, I was deep into dissertation mode, and it was cartoony, so I was eh. I missed the whole phenomenon. I’m also terrible at first person shooters. I would play that game, but only for the social aspect.
What’s a typical breakfast?
Black coffee.
Cats or dogs?
That’s like Sophie’s Choice. They’re both fuzzy and so I like them both equally.
Android or iOS?
Resentfully Android.
Where do you get your news?
A lot of word of mouth. A lot of organic discovery via search. Also a really embarrassingly large amount of Reddit. I don’t use Twitter. I’ve never had a handle. I have a Facebook profile, but I don’t use it. OK, I don’t use Facebook because I enabled two-factor authentication, and on that second stage, I’m always like, wait I don’t want to login to Facebook. Everyone likes to think I have this huge crusade about privacy, but no, I just don’t like it.
What’s the last internet find you shared with someone?
Some kind of meme or a thread on Reddit.
When it comes to GIFs, hard or soft G?
Listen. It’s not a unit of time, and it’s not peanut butter. It’s [hard-G] GIF, and I don’t care what other people say. It’s GRAPHIC.
What’s something about yourself that people would be surprised to know?
People who know me really well will roll their eyes, but I have evidence that this surprises people. I am half Filipino. Most people don’t realize that. They don’t realize I’m half anything. Most people don’t know much about Filipinos, in my experience. It’s not a culture that a lot of people have much exposure to or understanding of, even though there are Filipinos everywhere.
~ ~ ~ ~
Ed note: You might be wondering about the image we selected for this article. Rebecca is a “red lanyard person,” which at Mozilla means she prefers not to publish photos of herself online.