Last week I wrote about the need for transparency in inequalities research - how hidden research both reduces the truthfulness of its claims, and how it works against the collective nature of social science. This week I want to finish off my argument, and deal with the objections to transparent social science, and in particular to respond to Kat Smith’s wonderful warning. As she wrote on opendemocracy, when it comes to tobacco you can argue that ‘freedom of information reduces transparency’ rather than increases it.
Despite this, I think that all inequalities researchers should be as transparent as possible, and that the users of this research should demand transparency. Here’s why.
Practical objections to transparency
1. Why would it help anyone to be transparent?
The most common argument against transparency is that researchers have to put a huge amount of effort into collecting data – why would they bother doing this if they then had to share it with everyone else? (This is the response that Firebaugh found when floating compulsory transparency for the American Sociological Review in the 80s, and Abbott agrees – in the SM&R debates).
But transparency doesn’t mean removing the incentives to collect data – data collectors always have the first publication(s) from their data, so the headlines are theirs. Moreover, sharing data may even make your research more high-impact. Gary King has argued that articles that share data are cited twice as often as those that don’t. Andrew Abbott‘s SM&R piece points out that this claim is over-stated – using transparently available data! – and that the main drivers of citations are wider than data sharing. Still, there is a plausible argument that sharing data increases your public profile, gets your name known to lots of people, and has a greater number of people paying close attention to your research as a starting point for their own research.
I would go further than this though. Fundamentally, I think this argument is selfishness, pure and simple. And while selfishness is understandable for individuals, I can see no reason why we (as a scientific community) would tolerate it. Social science is a collective enterprise, which depends on people working together rather than the work of heroic individuals. It’s also (primarily) publicly-funded, the justification for which is that it helps society. This is why, for example, the main social science research council in the UK, the ESRC, requires any data collection it funds to be shared ‘to the maximum extent possible’. To me, reaping the rewards of data is a non-argument.
2. Where do find the time to be transparent?
The other main complaint is that sharing data and code is time-consuming (see Firebaugh). To which I can only say – yes…
…BUT…
…if you follow good-practice principles in your research, you can have data + code ready to share and actually save time! Scott Long‘s wonderful book sets out the principles for good data management, and personally I think that any analyses that you can’t follow if you come back to in two years shouldn’t be published anyway.
So rather than extra effort, it’s an incentive for us to do our analyses properly in the first place.
3. Do I have permission to be transparent?
One caveat with transparency is that sometimes we don’t have permission to be transparent. I primarily do analyses using other people’s data – and so it’s their decision to share the data, rather than mine. So for example, I use the influential Whitehall II cohort of civil servants in two of my PhD chapters, but anyone wanting to replicate this will have to go to a committee that guards access to the data. And sometimes people didn’t ask research participants if they could share data at the time. So there are limits to what we can do here.
A related point is whether it’s possible to share data/code and maintain confidentiality to research participants. Huge amounts have been written about this, and the main consensus seems to be that most data can be shared, but it’s sensible to restrict access to trustworthy individuals (in the UK, this basically means anyone working at a university), and to get people to agree to terms and conditions in using the data – on pain of losing their right to do so in future. For some data though (e.g. date of birth, local area identifiers), access probably needs to be restricted yet further – so for example, some analyses I’m doing on British Social Attitudes probably won’t be replicable unless other researchers approach the data holders, NatCen. Still, I can make the process of obtaining data as transparent as possible to make replication easier.
Kat Smith and Heather Lanthorn mentioned the need to look at qualitative research in a comment on last week’s post, and I should mention it here. I think all of the same principles apply to qualitative research (the UK Data Archive has a ‘Qualidata’ section)- it’s just that the confidentiality issues are harder. Personally I ask all qualitative interview participants for permission to share anonymised data, and I anonymise my interviews at the point of transcription, so it’s relatively straightforward to make these available when I publish them. But in some circumstances – particularly interviewing elites who are more easily identifiable (which I’ve done), or those such as drug dealers or benefit cheats who would be uncomfortable with sharing – there are inevitable limits to how far data can be shared. And this is fine.
Political objections to transparency
Finally, to Kat’s main points. I don’t agree that consent is implicitly restricted to people on the side of what we think of as ‘good’ – surely the point is that the data are being collected to find the truth, rather than to demonstrate what the people taking part want to show! So if any bona fide researcher wants to do research, then they should be able to access the data, even if they’re working for the tobacco industry, the BNP, revolutionary communists (like some of my colleagues) or whatever.
It’s a trickier issue around the political consequences of transparency. I recommend reading Kat’s comment or post - this shows how Big Tobacco use transparency against the public interest, by (i) creating endless hassles for researchers they don’t like through freedom of information requests; and (ii) re-analysing analyses they don’t like until they get the result they want, and using these to create doubt in public debate. At the same time, they refuse to share any of their own data – and Kat argues this creates an unfair power imbalance.
While this a critical point to raise, I don’t agree that the solution is to keep research secret [ADDED: not that this is what Kat is suggesting! See her comment underneath this post]:
- Re time-wasting – this is only a problem because researchers aren’t transparent to begin with, and hidden research conflicts with freedom of information requirements. If we get in the habit of being transparent, then there’s nothing for tobacco companies or other lobby groups to request.
- Re manufacturing doubt – this reflects a wider problem about the relationship between commercial interests and science. If we don’t have the right institutions in place to try and get truth into public debate, then we get a mess of claim and counter-claim, all driven by opposing political interests. The right response is to say, ‘these people have no place in scientific debate’. If we get this wrong, then even without transparency we have what I’ve elsewhere called the ‘evidence game’ – all the rhetoric of truth, with none of the substance.
This is a short answer to a large question. But I think it doesn’t get us anywhere to sabotage science, as a defensive reaction to aggressive corporate interests. By doing this, we undermine the credibility of science in the public domain (as the ‘Climategate’ scandal showed) – and it is this credibility we need if we are to fight off the attempts to distort science. There’s nothing wrong with making (anonymised) data available to tobacco industry-paid researchers. The problem is in listening to their results.
Conclusions…
For all these objections, transparency can and does work. The American Economic Review from 2004 stated that they “publish papers only if the data used in the analysis are clearly and precisely documented and are readily available to any researcher for purposes of replication” (taken from Freese), and they have since been followed by Econometrica, Journal of Political Economy, and Review of Economic Studies. The Center for Global Development are managing it.
I will try to prompt every organisation and colleague I come into contact with to do the same. And whether you’re a researcher or a user of researcher, I think you should expect transparency too – because the objections don’t stand up (unless confidentiality is an issue); because it undermines the collective nature of science; and because it’s simply less credible without it.
The famous political scientist Gary King argued recently in Science that “when we teach we should explain that data sharing and replication is an integral part of the scientific process. Students need to understand that one of the biggest contributions they or anyone is likely to be able to make is through data sharing”. I agree, and I hope you’ll follow.
