Editors who are new to health-related content on Wikipedia are often surprised when their edits are reverted with the rationale of "Fails WP:MEDRS", a shorthand reference to Wikipedia's guideline about sources considered reliable for health-related content. This essay explains why these standards exist.
Summary of the long content below
- The guideline, Wikipedia:Identifying reliable sources (medicine) (MEDRS) is not different from other guidelines, because all Wikipedia content should be generated by summarizing high quality sources (independent, secondary sources written by experts in the field, published by respected publishers). MEDRS (MEDical Reliable Source) explains how to find such sources for biomedical or health-related content. It can be treated as an extension and help page based on the already existing policies.[a] Wikipedia represents viewpoints in proportion to their prominence in the reliable sources, especially the secondary sources. Mainstream views, as determined by the sources, carry the most weight. Significant minority views are given less weight, and fringe views (often identified by omission from the secondary sources) carry little to none. It is very easy to engage in original research by cherry-picking primary sources. Editors should cite primary sources rarely, and then only with good reason and care![b]
Why is this especially important for biomedical content in Wikipedia?
- What we understand about human health and medicine is based on the basic science of biology, and biology is complex.
- Biological processes occur on a microscopic scale, with thousands of different kinds of molecules interacting all the time. At the same time, the organism interacts with the environment and other organisms. This becomes mind-bogglingly complex at the level of the actual molecules.
- Most research is on model organisms (cultured cells, worms, flies, lab mice). (All models are wrong, but some are useful). Isolated cells in culture often behave differently than they do in whole organisms. Furthermore, there are significant differences in the genetics and physiology between animals and humans, so the results from animal studies often do not translate into humans. Finally, model systems are often genetically pure, and don't reflect the genetic diversity of the real world. Only a small portion of research is done on actual people, and even then it is done in artificial context of a clinical trial where the researchers try to change only one thing at a time. This is all part of the scientific process, and rarely produces results that transfer painlessly to the real world. (You might have heard the joke about searching for keys where the light is better. This is a good analogy to how experiments in biology are performed. Sometimes it's so dark that we can't even begin to look where we want to.) The artificiality of the studies presents a serious problem in translating results to any natural setting, much less translating them to human health or biology.
- Biology is still a young science, and even our knowledge of basic things is fragmented; our big-picture ideas change all the time. Those from physical sciences or technology sometimes have an especially hard time understanding this — biological systems can't be mapped out and expected to act rationally.
- Human biology is even harder because there are constraints as to what we can do. We can't cut up healthy living humans and examine them microscopically to see what is going on in real time (unless you were Josef Mengele). Nor is it even possible to collect data on certain processes such as surgery. To really assess quality of care it would be necessary to perform placebo surgery, which is widely considered unethical.
- Translating insights from biology into medicine (in other words, applying the basic biological research to create technology) is another level of difficulty:
- Even scientific breakthroughs in biology take many years, and a great deal of money, to turn into anything useful. Even then, many new drugs, medical devices or diagnostic tests based on the best of science fail in adequately powered clinical trials, when actually tested on humans (the "adequately powered" thing is important – that means it has been tested in enough people so that we have a good sense of whether the outcome is meaningful – if it is probably true or is a fluke – you can flip a coin 5 times and get heads every time) This uncertainty and difficulty also goes for basic research that suggests that X or Y might be toxic or bad for you. It is hard to figure out what is true in the field of toxicology; toxicologists think very carefully about things like how people are exposed to a substance, at what dose, and over what amount of time, and try to come up with useful ways to model that in lab studies. Dumping a ton of a chemical directly on cells and killing them, tells you nothing about whether skin contact will harm you, nor at what dose!
- There is much we don't know even about existing therapeutics and other treatments. For example, there was a ton of great basic research (done in cells and model organisms) that showed a connection between oxidative damage and cancer; this research suggested that taking antioxidants might prevent cancer. So the NIH funded a huge (adequately powered!) clinical trial, the Selenium and Vitamin E Cancer Prevention Trial, so we could really learn if taking Vitamin E (a great antioxidant) could actually prevent cancer in actual humans. Guess what? People taking Vitamin E got more cancer, and the trend was so clear that they had to stop the clinical trial early. Mind-blowing. Welcome to biology. Welcome to medicine.
- The primary scientific literature is very exploratory, and not reliable.[1] The use of WP:PRIMARY sources is really dangerous in the context of health.
- The primary literature is written by scientists, for scientists. It is not intended to be taken as health advice by everyday people – it is not meant to be taken out of context and applied.
- Because the work is exploratory, much of it turns out to be false leads or dead ends and is simply ignored by other scientists. The ignored articles are not marked in any way, for us to know which ones they are.[2][3][4] The replication crisis is being discussed and addressed throughout the health and biomedical sciences; it is especially acute in the field of psychology.[5][6][7]
- The mapping work — determining how the exploration is going and where we stand — is done in review articles. (Ignored articles aren't usually explicitly mentioned in reviews either — they are busy with building up accurate maps, not debunking dead ends.)
- When clinical trials (experiments done on humans) are published, those papers are also primary sources. Some clinical trials are very well done (have enough subjects and are appropriately randomized, blinded, etc.) and some are very poorly done. It is very common to find publications of small clinical trials, the results of which cannot be generalized and which often contradict each other.[8] It is not easy for nonscientists to tell which is which. So we still look for review articles to help us understand even this kind of primary source.
- A lot of people have strong opinions about health-related matters.
- It is about us and our loved ones, after all.
- People take things that happen to them very seriously, and try to generalize from them. But this is not valid scientifically (tiny sample and cognitive biases) and people too easily mistake correlation for causation.
- There is a lot of money involved in health-related matters — everything from hype around basic research to drive donations to universities or sales of newspapers/attention to TV news, to companies trying to sell treatments of all kinds. With our 24-hour news cycle and the need to keep people coming back to websites, there is huge hype around basic science that is not ready for showtime in any responsible way. This is why you see the phrase so often: "If you have questions, discuss them with your doctor." Your doctor (if they are not a quack) will almost always tell you, "We don't know yet."
The result of all of this is that the world is awash with content about health. All kinds of media holler at us every day, about "new THIS" and "shocking THAT". Very often that content is dead wrong, or dramatically overstates what we can confidently say, based on the science. And many people have strong ideas that are not based on science at all.
But as an encyclopedia, Wikipedia is committed to providing reliable information to the public. We have nothing to do with hype or eyeballs or the 24-hour news cycle. We go slow, and say what is certain (which includes saying "we don't know" or "there is insufficient evidence to say X"). All of Wikipedia stands on, and is based on, the consensus of whatever field a given article falls within. We always have to think carefully about what sources we use to generate content, and this is especially true for health-related content. For health-related content, the field is evidence-based medicine. And per WP:MEDRS – which the community created after long and arduous discussion – we reach for review articles published in the biomedical literature, or statements by major medical or scientific bodies.
Secondary sources generally
Wikipedia is an encyclopedia. It is not a newspaper (we aren't in a hurry, and we don't have to report the latest and best). It is not a journal or a book, pulling together all the primary sources into a coherent picture — that is what scientists and other scholars do in review articles in journals, and what historians do in their books.
Our mission is to express the sum of human knowledge – "accepted knowledge", in the words of WP:NOT. We are all editors. Our role is to read and understand the reliable secondary and tertiary sources, in which experts have pulled the basic research together into a coherent picture, and summarize and compile what those sources say, in clear English that any reader with a decent education can understand.[c]
In articles related to health, editors who want to cite primary sources and create extensive or strong content based on them generally fall into one of three groups.
- Sometimes they are scientists, who treat Wikipedia articles like they themselves are literature reviews and these editors want to synthesize a story from primary sources. But articles here are encyclopedia articles, which is a different genre. Each article is meant to be "a summary of accepted knowledge regarding its subject". Secondary sources are where conclusions stated in primary sources are "accepted" or not.
- Sometimes they are everyday people, who don't understand that the scientific literature is where science happens – it is where scientists talk to each other. The scientific literature is not really intended for the general public. The Internet has made it more available to the public, as has the open access movement. Both are a mixed blessing. The downside is that everyday people may take research papers out of context from the ongoing and always-developing discussion among scientists, and take individual results as some kind of gospel truth, when each paper is really just a stepping stone (sometimes a false one) as we (humanity) apply the scientific method to understanding the world around us. Non-scientists may not know that many research articles in biology turn out to be dead ends, or unreplicable, or even withdrawn. When a research paper is published, we cannot know if it will eventually turn out to be replicable and/or accepted and built on by the relevant field, and if it will not. Reviews tell us that.[d]
- Sometimes editors wanting to use primary sources are agenda-driven — there is something in the real world that is very important to them, and they want that idea expressed in WP and given strong WP:WEIGHT. In the very act of doing that — in selecting a given primary source and giving it a lot of weight (or any weight at all, actually) — they are performing original research. It is sometimes hard to get people to see this.
Wikipedia is not about what you think is important, right now, nor even about what the media is hyping today. It is about what we know, as expressed in reliable, secondary sources. (Independent ones!) It is so hard for people to differentiate what they see and what they "know" from what humanity — as expressed by experts in a given field — knows.
It is hard for people to think like scholars, with discipline, and actually listen to and be taught by reliable, independent, secondary sources instead of acting like barroom philosophers who shoot from the hip or letting media hype drive them.
NPOV depends mightily upon editors' grasp of secondary sources. We have to find good ones – recent, independent ones – and absorb them, and see what the mainstream positions are in the field, what are "significant minority opinions", and what views are just plain WP:FRINGE. We have to let the best sources teach us. And yes, it takes commitment – both in time, and to the values of Wikipedia – to really try to find the best secondary sources, access them, absorb them, and learn from them how to distribute WEIGHT in a Wikipedia article.
What makes this even more challenging is that because this is a volunteer project, Wikipedia editors often come here and stay here due to some passion. This passion is a double-edged sword. It drives engagement and the creation of content, but too often brings with it advocacy for one position or another. This is a quandary. The discipline of studying secondary sources and editing content based on those sources, in putting egos aside and letting the secondary sources speak, is the key that saves Wikipedia from our personal, limited perspectives.
- While WP:OR allows primary sources to be used, it is "only with care, because it is easy to misuse them."
- WP:NPOV says "Neutrality assigns weight to viewpoints in proportion to their prominence. However, when reputable sources contradict one another and are relatively equal in prominence, describe both approaches and work for balance. This involves describing the opposing views clearly, drawing on secondary or tertiary sources that describe the disagreement from a disinterested viewpoint."
- WP:VERIFY says, "Base articles largely on reliable secondary sources. While primary sources are appropriate in some cases, relying on them can be problematic. For more information, see the Primary, secondary, and tertiary sources section of the NOR policy, and the Misuse of primary sources section of the BLP policy."
The call to use independent secondary sources is deep in the guts of Wikipedia. This is a meta-issue — a question of what it means to be an editor on Wikipedia.
Secondary sources about health matters
Biology is difficult
Biology is difficult. It is still a young science, and our knowledge of even basic things is fragmentary, and even our big-picture ideas are changing all the time. Human biology — our understanding of what is going on inside healthy people and inside sick people — is even harder, and there are serious barriers to furthering our understanding. People in the physical sciences or technology seem to have an especially hard time understanding this.
The physical sciences have given us deep insight into material reality, and because the science there has progressed so far, we can do amazing things. For example, Moore's law is a direct result of our advances in physics and materials science and our ability to apply science — to create technology to serve us, to the point where we now have amazing things like smartphones — computers we can hold in our hands and interact with in intuitive ways, capabilities that just a couple decades ago would have taken an entire room full of equipment to provide and that only cutting-edge scientists could operate.
Physics deals with dead matter. We can poke and prod without doing harm, and what we are looking at is what we are looking at. Life (made of physical matter, of course) is way more complicated. In comparison, "dead" is easy; life is hard.
Biology remains primarily an observational science. Don't get me wrong — biologists do experiments — they poke and prod living things in various ways, to help them try to flesh out the pictures we are still forming about what is going on in living things. But we are not in possession of a set of "laws of nature" such as those that govern physics. Even what we once called the central dogma of molecular biology — that DNA "makes" RNA which "makes" proteins — has turned out to be far more complicated than biologists originally thought. We still don't fully understand what something as basic as aspirin does in the human body, much less what it does in a particular person's body. We understand a lot, but our knowledge is far from perfect. Medicine like aspirin is technology — we are doing our best to apply the findings of biological science to solve problems. We understand what aspirin itself is, very well (the chemistry, not the biology), but what happens when you put it into an average human body, or a particular person's body, is another question altogether. The science is too weak in biology, especially human biology, to apply and evolve technology with anywhere near the speed of information technology.
These fields are different worlds, scientifically speaking. (I am not even getting into structural differences that make the markets so different — innovators in medicine have to deal with regulators and whether insurance companies will pay for things, with serious ethical issues involved in experimenting on animals and humans, and with the huge amounts of money and time and risk in bringing new products to market. All these make medicine a different universe from information technology.)
Going a little deeper into the science...
Most everybody has heard of "DNA", but what is it? It is a polymer — it is a chemical made up of many subunits all connected in a chain. Each of those subunits is a chemical called a nucleotide. In simple terms, there are four different nucleotides: adenine, guanine, thymine, and cytosine, and we often describe the chain they make when they link together by using the first letters of their names: A, G, T, and C. So DNA is a long chain of As, Gs, Ts, and Cs. We can describe a given instance of a DNA molecule as a chain of letters: AAGTCTTGACT, etc.
A "genome" is, basically, all the DNA in a cell. (A given species will have a pretty consistent genome on a high level, but every instance of that species will be slightly different — there will be many small variants — some of them a single nucleotide change, some of them being whole deletions or rearrangements of DNA segments. But genomes remain consistent enough from organism to organism within a species that we can indeed meaningfully talk about "species"). The simplest bacteria (which are some of the simplest living organisms) have DNA that is a chain 139,000 nucleotides long (ATCTG, etc., times ~139,000). Microscopic, mind you!
But ... who cares? Why does that matter? Well, DNA is kind of the "blueprint" of the cell.
(We need to be careful here — people use a lot of metaphors in biology, and they are starting to slip into thinking about "genomes" as pure information—as literally some kind of code, like software. But in the real world, any genome is DNA, which is not abstract information. DNA is an actual, physical thing in every cell in every living thing. It interacts physically with other actual chemicals, which in turn interact with other chemicals, and so on and so on. The sum of those actual interactions is what we call "life", and even "consciousness".)
Within the long, long chemical chains of DNA, certain segments function as a kind of code (we call these segments "genes"). The cell has machinery (yes! — actual mini-machinery that is amazing to behold and consider — see this youtube video for an animation) that creates a different polymer, RNA, by reading off the nucleotides on DNA. The RNA can in turn function as a kind of code that other machinery "reads" and builds proteins from, or it can go off and do things on its own—like become part of a ribosome, or directly interfere with other molecules, or do many other things. We used to think there was a lot of junk DNA in the regions of DNA between genes — we used to think the DNA in those regions was just inert. But we are learning more and more that all kinds of interesting things are happening there.
All those things (DNA, RNA, proteins, and many other things) all interact with each other. Zillions of interactions, all happening on a microscopic scale, and changing all the time.
Now, think about science. Newton could stand in his back yard, drop an apple, and measure how long it took the apple to hit the ground. He could change the experiment — climb a ladder and drop it from higher, and time that. But of course, all that is pretty... uncontrolled. What if it's windy one day, or really humid so the air is thicker? How high exactly was his hand above the ground? He could take it inside, where there is no wind. He could put the apple in a holder attached to a ruler, and release it from a precisely determined height. What he is doing there, is thinking about how to design a controlled experiment, so he is only testing one thing so that he will know exactly what is changing and can later make sense of the results of his experiment.
Turn back to a bacterium. Tens of thousands of nucleotides in even the simplest genome, dozens of genes encoding for many different proteins, etc., etc., and everything interacting with each other, and the whole bacterium interacting with whatever is floating around it, including...other bacteria. Alive, and constantly changing. And all microscopic and invisible to the naked eye. Think about trying to do an experiment and trying to change just one thing. Think about how easy it would be to contaminate the experiment—to have the tiniest jot of some chemical on one of the instruments you are trying to use to manipulate the bacterium. It is really, really hard, just physically, to perform an experiment in biology; it takes a lot of skill and training, and it is really hard to even design an experiment where you are reasonably certain you are only changing one thing.
And remember, this is just a bacterium. You can kill it, chop it, dump chemicals on it...pretty much whatever you want to try to create a controlled experiment and then see what happens. What if you want to understand an organism that has multiple cells, like a human? And remember, our cells also interact with each other (where each has their own DNA that is producing mRNA etc all the time), and we have organs and organ systems that interact with one another on a meta-level, and systems like hormones that act across our entire body on a meta-meta level. The complexity is absolutely mind-blowing. Add to that the fact that everyone is different, because each of us is the result of a unique blend of our parents' DNA, and each of us grew up and exist in different environments. So you can give one person a dose of the drug coumadin and it will perform just as you expect it to, but if you give the same dose to another person, and it can be wildly more potent or less potent. (Some drugs are more sensitive to individual differences than others, but all drugs act differently in different people.)
Human biology is harder still
So, trying to understand normal basic human biology is hard. Trying to figure out what is going on in a disease is really hard, too. For example, everybody knows that Alzheimer's is a terrible disease, and we have spent gobs of money trying to figure out what causes it. One of the bad actors is a piece of a protein. The piece is called "A beta" and the whole protein is called "APP". Well, with all the money we have spent, we still do not know what APP does in normal brains, and we still don't understand why the A beta piece gets cut out of it. We don't understand why neurons die in the brains of Alzheimer's patients, nor how to stop them from dying. That is crazy, right? It starts to make some sense when you realize that we have no way (really!) of looking inside a living human being's skull and seeing in detail — way down at the cellular level — what is going on. It's a serious problem! Anyway, we are scrabbling around in the dark. Humans are really, really complicated biological things. There's so much going on.
Since we can't chop up living human beings or do crazy experiments on them, how do we try to figure out human biology? We use models — mostly other organisms on which we can do experiments, and based on the results, we can then try to make guesses about human biology. You might have heard the joke about searching for keys where the light is better. This is what biology is like. People do research in mice, or in cells in petri dishes, or they cut up dead people. We do controlled experiments that make sense, and we can start to put stories together about what is going on. And while we are making progress, our answers are still pretty crappy, pretty fragmentary. (This is why we do experiments on animals. A lot of people, including scientists, struggle ethically with whether it is acceptable to do experiments on animals, and if so, how. It is not an easy question. How will we learn about biology if we cannot do experiments with living beings, especially ones that are similar to us? How do we actually see what is going inside a living being if we do not cut it open and look? We do not have any technology that allows us to non-invasively look deep inside a living thing on a microscopic level in real time. That technology just doesn't exist in the real world—we have no tricorders. These are real questions, and very hard ones.)
Another thing scientists do are "epidemiological studies". These are studies of a lot of living people where you measure a bunch of things and try to find correlations. But correlations are dangerous. For example, say a study found that college kids who sleep in their clothes tend to wake up with headaches—that's a correlation. But what does this really mean? Does wearing clothes while you sleep make you sleep poorly, or maybe cut off blood to your head or something? Well...the study didn't measure how much beer people drank the night before! Right? Now it all makes sense. In this case, the beer drinking is what we call a "confounder", and suddenly we can see that the correlation we saw before is really meaningless. Just because the two things happen to occur together does not mean one caused the other. So while it is tempting to say that the correlation implies causation, it is a very dangerous thing to assume (see correlation does not mean causation if you want to dig into that whole thing more).
Scientists also conduct clinical trials. These are also scientific experiments where the scientists are trying as hard as they can to change just one thing, again so they can actually make sense of the results. There are intense ethical issues involved in doing medical experiments with humans, and large bodies of international and national law and regulation concerning this. A principle that all these bodies of law and regulation share is that there must be some clear benefit to society from any experiment done on a human, and the research subject must be protected as much as possible from any risk of harm.[e] For this reason, scientists don't test things like pesticides directly on humans. Instead, we rely on toxicology studies in animals and cells to try to understand the risks of substances that have beneficial uses outside of medicine. Scientists do test new drugs, medical devices, and diagnostic tests on humans to determine if they work well enough (that is, are "effective"), and are safe enough, to justify their release into the market and subsequent widespread use. Only after doing as much work as possible in cells and animals (and with many medical devices, in human cadavers) can testing on humans begin. Clinical trials of drugs start with small Phase I studies to explore how much of the drug can be used and to get an initial understanding of whether they are safe enough to continue testing. These tests are important and are dangerous. Terrible surprises happen (rarely, but they happen), such as what occurred in the Phase I trials of an immunomodulatory cancer drug called TGN1412; it unexpectedly caused a cytokine storm and multiple organ failure in the six patients to whom it was given. While all of them survived, they required treatment in intensive care and the long-term effects on them are unknown.[11]) Phase I disasters like that are rare, but the point is, they do happen, even with the most careful planning. Putting a drug in a human for the first time is one of the scariest and most intellectually and ethically challenging experiments imaginable.
In any case, after Phase I trials establish dose levels and give a high-level insight into safety, Phase II trials start. These are conducted on larger (say 10–100) groups of patients who are actually sick, and the goal is to get an insight into safety and efficacy in actual patients. Companies will often conduct multiple Phase II trials (Phase IIa, Phase IIb, etc.) with various drug formulations and also in different patient populations or for different diseases, further exploring whether and how it makes sense to take on the much bigger expense and challenge of a Phase III trial. Phase III trials test the drug in large numbers of sick patients, with the goal of getting definitive data about safety and efficacy. These trials, which cost tens of millions of dollars to run, are carefully designed; the goal is to have a big enough "N" (see above!) and to follow patients long enough to get a reliable answer, but not too big an N such that patients are endangered unnecessarily and so as not to waste money and time.
All three phases are experiments that are limited in time and in the number of patients who are treated, and there is often more to learn about drugs after they are on the market, and are used by millions of people over years and years. Post-marketing surveillance of drugs is important, and is difficult—again because you are back to doing epidemiological studies that are not controlled, and it is hard to determine whether problems that arise in the population taking the drug (who are sick!) are caused by the drug or not (the correlation and causation problem).
Sometimes—rarely, but probably more commonly as we move into the 2010s and beyond—companies test new drugs against existing drugs. They do that because payors (insurance companies, national health payment systems like Medicare or NHS in the UK) are starting to demand this kind of information to justify drug pricing. This testing provides really valuable data. Outside of that, there are two ways we get insight into what available treatment ~might~ be best for a given patient with a given problem. One is that federal agencies like the US Agency for Healthcare Research and Quality or, less frequently, the National Institutes of Health, sometimes fund head-to-head trials comparing treatments. Another way is that doctors and scientists sometimes pull together all the published clinical trials for a given condition, with all the various treatments that were used, and try to compare how well the treatments worked, and how safe they are, using complex statistical methods. These articles are called systematic reviews, and they are some of the most reliable sources we have for medical information. The Cochrane Collaboration is an example of a group that does this.
So, it is hard to tell what is going on. We use models, we do big studies and make correlations...and all of these are experiments. Scientists also analyze published results and try to make sense of them, often with complex statistical modelling. All of these efforts show us stretching, reaching out, into the microscopic, churning darkness where life happens as well as into the mass of data we have built up about how groups of people respond to various treatments—to try to understand, bit by bit, what we are and how we are affected by diseases and by the drugs meant to treat them.