(Artigo em construção/tradução)
( Categoria “Ciência Acadêmica” com os prós e contras. Ver artigos relacionados:1 – Antes de julgar a Teoria da Matriz/DNA, leia isto.2- Lista dos descobridores que foram ridicularizados e depois reconhecidos)
Tem acontecido – como relata o artigo abaixo – que uma descoberta sôbre a Natureza feita através de experimento cientifico e portanto aprovada, nos experimentos subsequentes começa a mostrar resultados diferentes. Muitos dos casos são devidos a êrros humanos nas experiencias, mas outros casos sugerem que existam variáveis ocultas. Uma possibilidade é que a Natureza é completo acaso, por isso nenhuma experiência poderia ser replicada e dar o exato resultado… se der, foi por acaso. Mas existe uma muito tênue hipótese que pode ser levantada quando se conhece a Matriz. Ainda não consigo descrevê-la corretamente em palavras, mas porque vou tentar averiguar esta hipótese, deixo tudo isso aqui registrado. A hipótese seria de que a realidade seja composta de várias camadas fractais de tempo. Um exemplo seria: o ser humano demora 9 meses para dar a luz, mas o processo é o mesmo para um mosquito, que demora, suponhamos 3 dias. O processo é o fractal, quer dizer, ele é sempre repetido, o que varia é o tempo. Se assim for, todos os fenomenos que ocorrem a nível microscópico devem mudar constantemente em relação ao nosso nivel de percepção, e à nossa própria realidade. Enquanto nós demoramos 10 anos como sistema para ir da Função 2 (onde somos crianças) para a Função 3 (onde somos jovens) nossas células que estão sob o mesmo ciclo vital passam por estas duas funções milhares de vezes. Com isso, experimentos que se baseiam em ações e reações de níveis de grandezas menores que o nosso (como a ação de anti-depressantes, ou mesmo antibióticos) podem dar o resultado “A” numa experiencia feita hoje e o mesmo resultado numa experiencia feita daqui um ano, porque coincidiu de acertarmos o mesmo ciclo vital de diferentes gerações. Ou pode ser que várias experiencias efetuadas cronológicamente em semanas seguidas vão dando resultados ligeiramente diferentes, porque o alvo sob testes está mudando devido seu ciclo vital.
Nêste caso, as causas do “efeito declinante” poderiam ser tôdas ou algumas das citadas pelo autor do artigo e mais ainda esta hipótese levantada pela Teoria da Matriz.
Translation to English:
It has happened – as the article below reports – one that made discoveries about the Nature through scientific experiment and therefore adopted in subsequent experiments begin to show different results. Many cases are due to human error in the experiments, but other cases suggest that there are hidden variables. One possibility is that Nature is completely random, so no experience could be replicated and give the exact result … if it does, it must be by chance. But there is a very tenuous assumption that could be raised from the Matrix/DNA Theory viewpoint. I still can not describe it correctly in words, but because I will try to investigate this hypothesis, I leave it all on record. The hypothesis is that reality is composed of several layers of fractal time. An example would be: the human specie takes 9 months to give birth, but the process is the same for a mosquito, that takes, to say, 3 days. The process is fractal, meaning it is always repeated, the only variable is time. If so, all phenomena that occur at the microscopic level must constantly change in relation to our level of perception, and our own reality. Things are better explained if we see the diragram/software of matrix at my website: http:theuniversalmatrix.com . While we have taken 10 years to go from Function 2 (where we are children) for the Function 3 (where we are teenager), our cells that are under the same life cycle pass through these two functions thousands of times. Thus, experiments that are based on actions and reactions of levels of magnitude smaller than our own (as the action of anti-depressants, or antibiotics) may give the result “A” in an experiment done today and the same result in an experiment done after one year, it coincided to hit the same life cycle of different generations. Or it may be that several experiments performed chronologically in consecutive weeks will give slightly different results, because the target under test is changing quickly because its life cycle.
In this case, the causes of “The decline effect” could be any or all of the mentioned by author of this article and this one further hypothesized by the Theory of Matrix.
Annals of Science
The Truth Wears Off
Is there something wrong with the scientific method?
by Jonah Lehrer December 13, 2010
- On September 18, 2007, a few dozen neuroscientists, psychiatrists, and drug-company executives gathered in a hotel conference room in Brussels to hear some startling news. It had to do with a class of drugs known as atypical or second-generation antipsychotics, which came on the market in the early nineties. The drugs, sold under brand names such as Abilify, Seroquel, and Zyprexa, had been tested on schizophrenics in several large clinical trials, all of which had demonstrated a dramatic decrease in the subjects’ psychiatric symptoms. As a result, second-generation antipsychotics had become one of the fastest-growing and most profitable pharmaceutical classes. By 2001, Eli Lilly’s Zyprexa was generating more revenue than Prozac. It remains the company’s top-selling drug.
But the data presented at the Brussels meeting made it clear that something strange was happening: the therapeutic power of the drugs appeared to be steadily waning. A recent study showed an effect that was less than half of that documented in the first trials, in the early nineteen-nineties. Many researchers began to argue that the expensive pharmaceuticals weren’t any better than first-generation antipsychotics, which have been in use since the fifties. “In fact, sometimes they now look even worse,” John Davis, a professor of psychiatry at the University of Illinois at Chicago, told me.
Before the effectiveness of a drug can be confirmed, it must be tested and tested again. Different scientists in different labs need to repeat the protocols and publish their results. The test of replicability, as it’s known, is the foundation of modern research. Replicability is how the community enforces itself. It’s a safeguard for the creep of subjectivity. Most of the time, scientists know what results they want, and that can influence the results they get. The premise of replicability is that the scientific community can correct for these flaws.
But now all sorts of well-established, multiply confirmed findings have started to look increasingly uncertain. It’s as if our facts were losing their truth: claims that have been enshrined in textbooks are suddenly unprovable. This phenomenon doesn’t yet have an official name, but it’s occurring across a wide range of fields, from psychology to ecology. In the field of medicine, the phenomenon seems extremely widespread, affecting not only antipsychotics but also therapies ranging from cardiac stents to Vitamin E and antidepressants: Davis has a forthcoming analysis demonstrating that the efficacy of antidepressants has gone down as much as threefold in recent decades.
For many scientists, the effect is especially troubling because of what it exposes about the scientific process. If replication is what separates the rigor of science from the squishiness of pseudoscience, where do we put all these rigorously validated findings that can no longer be proved? Which results should we believe? Francis Bacon, the early-modern philosopher and pioneer of the scientific method, once declared that experiments were essential, because they allowed us to “put nature to the question.” But it appears that nature often gives us different answers.
Jonathan Schooler was a young graduate student at the University of Washington in the nineteen-eighties when he discovered a surprising new fact about language and memory. At the time, it was widely believed that the act of describing our memories improved them. But, in a series of clever experiments, Schooler demonstrated that subjects shown a face and asked to describe it were much less likely to recognize the face when shown it later than those who had simply looked at it. Schooler called the phenomenon “verbal overshadowing.”
The study turned him into an academic star. Since its initial publication, in 1990, it has been cited more than four hundred times. Before long, Schooler had extended the model to a variety of other tasks, such as remembering the taste of a wine, identifying the best strawberry jam, and solving difficult creative puzzles. In each instance, asking people to put their perceptions into words led to dramatic decreases in performance.
But while Schooler was publishing these results in highly reputable journals, a secret worry gnawed at him: it was proving difficult to replicate his earlier findings. “I’d often still see an effect, but the effect just wouldn’t be as strong,” he told me. “It was as if verbal overshadowing, my big new idea, was getting weaker.” At first, he assumed that he’d made an error in experimental design or a statistical miscalculation. But he couldn’t find anything wrong with his research. He then concluded that his initial batch of research subjects must have been unusually susceptible to verbal overshadowing. (John Davis, similarly, has speculated that part of the drop-off in the effectiveness of antipsychotics can be attributed to using subjects who suffer from milder forms of psychosis which are less likely to show dramatic improvement.) “It wasn’t a very satisfying explanation,” Schooler says. “One of my mentors told me that my real mistake was trying to replicate my work. He told me doing that was just setting myself up for disappointment.”
Schooler tried to put the problem out of his mind; his colleagues assured him that such things happened all the time. Over the next few years, he found new research questions, got married and had kids. But his replication problem kept on getting worse. His first attempt at replicating the 1990 study, in 1995, resulted in an effect that was thirty per cent smaller. The next year, the size of the effect shrank another thirty per cent. When other labs repeated Schooler’s experiments, they got a similar spread of data, with a distinct downward trend. “This was profoundly frustrating,” he says. “It was as if nature gave me this great result and then tried to take it back.” In private, Schooler began referring to the problem as “cosmic habituation,” by analogy to the decrease in response that occurs when individuals habituate to particular stimuli. “Habituation is why you don’t notice the stuff that’s always there,” Schooler says. “It’s an inevitable process of adjustment, a ratcheting down of excitement. I started joking that it was like the cosmos was habituating to my ideas. I took it very personally.”
Schooler is now a tenured professor at the University of California at Santa Barbara. He has curly black hair, pale-green eyes, and the relaxed demeanor of someone who lives five minutes away from his favorite beach. When he speaks, he tends to get distracted by his own digressions. He might begin with a point about memory, which reminds him of a favorite William James quote, which inspires a long soliloquy on the importance of introspection. Before long, we’re looking at pictures from Burning Man on his iPhone, which leads us back to the fragile nature of memory.
Although verbal overshadowing remains a widely accepted theory—it’s often invoked in the context of eyewitness testimony, for instance—Schooler is still a little peeved at the cosmos. “I know I should just move on already,” he says. “I really should stop talking about this. But I can’t.” That’s because he is convinced that he has stumbled on a serious problem, one that afflicts many of the most exciting new ideas in psychology.
One of the first demonstrations of this mysterious phenomenon came in the early nineteen-thirties. Joseph Banks Rhine, a psychologist at Duke, had developed an interest in the possibility of extrasensory perception, or E.S.P. Rhine devised an experiment featuring Zener cards, a special deck of twenty-five cards printed with one of five different symbols: a card was drawn from the deck and the subject was asked to guess the symbol. Most of Rhine’s subjects guessed about twenty per cent of the cards correctly, as you’d expect, but an undergraduate named Adam Linzmayer averaged nearly fifty per cent during his initial sessions, and pulled off several uncanny streaks, such as guessing nine cards in a row. The odds of this happening by chance are about one in two million. Linzmayer did it three times.
Rhine documented these stunning results in his notebook and prepared several papers for publication. But then, just as he began to believe in the possibility of extrasensory perception, the student lost his spooky talent. Between 1931 and 1933, Linzmayer guessed at the identity of another several thousand cards, but his success rate was now barely above chance. Rhine was forced to conclude that the student’s “extra-sensory perception ability has gone through a marked decline.” And Linzmayer wasn’t the only subject to experience such a drop-off: in nearly every case in which Rhine and others documented E.S.P. the effect dramatically diminished over time. Rhine called this trend the “decline effect.”
Schooler was fascinated by Rhine’s experimental struggles. Here was a scientist who had repeatedly documented the decline of his data; he seemed to have a talent for finding results that fell apart. In 2004, Schooler embarked on an ironic imitation of Rhine’s research: he tried to replicate this failure to replicate. In homage to Rhine’s interests, he decided to test for a parapsychological phenomenon known as precognition. The experiment itself was straightforward: he flashed a set of images to a subject and asked him or her to identify each one. Most of the time, the response was negative—the images were displayed too quickly to register. Then Schooler randomly selected half of the images to be shown again. What he wanted to know was whether the images that got a second showing were more likely to have been identified the first time around. Could subsequent exposure have somehow influenced the initial results? Could the effect become the cause?
The craziness of the hypothesis was the point: Schooler knows that precognition lacks a scientific explanation. But he wasn’t testing extrasensory powers; he was testing the decline effect. “At first, the data looked amazing, just as we’d expected,” Schooler says. “I couldn’t believe the amount of precognition we were finding. But then, as we kept on running subjects, the effect size”—a standard statistical measure—“kept on getting smaller and smaller.” The scientists eventually tested more than two thousand undergraduates. “In the end, our results looked just like Rhine’s,” Schooler said. “We found this strong paranormal effect, but it disappeared on us.”
The most likely explanation for the decline is an obvious one: regression to the mean. As the experiment is repeated, that is, an early statistical fluke gets cancelled out. The extrasensory powers of Schooler’s subjects didn’t decline—they were simply an illusion that vanished over time. And yet Schooler has noticed that many of the data sets that end up declining seem statistically solid—that is, they contain enough data that any regression to the mean shouldn’t be dramatic. “These are the results that pass all the tests,” he says. “The odds of them being random are typically quite remote, like one in a million. This means that the decline effect should almost never happen. But it happens all the time! Hell, it’s happened to me multiple times.” And this is why Schooler believes that the decline effect deserves more attention: its ubiquity seems to violate the laws of statistics. “Whenever I start talking about this, scientists get very nervous,” he says. “But I still want to know what happened to my results. Like most scientists, I assumed that it would get easier to document my effect over time. I’d get better at doing the experiments, at zeroing in on the conditions that produce verbal overshadowing. So why did the opposite happen? I’m convinced that we can use the tools of science to figure this out. First, though, we have to admit that we’ve got a problem.”
In 1991, the Danish zoologist Anders Møller, at Uppsala University, in Sweden, made a remarkable discovery about sex, barn swallows, and symmetry. It had long been known that the asymmetrical appearance of a creature was directly linked to the amount of mutation in its genome, so that more mutations led to more “fluctuating asymmetry.” (An easy way to measure asymmetry in humans is to compare the length of the fingers on each hand.) What Møller discovered is that female barn swallows were far more likely to mate with male birds that had long, symmetrical feathers. This suggested that the picky females were using symmetry as a proxy for the quality of male genes. Møller’s paper, which was published in Nature, set off a frenzy of research. Here was an easily measured, widely applicable indicator of genetic quality, and females could be shown to gravitate toward it. Aesthetics was really about genetics.
In the three years following, there were ten independent tests of the role of fluctuating asymmetry in sexual selection, and nine of them found a relationship between symmetry and male reproductive success. It didn’t matter if scientists were looking at the hairs on fruit flies or replicating the swallow studies—females seemed to prefer males with mirrored halves. Before long, the theory was applied to humans. Researchers found, for instance, that women preferred the smell of symmetrical men, but only during the fertile phase of the menstrual cycle. Other studies claimed that females had more orgasms when their partners were symmetrical, while a paper by anthropologists at Rutgers analyzed forty Jamaican dance routines and discovered that symmetrical men were consistently rated as better dancers.
Then the theory started to fall apart. In 1994, there were fourteen published tests of symmetry and sexual selection, and only eight found a correlation. In 1995, there were eight papers on the subject, and only four got a positive result. By 1998, when there were twelve additional investigations of fluctuating asymmetry, only a third of them confirmed the theory. Worse still, even the studies that yielded some positive result showed a steadily declining effect size. Between 1992 and 1997, the average effect size shrank by eighty per cent.
And it’s not just fluctuating asymmetry. In 2001, Michael Jennions, a biologist at the Australian National University, set out to analyze “temporal trends” across a wide range of subjects in ecology and evolutionary biology. He looked at hundreds of papers and forty-four meta-analyses (that is, statistical syntheses of related studies), and discovered a consistent decline effect over time, as many of the theories seemed to fade into irrelevance. In fact, even when numerous variables were controlled for—Jennions knew, for instance, that the same author might publish several critical papers, which could distort his analysis—there was still a significant decrease in the validity of the hypothesis, often within a year of publication. Jennions admits that his findings are troubling, but expresses a reluctance to talk about them publicly. “This is a very sensitive issue for scientists,” he says. “You know, we’re supposed to be dealing with hard facts, the stuff that’s supposed to stand the test of time. But when you see these trends you become a little more skeptical of things.”
What happened? Leigh Simmons, a biologist at the University of Western Australia, suggested one explanation when he told me about his initial enthusiasm for the theory: “I was really excited by fluctuating asymmetry. The early studies made the effect look very robust.” He decided to conduct a few experiments of his own, investigating symmetry in male horned beetles. “Unfortunately, I couldn’t find the effect,” he said. “But the worst part was that when I submitted these null results I had difficulty getting them published. The journals only wanted confirming data. It was too exciting an idea to disprove, at least back then.” For Simmons, the steep rise and slow fall of fluctuating asymmetry is a clear example of a scientific paradigm, one of those intellectual fads that both guide and constrain research: after a new paradigm is proposed, the peer-review process is tilted toward positive results. But then, after a few years, the academic incentives shift—the paradigm has become entrenched—so that the most notable results are now those that disprove the theory.
Jennions, similarly, argues that the decline effect is largely a product of publication bias, or the tendency of scientists and scientific journals to prefer positive data over null results, which is what happens when no effect is found. The bias was first identified by the statistician Theodore Sterling, in 1959, after he noticed that ninety-seven per cent of all published psychological studies with statistically significant data found the effect they were looking for. A “significant” result is defined as any data point that would be produced by chance less than five per cent of the time. This ubiquitous test was invented in 1922 by the English mathematician Ronald Fisher, who picked five per cent as the boundary line, somewhat arbitrarily, because it made pencil and slide-rule calculations easier. Sterling saw that if ninety-seven per cent of psychology studies were proving their hypotheses, either psychologists were extraordinarily lucky or they published only the outcomes of successful experiments. In recent years, publication bias has mostly been seen as a problem for clinical trials, since pharmaceutical companies are less interested in publishing results that aren’t favorable. But it’s becoming increasingly clear that publication bias also produces major distortions in fields without large corporate incentives, such as psychology and ecology.
While publication bias almost certainly plays a role in the decline effect, it remains an incomplete explanation. For one thing, it fails to account for the initial prevalence of positive results among studies that never even get submitted to journals. It also fails to explain the experience of people like Schooler, who have been unable to replicate their initial data despite their best efforts. Richard Palmer, a biologist at the University of Alberta, who has studied the problems surrounding fluctuating asymmetry, suspects that an equally significant issue is the selective reporting of results—the data that scientists choose to document in the first place. Palmer’s most convincing evidence relies on a statistical tool known as a funnel graph. When a large number of studies have been done on a single subject, the data should follow a pattern: studies with a large sample size should all cluster around a common value—the true result—whereas those with a smaller sample size should exhibit a random scattering, since they’re subject to greater sampling error. This pattern gives the graph its name, since the distribution resembles a funnel.
The funnel graph visually captures the distortions of selective reporting. For instance, after Palmer plotted every study of fluctuating asymmetry, he noticed that the distribution of results with smaller sample sizes wasn’t random at all but instead skewed heavily toward positive results. Palmer has since documented a similar problem in several other contested subject areas. “Once I realized that selective reporting is everywhere in science, I got quite depressed,” Palmer told me. “As a researcher, you’re always aware that there might be some nonrandom patterns, but I had no idea how widespread it is.” In a recent review article, Palmer summarized the impact of selective reporting on his field: “We cannot escape the troubling conclusion that some—perhaps many—cherished generalities are at best exaggerated in their biological significance and at worst a collective illusion nurtured by strong a-priori beliefs often repeated.”
Palmer emphasizes that selective reporting is not the same as scientific fraud. Rather, the problem seems to be one of subtle omissions and unconscious misperceptions, as researchers struggle to make sense of their results. Stephen Jay Gould referred to this as the “shoehorning” process. “A lot of scientific measurement is really hard,” Simmons told me. “If you’re talking about fluctuating asymmetry, then it’s a matter of minuscule differences between the right and left sides of an animal. It’s millimetres of a tail feather. And so maybe a researcher knows that he’s measuring a good male”—an animal that has successfully mated—“and he knows that it’s supposed to be symmetrical. Well, that act of measurement is going to be vulnerable to all sorts of perception biases. That’s not a cynical statement. That’s just the way human beings work.”
One of the classic examples of selective reporting concerns the testing of acupuncture in different countries. While acupuncture is widely accepted as a medical treatment in various Asian countries, its use is much more contested in the West. These cultural differences have profoundly influenced the results of clinical trials. Between 1966 and 1995, there were forty-seven studies of acupuncture in China, Taiwan, and Japan, and every single trial concluded that acupuncture was an effective treatment. During the same period, there were ninety-four clinical trials of acupuncture in the United States, Sweden, and the U.K., and only fifty-six per cent of these studies found any therapeutic benefits. As Palmer notes, this wide discrepancy suggests that scientists find ways to confirm their preferred hypothesis, disregarding what they don’t want to see. Our beliefs are a form of blindness.
John Ioannidis, an epidemiologist at Stanford University, argues that such distortions are a serious issue in biomedical research. “These exaggerations are why the decline has become so common,” he says. “It’d be really great if the initial studies gave us an accurate summary of things. But they don’t. And so what happens is we waste a lot of money treating millions of patients and doing lots of follow-up studies on other themes based on results that are misleading.” In 2005, Ioannidis published an article in the Journal of the American Medical Association that looked at the forty-nine most cited clinical-research studies in three major medical journals. Forty-five of these studies reported positive results, suggesting that the intervention being tested was effective. Because most of these studies were randomized controlled trials—the “gold standard” of medical evidence—they tended to have a significant impact on clinical practice, and led to the spread of treatments such as hormone replacement therapy for menopausal women and daily low-dose aspirin to prevent heart attacks and strokes. Nevertheless, the data Ioannidis found were disturbing: of the thirty-four claims that had been subject to replication, forty-one per cent had either been directly contradicted or had their effect sizes significantly downgraded.
The situation is even worse when a subject is fashionable. In recent years, for instance, there have been hundreds of studies on the various genes that control the differences in disease risk between men and women. These findings have included everything from the mutations responsible for the increased risk of schizophrenia to the genes underlying hypertension. Ioannidis and his colleagues looked at four hundred and thirty-two of these claims. They quickly discovered that the vast majority had serious flaws. But the most troubling fact emerged when he looked at the test of replication: out of four hundred and thirty-two claims, only a single one was consistently replicable. “This doesn’t mean that none of these claims will turn out to be true,” he says. “But, given that most of them were done badly, I wouldn’t hold my breath.”
According to Ioannidis, the main problem is that too many researchers engage in what he calls “significance chasing,” or finding ways to interpret the data so that it passes the statistical test of significance—the ninety-five-per-cent boundary invented by Ronald Fisher. “The scientists are so eager to pass this magical test that they start playing around with the numbers, trying to find anything that seems worthy,” Ioannidis says. In recent years, Ioannidis has become increasingly blunt about the pervasiveness of the problem. One of his most cited papers has a deliberately provocative title: “Why Most Published Research Findings Are False.”
( O texto abaixo serve como aviso para mim e minha teoria: estou achando, vendo a Matriz em tudo, porem, não estarei selecionando e discriminando dados, ignorando quando a Matriz não aparece?)
The problem of selective reporting is rooted in a fundamental cognitive flaw, which is that we like proving ourselves right and hate being wrong. “It feels good to validate a hypothesis,” Ioannidis said. “It feels even better when you’ve got a financial interest in the idea or your career depends upon it. And that’s why, even after a claim has been systematically disproven”—he cites, for instance, the early work on hormone replacement therapy, or claims involving various vitamins—“you still see some stubborn researchers citing the first few studies that show a strong effect. They really want to believe that it’s true.”
That’s why Schooler argues that scientists need to become more rigorous about data collection before they publish. “We’re wasting too much time chasing after bad studies and underpowered experiments,” he says. The current “obsession” with replicability distracts from the real problem, which is faulty design. He notes that nobody even tries to replicate most science papers—there are simply too many. (According to Nature, a third of all studies never even get cited, let alone repeated.) “I’ve learned the hard way to be exceedingly careful,” Schooler says. “Every researcher should have to spell out, in advance, how many subjects they’re going to use, and what exactly they’re testing, and what constitutes a sufficient level of proof. We have the tools to be much more transparent about our experiments.”
In a forthcoming paper, Schooler recommends the establishment of an open-source database, in which researchers are required to outline their planned investigations and document all their results. “I think this would provide a huge increase in access to scientific work and give us a much better way to judge the quality of an experiment,” Schooler says. “It would help us finally deal with all these issues that the decline effect is exposing.”
( Mas existe outra maneira de testar os experimentos: testa-los contra outras abordagens, como a da Matrix Theory.)
Although such reforms would mitigate the dangers of publication bias and selective reporting, they still wouldn’t erase the decline effect. This is largely because scientific research will always be shadowed by a force that can’t be curbed, only contained: sheer randomness. Although little research has been done on the experimental dangers of chance and happenstance, the research that exists isn’t encouraging.
In the late nineteen-nineties, John Crabbe, a neuroscientist at the Oregon Health and Science University, conducted an experiment that showed how unknowable chance events can skew tests of replicability. He performed a series of experiments on mouse behavior in three different science labs: in Albany, New York; Edmonton, Alberta; and Portland, Oregon. Before he conducted the experiments, he tried to standardize every variable he could think of. The same strains of mice were used in each lab, shipped on the same day from the same supplier. The animals were raised in the same kind of enclosure, with the same brand of sawdust bedding. They had been exposed to the same amount of incandescent light, were living with the same number of littermates, and were fed the exact same type of chow pellets. When the mice were handled, it was with the same kind of surgical glove, and when they were tested it was on the same equipment, at the same time in the morning.
The premise of this test of replicability, of course, is that each of the labs should have generated the same pattern of results. “If any set of experiments should have passed the test, it should have been ours,” Crabbe says. “But that’s not the way it turned out.” In one experiment, Crabbe injected a particular strain of mouse with cocaine. In Portland the mice given the drug moved, on average, six hundred centimetres more than they normally did; in Albany they moved seven hundred and one additional centimetres. But in the Edmonton lab they moved more than five thousand additional centimetres. Similar deviations were observed in a test of anxiety. Furthermore, these inconsistencies didn’t follow any detectable pattern. In Portland one strain of mouse proved most anxious, while in Albany another strain won that distinction.
The disturbing implication of the Crabbe study is that a lot of extraordinary scientific data are nothing but noise. The hyperactivity of those coked-up Edmonton mice wasn’t an interesting new fact—it was a meaningless outlier, a by-product of invisible variables we don’t understand. The problem, of course, is that such dramatic findings are also the most likely to get published in prestigious journals, since the data are both statistically significant and entirely unexpected. Grants get written, follow-up studies are conducted. The end result is a scientific accident that can take years to unravel.
This suggests that the decline effect is actually a decline of illusion. While Karl Popper imagined falsification occurring with a single, definitive experiment—Galileo refuted Aristotelian mechanics in an afternoon—the process turns out to be much messier than that. Many scientific theories continue to be considered true even after failing numerous experimental tests. Verbal overshadowing might exhibit the decline effect, but it remains extensively relied upon within the field. The same holds for any number of phenomena, from the disappearing benefits of second-generation antipsychotics to the weak coupling ratio exhibited by decaying neutrons, which appears to have fallen by more than ten standard deviations between 1969 and 2001. Even the law of gravity hasn’t always been perfect at predicting real-world phenomena. (In one test, physicists measuring gravity by means of deep boreholes in the Nevada desert found a two-and-a-half-per-cent discrepancy between the theoretical predictions and the actual data.) Despite these findings, second-generation antipsychotics are still widely prescribed, and our model of the neutron hasn’t changed. The law of gravity remains the same.
Such anomalies demonstrate the slipperiness of empiricism. Although many scientific ideas generate conflicting results and suffer from falling effect sizes, they continue to get cited in the textbooks and drive standard medical practice. Why? Because these ideas seem true. Because they make sense. Because we can’t bear to let them go. And this is why the decline effect is so troubling. Not because it reveals the human fallibility of science, in which data are tweaked and beliefs shape perceptions. (Such shortcomings aren’t surprising, at least for scientists.) And not because it reveals that many of our most exciting theories are fleeting fads and will soon be rejected. (That idea has been around since Thomas Kuhn.) The decline effect is troubling because it reminds us how difficult it is to prove anything. We like to pretend that our experiments define the truth for us. But that’s often not the case. Just because an idea is true doesn’t mean it can be proved. And just because an idea can be proved doesn’t mean it’s true. When the experiments are done, we still have to choose what to believe. ♦
More Thoughts on the Decline Effect
In “The Truth Wears Off“, I wanted to explore the human side of the scientific enterprise. My focus was on a troubling phenomenon often referred to as the “decline effect,” which is the tendency of many exciting scientific results to fade over time. This empirical hiccup afflicts fields from pharmacology to evolutionary biology to social psychology. There is no simple explanation for the decline effect, but the article explores several possibilities, from the publication biases of peer-reviewed journals to the “selective reporting” of scientists who sift through data.
This week, the magazine published four very thoughtful letters in response to the piece. The first letter, like many of the e-mails, tweets, and comments I’ve received directly, argues that the decline effect is ultimately a minor worry, since “in the long run, science prevails over human bias.” The letter, from Howard Stuart, cites the famous 1909 oil-drop experiment performed by Robert Millikan and Harvey Fletcher, which sought to measure the charge of the electron. It’s a fascinating experimental tale, as subsequent measurements gradually corrected the data, steadily nudging the charge upwards. In his 1974 commencement address at Caltech, Richard Feynman described why the initial measurement was off, and why it took so long to fix:
Millikan measured the charge on an electron by an experiment with falling oil drops, and got an answer which we now know not to be quite right. It’s a little bit off, because he had the incorrect value for the viscosity of air. It’s interesting to look at the history of measurements of the charge of the electron, after Millikan. If you plot them as a function of time, you find that one is a little bigger than Millikan’s, and the next one’s a little bit bigger than that, and the next one’s a little bit bigger than that, until finally they settle down to a number which is higher.
Why didn’t they discover that the new number was higher right away? It’s a thing that scientists are ashamed of—this history—because it’s apparent that people did things like this: When they got a number that was too high above Millikan’s, they thought something must be wrong—and they would look for and find a reason why something might be wrong. When they got a number closer to Millikan’s value they didn’t look so hard. And so they eliminated the numbers that were too far off, and did other things like that.
That’s a pretty perfect example of selective reporting in science. One optimistic takeaway from the oil-drop experiment is that our errors get corrected, and that the truth will always win out. Like Mr. Stuart, this was the moral Feynman preferred, as he warned the Caltech undergrads to be rigorous scientists, because their lack of rigor would be quickly exposed by the scientific process. “Other experimenters will repeat your experiment and find out whether you were wrong or right,” Feynman said. “Nature’s phenomena will agree or they’ll disagree with your theory.”
But that’s not always the case. For one thing, a third of scientific papers never get cited, let alone repeated, which means that many errors are never exposed. But even those theories that do get replicated are shadowed by uncertainty. After all, one of the more disturbing aspects of the decline effect is that many results we now believe to be false have been replicated numerous times. To take but one example I cited in the article: After fluctuating asymmetry, a widely publicized theory in evolutionary biology, was proposed in the early nineteen-nineties, nine of the first ten independent tests confirmed the theory. In fact, it took several years before an overwhelming majority of published papers began rejecting it. This raises the obvious problem: If false results can get replicated, then how do we demarcate science from pseudoscience? And how can we be sure that anything—even a multiply confirmed finding—is true?
These questions have no easy answers. However, I think the decline effect is an important reminder that we shouldn’t simply reassure ourselves with platitudes about the rigors of replication or the inevitable corrections of peer review. Although we often pretend that experiments settle the truth for us—that we are mere passive observers, dutifully recording the facts—the reality of science is a lot messier. It is an intensely human process, shaped by all of our usual talents, tendencies, and flaws.
Many letters chastised me for critiquing science in such a public venue. Here’s an example, from Dr. Robert Johnson of Wayne State Medical School:
Creationism and skepticism of climate change are popularly-held opinions; Lehrer’s closing words play into the hands of those who want to deny evolution, global warming, and other realities. I fear that those who wish to persuade Americans that science is just one more pressure group, and that the scientific method is a matter of opinion, will be eager to use his conclusion to advance their cause.
This was a concern I wrestled with while writing the piece. One of the sad ironies of scientific denialism is that we tend to be skeptical of precisely the wrong kind of scientific claims. Natural selection and climate change have been verified in thousands of different ways by thousands of different scientists working in many different fields. (This doesn’t mean, of course, that such theories won’t change or get modified—the strength of science is that nothing is settled.) Instead of wasting public debate on solid theories, I wish we’d spend more time considering the value of second-generation antipsychotics or the verity of the latest gene-association study.
Nevertheless, I think the institutions and mechanisms of the scientific process demand investigation, even if the inside view isn’t flattering. We know science works. But can it work better? There is too much at stake to not ask that question. Furthermore, the public funds a vast majority of basic research—it deserves to know about any problems.
And this brings me to another category of letters, which proposed new ways of minimizing the decline effect. Some readers suggested reducing the acceptable level of p-values or starting a Journal of Negative Results. Andrew Gelman, a professor of statistics at Columbia University, proposed the use of “retrospective power analyses,” in which experimenters are forced to calculate their effect size using “real prior information,” and not just the data distilled from their small sample size.
I also received an intriguing e-mail from a former academic scientist now working for a large biotech company:
When I worked in a university lab, we’d find all sorts of ways to get a significant result. We’d adjust the sample size after the fact, perhaps because some of the mice were outliers or maybe they were handled incorrectly, etc. This wasn’t considered misconduct. It was just the way things were done. Of course, once these animals were thrown out [of the data] the effect of the intervention was publishable.
He goes on to say that standards are typically more rigorous in his corporate lab:
Here we have to be explicit, in advance, of how many mice we are going to use, and what effect we expect to find. We can’t fudge the numbers after the experiment has been done… That’s because companies don’t want to begin an expensive clinical trial based on basic research that is fundamentally flawed or just a product of randomness.
Of course, once that basic research enters clinical trials, there’s plenty of evidence that the massive financial incentives often start warping the data, leading to the suppression of negative results and the misinterpretation of positive ones. (This helps explain, at least in part, why such a large percentage of randomized clinical trials cannot be replicated. (ClinicalTrials.gov has tried to fix this problem by mandating the public registration of every clinical trial in advance.)
The larger point, though, is that there is nothing inherently mysterious about why the scientific process occasionally fails or the decline effect occurs. As Jonathan Schooler, one of the scientists featured in the article told me, “I’m convinced that we can use the tools of science to figure this—”the decline effect—”out. First, though, we have to admit that we’ve got a problem.”
Comentarios de terceiros:
Though Jonah Lehrer shows that the truth is slippery, he also demonstrates that, eventually, the truth comes out (“The Truth Wears Off,” December 13th). Robert Millikan’s famous oil-drop experiment, designed to measure the charge of an electron, is an example. Millikan’s value for the electron charge was too small. Over time, other scientists replicated his work, and the measured value gradually increased to the one we accept today. The value increased only slowly because scientists were easily biased toward rejecting results that deviated too far from expectations, based on Millikan’s number. It took many experiments, but the process showed that, in the long run, science prevails over human bias.
Glen Ridge, N.J.
Lehrer concludes that the “decline effect” is “troubling because it reminds us how difficult it is to prove anything.” But scientific hypotheses, no matter how firmly established, are never “proved” right. They are inherently provisional. Scientists know that the door is always open for new evidence and stronger theories. Newton’s law of gravitation was supported by countless empirical observations. It held the stage, unquestioned, for more than two hundred years. Then Einstein’s general relativity showed that Newton’s law, while a good approximation, was wrong. In this historical context, it’s no surprise that some recently formulated biological theories will not stand the test of time.
New York City
One development that Lehrer does not discuss, and that we believe has the potential to exacerbate the “decline effect,” is the trend toward the publication of brief reports in mainstream psychology journals. Traditionally, an article in experimental psychology consisted of at least three experiments, the first demonstrating an important finding, and the others replicating it and establishing its boundary conditions. Today, there is a move away from such articles and toward one-experiment papers reporting a single unexpected result. Short reports demand less time of authors, reviewers, and readers, but an undesirable consequence is that findings are less likely to be replicated even by the investigators who first publish them. This increases the likelihood that a result initially received as exciting and novel will turn out to be nothing but a fluke.
Editor, Journal of Experimental Psychology: General
Meu Comentário postado no artigo do New Yorker.Com
Maybe the decline effect is reality and not only human mistakes. There is a very tenuous assumption that could be raised from the Matrix/DNA Theory viewpoint. The hypothesis is that reality is composed of several layers of fractal time. An example would be: the human specie takes 9 months to give birth, but the process is the same for a mosquito, that takes, to say, 3 days. The process is fractal, meaning it is always repeated, the only variable is time. If so, all phenomena that occur at the microscopic level must constantly change in relation to our level of perception, and our own reality. Things are better explained if we see the diragram/software of matrix at my website: http:theuniversalmatrix.com . While we have taken 10 years to go from Function 2 (where we are children) for the Function 3 (where we are teenager), our cells that are under the same life cycle pass through these two functions thousands of times. Thus, experiments that are based on actions and reactions of levels of magnitude smaller than our own (as the action of anti-depressants, or antibiotics) may give the result “A” in an experiment done today and the same result in an experiment done after one year, it coincided to hit the same life cycle of different generations. Or it may be that several experiments performed chronologically in consecutive weeks will give slightly different results, because the target under test is changing quickly because its life cycle. In this case, the causes of “The decline effect” could be any or all of the mentioned by author of this article and this one further hypothesized by the Theory of Matrix.
Mr. Lehrer, when I read your original article, it seemed to me that you had taken a good story about inadequate experiment designs–particularly poor controls–and weak statistical analysis and buried it in sensationalist language and muddled thinking. The truth does not, as your headline claimed, wear off. Nor is there a real “decline effect.” In another part of Feynman’s “Cargo Cult” lecture that you quoted from, he talks about the difficulty of creating controlled conditions, using as his example experiments on rats. In your article, you allude to this problem as contributing to the “decline effect,” but rather than make a clear point that the so-called “decline effect” can be attributed to poor controls and, generally, poor experiment design, you instead “explain” the problem through sloppy thinking and sensationalist language. The tools to overcome the illusion of the “decline effect” have been a part of the scientist’s toolkit for decades: careful experiment design; rigorous statistical analysis; replication by third parties. Above all, scientists must be carefully and clearly honest with themselves. It is not always easy to be so honest as Feynman admonished (at the least, you just don’t think of everything), and we can be sure that scientists do not all have the solid grounding in experiment design and data analysis to always achieve accurate and reproducible results. Your article, while touching on each of these factors, failed to highlight them clearly, instead focusing on the more sensational (or perhaps “outrageous”) explanation that “the truth wears off.”
Posted 1/3/2011, 10:16:39am by DeliciousLemur
(O artigo abaixo indica vários links importantes a serem procurados)
O método científico não é mais aquele?
10/03/2011 11:21Bem-vindos ao questões da ciência. Para inaugurar este espaço, a proposta é discutir um texto polêmico que aponta vulnerabilidades do método científico e que motivou bastante debate na blogosfera americana. O artigo “The Truth Wears Off” (algo como “A verdade se esvaece”) foi publicado na revista New Yorker pelo jornalista Jonah Lehrer, conhecido de alguns leitores brasileiros como autor dos livros Proust foi um neurocientista e O momento decisivo.
O artigo está disponível on-line (em inglês). O leitor que chegar ao fim das quase 5 mil palavras será recompensado com reflexões interessantes sobre como os pesquisadores chegam às suas conclusões e sobre como são construídos os consensos científicos que muitos tomam por verdade.
Lehrer apresenta em seu texto estudos cujos resultados entusiasmaram os cientistas num primeiro momento, mas que não puderam ser reproduzidos com sucesso posteriormente. Da psicologia à genética, passando pela física e pela zoologia, os casos reunidos por ele cobrem vários campos da ciência.
O exemplo mais emblemático é o de um conjunto de medicamentos psiquiátricos aprovados após resultados promissores em várias rodadas de testes clínicos e que, anos depois, tiveram sua eficácia sensivelmente reduzida, como se tivessem deixado de funcionar de repente.
Casos como esses preocupam porque contrariam um dos pilares da boa ciência: os resultados obtidos por uma equipe de pesquisadores devem ser passíveis de reprodução em outros laboratórios. Haveria então algo de errado com o método científico, como sugere o subtítulo do artigo de Lehrer?
Efeito de declínio
Em comum, os exemplos citados por Lehrer manifestam aquilo que ele chama de “efeito de declínio”, que poderia ser definido como a tendência de algumas alegações científicas de receberem cada vez menos respaldo pelos resultados experimentais com o passar do tempo.
O autor levanta algumas hipóteses para explicar esse efeito. Em alguns casos, ele poderia ser creditado a uma distorção estatística – se uma amostragem inadequada tiver sido escolhida nos testes iniciais, os resultados animadores não se repetirão à medida que o estudo for replicado em maior escala. Também contribui para a confusão a tendência dos cientistas de só publicarem resultados satisfatórios, relegando ao esquecimento os experimentos fracassados.
Nem todos se satisfizeram com as explicações aventadas por Lehrer. A publicação de seu artigo motivou várias críticas em blogs de ciência. O autor foi acusado de não levar em conta noções básicas de estatística e dar importância demasiada a um fenômeno com o qual os cientistas lidam corriqueiramente, como sugeriu Orac, pseudônimo do médico-blogueiro titular do Respectful Insolence.
“O efeito de declínio é algo que qualquer médico que trabalhe com pesquisa clínica conhece na prática, embora possa não se referir a ele nesses termos”, afirma o blogueiro em um longo post em que refuta os argumentos do artigo de Lehrer.
Já Steven Novella, do blog Neurologica, acusou o autor de agir como os negacionistas da ciência. “Lehrer se refere a aspectos da ciência que os céticos vêm apontando há anos (…) e chega à conclusão niilista de que é difícil provar qualquer coisa e que, em última análise, ‘ainda temos que escolher no que acreditar’”, argumenta ele.
Muitas das reações ao artigo foram um tanto extremas. Talvez seja mais apropriado adotar o tom moderado do blogueiro Matthew Nisbet, do Age of Engagement, que vê no texto uma grande oportunidade para mostrar para o público leigo como funciona a ciência e como são tênues e transitórias as verdades científicas.
“O artigo de Lehrer é um exemplo notável de jornalismo científico de uma tradição que explica realidades complexas sobre a natureza social da prática científica e sobre como as descobertas científicas são relatadas e percebidas pelo público”, argumenta ele.
A discussão é boa. Voltaremos a ela em breve.