(Empirical) Science can never know for certain that something is true, Scientific knowledge is about there being a high probability that something is true.
Some years ago I had a conversation with a layman about flying saucers. Because I am scientific, I know all about flying saucers! I said, "I don't think there are flying saucers." So my antagonist said, "Is it impossible that there are flying saucers? Can you prove that it's impossible?"
"No," I said, "I can't prove it's impossible. It's just very unlikely." At that he said, "You are very unscientific. If you can't prove it's impossible, then how can you say that it's unlikely?" But that is the way that is scientific. It is scientific only to say what is more likely and what is less likely, and not to be proving all the time the possible and impossible.
To define what I mean, I might have said to him, "Listen, I mean that from my knowledge of the world that I see around me, I think that it is much more likely that the reports of flying saucers are the results of the known irrational characteristics of terrestrial intelligence than of the unknown rational efforts of extra-terrestrial intelligence."
-- Richard Feynman
Note: we are not asking about the causal relationship between evaluations and learning.
Motivation: we want to know if we can use evaluations to predict who teaches well.
“…the global ratings of teacher effectiveness ….were most highly related to mean exam performance.”
What is going on here?
A meta-analysis is a statistical analysis that combines the results of multiple scientific studies.
More observations $\Rightarrow$ smaller effects.
For small samples, many more positive results than negative results.
Unbiasedness: estimator is correct on average.
Consistency: estimator is correct as sample size increases to infinity.
Some results are more likely to get published than others.
Guess which ones?!
If you test multiple hypothesis, some of them will be statistically significant even if there is no true relationship.
Why? Chance!
There are statistical ways to adjust your p-values for multiple hypothesis testing.
[Source: Open Science Collaboration. (2015, Science). Estimating the reproducibility of psychological science.]
(These are focused on fields where experiments are appropriate.)
Related book: https://www.ucpress.edu/book/9780520296954/transparent-and-reproducible-social-science-research
Science has it’s flaws but is still the best game in town when it comes to finding the truth.
There are mechanisms (peer review, meta-analysis, replications) that help us get closer to the truth.
I am usually careful when interpreting the results of a single study.
I am more convinced if a discipline has achieved consensus about a specific result.
For single studies outside my discipline (e.g. that I read about in the news), I am less able to evaluate their quality and probably more sceptical than the average person.
Science has it’s flaws but is still the best game in town when it comes to finding the truth. (Same)
I am usually careful when interpreting the results of a single study. (Same)
I am more convinced if a discipline has achieved consensus about a specific result. (Same. Replication is an important aspect of this.)
I put a large emphasis on how the result was arrived at? (Is it just correlation? Is the sample large?) As a result I am much more sceptical of the use of meta-analysis.
Theory remains important to understand how to interpret the evidence and whether the results of a given study apply to the problem at hand. (If a study finds that deworming works in Kenya, will it work in Bangladesh?)
In economics (and many other sciences), we typically focus on one very narrow question at a time.
For example, we are asking:
We are not:
In disputes, we aim to represent the opposing views as accurately and positively as possible. This is called steel manning.
(Rob: use of models/theory is partly about making sure cannot disagree on the 'views')
We try not to build and attack a straw man – a misrepresentation of someone's position or argument that is easy to defeat.
We try to stay focused on the discussion at hand and avoid attacks based on irrelevant factors such as,
Such attacks are called ad hominem (latin for “to the man”).
As a general rule: we try to assume that the person we’re arguing with is intelligent and has good intentions.
Discussing emotional topics can be difficult. When doing this, I try to be humble and calm.
It is not a sign of defeat to change my mind.
Sometimes both points might be true at the same time or in different situations.
I hold many beliefs.