Saw your most recent thread on twitter. Does your reasoning for why have to do with the twin method assuming that instead of phenotypic r^2 scaling linearly with kinship, phenotypic r^1 does so?
I don't think there's any relationship between the phenotypic null hypothesis and the thing I wrote on twitter about the linear vs quadratic scaling. But I might be missing some subtle clever connection.
Oh I just wrote this here cuz I'm banned on twitter atm. This thought didn't have anything to do with the thing where genes->phenotype1->phenotype 2 disqualifies phenotype 2 from having direct genetic effects in the way that phenotype 1 does.
My idea was just that, if taking 2 length=n vectors with equal variances which correlate at 0.0 (vectors A & B), then if you add vector A + vector A + vector B to create vector C, then the r^2 for the B<->C association will be .33 rather than .33^2=.1089. The twin method observes that '0.5' kinship -> '1.0' kinship results in a 0.x phenotypic correlation -> 0.y phenotypic correlation, meaning that h^2 = 0.y-0.x / 1.0-0.5. But if it's the squared correlation which scales linearly with kinship rather than the unsquared correlation, then h^2 = (0.y)^2-(0.x)^2 / 1.0-0.5.
This would fit with the thread since 0.y - 0.x < (0.y)^2 - (0.x)^2:
(0.y)^2 - (0.x)^2 would be the true genetic effect while 0.y - 0.x would be the 'heritability' which doesn't linearly respond to said effect.
I was wanting to know if this sort of thing was why you thought H^2 and C^2 don't properly scale with genetic effects.
"My idea was just that, if taking 2 length=n vectors with equal variances which correlate at 0.0 (vectors A & B), then if you add vector A + vector A + vector B to create vector C, then the r^2 for the B<->C association will be .33 rather than .33^2=.1089."
If you set C = A + A + B, then that's the same as setting C = 2A + B. But scaling A by 2 is equivalent to scaling A's variance by a factor of 4 (since variance is quadratic and 2^2 = 4). So B would only make up 1/5th of the variance, and therefore the r^2 would be 0.2 (and the r would be sqrt(0.2)=0.45).
"The twin method observes that '0.5' kinship -> '1.0' kinship results in a 0.x phenotypic correlation -> 0.y phenotypic correlation, meaning that h^2 = 0.y-0.x / 1.0-0.5. But if it's the squared correlation which scales linearly with kinship rather than the unsquared correlation, then h^2 = (0.y)^2-(0.x)^2 / 1.0-0.5.
This would fit with the thread since 0.y - 0.x < (0.y)^2 - (0.x)^2:
(0.y)^2 - (0.x)^2 would be the true genetic effect while 0.y - 0.x would be the 'heritability' which doesn't linearly respond to said effect."
The basic issue with family-based genetic studies is that you "pass through" the genetic effect twice.
That is, why do genes lead to a correlation between e.g. identical twins? Because of the causal structure: twin 1 phenotype <- shared genotype -> twin 2 phenotype.
Each of the arrows carries a causal weight of A, and since you've got 2 arrows you have to pass through to get from one to the other, that yields a total correlation of A^2.
Because traditionally it has not been possible to observe genotypes, pretty much everything involving genetics has gotten a lot of A^2s, rather than raw As. For instance with the breeders' equation Δ = A^2*s, if you phenotypically select with a strength of s, then that (through passing through the arrow once) will give you the individuals with A*s higher genetic scores, and those genes will (through passing through the arrow once more) yield A^2*s higher phenotype in the offspring. But if you instead selected on the genotype directly, instead of just on the phenotype, then you could improve your effect size to Δ = A*s, because you don't have to go all the way through the proxy of the phenotype.
"But if it's the squared correlation which scales linearly with kinship rather than the unsquared correlation, then h^2 = (0.y)^2-(0.x)^2 / 1.0-0.5."
(Here 0.y and 0.x refer to rMZ and rDZ respectively, right?)
I don't think this equation is likely meaningful. The reason we'd use h^2 = 2 (rMZ - rDZ) is because a path-tracing argument shows rMZ = A^2 + C^2 and rDZ = 0.5 A^2 + C^2. If you take rMZ^2 - rDZ^2, then that leaves you with rMZ^2 = A^4 + C^4 + 2 A^2 C^2, and rDZ^2 = 0.25 A^4 + C^4 + A^2 C^2, so rMZ^2 - rDZ^2 = 0.75 A^4 + A^2 C^2, which is uninterpretable. The correct equation is A^2 = 2 (rMZ - rDZ) (or equivalently as you write, h^2 = (0.y - 0.x) / (1.0 - 0.5)).
"But if it's the squared correlation which scales linearly with kinship rather than the unsquared correlation"
My tweet thread was not about what scaled with kinship, it was about the specific case of parent-child relations, which have fixed kinship of 0.5.
If we want to generalize it to more general kinship relations, we could perhaps take n generations of inheritance. So like, parent-child as well as grandparent-child etc.
In that case, path-tracing would tell us that if the variable is shared due to vertical phenotypic transmission of strength v, then the n-generational correlation would be r=v^n. Meanwhile, if it is shared due to a genetic influence of A, then the n-generational correlation would be r=A^2 * 0.5^n.
---
Not sure I understood your comment properly, hope this helps?
welp, misunderstood my own code, does not do so. Yep, you were also right. This answers what I wanted to know as well as some stuff I was curious about but didn't ask about.
I now get that I was being a crazy person and that this was not relevant to your thread. Now though, I'm not actually sure this is a critique of the twin results, as I think this is just pointing out sound theoretical application with regards to parent-child associations. Of course, taking heritability figures as a measure of how much genes contribute to parent-child associations will lead to your making overestimates, as this contribution is equal to heritability times the parent-child kinship coefficient. The expected contribution of environment though is a little bit more clever, as like you say, this isn't quadratic; in my view, after the raw phenotypic association has its direct genetic component removed, the remaining correlation should be taken as being the square of the environmental effect of a parent's phenotype on their child's phenotype.
pMZ1 = phenotype of first MZ twin (pMZ2 for phenotype of second)
gMZ1 = genotype of first MZ twin
to predict phenotypic similarity of MZ twins from their genotypic similarity and known twin-derived heritability (h^2); "-h->" = sqrt(h^2):
pMZ1 -h-> gMZ1 -1.0-> gMZ1 -h-> pMZ2
for predicting phenotypic similarity of DZ from genotypic similarity
pDZ1 -h-> gDZ1 -0.5-> gDZ2 -h-> pDZ2
for predicting phenotypic similarity of a parent and child from their genotypic similarity,
pP -h-> gP -0.5-> gC -h-> pC
for predicting phenotypic similarity of a parent and child from the effect of a parent's phenotype on their child's environment:
pP -?-> eC -e-> pC (note that 'e' here could theoretically encompass both things that count in the twin studies as shared environment and others counted as non-shared environment; we're unsure to what degree parents pass on identical environments to both siblings, but we are pretty sure from EEA evidence that, in any causally relevant sense, the degree to which they do is constant for MZ and DZ twin scenarios; I have no idea how this would be measured here unless reverse engineered from the overall phenotypic parent-child correlation and from twin-derived heritability)
Really, I can't see any way to know -e-> or to know pP -> eC in this w/out twin-based heritability, so
"But if it's the squared correlation which scales linearly with kinship rather than the unsquared correlation"
To expand on my other comment:
"In that case, path-tracing would tell us that if the variable is shared due to vertical phenotypic transmission of strength v, then the n-generational correlation would be r=v^n. Meanwhile, if it is shared due to a genetic influence of A, then the n-generational correlation would be r=A^2 * 0.5^n."
In the equation r=A^2 * 0.5^n, the value 0.5^n represents the relatedness after n generations, so I agree that r is linearly related to relatedness with a coefficient of A^2. And therefore that 2 (rMZ - rDZ) is an appropriate way of getting A^2. It's just that my twitter thread was not about scaling with relatedness, but instead scaling with causal effect size.
Saw your most recent thread on twitter. Does your reasoning for why have to do with the twin method assuming that instead of phenotypic r^2 scaling linearly with kinship, phenotypic r^1 does so?
I don't think there's any relationship between the phenotypic null hypothesis and the thing I wrote on twitter about the linear vs quadratic scaling. But I might be missing some subtle clever connection.
Oh I just wrote this here cuz I'm banned on twitter atm. This thought didn't have anything to do with the thing where genes->phenotype1->phenotype 2 disqualifies phenotype 2 from having direct genetic effects in the way that phenotype 1 does.
My idea was just that, if taking 2 length=n vectors with equal variances which correlate at 0.0 (vectors A & B), then if you add vector A + vector A + vector B to create vector C, then the r^2 for the B<->C association will be .33 rather than .33^2=.1089. The twin method observes that '0.5' kinship -> '1.0' kinship results in a 0.x phenotypic correlation -> 0.y phenotypic correlation, meaning that h^2 = 0.y-0.x / 1.0-0.5. But if it's the squared correlation which scales linearly with kinship rather than the unsquared correlation, then h^2 = (0.y)^2-(0.x)^2 / 1.0-0.5.
This would fit with the thread since 0.y - 0.x < (0.y)^2 - (0.x)^2:
(0.y)^2 - (0.x)^2 would be the true genetic effect while 0.y - 0.x would be the 'heritability' which doesn't linearly respond to said effect.
I was wanting to know if this sort of thing was why you thought H^2 and C^2 don't properly scale with genetic effects.
"My idea was just that, if taking 2 length=n vectors with equal variances which correlate at 0.0 (vectors A & B), then if you add vector A + vector A + vector B to create vector C, then the r^2 for the B<->C association will be .33 rather than .33^2=.1089."
If you set C = A + A + B, then that's the same as setting C = 2A + B. But scaling A by 2 is equivalent to scaling A's variance by a factor of 4 (since variance is quadratic and 2^2 = 4). So B would only make up 1/5th of the variance, and therefore the r^2 would be 0.2 (and the r would be sqrt(0.2)=0.45).
"The twin method observes that '0.5' kinship -> '1.0' kinship results in a 0.x phenotypic correlation -> 0.y phenotypic correlation, meaning that h^2 = 0.y-0.x / 1.0-0.5. But if it's the squared correlation which scales linearly with kinship rather than the unsquared correlation, then h^2 = (0.y)^2-(0.x)^2 / 1.0-0.5.
This would fit with the thread since 0.y - 0.x < (0.y)^2 - (0.x)^2:
(0.y)^2 - (0.x)^2 would be the true genetic effect while 0.y - 0.x would be the 'heritability' which doesn't linearly respond to said effect."
The basic issue with family-based genetic studies is that you "pass through" the genetic effect twice.
That is, why do genes lead to a correlation between e.g. identical twins? Because of the causal structure: twin 1 phenotype <- shared genotype -> twin 2 phenotype.
Each of the arrows carries a causal weight of A, and since you've got 2 arrows you have to pass through to get from one to the other, that yields a total correlation of A^2.
Because traditionally it has not been possible to observe genotypes, pretty much everything involving genetics has gotten a lot of A^2s, rather than raw As. For instance with the breeders' equation Δ = A^2*s, if you phenotypically select with a strength of s, then that (through passing through the arrow once) will give you the individuals with A*s higher genetic scores, and those genes will (through passing through the arrow once more) yield A^2*s higher phenotype in the offspring. But if you instead selected on the genotype directly, instead of just on the phenotype, then you could improve your effect size to Δ = A*s, because you don't have to go all the way through the proxy of the phenotype.
"But if it's the squared correlation which scales linearly with kinship rather than the unsquared correlation, then h^2 = (0.y)^2-(0.x)^2 / 1.0-0.5."
(Here 0.y and 0.x refer to rMZ and rDZ respectively, right?)
I don't think this equation is likely meaningful. The reason we'd use h^2 = 2 (rMZ - rDZ) is because a path-tracing argument shows rMZ = A^2 + C^2 and rDZ = 0.5 A^2 + C^2. If you take rMZ^2 - rDZ^2, then that leaves you with rMZ^2 = A^4 + C^4 + 2 A^2 C^2, and rDZ^2 = 0.25 A^4 + C^4 + A^2 C^2, so rMZ^2 - rDZ^2 = 0.75 A^4 + A^2 C^2, which is uninterpretable. The correct equation is A^2 = 2 (rMZ - rDZ) (or equivalently as you write, h^2 = (0.y - 0.x) / (1.0 - 0.5)).
"But if it's the squared correlation which scales linearly with kinship rather than the unsquared correlation"
My tweet thread was not about what scaled with kinship, it was about the specific case of parent-child relations, which have fixed kinship of 0.5.
If we want to generalize it to more general kinship relations, we could perhaps take n generations of inheritance. So like, parent-child as well as grandparent-child etc.
In that case, path-tracing would tell us that if the variable is shared due to vertical phenotypic transmission of strength v, then the n-generational correlation would be r=v^n. Meanwhile, if it is shared due to a genetic influence of A, then the n-generational correlation would be r=A^2 * 0.5^n.
---
Not sure I understood your comment properly, hope this helps?
for some reason when I scale A by 2 in R that just scales A's variance by 2 for me :/
How did you scale it by 2? Can you share the code?
welp, misunderstood my own code, does not do so. Yep, you were also right. This answers what I wanted to know as well as some stuff I was curious about but didn't ask about.
I now get that I was being a crazy person and that this was not relevant to your thread. Now though, I'm not actually sure this is a critique of the twin results, as I think this is just pointing out sound theoretical application with regards to parent-child associations. Of course, taking heritability figures as a measure of how much genes contribute to parent-child associations will lead to your making overestimates, as this contribution is equal to heritability times the parent-child kinship coefficient. The expected contribution of environment though is a little bit more clever, as like you say, this isn't quadratic; in my view, after the raw phenotypic association has its direct genetic component removed, the remaining correlation should be taken as being the square of the environmental effect of a parent's phenotype on their child's phenotype.
pMZ1 = phenotype of first MZ twin (pMZ2 for phenotype of second)
gMZ1 = genotype of first MZ twin
to predict phenotypic similarity of MZ twins from their genotypic similarity and known twin-derived heritability (h^2); "-h->" = sqrt(h^2):
pMZ1 -h-> gMZ1 -1.0-> gMZ1 -h-> pMZ2
for predicting phenotypic similarity of DZ from genotypic similarity
pDZ1 -h-> gDZ1 -0.5-> gDZ2 -h-> pDZ2
for predicting phenotypic similarity of a parent and child from their genotypic similarity,
pP -h-> gP -0.5-> gC -h-> pC
for predicting phenotypic similarity of a parent and child from the effect of a parent's phenotype on their child's environment:
pP -?-> eC -e-> pC (note that 'e' here could theoretically encompass both things that count in the twin studies as shared environment and others counted as non-shared environment; we're unsure to what degree parents pass on identical environments to both siblings, but we are pretty sure from EEA evidence that, in any causally relevant sense, the degree to which they do is constant for MZ and DZ twin scenarios; I have no idea how this would be measured here unless reverse engineered from the overall phenotypic parent-child correlation and from twin-derived heritability)
Really, I can't see any way to know -e-> or to know pP -> eC in this w/out twin-based heritability, so
pP1 -?-> eC -e-> pC =
pP1 -> pC minus
gP1 -h-> gP1 -0.5-> gC -h-> pC -
"But if it's the squared correlation which scales linearly with kinship rather than the unsquared correlation"
To expand on my other comment:
"In that case, path-tracing would tell us that if the variable is shared due to vertical phenotypic transmission of strength v, then the n-generational correlation would be r=v^n. Meanwhile, if it is shared due to a genetic influence of A, then the n-generational correlation would be r=A^2 * 0.5^n."
In the equation r=A^2 * 0.5^n, the value 0.5^n represents the relatedness after n generations, so I agree that r is linearly related to relatedness with a coefficient of A^2. And therefore that 2 (rMZ - rDZ) is an appropriate way of getting A^2. It's just that my twitter thread was not about scaling with relatedness, but instead scaling with causal effect size.
Doesn't the question contain the answer?