Pruning relatives from a large dataset

Hi all,

Sanad and I have a problem and we wonder whether anyone has encountered something similar (and knows a solution). In his spiny mouse dataset we have used microsats to measure relatedness between all typed individuals. For most downstream population genetic analyses (e.g. testing for departures from HWE, performing analyses in STRUCTURE etc)  assumptions of individuals being unrelated are made.  Violating these assumptions can cause real problems – see for example the recent paper in MER from Jianlang Wang’s group on what this does to STRUCTURE analyses. In the spiny mice we have quite a lot of pairs (>500) with an r > 0.25). Therefore, we wish to prune individuals from the dataset such that nobody has an r >= 0.25 to anything else. This sounds straightforward, but in practice is quite tricky because there are so many pairwise combinations and if you remove one individual from a dyad at random you may end up throwing away too much data.

Therefore, our question is this.
Does anyone know of an efficient way (i.e. a program) for removing the fewest possible individuals while ensuring no dyads have a r above a  given threshold (we chose 0.25 fairly arbitralily).

Many thanks
Jon & Sanad

Leave a Reply