Getting Things Done in Genetics & Bioinformatics Research
Tuesday, November 23, 2010
Randomly Select Subsets of Individuals from a Binary Pedigree .fam File
I'm working on imputing GWAS data to the 1000 Genomes Project data using MaCH. For the model estimation phase you only need ~200 individuals. Here's a one-line unix command that will pull out 200 samples at random from a binary pedigree .fam file called myfamfile.fam:
for i in `cut -d ' ' -f 1-2 myfamfile.fam | sed s/\ /,/g`; do echo "$RANDOM $i"; done | sort | cut -d' ' -f 2| sed s/,/\ /g | head -n 200
Redirect this output to a file, and then run PLINK using the --keep option with this new file.