Wednesday, May 4, 2011

PLINK/SEQ for Analyzing Large-Scale Genome Sequencing Data

PLINK/SEQ is an open source C/C++ library for analyzing large-scale genome sequencing data. The library can be accessed via the pseq command line tool, or through an R interface. The project is developed independently of PLINK but it's syntax will be familiar to PLINK users.

PLINK/SEQ boasts an impressive feature set for a project still in the beta testing phase. It supports several data types (multiallelic, phased, imputation probabilities, indels, and structural variants), and can handle datasets much larger than what can fit into memory. PLINK/SEQ also comes bundled with several reference databases of gene transcripts and sequence & variation projects, including dbSNP and 1000 Genomes Project data.

As with PLINK, the documentation is good, and there's a tutorial using 1000 Genomes Project data.

PLINK/SEQ - A library for the analysis of genetic variation data


  1. Thanks Stephen and Will for your pointing out such a good tools for analyzing genomic variant data.

    I have some simple problems I can not get answer by myself. Could you please share any your experiences/tricks with me?

    It was said in the Plink/seq website that "the PLINK/Seq library can be compiled as an R extension library (and is available for download)". That is very attractive. But,

    (1) I can not find a R library of plink/seq, in the download page. How can I get it? Also, after I installed plink/seq, and I can notfind R library in the installation folder of that.

    (2) Do you think they have not release the R library, although they said they has done it?

  2. According to their downloads page: "We are actively developing an R interface to the PLINK/SEQ library. A beta download will be available shortly."

    Incidentally, does anyone know if pseq can access VCF files off an ftp server? I want to work with 1000 Genomes VCF files without having to download a local copy.


