Linear predictive coding of histone modifications measured by ChIP-seq

The expression of a gene is tightly regulated on multiple biological levels. This includes epigenetic regulation through post-translational modifications of core histone proteins, methylation of DNA and nucleosome positioning. These factors control the accessibility of DNA to regulatory proteins and therefore have profound influence on gene expression. Depending on the histone modification and experimental condition of interest, datasets can be very variable and the peak profiles to be analyzed change in length, shape and frequencies, as well as their location. For example, signals from H3K4me3 or H2A.Z typically localize at the transcription start site (TSS), while others such as H3K36me3 and H3K20me1 often spread over the promoter and gene region, in addition variations in nucleosome free regions lead to a shift in peak location.

However, despite the biological importance of these signal profiles, current bioinformatics tools are limited to the analysis of signal intensities. We present a new analysis strategy and pipeline that is tailored to the analysis of ChIP-Seq data from epigenetics, and focuses on leveraging frequency, shape and peak location information. The method makes use of Linear Predictive Coding (LPC) to model and parameterise ChIP-seq signals. The derived coefficients (parameters) of the LPC model represent a multidimensional spectral feature, which can be used to replace the signal intensity.

An example Matlab script implementing the LPC analysis pipeline is available here: LPC_example.zip (for to non-commercial users only).

An example ChIP-seq signal is included in the file. To run the example, simple run the script from Matlab. This will product figures shown below.

Detailed description of the method is currently under submission:

Beck D, Brandl MB, Boelen L, Unnikrishnan A, Pimanda JE and Wong JWH (2012) Signal analysis for genome wide maps of histone modifications measured by ChIP-seq, doi:10.1093/bioinformatics/bts085

Strand bias and merging

Kernel density estimation

Comparison of LPC and actual density estimates

LPC parameters with different number of coefficients