Deep bisulfite sequencing

The Deep bisulfite sequencing (Deep-BIS) and ChIP-Deep-BIS assays are described in detail in Landan et. al. (in press, 2012). Below are explanations of how the data was mapped and analyzed, as well as downloadable methylation profiles and linkage diagrams.


Deep-BIS paired-end and single-end reads describing ligated and converted amplification products from one or more of the profiled genomic regions (amplicons) were processed as follows:

Step 1: An index of converted 26-mers from the genomic sequences was constructed, including sequences converted on the plus strand and their reverse complements. We filtered the index so that it included only unique hits (the majority of 26-mers).

Step 2: We screened sequence reads and filtered all cases with incomplete non-CpG conversion. Potential non CpG methylation was therefore ignored. 26bp subsequences were then matched with the reference index and perfect or one-mismatch hits were recorded.

Step 3: Reference hits were extended on both sides as long as sequences matched (i.e. until the read end or the ligation point, or until an insertion or deletion error occurred). CpG sites in the matching sequence were identified and their methylation states were called, generating a partial methylation profile for the amplicon. CpG sites with low base call quality (>0.01 error probability), or those not covered by the read, were recorded as missing data.

Step 4: In case of paired-end reads, we combined the extracted partial methylation profiles from both ends if they mapped to opposite sides of the same reference amplicon, since we could assume in such case that no ligation event occurred between them. Paired reads provided longer profiles and enabled correlation analysis of CpG pairs separated by more than 70bp.

Download Deep-BIS methylation profiles
Download ChIP-Deep-BIS methylation profiles

Each .prof file consists of the CpG methylation status for all reads obtained for a given sample. Each line represents data from one read. The columns are as follows:

1. Region name – One of the profiled amplicons, whose details can be found in the Supplementary Tables.
2. CpG methylation status – Methylated CpGs are indicated as 1’s, unmethylated CpGs as 0’s. Asterixes (*) indicate CpGs that were not part of the sequenced read.
3. Strand – Indicates whether the read represented the converted strand (1) or the reverse-complement of the converted strand (-1).

Correlation analysis

To generate methylation linkage diagrams we employed standard correlation analysis of the methylation profiles of each pair of CpGs (in a manner analogous to the computation of genetic linkage between SNPs). Specifically, we counted the number of reads with common methylation (n_11), common lack of methylation (n_00) and different methylation states (n_01, n_10) for each pair of CpGs. We ignored reads in which one of the methylation states was not available. We then computed the average methylation for each CpG (m1 = (n_11+n_10)/N, m2=(n_11+n_01)/N, where N is the total number of reads) and the variance of methylation for each CpG (v1 = m1*(1-m1), v2= m2*(1-m2)). The correlation is then defined as cor = (n_11/N – m1*m2)/sqrt(v1*v2). Note that correlation values are subject to considerable noise when coverage is low, or when m1 or m2 are close to 0 or 1. Methylation linkage data are therefore interpreted over groups of spatially coupled CpGs rather than specific pairs. All diagrams can be obtained below in .png format.

Download Deep-BIS linkage diagrams
Download ChIP-Deep-BIS linkage diagrams

