Target gene expression levels and competition between transfected and endogenous microRNAs are strong confounding factors in microRNA high-throughput experiments
© Saito and Sætrom; licensee BioMed Central Ltd. 2012
Received: 2 August 2011
Accepted: 10 February 2012
Published: 10 February 2012
MicroRNA (miRNA) target genes tend to have relatively long and conserved 3' untranslated regions (UTRs), but to what degree these characteristics contribute to miRNA targeting is poorly understood. Different high-throughput experiments have, for example, shown that miRNAs preferentially regulate genes with both short and long 3' UTRs and that target site conservation is both important and irrelevant for miRNA targeting.
We have analyzed several gene context-dependent features, including 3' UTR length, 3' UTR conservation, and messenger RNA (mRNA) expression levels, reported to have conflicting influence on miRNA regulation. By taking into account confounding factors such as technology-dependent experimental bias and competition between transfected and endogenous miRNAs, we show that two factors - target gene expression and competition - could explain most of the previously reported experimental differences. Moreover, we find that these and other target site-independent features explain about the same amount of variation in target gene expression as the target site-dependent features included in the TargetScan model.
Our results show that it is important to consider confounding factors when interpreting miRNA high throughput experiments and urge special caution when using microarray data to compare average regulatory effects between groups of genes that have different average gene expression levels.
KeywordsmicroRNA targets siRNA microarray proteomics PAR-CLIP
MicroRNAs (miRNAs) are an abundant class of small non-coding RNAs (ncRNAs) that negatively regulate protein-coding genes [1, 2]. MicroRNAs are involved in many important regulatory roles [3–5], and current estimates indicate that miRNAs regulate at least 60% of the human protein-coding genes .
In animals, functional miRNA sites preferentially reside in 3' UTRs , and these sites are generally well conserved . Moreover, some ubiquitously expressed genes, such as housekeeping genes, have shorter 3' UTRs to potentially avoid miRNA regulation [2, 8], whereas proliferating cells express mRNAs with shortened 3' UTRs to avoid miRNA regulation . Hence, miRNA target genes are likely to have relatively long and conserved 3' UTRs. However, to what degree the length and conservation of 3' UTR contribute to miRNA targeting is still poorly understood. To illustrate, data from Argonaute RNA immunoprecipitation (RIP) in human and fly indicate that miRNAs target short 3' UTRs [10, 11], whereas microarray data from miRNA transfection experiments and sequence data from Argonaute cross-linked immunoprecipitation (CLIP) experiments indicate that miRNAs target long 3' UTRs . Wen and colleagues also found that target site conservation was more important for CLIP-supported target sites than for targets that were down-regulated in the transfection experiments . Moreover, our previous study  showed that genes with a 3' UTR longer than 4,000 nucleotides were less affected by ectopically expressed miRNAs than were genes with a shorter 3' UTR, and that target site conservation had little or no effect on the performance of our miRNA target prediction algorithm.
In addition to 3' UTR length and conservation, several other gene characteristics also affect miRNA regulation. For example, many miRNAs are known to regulate genes involved in cell development processes . Another example is that miRNAs appear to preferentially target genes with high CpG promoters . Also, as highly expressed genes transcribe a large number of mRNAs, the miRNA regulation of those mRNAs can be different from those of weakly expressed genes, although current analyses disagree on whether miRNAs affect highly expressed genes more or less than medium or lowly expressed genes [15, 16].
High throughput experiments based on microarrays or proteomics have been important for characterizing miRNA regulation [17–20]. Although these and more recent comparative studies [10, 12] found that some features such as seed complementarity and seed strength are consistently important for miRNA targeting, other features such as 3' UTR length and site conservation mentioned above differ between studies and technologies. There are at least three potential features that may contribute to the difference between miRNA high-throughput experiments. First, the number of genes that cover a microarray experiment is usually much larger than that of a proteomics experiment. For instance, Baek et al.  used both microarray and proteomics for their miRNA target gene analysis, and the number of genes detected for microarray and proteomics samples were about 20,000 and 2,000, respectively. Second, transfected (exogenous) miRNAs compete with endogenous miRNAs for the protein complex needed for miRNA regulation . Therefore, genes targeted by endogenous miRNAs but not by the exogenous miRNA can be up-regulated. Third, the effect of miRNA regulation can be diluted by target abundance, which means that each target gene is less down-regulated when the miRNA has many highly expressed compared with a few lowly expressed target genes . Common for these features is that they are target site-independent, but gene context-dependent.
In this study, we have investigated the effects on miRNA targeting of several such target site-independent but gene context-dependent features. We categorized these features into three types: (i) target mRNA features, such as 3' UTR length, 3' UTR sequence conservation, and mRNA expression level; (ii) sample features, such as the competition and dilution effects; and (iii) platform features, such as different types of technologies and experimental methods. We found that two features - the competition effect between endogenous miRNAs and transfected miRNAs, and mRNA expression level - have a strong impact on results from high throughput experiments. Both features are confounding factors that explain many of the previously reported differences between different studies and high throughput technologies. It is important to consider these confounding factors in order to analyze accurately and robustly different types of miRNA high-throughput experiments and to infer correctly the characteristics of miRNA regulation.
Results and discussion
Target mRNA features: ectopic miRNA expression differentially affects subgroups of genes with differing 3' UTR length, 3' UTR conservation, and mRNA expression level
As we expected that mRNAs targeted by miRNA have long and conserved 3' UTRs, we wanted to examine how these characteristics actually affect miRNA regulation. Specifically, we wondered whether there was a difference in how different gene groups, such as genes with long, medium, or short 3' UTRs or genes with high, medium, or low 3' UTR conservation, were affected by ectopic miRNA expression. To address this question, we used microarray and proteomics data from five and two miRNA transfection experiments, respectively, and microarray data from two miRNA inhibition experiments and analyzed the differences in gene expression log ratio values of predicted targets in the different gene groups (see Methods). We also included microarray data from a small interfering RNA (siRNA) transfection experiment because siRNAs behave as miRNAs in terms of target recognition [23, 24]. In contrast to evolutionary selected miRNA targets, however, targets for artificially designed exogenous siRNAs should be evolutionary unbiased. The siRNA dataset, therefore, served as an estimate of the general regulatory effects of over-expressing small RNAs. In total, we used 10 different types of miRNA high-throughput experiments, which covered 140 samples and 70 miRNAs and siRNAs (Additional file 1, Table S1 and S2).
Target mRNA features: predicted target genes with short 3' UTRs are more down-regulated than genes with long 3' UTRs
We have previously reported that genes with very long 3' UTRs (> 4,000 nts) are poor targets for ectopically expressed miRNAs or siRNAs . Consistent with this, there was no experiment that showed significant down-regulation for the Very Long sub-group (Figure 1C, Additional file 1, Table S3). Interestingly, genes with short or medium 3' UTRs (Short, Med Short, and Med Long) were significantly down-regulated compared to genes with long 3' UTRs (Long and Very Long) among nearly all of the experiments. We saw the strongest effect for 3' UTRs with sub-groups Med Short and Med Long, as one or both groups were significantly down-regulated compared to other predicted targets in eight of ten experiments (Additional file 1, Table S3). Between these two sub-groups, Med Short had lower P-values than Med Long in seven of ten experiments. Using the one-sided Kolmogorov-Smirnov test as an alternative test method also supported the same trends (Additional file 1, Table S6). These results indicate that miRNA or siRNA target genes with short 3' UTRs were generally more down-regulated than genes with long 3' UTRs.
Target mRNA features: conservation in 3' UTR regions has inconsistent regulatory effects
Although the sub-group of highly conserved 3' UTR regions was strongly down-regulated in five of ten experiments, the non-conserved sub-group was down-regulated in two of ten experiments (Figure 1D; Additional file 1, Table S4). One possible explanation for this inconsistent pattern could be the conservation levels of the miRNAs used in the experiments. Since highly conserved miRNAs tend to have more target genes than less-conserved miRNA, highly conserved miRNAs may be more affected by the reported target dilution effect . However, we found no significant correlations between the degree of miRNA conservation and the down-regulation effects (data not shown). Moreover, the siRNAs used in the Jackson experiment should be unaffected by 3' UTR conservation, but these siRNAs strongly down-regulated highly conserved 3' UTRs compared with other 3' UTRs . We observed the same inconsistency for 3' UTR conservation when testing with the one-sided Kolmogorov-Smirnov test (Additional file 1, Table S7). Thus, although highly conserved 3' UTRs in some cases can be better targets for ectopically expressed small RNAs, the inconsistent regulatory effects suggest that other factors are more important.
Target mRNA features: predicted target genes with high mRNA expression levels are more down-regulated than the genes with low mRNA expression levels
A recent study showed that mRNA expression affects siRNA efficacy such that lowly expressed mRNAs are less affected by siRNAs than are highly expressed mRNAs . Consistent with these results, siRNA and miRNA target genes with high or medium expression as measured by RNA-seq  were significantly down-regulated compared to the rest of the sub-groups in the microarray experiments (Figure 1E). We saw a very strong down-regulatory effect on mRNA expression with sub-groups Very High, High, and Medium, as one or more sub-groups were significantly down-regulated compared to other predicted targets in seven of ten experiments (Additional file 1, Table S5). Moreover, there was no experiment that showed significant P-values for sub-groups Very Low and NoExp. One-sided Kolmogorov-Smirnov tests also gave similar results (Additional file 1, Table S8). These results indicate that predicted miRNA or siRNA target genes with high or moderate expression levels are generally more down-regulated than genes with low expression levels.
Target mRNA features: comparison tests on individual samples strongly support that the differences between subgroups are common for many miRNAs
Multiple comparison tests at an individual sample level support the experimental level test results
3' UTR Length
3' UTR Consv
Sample features: competition with endogenous miRNAs impacts exogenous miRNAs' targeting of genes with long 3' UTRs
Small RNA transfection perturbs endogenous miRNA regulation such that genes targeted by endogenous miRNAs can become up-regulated . We, therefore, hypothesized that genes with long 3' UTRs had a net unaffected expression or reduced response because these genes potentially had more endogenous miRNA target sites than genes with short 3' UTRs. To test this hypothesis, we separated the genes into four groups based on whether the genes were predicted to be targeted by the exogenous miRNAs and by highly expressed endogenous miRNAs. Both the first (T +Endo) and the second (T -Endo) groups consist of genes targeted by exogenous miRNAs, whereas the third (NT +Endo) and the fourth (NT -Endo) groups consist of genes without exogenous miRNA targets. The second word in the group names indicates that the group contains either genes targeted by endogenous miRNAs (+Endo) or genes without endogenous miRNA targets (-Endo) (Additional file 1, Table S12; see Methods). Specifically, we used two of the four groups, T -Endo and T +Endo, for statistical analysis. We only used eight transfected experiments and excluded two inhibition experiments because Selbach locked nucleic acid (LNA) and Linsley 2'-O-methyl (OME) experiments inhibited endogenous miRNAs.
Sample features: PAR-CLIP data show that endogenous miRNAs target most mRNAs with long 3' UTR
PAR-CLIP data show that endogenous miRNA tend to target mRNAs with long 3' UTR
Sample features: competition with endogenous miRNAs is not affected by evolutionary conservation levels of 3' UTRs
The test results of mRNA target features showed no consistent evidence that genes in any sub-group of 3' UTR conservation were significantly more down-regulated than were the genes in the rest of the sub-groups (Additional file 1, Table S4). We, therefore, did not expect to find consistent patterns of interaction between the competition effect and 3' UTR conservation. Indeed, when we tested whether T -Endo genes were more down-regulated than T +Endo genes in the sub-groups of 3' UTR conservation, the test showed no consistent trends across sub-groups; rather, the tests indicated consistent differences between experiments, as all sub-groups were significant on the Lim, Jackson, and Selbach datasets (Additional file 1, Table S14). These results suggest that competition between endogenous and exogenous miRNAs is unaffected by the levels of evolutionary conservation on 3' UTRs.
Sample features: competition with endogenous miRNAs has a strong impact on genes with medium or higher mRNA expression levels
As endogenous miRNA regulation mostly reduces target mRNA expression [8, 26, 27], miRNA target genes with low mRNA expression levels would more likely be under strong regulation by endogenous miRNAs. Similarly, miRNA target genes with high mRNA expression would less likely be under strong endogenous miRNA regulation. Consequently, we expected lowly expressed mRNAs to be less affected by competition with exogenous miRNAs than were highly expressed mRNAs. Indeed, T -Endo genes were significantly more down-regulated than were T +Endo genes for high or moderate mRNA expression levels (Very High, High, and Medium; Figure 2B; Additional file 1, Table S15). Moreover, no experiments showed significant down-regulation for low or no mRNA expression levels (Very Low and NoExp). As, according to our results, endogenous miRNAs preferentially target genes with long 3' UTRs, taken together, these results support that the genes with a very long 3' UTR are less affected by exogenous miRNAs because most of them are under the influence of endogenous miRNAs.
Sample features: tests on individual samples support that the competition effect is strongest for genes with strong mRNA expression levels
To analyze further the trends of competition effects on 3' UTR length, 3' UTR conservation, and mRNA expression level, we tested the difference of miRNA down-regulation between T -Endo and T +Endo genes on the 140 individual samples instead of the collective experiments. To compare the trends of the test results between experiments and samples, we again created two types of counts: (i) the number of experiments that had significant P-values from the test results of the experiments, and (ii) the number of experiments that had their Sample level scores greater than either 0.5 or 0 (Additional file 1, Table S16). With a strict threshold of Sample level scores (> 0.5), the tests on samples showed no strong support for the trend observed when tested on the experiments. With a less strict threshold of Sample level scores (> 0), the tests on samples supported the trend for mRNA expression level (Pearson r = 0.86; P-value = 0.03), whereas the 3' UTR length showed some, but insignificant, support of the experiment-level results (r = 0.47; P-value = 0.42). The 3' UTR conservation showed no correlation with the experiment-level results (r = -0.10; P-value = 0.9). Together, the results indicated that some samples support the result from the collective experiments that the competition effect more strongly affects genes with high compared with low mRNA expression level.
Platform features: microarray datasets can have cryptic bias towards detecting differential expression in highly expressed genes
A recent study showed that mRNA expression affects siRNA efficacy such that lowly expressed mRNAs are less affected by siRNAs than are highly expressed mRNAs . Consistent with these results, siRNA and miRNA target genes with high or medium expression, as measured by RNA-Seq , were significantly down-regulated compared to the rest of the sub-groups in the microarray experiments (Figure 1E). However, these trends were not apparent in the proteomics datasets, which instead showed significant effects on the lowly expressed genes (two of three experiments; Additional file 1, Table S5).
As the proteomics experiments relied on detecting and identifying individual proteins, whereas microarrays use hybridization signals to infer gene expression, we reasoned that the differences might be explained by differences in the sensitivity of the two methods to detect highly and lowly expressed genes or to detect expression changes for highly and lowly expressed genes. Whereas the microarray signals were evenly distributed among all subgroups of gene expression levels, the proteomics data showed bias towards highly expressed genes and detected few very lowly expressed genes (Figure 3C). These trends were apparent in the other microarray and proteomics datasets as well (Additional file 1, Figure S1). When considering expression changes, however, the proteomics data showed little bias and detected down-regulated genes independently of their expression level (Figure 3D). The microarray data, in contrast, showed strong expression-related bias, such that down-regulated genes were enriched among the highly expressed genes and depleted among the lowly expressed genes.
Importantly, these results were independent of miRNA targeting, as subdividing the genes into four groups based on whether the genes contained predicted target sites for the exogenous miRNAs and highly expressed endogenous miRNAs gave similar results (Additional file 1, Figure S2). This grouping further illustrated the effects of miRNA targeting, however. Specifically, consistent with miRNAs inhibiting mRNA expression, the most highly expressed genes constituted a smaller percentage of the genes predicted to be targets for endogenous miRNA (T +Endo and NT +Endo, Figure 3C) than of the genes predicted not to be targets (T -Endo and NT -Endo; Figure 3C).
In summary, the exogenous miRNAs' apparent strong effects on highly expressed genes within the microarray data can be explained by technology-related artifacts. Specifically, our results show that although microarrays detect lowly expressed genes, arrays have lower sensitivity for identifying differential expression for such genes than for highly expressed genes. This is consistent with previous results . Proteomics data, in contrast, are biased towards highly expressed genes, but detect differential expression independently of gene expression levels. In other words, proteomics fails to detect many genes with a low expression level but the sensitivity in detecting differentially expressed genes is similar among different expression levels, whereas microarrays can detect genes with a low expression level but the sensitivity in detecting differential expression is low for these genes compared with highly expressed genes. Because of these differences, microarray but not proteomics data will show that miRNAs on average have a stronger effect on highly than on lowly expressed genes.
Platform and Sample features: competition with endogenous miRNAs has a stronger impact on regulation than has dilution from high overall target expression
Targeting by endogenous miRNAs influences both the genes' expression levels before transfection (Figure 3D) and response to exogenous miRNAs after transfection (Figure 2). We therefore reasoned that part of the observed correlation could be related to interactions between endogenous miRNAs and the exogenous, ectopically expressed miRNA because some exogenous miRNA target genes also targeted by endogenous miRNAs were potentially up-regulated because of the competition effects. Such interactions could be further compounded by the microarrays' bias towards detecting differential expression among highly expressed genes (Figure 3E). Arvey and colleagues based their conclusions on microarray data and they also reported that for most transfected miRNAs or siRNAs (166 of 181 tested; P-value = 2e-33, sign test), highly expressed genes are more down-regulated than are lowly expressed genes . To eliminate such interactions between the endogenous and exogenous miRNAs, we calculated the correlation between the total number of miRNA sites and the average log ratio of the genes that were predicted only to be targets for the exogenous miRNAs. The correlation was not significant (r = 0.22; P-value = 0.067; Figure 4B), indicating that when considering the average effects of exogenous, ectopically expressed miRNAs, endogenous miRNA regulation (competition ) is more important than overall target expression levels (dilution).
Regression analysis: linear regression confirmed trends from individual feature analyses
Nine factors for linear regression
Target's 3' UTR length
Target's overall 3' UTR conservation
Target's mRNA expression level as measured by RNA-Seq.
The number of miRNA or siRNA target sites in the target's 3' UTR
The number of target sites for endogenous miRNAs in the target's 3' UTR
The total number of miRNA or siRNA target sites in all the potential target 3' UTRs
Binary variable stating whether the target's fold change was measured by microarray or proteomics
Binary variable stating whether the target's fold change was measured after miRNA transfection or inhibition
mRNA level TargetScan scores. Only used in the last regression model.
Regression analysis: factor crossing in the linear model confirmed that several factors have combined effects on miRNA down-regulation
Based on our analyses of the individual target, sample, and experiment features, we expected that some factor pairs such as mRNA expression (exp) and proteomics compared with microarray experiments (p_ma), the number of target sites for the exogenous small RNA (#site_m) and endogenous miRNA (#endo_m), and 3' UTR conservation (cs3) and transfection compared with inhibition experiments (e_oe) had strong combined effects. To investigate this possibility, we extended our simple linear model to include all second-order interactions, such as ln3 * cs3, ln3 * exp, and so on. This extension slightly increased the R2 of the model (R2 = 0.047; adjusted R2 = 0.046).
Although directly comparing coefficients was less straightforward than for the simpler model without factor interactions because of different range distributions for combined factors, there were still several factors that had significantly higher or lower coefficients than the others (Additional file 1, Figure S4 and Table S18). The number of target sites for the ectopically expressed miRNA or siRNA (#site_m) had the highest coefficient, whereas the interaction between #site_m and the number of target sites for endogenous miRNAs (#endo_m) had the most negative and most significant coefficient. Consistent with the importance of the number of target sites for regulation, most factor combinations that included #site_m were significant. These results confirm that the number of target sites (#site_m) is the most important factor to explain log ratio changes, but that several other interacting factors and especially competition with endogenous miRNAs (#endo_m) influence the resulting target knock-down.
Of the other factors, target expression (exp) was still among the most significant single factor. Moreover, consistent with our previous results, the coefficients for exp's interaction with the two experiment factors (p_ma and e_oe) were strong, significant, and positive. In contrast, exp by itself or interacting with 3' UTR length (ln3) had a significant negative coefficient. Consequently, our results suggest that when experiment-related features are factored out, miRNAs do more strongly affect lowly than highly expressed genes.
As for the remaining features, both ln3 and cs3 showed significant interactions with expected features. Especially cs3 showed strong and significant interactions with many factors (all except ln3, exp, and p_ma), which likely explains the conflicting results for 3' UTR conservation in the individual feature analyses.
Regression analysis: target site-dependent features, as modeled by TargetScan scores, show strong interactions with site-independent features
So far, our regression models and analyses considered all stringent seed sites as equally important target sites (see Methods), but different features of individual target sites, such as the seed type, the site's AU context, the site's location in 3' UTR, and additional pairing between the miRNA 3' end and mRNA, do affect miRNA targeting . We therefore extended our regression model to include TargetScan  scores (see Methods) to determine how the predicted effects of individual miRNA target sites affect and interact with the mRNA, sample, and platform level features in our model.
Taking all second-order interactions into account further improved the model (R2 = 0.093; adjusted R2 = 0.092) and TargetScan score combined with most other factors (except #site_m) showed significant coefficients (Additional file 1, Figure S5 and Table S20). Together, these results show that although target site-dependent features, such as those modeled by TargetScan, are important for miRNA targeting, target mRNA, sample, and platform level features are also important for correctly interpreting miRNA high-throughput experiments. Specifically, in our analyses, target site-dependent and -independent features explain about the same amount of variation in target gene expression.
CpG frequency, and developmental and housekeeping genes: overall gene expression is a major confounding factor when analyzing microarray data
P-values of multiple Wilcoxon rank-sum tests on three miRNA target features for CpGH, House-keeping, and Developmental genes
3' UTR Length
3' UTR Consv
3' UTR Length
3' UTR Consv
3' UTR Length
3' UTR Consv
We analyzed the average regulatory effects that ectopically expressed miRNAs or siRNAs have on large gene sets and identified two strong factors. First, competition between endogenous miRNAs and the ectopically expressed RNAs have a strong impact on the targets' regulatory response. Genes with very long 3' UTRs, for example, are likely targeted by endogenous miRNAs and therefore are less affected by exogenous miRNAs than are genes with shorter 3' UTRs. Second, target gene expression is a strong confounding factor when analyzing microarray experiments. Target genes with strong expression levels were significantly down-regulated compared to other expressed genes only in the microarray experiments. One explanation is that some genes are highly expressed simply because they are less targeted by endogenous miRNAs, and, therefore, ectopically expressed miRNAs have stronger down-regulating effects on these genes. Consistent with this explanation, miRNA transfection experiments give stronger target expression changes than do miRNA inhibition experiments.
Another explanation is that microarrays, compared with high throughput proteomics, report significant expression data for many more lowly expressed genes. This technological difference means that microarrays can detect more differentially expressed genes than can high throughput proteomics. However, microarrays generally detect less differential expression for lowly expressed genes than for highly expressed genes. Consequently, microarrays will find a smaller fraction of the lowly expressed genes than of the highly expressed genes to be differentially expressed. The result is that when using microarray data to compare average down-regulation of miRNA-targets among housekeeping genes and developmentally regulated genes - genes that have high and low average expression levels, respectively - the average down-regulation is stronger for housekeeping genes than for developmental genes. This difference is contrary to the general consensus within the field  and to our analyses of high-throughput proteomics data which indicate that developmentally regulated genes are more likely miRNA targets than are housekeeping genes. Supporting this, our regression analysis shows that when such experiment-related confounding factors have been taken into account, miRNAs more effectively knock down lowly expressed than highly expressed genes.
These results suggest that it is important to consider multiple factors when it comes to assessing miRNA targeting effects. One example of this relates to the so-called target dilution effect. A previous analysis has reported that the total expression level of target candidates for ectopically expressed miRNAs affects the targets' average knockdown such that high total expression gives low average knockdown . Our analyses, however, show that some of this dilution effect can be explained by interactions with endogenous miRNAs. Consequently, it is very important to consider what genes are already targeted by endogenous miRNAs when designing and interpreting high throughput miRNA or siRNA experiments.
In summary, our results can explain the results from several recent studies that have analyzed features that are important for miRNA regulation and found that the importance of 3' UTR length, conservation, and target gene expression depend on the technology used to measure miRNA targeting. Our results urge special caution when using microarray data to compare average regulatory effects between groups of genes that have different average gene expression levels, such as high and low CpG genes and housekeeping and developmentally regulated genes.
miRNA annotation and miRNA seed types
We downloaded the annotations of human miRNAs, mature miRNAs, and miRNA families from miRBase (release 12.0) .
Microarray and proteomics datasets
We downloaded four microarray datasets, the Jackson , Lim , Grimson , and Linsley  from the Gene Expression Omnibus (GEO) database [GEO:GSE5814, GEO:GSE2075, GEO:GSE8501, GEO:GSE6838] , and two proteomics datasets, Selbach  and Baek , from the original publications along with the corresponding microarray datasets. In total, we used six microarray and two proteomics datasets. Samples from both microarray and proteomics used in this study are listed in Additional file 1, Table S2.
3' UTR sequence and conservation
We downloaded the RefSeq transcripts (hg18), human chromosome sequences, and multiz 17-way  for conserved sequences from the University of California, Santa Cruz (UCSC) Genome Browser . The positional data used to generate nucleotide sequences were the exon positions from RefSeq for the 3' UTR region. We selected the longest 3' UTRs when a RefSeq entry had multiple transcripts. In addition to the human sequences, we generated 3' UTR sequences conserved in human, mouse, and rat (HMR) from multiz 17-way.
mRNA expression in HeLa
We used replicate 1 of the ENCODE Caltech RNA-Seq data  from USCS for the mRNA expression in HeLa.
Housekeeping and developmental genes
Endogenous miRNAs in HeLa
We used the Mammalian microRNA Expression Atlas  to define the endogenous miRNAs in HeLa. For endogenous miRNAs, we selected the top ten highly expressed miRNA families and used the miRNAs that belong to these miRNA families. We used three stringent seed types - 8mer, 7mer-A1, and 7mer-m8 - to search the candidate sites of the endogenous miRNAs.
We obtained PAR-CLIP high-throughput sequencing data of all four AGO proteins from the GEO database [GEO:GSE21918] . We used positional information downloaded with the sequence reads and mapped these positions on 3' UTR regions of all RefSeq genes. We considered 3' UTRs that had at least one site with the number of mapped reads greater than or equal to five to have an AGO binding site.
Data preparation for microarray and proteomics data
We used pre-processed data either from GEO or from the original publications. All log-ratio values that were pre-computed in log2 were transformed to log10. Log-ratio values of two inhibition experiments, Selbach LNA and Linsley OME, were negated because genes with positive log-ratio values were potential miRNA targets for these experiments.
Predicted miRNA or siRNA targets
We separated the genes of each high throughput experiment by target prediction into 'Target' and 'Non-target' genes. The target prediction method we used was a simple stringent seed search on the 3' UTRs and a gene was defined as a 'Target' when the gene contained a seed site for the miRNA or siRNA used in the experiment. We used the previously described three stringent seed types, 8mer, 7mer-A1, and 7mer-m8, to define miRNA targets .
3' UTR sequence length
For the 3' UTR sequence length, we made five sub-groups, Very Long (> 4,000 nts), Long (1,373 to approximately 4,000 nts), Medium Long (630 to approximately 1,372 nts), Medium Short (248 to approximately 629 nts), and Short (0 to approximately 247 nts). The first group, Very Long, was decided from our previous study  because it showed that the genes with 3' UTR length longer than 4,000 nucleotides were less targeted by miRNAs. We sorted the rest of the sequences by 3' UTR length and divided them into four equally sized sub-groups.
3' UTR sequence conservation
For the 3' UTR sequence conservation, we calculated the conservation scores for each sequence by counting the number of conserved nucleotides in the HMR sequences and then divided the resulting number by the length of the sequence. We sorted the sequences with non-zero scores and divided them into three equal sub-groups; High (> 0.238), Medium (0.054 to approximately 0.238), and Low (0 to approximately 0.054). The sequences with zero scores were categorized as NoConsv.
mRNA expression level
We used the average number of tags from Caltech RNA-Seq as a measure (score) of mRNA expression levels. We sorted the mRNAs with non-zero expression scores by score and then divided them into five equally sized sub-groups; Very High (> 0.2007), High (0.0795 to approximately 0.2007), Medium (0.0344 to approximately 0.0795), Low (0.008 to approximately 0.0344), and Very Low (0 to approximately 0.008). The mRNAs with zero scores were categorized as NoExp.
CpG frequency in promoters
We defined the promoter regions as 1,000 nucleotides upstream from the transcription start site. We used a moving window approach (500 nt window moving 5 nt at a time) to compute the CpG frequency and classified the CpG frequency as 'high' when at least one 500-nucleotide-window contained > 55% GC content and > 75% CpG content, 'low' when none of the windows contained > 48% CpG content, and 'medium' for the rest .
Housekeeping and developmental genes
We mapped housekeeping and developmental genes to RefSeq genes based on gene IDs. Non-housekeeping and non-developmental genes were the rest of the RefSeq genes that were not mapped.
Endogenous miRNA targets
We split 'Target' and 'Non-Target' genes into 'With endogenous' and 'Without endogenous' to make the T +Endo, T -Endo, NT +Endo, and NT -Endo gene groups, where 'T' and 'NT' represent 'Target' and 'Non-Target', whereas '+Endo' and '-Endo' represent 'With endogenous' and 'Without endogenous'. We defined a gene as 'With endogenous' when the gene was a predicted targeted for one or more of the top ten most highly expressed endogenous miRNA families. The same approach as for predicting miRNA and siRNA targets was used to predict target genes for endogenous miRNAs.
Total mRNA expression levels and total number of target sites for the dilution effects
We calculated both the total mRNA expression and the number of target transcripts as previously described . Specifically, the total mRNA expression was a sample level sum of the average number of tags from Caltech RNA-Seq for predicted miRNA or siRNA target genes. The total number of target sites was a sample level sum of the number of target sites in predicted miRNA or siRNA target genes.
mRNA level TargetScan scores
The stand alone version of TargetScan was downloaded from the TargetScan website http://www.targetscan.org. We then ran TargetScan on the 3' UTR sequences of the genes from the ten miRNA high-throughput experiments with corresponding miRNA or siRNA sequences. The scores of target sites were aggregated by miRNA:mRNA pairs, and the aggregated scores were negated. Therefore, a high mRNA-level TargetScan score indicates that the mRNA is a strong candidate for miRNA down-regulation.
We used the log-ratio values from 10 different experiments (Additional file 1, Table S1) that contain 140 different samples (Additional file 1, Table S2) to measure the contributions of different groups to miRNA targeting efficacy. To test the significance level between multiple groups, we performed both one-sided Wilcoxon rank-sum and one-sided Kolmogorov-Smirnov non-parametric multiple comparison tests on the log-ratio values.
Sample level scores
We performed non-parametric tests on 140 samples and counted the number of samples that had a significant P-value (< 0.05) to calculate the proportion per experiment as Sample level score.
We merged the five sub-groups of 3' UTR lengths into the two bigger groups long (Very Long and Long) and short (Med Long, Med Short, and Short) for all RefSeq genes and identified potential miRNA target genes bound by the four AGOs. The numbers of bound genes in the long and short groups were compared between each AGO and all RefSeq genes by Fisher's exact test.
Log2 enrichment of down-regulated genes
The enrichment was calculated as the average log ratio values of down-regulated genes (P-value < 0.01 and log ratio < -0.01 for microarray, and log ratio < -0.01 for proteomics) divided by the average log-1ratio values of all genes.
All factors were normalized to the [0, 1] value range before building linear regression models. The normalization was linear; that is, (feature value - min)/(max - min), where min and max values were defined for each factor as ln (min: -2.0, max: 2.0), ln3 (min: 0, max: 1000), cs3 (min:0, max: 1), exp (min:6.1e-5, max: 64.0), #site_m (min: 0, max: 20), #endo_m (min: 0, max: 30), #endo_s (min: 0, max: 6000), and ts_score (min: 0.0, max: 2.0).
Gene Expression Omnibus
locked nucleic acid
Photoactivatable-Ribonucleoside-Enhanced Crosslinking and Immunoprecipitation
small interfering RNA
University of California: Santa Cruz
We thank Morten Rye for assistance with RNA-Seq data analysis. This work was supported by the Functional Genomics Program (FUGE) of the Norwegian Research Council.
- Ambros V: The functions of animal microRNAs. Nature. 2004, 431: 350-355. 10.1038/nature02871.View ArticlePubMedGoogle Scholar
- Bartel DP: MicroRNAs: genomics, biogenesis, mechanism, and function. Cell. 2004, 116: 281-297. 10.1016/S0092-8674(04)00045-5.View ArticlePubMedGoogle Scholar
- Stefani G, Slack FJ: Small non-coding RNAs in animal development. Nat Rev Mol Cell Biol. 2008, 9: 219-230. 10.1038/nrm2347.View ArticlePubMedGoogle Scholar
- Croce CM: Causes and consequences of microRNA dysregulation in cancer. Nat Rev Genet. 2009, 10: 704-714. 10.1038/nrg2634.PubMed CentralView ArticlePubMedGoogle Scholar
- Hwang HW, Mendell JT: MicroRNAs in cell proliferation, cell death, and tumorigenesis. Br J Cancer. 2007, 96 (Suppl): R40-44.PubMedGoogle Scholar
- Friedman RC, Farh KK, Burge CB, Bartel DP: Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 2009, 19: 92-105.PubMed CentralView ArticlePubMedGoogle Scholar
- Bartel DP: MicroRNAs: target recognition and regulatory functions. Cell. 2009, 136: 215-233. 10.1016/j.cell.2009.01.002.PubMed CentralView ArticlePubMedGoogle Scholar
- Stark A, Brennecke J, Bushati N, Russell RB, Cohen SM: Animal MicroRNAs confer robustness to gene expression and have a significant impact on 3'UTR evolution. Cell. 2005, 123: 1133-1146. 10.1016/j.cell.2005.11.023.View ArticlePubMedGoogle Scholar
- Sandberg R, Neilson JR, Sarma A, Sharp PA, Burge CB: Proliferating cells express mRNAs with shortened 3' untranslated regions and fewer microRNA target sites. Science. 2008, 320: 1643-1647. 10.1126/science.1155390.PubMed CentralView ArticlePubMedGoogle Scholar
- Hausser J, Landthaler M, Jaskiewicz L, Gaidatzis D, Zavolan M: Relative contribution of sequence and structure features to the mRNA binding of Argonaute/EIF2C-miRNA complexes and the degradation of miRNA targets. Genome Res. 2009, 19: 2009-2020. 10.1101/gr.091181.109.PubMed CentralView ArticlePubMedGoogle Scholar
- Hong X, Hammell M, Ambros V, Cohen SM: Immunopurification of Ago1 miRNPs selects for a distinct class of microRNA targets. Proc Natl Acad Sci USA. 2009, 106: 15085-15090. 10.1073/pnas.0908149106.PubMed CentralView ArticlePubMedGoogle Scholar
- Wen J, Parker BJ, Jacobsen A, Krogh A: MicroRNA transfection and AGO-bound CLIP-seq data sets reveal distinct determinants of miRNA action. RNA. 2011, 17: 820-834. 10.1261/rna.2387911.PubMed CentralView ArticlePubMedGoogle Scholar
- Saito T, Saetrom P: A two-step site and mRNA-level model for predicting microRNA targets. BMC Bioinformatics. 2010, 11: 612-10.1186/1471-2105-11-612.PubMed CentralView ArticlePubMedGoogle Scholar
- Sinha S, Adler AS, Field Y, Chang HY, Segal E: Systematic functional characterization of cis-regulatory motifs in human core promoters. Genome Res. 2008, 18: 477-488. 10.1101/gr.6828808.PubMed CentralView ArticlePubMedGoogle Scholar
- Larsson E, Sander C, Marks D: mRNA turnover rate limits siRNA and microRNA efficacy. Mol Syst Biol. 2010, 6: 433-PubMed CentralView ArticlePubMedGoogle Scholar
- Hafner M, Landthaler M, Burger L, Khorshid M, Hausser J, Berninger P, Rothballer A, Ascano M, Jungkamp AC, Munschauer M, Ulrich A, Wardle GS, Dewell S, Zavolan M, Tuschl T: Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell. 2010, 141: 129-141. 10.1016/j.cell.2010.03.009.PubMed CentralView ArticlePubMedGoogle Scholar
- Lim LP, Lau NC, Garrett-Engele P, Grimson A, Schelter JM, Castle J, Bartel DP, Linsley PS, Johnson JM: Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature. 2005, 433: 769-773. 10.1038/nature03315.View ArticlePubMedGoogle Scholar
- Grimson A, Farh KK, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP: MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell. 2007, 27: 91-105. 10.1016/j.molcel.2007.06.017.PubMed CentralView ArticlePubMedGoogle Scholar
- Selbach M, Schwanhausser B, Thierfelder N, Fang Z, Khanin R, Rajewsky N: Widespread changes in protein synthesis induced by microRNAs. Nature. 2008, 455: 58-63. 10.1038/nature07228.View ArticlePubMedGoogle Scholar
- Baek D, Villén J, Shin C, Camargo FD, Gygi SP, Bartel DP: The impact of microRNAs on protein output. Nature. 2008, 455: 64-71. 10.1038/nature07242.PubMed CentralView ArticlePubMedGoogle Scholar
- Khan AA, Betel D, Miller ML, Sander C, Leslie CS, Marks DS: Transfection of small RNAs globally perturbs gene regulation by endogenous microRNAs. Nat Biotechnol. 2009, 27: 549-555.PubMed CentralPubMedGoogle Scholar
- Arvey A, Larsson E, Sander C, Leslie CS, Marks DS: Target mRNA abundance dilutes microRNA and siRNA activity. Mol Syst Biol. 2010, 6: 363-PubMed CentralView ArticlePubMedGoogle Scholar
- Birmingham A, Anderson EM, Reynolds A, Ilsley-Tyree D, Leake D, Fedorov Y, Baskerville S, Maksimova E, Robinson K, Karpilow J, Marshall WS, Khvorova A: 3' UTR seed matches, but not overall identity, are associated with RNAi off-targets. Nat Methods. 2006, 3: 199-204. 10.1038/nmeth854.View ArticlePubMedGoogle Scholar
- Jackson AL, Burchard J, Schelter J, Chau BN, Cleary M, Lim L, Linsley PS: Widespread siRNA "off-target" transcript silencing mediated by seed region sequence complementarity. RNA. 2006, 12: 1179-1187. 10.1261/rna.25706.PubMed CentralView ArticlePubMedGoogle Scholar
- Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, Kuehn MS, Taylor CM, Neph S, Koch CM, Asthana S, Malhotra A, Adzhubei I, Greenbaum JA, Andrews RM, Flicek P, Boyle PJ, Cao H, Carter NP, Clelland GK, Davis S, Day N, Dhami P, Dillon SC, Dorschner MO, Fiegler H, et al: Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007, 447: 799-816. 10.1038/nature05874.View ArticlePubMedGoogle Scholar
- Shkumatava A, Stark A, Sive H, Bartel DP: Coherent but overlapping expression of microRNAs and their targets during vertebrate development. Genes Dev. 2009, 23: 466-481. 10.1101/gad.1745709.PubMed CentralView ArticlePubMedGoogle Scholar
- Guo H, Ingolia NT, Weissman JS, Bartel DP: Mammalian microRNAs predominantly act to decrease target mRNA levels. Nature. 2010, 466: 835-840. 10.1038/nature09267.PubMed CentralView ArticlePubMedGoogle Scholar
- Evans SJ, Watson SJ, Akil H: Evaluation of Sensitivity, Performance and Reproducibility of Microarray Technology in Neuronal Tissue. Integr Comp Biol. 2003, 43: 780-785. 10.1093/icb/43.6.780.View ArticlePubMedGoogle Scholar
- Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ: miRBase: tools for microRNA genomics. Nucleic Acids Res. 2008, 36: D154-158. 10.1093/nar/gkn221.PubMed CentralView ArticlePubMedGoogle Scholar
- Linsley PS, Schelter J, Burchard J, Kibukawa M, Martin MM, Bartz SR, Johnson JM, Cummins JM, Raymond CK, Dai H, Chau N, Cleary M, Jackson AL, Carleton M, Lim L: Transcripts targeted by the microRNA-16 family cooperatively regulate cell cycle progression. Mol Cell Biol. 2007, 27: 2240-2252. 10.1128/MCB.02005-06.PubMed CentralView ArticlePubMedGoogle Scholar
- Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Edgar R: NCBI GEO: mining tens of millions of expression profiles-database and tools update. Nucleic Acids Res. 2007, 35: D760-D765. 10.1093/nar/gkl887.PubMed CentralView ArticlePubMedGoogle Scholar
- Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, Haussler D, Miller W: Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004, 14: 708-715. 10.1101/gr.1933104.PubMed CentralView ArticlePubMedGoogle Scholar
- Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D: The human genome browser at UCSC. Genome Res. 2002, 12: 996-1006.PubMed CentralView ArticlePubMedGoogle Scholar
- De Ferrari L, Aitken S: Mining housekeeping genes with a Naive Bayes classifier. BMC Genomics. 2006, 7: 277-10.1186/1471-2164-7-277.PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25: 25-29. 10.1038/75556.PubMed CentralView ArticlePubMedGoogle Scholar
- Landgraf P, Rusu M, Sheridan R, Sewer A, Iovino N, Aravin A, Pfeffer S, Rice A, Kamphorst AO, Landthaler M, Lin C, Socci ND, Hermida L, Fulci V, Chiaretti S, Foa R, Schliwka J, Fuchs U, Novosel A, Muller RU, Schermer B, Bissels U, Inman J, Phan Q, Chien M, Weir DB, Choksi R, De Vita G, Frezzetti D, Trompeter HI, et al: A mammalian microRNA expression atlas based on small RNA library sequencing. Cell. 2007, 129: 1401-1414. 10.1016/j.cell.2007.04.040.PubMed CentralView ArticlePubMedGoogle Scholar
- Weber M, Hellmann I, Stadler MB, Ramos L, Paabo S, Rebhan M, Schubeler D: Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome. Nat Genet. 2007, 39: 457-466. 10.1038/ng1990.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.