All multicellular organisms embed endogenous circadian oscillators or clocks that rhythmically regulate a wide variety of processes in response to daily environmental cycles. Previous molecular studies using rhythmic mutants for several model systems have identified a set of genes responsible for rhythmic activities and illustrated the molecular mechanisms underlying how disruptions in circadian rhythms are associated with the sort of aberrant cell cycling. However, the wide use of these forward genetic studies is impaired by a limited number of mutations that can be identified or induced only in a single genome, limiting the identification of many other conserved or non-conserved clock genes. Genetic linkage or association mapping provides an unprecedented glimpse into the genome-wide scanning and characterization of genes underlying circadian rhythms. The implementation of sophisticated statistical models into genetic mapping studies can not only identify key clock genes or clock quantitative trait loci (cQTL) but also, more importantly, reveal a complete atlas of the genetic control mechanisms constituted by gene interactomes. Here, we introduce and review an advanced statistical mechanics framework for coalescing all possible clock genes into intricate but well-organized interaction networks that regulate rhythmic cycles. The application of this framework to widely available mapping populations will reshape and further our understanding of the genetic signatures behind circadian rhythms for an enlarged range of species including microbes, plants, and humans.

The 24-h rotation of the Earth causes predictable changes in light and temperature in our natural environment. Accordingly, all living organisms from microorganisms to insects, plants, and mammals exhibit circadian rhythms, i.e., sustained oscillations with a period close to 24 h.1 Circadian rhythms are mediated by an internal body clock, which appears nearly ubiquitous in life and regulates a wide array of metabolic and physiological functions, such as hormone production, cell regeneration, brain wave activity, and organism behavior.2–5 Disruptions in biological rhythms can be associated with aberrant cell cycling, ultimately leading to disease such as tumorigenesis, cardiovascular disease, and neurodegenerative disorders,6–8 as well as reduced productivity in plants.9–12 

The biological process of circadian rhythms involves three fundamental components: input pathways that transmit environmental cues to the circadian clock; the clock gene itself, which generates the biological rhythm; and output pathways that entrain the clock's information regarding phase and periodicity to the rest of the organism.13 Extensive molecular studies have successfully identified specific clock genes that regulate an organism's cyclic response to its surrounding environment.14–16 The first clock gene, per, was characterized and cloned in Drosophila,17–19 which was subsequently found to regulate circadian rhythms through its protein product PER.20–23 However, the question of how the PER protein enters the nucleus to act as a transcriptional factor was not answered until the second clock gene, tim, was discovered.24,25 Takahashi and his group found that the transcription of tim and per were crucial for sustaining an autonomous oscillation that is activated by a positive input, clock, the first clock gene detected in mammals,26–28 that functions through the CLOCK-BMAL1 heterodimeric transcription factor.29 Tremendous efforts have been made to understand the molecular basis of how the clock genes receive input signals, drive their entrainment, and regulate cellular aspects of circadian rhythms.1,8

With the continuous improvement of molecular and cloning techniques, an increasing number of clock genes have been detected and characterized.1,30–34 These genes were revealed to encode proteins through multiple interconnected transcriptional and translational feedback loops, having various impacts on physiological and behavioral rhythmicity.1,16,34–38 Despite this progress, a complete characterization of clock genes and their rhythmic functions is still far from clear. First, the current identification of molecular components for circadian clocks is mostly based on forward genetics approaches that utilize mutants with abnormal behavioral cycles to map genes.39 In practice, only a limited number of rhythm mutations can be detected or induced in a single genome for several well-studied model systems, such as cyanobacteria, Neurospora crassa, Arabidopsis thaliana, Drosophila melanogaster, and Mus musculus (mice).1,40,41 It is largely unknown how clock genes occur and function in those organisms for which mutants are hardly available. Second, although circadian rhythms are omnipresent, their underlying molecular mechanisms are not conserved among evolutionarily divergent organisms,42–45 which makes it difficult or even impossible to chart a complete picture of clock machineries from only model systems.

Linkage or association mapping is a forward genetic tool that can serve as an alternative approach for clock gene detection.46 This approach, not relying on rhythmic mutants, can take advantage of increasingly available genotypic and sequencing data collected at unprecedented resolution for almost all species and make full use of considerable allelic variation in clock function that has been accumulated during evolution in natural populations.47,48 It displays a formidable ability to map a complete set of quantitative trait loci (QTLs) throughout the genome that control a rhythmic trait. For example, using linkage mapping or association studies, important clock QTLs (cQTLs) that mediate rhythmic activities have been mapped in several species.49–55 These studies perform association analysis between marker genotypes and chronophenotypes to identify and map significant genetic loci. To leverage the biological relevance of cQTL detection, several dynamic mapping approaches have been proposed to characterize how QTLs globally regulate the periodic pattern and form of circadian rhythms expressed in various stages from gene expression to protein turnover to metabolic rhythm and ultimately to cell cycles.56–63 These dynamic approaches, referred to collectively as functional mapping or systems mapping (reviewed in Refs. 64–66) integrate mathematical aspects of circadian rhythms into a mapping setting, and provide a capacity to test the temporal trajectories of genetic effects, exerted by cQTLs, on rhythmic patterns.

Existing mapping models were developed to detect individual significant QTLs from a large pool of genome-wide molecular markers. These models work well in specific situations, but may not work for rhythmic mapping because circadian clocks involve a number of heterogeneous genes that act singly and work together via local or non-local interactions. The past three decades have seen the tremendous development of statistical models for reconstructing interaction networks from gene expression data (see a number of excellent reviews67–70 from different perspectives). Reconstruction of genetic networks at the QTL level is much more challenging, although a genome-wide QTL-QTL interaction network can provide direct insight into the genetic control of complex biological processes. More recently, Jiang et al.71 have proposed a statistical model for mapping QTL networks underlying developmental trajectories by integrating functional mapping and evolutionary game theory. We argue that this model can be modified to map new cQTLs for rhythmic processes and unveil how these cQTLs interact with each other through an intrinsic but well-orchestrated network. Here, by reviewing the fundamental utility of this model to study clock genetics, we augment it into a generic paradigm in which genome-wide interactome networks can be inferred at any dimension. We also integrate QTL control networks and gene regulatory networks to establish an intertwined bidirectional, signed, and weighted circuit that can better reveal key organizing principles of how biological rhythms are regulated through a web of interacting cQTLs and transcriptomic networks.

Different from traditional forward genetics, reverse genetics based on mapping or association studies can simultaneously detect and map a wide array of new genes regulating rhythmic pattern and function for any species, without relying on the characterization of rhythmic mutants. Also, unlike the notion of clock genes limited to transcriptional genes and their products in forward genetics, clock mapping studies based on reverse genetic thinking are concerned with gene detection that covers an entire domain of the central dogma of biology ranging from DNA to RNA to proteins to cellular physiology and behavior. To clarify some issues, we provide several relevant notations. In a mapping population, we genotype DNA markers, e.g., single nucleotide polymorphisms (SNPs), measure the profiles of genes and their products, and phenotype complex traits, aimed at illustrating a DNA to RNA to phenotype pathway. Networks composed of DNA markers are called genetic or SNP-SNP interaction networks, those composed of transcriptional genes gene (regulatory) networks, and those composed of phenotypic traits phenotypic networks. We use positive or negative epistasis to define interactions between different markers and synergism and antagonism to define the pattern of how transcriptional genes (or phenotypic traits) are co-regulated. We assume that phenotypic traits are causally regulated by transcriptional genes, which are controlled by DNA markers. We refer to the DNA markers that are significantly associated with gene expression or phenotypic variation QTLs.

Circadian oscillators function through highly interconnected, autoregulatory gene networks that contain transcription-translation feedback loop motifs.1,16,34,72 To accommodate this complexity of rhythmic processes, we describe a general framework for reconstructing SNP-SNP interaction and gene regulatory networks that cover all genome-wide genes in order to chart a complete atlas of the molecular mechanisms underlying circadian rhythms. Suppose there is a circadian clock constituted by a number of genes that are rhythmically expressed to regulate behavior and physiological traits in a way that conforms to the daily environmental cycle of light/dark. All these processes are encoded by an unknown number of DNA variants distributed throughout the genome. An oscillating clock is large in dimension, complex in structure, heterogeneous in organization, and diverse in function. Despite these recalcitrant characteristics, we can view it as a multiplayer game. Originating in economic research,73 game theory studies and models the payoff of one player based on the strategy implemented by the other player. The application of game theory has been largely popularized by the concept of the Nash equilibrium, a proposed solution of a non-cooperative game, in which each rational agent tends to choose an optimal strategy to maximize its payoff, conditional on the strategies of its opponents, as long as the latter remains unchanged.74 By combining game theory and evolutionary biology, Smith and Price75 formulated evolutionary game theory to interpret how frequency-dependent fitness drives strategies to evolution.76 This theory's core is the concept of an evolutionarily stable strategy regarded as an equilibrium refinement of the Nash equilibrium and its extension to population evolution. However, Smith and Price's evolutionary game theory serves as the static analysis tool of evolutionary stability because it does not attempt to model how strategies change in a population. By adding the time dimension, we expand evolutionary game theory to its dynamic domain, making it possible to explicitly model the change of strategy frequencies in the population. Such a dynamic evolutionary game theory (dEGT) does not need to specifically define a notion of evolutionary stability because, by specifying a population dynamic model, all of the standard stability concepts may be used to characterize dynamical systems. As such, if a dynamic model is developed, we can implement dEGT to characterize how a player achieves its payoff differently over time through its own strategy and the strategies implemented by other players.

We interpret a circadian clock through the lens of dEGT. We view entities comprising the clock as interactive players. The magnitude at which an entity regulates rhythmicity depends on the intrinsic capacity of this entity and the extrinsic influences of other entities on it. Let gj(t) denote the net effect of entity j on a rhythmic trait at time t (t = t1, …, tT), which can be characterized by a nonlinear Lotka-Volterra ODE. A whole rhythmic network is composed of q such equations, expressed as

gj(t)=Qjgjt:Θj+j=1,jjqQjjgjt:Θjj,j=1,,q,
(1)

where the net effect of entity j includes two components: independent effect, Qj(gjt:Θj), which is expected to occur when this entity is assumed to be socially isolated, and accumulated dependent effect ΣQjj(gjt:Θjj), which results from the influence of other entities j (j = 1, …, j–1, j + 1, …, m) on the focal entity. The pattern and strength of how an entity acts independently are determined by its own innate strategy, whereas how and how much the action of this entity is affected depend on the strategies of other entities. Thus, we express the independent effect of entity j as a function of gj(t) and its dependent effect as a function of gjt. Equation (1) represents a mathematically formulated dEGT framework for systematically characterizing inter-entity interactions and their impacts on circadian changes. Given the uniqueness of the above derivation procedure, we call networks reconstructed under this procedure systems evolutionary game networks (SEGNs).

To reconstruct the SEGN at the DNA level, we need to obtain genetic effects of individual SNPs on biological rhythmicity, which can be used to formulate the nonlinear equations, as described by Eq. (1). At this time, j denotes a SNP, q denotes the number of SNPs, and gjt denotes the genetic effect of SNP j on circadian rhythms at time t. Functional mapping is a dynamic mapping approach that can estimate the temporal pattern of genetic effect or genetic variance due to single significant SNPs or single SNP pairs chosen from a genome-wide pool of markers for any mapping or association populations.64–66 This approach has proven itself to be powerful for QTL mapping in a wide variety of species.77–84 We initiate a mapping or association study composed of n individuals that are genotyped at q genome-wide SNPs and phenotyped for a rhythmic trait repeatedly at a series of T discrete time points. Let yi = (yi(t1), …, yi(tT)) denote a vector of measured values of a rhythmic trait for individual i at T discrete time points. Consider a SNP with K genotypes whose observations are denoted by nk (k = 1, …, K), respectively. Functional mapping formulates a likelihood for n trait vectors at this SNP, expressed as:

Lμ;Σ=k=1Ji=1nkfkyi|μk;Σ,
(2)

where fkyi|μk;Σ is a T-dimensional normal distribution for individual i with mean vector for genotype j (μk) and covariance matrix Σ. Functional mapping implements biologically meaningful mathematical equations of trait formation to model genotype-typical mean vectors.77 Many parametric approaches, such as the first-order autoregressive [AR(1)] model,77,78 the first-order structured antedependence [SAD(1)] model,84,85 the autoregressive moving average (ARMA) model,86 and Brownian motion process, have been used to fit the structure of the covariance matrix. Compared with parametric approaches, nonparametric approaches based on B-splines or Legendre Orthogonal Polynomials (LOP) may better model the covariance structure.87–89 The best approach that structures the longitudinal covariance matrix of real data can be chosen based on information criteria. Joint mean-covariance modeling in functional mapping can enhance the biological relevance of QTL detection and its statistical power.

In rhythmic biology, the cyclic change of trait values can be approximated by mathematical functions.90,91 Fourier series are considered one of the universal approximators for rhythmic models.58,59,92–95 We model time-varying genotypic values in μk by the Fourier signal, expressed as

μk(t)=ak0+r=1Rakrcos2πrtTk+r=1Rbkrsin2πrtTr,
(3)

where ak0 is the trait mean of genotype k over time; Tk is the period of rhythmic cycle for genotype k; akr and bkr are the coefficients of cosines and sines, respectively, at the rth harmonic, from which the amplitude and phase of rhythmic change for genotype k are calculated as Akr=akr2+bkr2 and ϕkr=tan1(bkr/akr); and R is the number of harmonics that best fits the observed data by statistical reasoning. A number of parametric, nonparametric, or semiparametric approaches have been developed to model the covariance matrix.59,87

Statistical algorithms are implemented to solve the likelihood of Eq. (2) and obtain the maximum likelihood estimates (MLEs) of the Fourier series parameters (ak0, Tk, akr, bkr) for each genotype j. We plug these estimates into Eq. (3) to calculate time-varying genotypic values of each genotype from which we calculate time-varying genetic effects or variances explained by the SNP under consideration. In a hypothetical example of rhythmic mapping (Fig. 1), we demonstrate how functional mapping can be used to detect genotypic differences in rhythmic curve. At SNP 1, three genotypes AA, Aa, and aa were detected to differ in amplitude but not in phase and period [Fig. 1(a)]. Genotypic differences at SNP 2 follow a different pattern; genotypes AA and Aa differ in amplitude but are similar in phase and period; both differ from genotype aa in the three rhythmic parameters [Fig. 1(b)]. For a backcross or recombinant inbred line mapping population, there are only two genotypic curves at a SNP, whose difference is used to describe the genetic effect of this SNP. For an F2 mapping population or human association study population, three genotypes at a SNP may produce two effects, additive and dominant. These estimated effects at different SNPs are implemented into nonlinear equations for network reconstruction. In quantitative genetics, the contribution of a gene to trait variation can be described by the genetic variance that is explained by this gene. Here, we estimate and use the genetic variance of each SNP for the subsequent modeling. It is interesting to find that genetic variance at a SNP also changes rhythmically, although the pattern of change differs between SNPs [Fig. 1(c)]. Note that gj(t)'s building up ODE equations are time-varying genetic variances accounted for by a SNP j (j = 1, …, p), estimated by functional mapping. Using these SEGN equations, we reconstruct SNP-SNP interaction networks for circadian rhythms.

FIG. 1.

Simulated examples showing functional mapping of circadian rhythms as a first step of SEGN reconstruction. (A) and (B) Estimated rhythmic curves of three genotypes AA, Aa, and aa at each of two randomly chosen SNPs, showing genotypic differences in rhythmic features including phase, period, and amplitude. (C) Rhythmic pattern of genetic variance explained by each SNP. (D) Plot of the genetic variance of SNP 2 against SNP 1 over rhythmic cycles. Data were simulated by mimicking the features of rhythmic data reported in Ref. 137.

FIG. 1.

Simulated examples showing functional mapping of circadian rhythms as a first step of SEGN reconstruction. (A) and (B) Estimated rhythmic curves of three genotypes AA, Aa, and aa at each of two randomly chosen SNPs, showing genotypic differences in rhythmic features including phase, period, and amplitude. (C) Rhythmic pattern of genetic variance explained by each SNP. (D) Plot of the genetic variance of SNP 2 against SNP 1 over rhythmic cycles. Data were simulated by mimicking the features of rhythmic data reported in Ref. 137.

Close modal

In a mapping study, Jiang et al.71 showed that the estimated net genetic variances by functional mapping implemented in Eq. (1) can reconstruct a gene network for growth trajectories. This implementation can characterize the detailed interaction pattern of genes, but it only illustrates the partial architecture of epistasis among a limited set of significant loci that are first chosen from single marker analysis. It is possible that virtually all genes in the genome participate in mediating a complex trait or disease96 such that it is essential to reconstruct a genome-wide SEGN that covers the interactome of all genes.

Although we attempt to encapsulate all genes into circadian networks, this does not mean that we would reconstruct a completely interconnected network in which each gene is linked with every other gene. Instead, we will need to reconstruct sparsely connected networks, in which most genes have a low number of links. Ample evidence from a variety of data analyses suggests that biological networks, ranging from metabolic gene-regulatory to species interaction networks, are sparse, i.e., the percentage of active interactions scales inversely with the network size.97–101 This is different from inanimate networks and telecommunication networks that are connected by a complete number of links. Several studies have begun to explore why interaction networks in living systems universally possess a nonrandom architecture and sparsity.102 It has been generally suggested that sparsity is an emergent property that enables the interactive system to better adapt to newly intervening changes and remain stable after perturbations of the underlying dynamics.103,104

Reconstructing a sparse circadian network based on the ODEs of Eq. (1) is equivalent to identifying and choosing a small number of SNPs that regulate a focal SNP from a huge pool of genome-wide SNPs. To do so, we formulate a regression model to describe the genetic variance value of each SNP as a linear combination of the genetic variance of all other SNPs across time points (equivalent to samples for a traditional regression model). However, because the number of SNPs (q) is significantly larger than the number of time points (T), we encounter the “curse of dimensionality” for model overfitting. We implement least absolute shrinkage and selection operator (LASSO)-based variable selection105 to choose a subset of the most significant SNPs (predictors) that interact with a focal SNP. LASSO can only select at most T SNPs before it saturates, but several versions of its modifications, such as elastic net,106 group LASSO,107 and adaptive group LASSO,108 can address the technical issue of q ≫ T. We can also address this issue by augmenting “sample size” through the curve fitting of genetic variance over time.

The Fourier approximations of genotype values by Eq. (3) allows us to calculate and interpolate an infinite number of genetic variance values of each SNP expressed during the rhythmic cycle, which provide an infinite number of “samples” to perform LASSO-based variable selection with any dimension of SNP data. Using the interpolated genetic variance values denoted as zjt (t = 1, …, T), we formulate a regression model to characterize how a SNP (say j) is affected by all other SNPs (j), expressed as

zjt=αj+j=1,jjqβjzjt+εjt,j=1,,q,
(4)

where αj and βj are the constant and the regression coefficient of SNP j as a predictor, respectively, and εjt is the residual error. The basic principle of using LASSO to disentangle the q ≫ T problem is to minimize squared error loss (L2 loss) under a penalty on the sum of the absolute values of the coefficients (an L1 penalty). Under this principle, the solution of Eq. (4) tends to find estimates of βj (j = 1, …, j–1, j + 1, …, m) that are mostly zero. By plotting the genetic variance of SNP 2 against that of SNP 1 (used in Fig. 1's example), we found that their relationship can be better fitted by a nonparametric approach [Fig. 1(c)]. Thus, we implement a general nonparametric approach for variable selection on a regression model in Eq. (4). In the end, through variable selection procedure, we will find a small subset of the most significant SNPs (say dj) that link with each focal SNP j (j = 1, …, m), which allows us to rewrite the ODEs of Eq. (1) as

gj(t)=Qjgjt:Θj+j=1,jjdjQjjgjt:Θjj,j=1,,m,
(5)

where Qj(gjt:Θj) and Qjj(gjt:Θjj) are defined as above. Although a majority of regression coefficients in Eq. (4) are shrunk to be zero, we pose no constraints on the number of ODEs such that the dimension of networks remains unchanged. Equation (5) affords an interdisciplinary platform on which evolutionary game theory and network theory are cross-pollinated through statistical variable selection to reconstruct a series of gene networks. These networks are high-dimensional (q SNPs), highly sparse (djq), and mobile (as a function of t).

The ODEs in Eq. (5) can be solved by implementing mathematical and statistical algorithms. Genetic variance gjt is calculated from genotypic values μk(t) that have an explicit periodic form, fitted by a Fourier series approximation in Eq. (2), but we do not know about the form of Qj(gjt:Θj) and Qjj(gjt:Θjj). Thus, these two functions can be better fitted by a nonparametric approach. Because of its favorable property as an infinitely differentiable function, we implement the LOP-based approach for smoothing time-varying independent and dependent genetic variances through the parameters Θj and Θjj, respectively. Many mature mathematical techniques have been available for studying numerical or theoretical properties of ODE models (e.g., sensitivity and bifurcation analysis).109–111 In the past decade, many statistical algorithms have emerged for estimating ODE parameters from the noisy data. These methods include Ramsay et al.'s generalized profiling approach,112 Liang and Wu's two-step derivative-based local polynomial regression approach,113 Cao et al.'s penalized least squares method,114 Brunel et al.'s gradient matching approach,115 Li et al.'s regularization estimation approach,116 and Chen et al.'s derivative-free approach.117 Each of these methods has its own advantages and disadvantages in parameter estimation, power, and computational efficiency. For example, the derivative-free approach is very flexible to handle noisy data. Because gene expression data often contain unknown noises, we hybridize the derivative-free approach with the simplex (Nelder-Mead) algorithm under a likelihood setting to obtain the maximum likelihood estimates (MLEs) of ODE parameters in Eq. (5).

Let Pj(t) and Pjjt denote the integrals of the MLEs of Qj(gjt:Θj) and Qjj(gjt:Θjj), respectively. Then, we code Pj(t) as a node and Pjjt as an edge into q-node networks. Such networks, i.e., SEGNs, cover all possible genes that take a part in circadian rhythms along direct or indirect pathways. SEGNs can contextualize bidirectional, signed, and weighted gene interactions into fully informative graphs and, thereby, own many favorable features that are unavailable to commonly used correlation- and Bayesian-based networks.

The phenomenon by which the impact of one gene on a phenotype is determined by other genes is called epistasis.118 Classic quantitative genetic theory can estimate the size of epistasis, but fails to characterize its causality and the direction of its causality.119 The SEGN is the interdisciplinary integration that combines separate perspectives through the development of mechanistic connections among them to establish a more cognitive and empirical approach toward the epistatic identification of clock genes. The pattern of how and how strongly SNP j is affected by SNP j′ can be assessed by Pjjt. If this value is positive, zero, or negative, then this suggests that SNP j′ activates, is neutral to, or inhibits SNP j, respectively. By comparing Pjjt and Pjjt, we can classify SNP-SNP interactions into five qualitatively different types:

  • Positive epistasis by which two interactive SNPs activate each other. This can be seen if both Pjjt and Pjjt are positive.

  • Negative epistasis by which two interactive SNPs inhibit each other. This can be seen if both Pjjt and Pjjt are negative.

  • Directional positive epistasis by which SNP j′ activates SNP j but the latter is neutral to the former. This can be seen if Pjjt is positive but Pjjt is zero.

  • Directional negative epistasis by which SNP j′ inhibits SNP j but the latter is neutral to the former. This can be seen if Pjjt is negative but Pjjt is zero.

  • Altruistic/exploitation epistasis in which one SNP activates the other but the latter inhibits the former. If Pjjt is positive whereas Pjjt is negative, this suggests that SNP j′ offers altruism to SNP j, or say, SNP j exploits SNP j′.

It is possible that the two SNPs may peacefully coexist when they do not affect each other. This can be seen if both Pjjt and Pjjt are zero. The SEGN is also a quantitative network, because each activation or inhibition is quantified by a value. If Pjjt and Pjjt are positive and their values are equal, the positive epistasis of two SNPs j and j′ is regarded as symmetrical positive epistasis. If Pjjt and Pjjt are positive but their values are not equal, then positive epistasis becomes asymmetrical positive epistasis. Similarly, we can distinguish between symmetrical negative epistasis and asymmetrical negative epistasis. Table I condenses the salient features of a SEGN. Taken together, the definitions and interpretations of various patterns of gene interactions can facilitate the exploration of the mass, energetic, or signal basis for each interaction, surpassing the traditional notation of epistasis in terms of its biological relevance.

TABLE I.

Qualitative definition of epistasis and its quantitative characterization by the SEGN model. Note: PjjtandPjjt are the dependent genetic variances of SNP j by SNP j' and SNP j' by SNP j, respectively.

Quantitative description
NoQualitative definitionPjjtPjjtPjjt
Symmetric positive epistasis 
Asymmetric positive epistasis ≠ 
Directional positive epistasis toward j 
Directional positive epistasis toward j' 
Altruism toward j or exploitation by j  
Altruism toward j' or exploitation by j'  
Symmetric negative epistasis 
Asymmetric negative epistasis ≠ 
Directional negative epistasis toward j  
10 Directional negative epistasis toward j'  
11 Coexistence  
Quantitative description
NoQualitative definitionPjjtPjjtPjjt
Symmetric positive epistasis 
Asymmetric positive epistasis ≠ 
Directional positive epistasis toward j 
Directional positive epistasis toward j' 
Altruism toward j or exploitation by j  
Altruism toward j' or exploitation by j'  
Symmetric negative epistasis 
Asymmetric negative epistasis ≠ 
Directional negative epistasis toward j  
10 Directional negative epistasis toward j'  
11 Coexistence  

The central themes of network reconstruction include sparsity, stability, and causality.120–122 As described above, the implementation of ODEs meets the causality property of a network by determining the direction of gene interaction. As shown in Box 1, the statistical procedure for learning the SEGN is formulated under the maximum likelihood and convex optimality setting. Thus, various strategies each SNP chooses to interact with different SNPs can be thought to achieve the maximum stability of the network.121 As predicted by network theory, there is a limit to the number of links owned by each node in a network.123 We can implement variable selection to detect the number of the most significant SNPs that affect a focal SNP. Taken together, we can reconstruct sparse, stable, and casual gene networks for circadian rhythms.

Networks are regarded as snapshots of biological systems at different times. Uncovering the dynamic nature of genetic networks can shed light on the genomic mechanisms that drive circadian rhythms. As a function of time t, Pjjt can be calculated at any time point from t = 0 to T. In an example illustrated by Fig. 2, we simulated 10 SNPs, each with genetic variance spinning cyclically with time in a different manner. We can reconstruct mobile SEGNs using these SNPs along the time axis. We show such SEGNs at three representative time points, 15, 30, and 60 h, in a rhythmic cycle. Although network topologies do not change from time 15 to 60, the quantitative organization of the network dramatically varies from time to time. For example, SNP 3 and SNP 8 establish a relationship of weak symmetric negative epistasis at time 15, but this relationship is changed to sizeable altruistic/repressive epistasis with SNP 8 promoting SNP 3 but with SNP 3 inhibiting SNP 8 at time 30, which becomes even stronger at time 60. In general, the strength of SNP-SNP interactions increases with time. Indeed, SEGNs can be reconstructed instantaneously, which are equipped with a capacity to establish a real-time visualization of genetic networks during biological processes. Such momentary monitoring facilitates the detection of genetic disruption in circadian rhythms, thereby providing a quantitative approach for rhythmically related disorder prediction.

FIG. 2.

Real-time genetic networks reconstructed from 10 simulated SNPs. (A): Genetic variance of each SNP changes rhythmically over time in a different way. (B): Instantaneous SEGNs at times 15, 30, and 60 during rhythmic cycles, showing temporal changes in topological structure and organization. Circles denote SNPs as nodes, whose size is proportional to the magnitude of genetic variance explained by a given SNP. Red and blue arrowed lines denote promotion and inhibition from one SNP to the next, respectively, with strength proportional to the thickness of lines. Data were simulated by mimicking the features of rhythmic data reported in Ref. 136.

FIG. 2.

Real-time genetic networks reconstructed from 10 simulated SNPs. (A): Genetic variance of each SNP changes rhythmically over time in a different way. (B): Instantaneous SEGNs at times 15, 30, and 60 during rhythmic cycles, showing temporal changes in topological structure and organization. Circles denote SNPs as nodes, whose size is proportional to the magnitude of genetic variance explained by a given SNP. Red and blue arrowed lines denote promotion and inhibition from one SNP to the next, respectively, with strength proportional to the thickness of lines. Data were simulated by mimicking the features of rhythmic data reported in Ref. 136.

Close modal

It has been widely recognized that biological networks across nearly an entire range of scales from molecules all the way up to the whole organism can be divided into smaller communities or modules that have strong internal interactions but are relatively autonomous with respect to each other.124,125 This phenomenon, called network modularity, has received considerable attention in biological and biomedical research.126–128 Genes within modules function similarly and vary together, but they are independent from the function of other genes. Such structural and functional diversity of gene networks enhance the robustness of biological systems to environmental perturbations, showing a widespread implication for mediating developmental and evolutionary processes. A number of computational algorithms have been developed to detect and characterize modular structure in networks by revealing the occurrence of densely connected groups of vertices, with only sparser connections between groups.129–133 

Genes that display a similar pattern of time-varying gene expression profiles are attributed to the same group. These similarly differentiated genes form the same modules, which are less similar in expression pattern to those from different modules. We implement functional clustering59,60,134 to classify SNPs into an optimal number of distinct groups, each representing a different module within clock networks. Let gj=(gjt1,,gj(tT)) denote a vector of genetic variances due to SNP j at time points (t1, …, tT). To group q SNPs into L modules according to how they act with time, functional clustering formulates a mixture-based likelihood as

Ly=j=1qπ1f1gj++πLfLgj,
(6)

where πl is the proportion of SNPs within module l (l = 1, …, L) to all SNPs, and fl(gj) is the T-variate normal distribution of SNP j over time with mean vector μl=μlt1,,μltT and covariance matrix Σ. The form of rhythmic genetic variance curves may be unknown so that a nonparametric smoothing approach can be used to model mean vector μl. In a simulated example derived from the emulated real data, we found that genetic variance explained by a SNP obeys the pattern of periodic change with time [Fig. 1(C)], in which case Fourier series approximation can be used to model μl for more efficient fitting. Many approaches can be used to model the matrix Σ, including AR(1), SAD(1), and AR(u,v)MA models86 and nonparametric models.87,88 A specific optimal procedure must be formulated to select a model that structures the covariance matrix for a given dataset.

Module proportions and parameters that model mean vectors and the covariance matrix can be estimated by implementing an EM-(Nelder-Mead) simplex hybrid algorithm. In the E step, we calculate the posterior probabilities of each SNP j that belongs to a particular module r, by equation

Πrj=πlflgjπ1f1gj++πLfLgj,
(7)

and in the M step, we estimate the proportion of module l among all genes by equation

πl=1qj=1qΠlj.
(8)

In the M step, the vector-covariance modeling parameters are estimated by the simplex algorithm. The E and M steps are repeated until the estimates are stable. The optimal number of modules, Lo, can be determined by information criteria, such as Akaike information criterion (AIC) or Bayesian information criterion (BIC). Based on the posterior probabilities of each SNP estimated by Eq. (7), we can assign SNPs into these L distinct modules. We used a simulated example to show how functional clustering can be used to classify 35,000 SNPs based on their rhythmic patterns of genetic variation. Under three different orders of the Fourier series approximation, we calculated AIC values when different number of modules are assumed. An optimal number of modules occurs at 24 for the third order where the AIC value is minimized (Fig. 3). We chose 10 of the 24 modules to show that the time-varying trends of genetic variances are dramatically different among rather than within modules. For example, genetic variances of SNPs within modules 1, 6, 7, and 10 spin slightly rhythmically with time, although their values are highly module-dependent. There are remarkable periodic changes in genetic variance for other modules, but the phase, period, and amplitude of rhythmic cycles are largely different among the modules.

FIG. 3.

A diagram of tridimensional SEGN across multiple layers. Top tier: Coarse-grained genetic networks among five modules detected from a complete set of SNPs for a mapping study. Second tier: Some modules are furthered classified into submodules to form genetic networks at higher resolution. Third tier: Some submodules need to be decomposed into different sub-submodules, forming fine-grained genetic networks at the individual SNP level. Red and blue arrows denote promotion and inhibition from one SNP to the next, respectively.

FIG. 3.

A diagram of tridimensional SEGN across multiple layers. Top tier: Coarse-grained genetic networks among five modules detected from a complete set of SNPs for a mapping study. Second tier: Some modules are furthered classified into submodules to form genetic networks at higher resolution. Third tier: Some submodules need to be decomposed into different sub-submodules, forming fine-grained genetic networks at the individual SNP level. Red and blue arrows denote promotion and inhibition from one SNP to the next, respectively.

Close modal

Let ql denote the number of SNPs that belong to module l (l = 1, …, L). We take the time-varying means of genetic variance over all SNPs within the same modules and use these means to reconstruct an L-node interaction network among modules. We can further reconstruct L interaction networks among SNPs within each of the L modules. Thus, a large SNP network is decomposed into multiple functionally different but interconnected network communities based on the theory of biological modularity. If the number of SNPs within a module is still large, a further analysis using functional clustering can be conducted to identify more fine-grained network communities. In the end, we will reconstruct a multilayer SNP interactome network, i.e., SEGN, that encapsulates all types of interactions for a complete set of genome-wide SNPs from a circadian study. Figure 4 illustrates the hierarchical structure of such a multiplayer SEGN, at the top tier of which a network was reconstructed with five modules. Each module contains SNPs that display a similar rhythmic pattern of genetic variance according to a certain criterion, but some modules can be further classified if a more stringent criterion is used. For example, module 1 is dissected into four submodules that build a network at the second tier, among which submodule 3 is further split into 8 sub-submodules to form the third-tier network. Similarly, modules 3 and 5 can be decomposed into more fine-grained networks. Multiplayer networks facilitate the fine detection of gene-gene interactions, but at the same time, cover all SNPs from an association study, thus affording a platform to test a variety of hypotheses regarding the molecular mechanisms of biological clocks at unprecedented resolution and coverage.

FIG. 4.

Fourier series-based functional clustering classifies 35 000 simulated SNPs into different modules. On the basis of AIC, the optimal number of modules is determined to be 22 under the third order Fourier series approximation. Ten modules were chosen to show how different modules are expressed rhythmically in various manners. Thin curves in each plot represent rhythmic changes of the genetic variance of individual SNPs, and the thick curve is the mean fitting curve of all SNPs within a module. Data were simulated by mimicking the features of rhythmic data reported in Ref. 136.

FIG. 4.

Fourier series-based functional clustering classifies 35 000 simulated SNPs into different modules. On the basis of AIC, the optimal number of modules is determined to be 22 under the third order Fourier series approximation. Ten modules were chosen to show how different modules are expressed rhythmically in various manners. Thin curves in each plot represent rhythmic changes of the genetic variance of individual SNPs, and the thick curve is the mean fitting curve of all SNPs within a module. Data were simulated by mimicking the features of rhythmic data reported in Ref. 136.

Close modal

Circadian clocks can be regarded as a molecular apparatus composed of clock genes at the transcriptional level. Some of these transcriptional genes regulate rhythmic patterns, whereas others cyclically change their expression in response to the regulation of biological cycles.135–137 Network alteration of these regulating or regulated processes may interpret causes of circadian clock dysregulation for diseased patients.138 Reconstructing such gene regulatory networks (GRNs) from expression data has not only attracted the attention of computational biologists67–70 but also captivated the interest of engineers.120–122 However, although a plethora of computational approaches have been developed, fully informative networks that capture all fundamental properties of interactions, including causality, the direction of causality, feedback cycle, strength, and mobility, are quite lacking. More recently, Chen et al.139 proposed a new computational model for reconstructing mechanistic and dynamic interaction networks from gene expression data at any dimension, showing its potential application to disentangle biological complexities.140 By accommodating the cyclic property of clock genes, this model can be modified and implemented to identify oscillating gene networks.

There has been ample evidence for the genetic control of gene co-expression related to various biological processes.141–143 To our knowledge, there have been no studies thus far that can characterize whether and how specific QTLs control large-scale GRNs for circadian rhythms, but we argue that such QTLs exist, which may affect biological systems at various scales. Our tasks are not only to reconstruct GRNs from gene expression data, as reported in previous studies,67–70,120–122 but also to develop a methodology for mapping QTLs that modulate these GRNs. The identification of QTLs involved in gene networks allow us to gain new insight into the molecular mechanisms of rhythmic processes.

1. Likelihood model

In a mapping or association study population, n individuals are genotyped, monitored for m transcriptional genes, and phenotyped for p rhythmic traits. Transcriptional profiles and phenotypic traits are measured repeatedly at a series of T discrete time points. Let yli = (yli(t1), …, yli(tT)) denote the expression vector of gene l measured for individual i at T time points. Consider a SNP of K genotypes each with a size denoted as nk (k = 1, …, K). The likelihood of gene expression data at this SNP is formulated as

L1μ;Σ=k=1Ki=1nkfky1i,,ymi|μ1k,,μmk;Σ,
(9)

where fk(⋅) is the n-dimensional m-variate normal distribution for m genes across T time points with the mean vector μk for SNP genotype k and covariance matrix Σ. Specifically, we have

μk=μ1k;;μmk=μ1kt1,,μ1ktT;;μmkt1,,μmktT,
(10)

where μlkt is the mean expression level of genotype k for gene l (l = 1, …, m) at time t. The expression amount of each gene includes independent and dependent expression components. The independent component is one that occurs when this gene is assumed to be in isolation, whereas the dependent component is formed due to the regulation of other genes for the focal gene. This argument can be mathematically formulated by a system of genotype-specific ODEs, expressed as

μlkt=Wlkμlkt:Θlk+l=1,lldlkWllkμlkt:Θllk,
(11)

where Wlkμlkt:Θlk is the independent expression component of gene l, determined by this gene's innate capacity expressed as a function of μlkt; Wllkμlkt:Θllk is the dependent expression component of gene l that results from the strategy implemented by gene l, expressed as a function of μlkt; and dlk (≪ m) is the number of genes that significantly influence gene l for genotype k. The determination of dlk is made by a LASSO-based variable selection built on a regression model of yli(t) as a response on yli(t) (l = 1, …, l-1, l + 1, …, m) as predictors over all individuals carrying the same genotype across time points. We allow dlk to vary across SNP genotypes for the same gene, because a focal gene may be regulated by different sets of genes for different genotypes.

We implement a nonparametric approach to smooth Wlkμlkt:Θlk and Wllkμlkt:Θllk, specified by the unknown parameters Θlk and Θllk, respectively. These independent and dependent expression components constitute the time-varying mean vector of gene l for SNP genotype k. Since the basis function is built on μlkt or μlkt, which contains time information, we can still implement autoregressive models, such as AR(1) or SAD(1), to fit the structure of the covariance matrix Σ in the likelihood of Eq. (9). Under the mean-covariance modeling as described above, we implement mathematical and statistical algorithms to solve the likelihood of Eq. (9) and obtain the MLEs of ODE parameters for each SNP genotype. By plugging these MLEs into the independent and dependent expression components in Eq. (11), we reconstruct a series of oscillating m-node sparse gene networks, denoted as Qk(t), for each genotype k.

To determine whether the SNP determines oscillating gene networks, we need to formulate a procedure for hypothesis testing. In the case of no significant QTL, we formulate the null hypothesis that gene networks characterized by ODE parameters in Eq. (11) are invariant among genotypes, i.e., H0: Qk(t)Q(t), under which the likelihood is written as

L0μ;Σ=i=1nfy1i,,ymi|μ1,,μm;Σ,
(12)

where f(⋅) is parameterized by the mean vector of time-varying expression of m genes over all individuals and covariance matrix modeled by an autoregressive process. The likelihood under the alternative hypothesis, i.e., all individuals with the same SNP genotype share the same network, but individuals with different genotypes have different networks, is formulated by Eq. (9). The log-likelihood ratio (LR) calculated under the null and alternative hypotheses is used as a test statistic. The genome-wide critical threshold is determined by permutation tests. A SNP is significant for gene networks if the LR value is beyond a threshold value.

2. Network dissection

Beyond GRNs reconstructed by commonly used correlation-based and Bayesian-based network approaches, Qk(t) can capture all network features by characterizing bidirectional, signed, and weighted gene co-regulation. As shown by Eq. (11), the pattern of gene co-regulation can be assessed by Wll(t). The sign and size of this value reflect the promotion, inhibition, or lack of regulation of gene l′ on gene l. Based on Wll(t) and Wll(t), we can classify gene co-regulation into different types as follows:

  • Synergism by which two genes activate each other if both Wll(t) and Wll(t) are positive.

  • Antagonism by which two genes inhibit each other if both Wll(t) and Wll(t) are negative.

  • Directional synergism by which gene l′ activates gene l, but the latter is neutral to the former if Wll(t) is positive but Wll(t) is zero or vice versa.

  • Directional antagonism by which gene l′ inhibits gene l, but the latter is neutral to the former if Wll(t) is negative but Wll(t) is zero or vice versa.

  • Altruism by which gene l activates gene l′, but the latter inhibits the former if Wll(t) is positive but Wll(t) is negative. Here, the altruism of gene l is the egoism of gene l′.

If both Wll(t) and Wll(t) are zero, then this indicates that the two genes l and l′ peacefully coexist. For synergism and antagonism, we can further define symmetrical synergism and symmetrical antagonism if Wll(t) and Wll(t) of the same sign are equal in size, and asymmetrical synergism and asymmetrical antagonism if Wll(t) and Wll(t) of the same sign are not equal in size.

3. Result interpretation

To explain how a QTL determines gene networks, we consider a simulated example with results illustrated in Fig. 5. Suppose there is a genotyped mapping population designed to characterize the genetic architecture of rhythmic co-regulation of genes in response to the environmental cycle. As an example, we assume that 10 genes are monitored, but the model can handle any number of genes by incorporating network modularity theory. Based on the above LR test, we identify a set of significant QTLs for rhythmic GRNs from a pool of genome-wide SNPs. By reconstructing 10-node gene networks for three different genotypes at a QTL, we found remarkable genotypic differences in network features. First, three genotypes display pronounced differences in the rhythmic pattern of gene expression profiles [Fig. 5(a)]. Notably, 10 genes are expressed more rhythmically in genotype AA than genotype Aa, with both genotypes having a stronger rhythmic pulse than genotype aa. Second, GRNs reconstructed from 10 genes are sparser in genotype AA than genotypes Aa and aa. The three gene networks considerably differ in both structure and organization among genotypes. For example, in QAA(t), genes 9 and 6 form an asymmetric antagonistic relationship, but this relationship does not exist in QAa(t) and Qaa(t). Gene 5 exerts a directional synergistic effect on gene 9 in QAa(t), whereas these genes co-exist peacefully in QAA(t) and Qaa(t). By closely investigating how gene-gene relationships differ among genotypes, one can decipher a global and detailed view into the genetic control mechanisms of specific QTLs on rhythmic GRNs.

FIG. 5.

A simulated example showing how a QTL controls GRNs. A total of 10 genes are assumed to regulate rhythmic activities through their interaction networks. A SNP is regarded as a significant QTL if its genotypes have different gene networks. (A) Rhythmic expression curves of 10 genes varying among three QTL genotype AA, Aa, and aa. (B) Gene networks reconstructed with 10 genes, individually for three different genotypes. Red and blue arrows denote promotion and inhibition from one gene to the next, respectively, with strength proportional to the thickness of lines. Data were simulated by mimicking the features of rhythmic data reported in Ref. 136.

FIG. 5.

A simulated example showing how a QTL controls GRNs. A total of 10 genes are assumed to regulate rhythmic activities through their interaction networks. A SNP is regarded as a significant QTL if its genotypes have different gene networks. (A) Rhythmic expression curves of 10 genes varying among three QTL genotype AA, Aa, and aa. (B) Gene networks reconstructed with 10 genes, individually for three different genotypes. Red and blue arrows denote promotion and inhibition from one gene to the next, respectively, with strength proportional to the thickness of lines. Data were simulated by mimicking the features of rhythmic data reported in Ref. 136.

Close modal

The procedure described in Sec. IV B can identify and map network QTLs that mediate the overall structure and organization of oscillating networks. Here, we develop a framework to test how a QTL affects the emergent properties of gene networks. The properties of a network can be described by many parameters, including connectivity, describing the number of nodes with which a node links within a network;144closeness, describing the degree of linkage of one node to other genes;145betweenness, reflecting the importance of a node as a bridge across the network;146eccentricity, defined as the longest distance of one node to other nodes;147eigenvector centrality, describing the importance of node to neighboring nodes;148 and PageRank, evaluating the quality and quantity of links to a network.149 As defined, these parameters determine the property of a network from different topological aspects.

For each genotype-specific oscillating gene network, we calculate connectivity, closeness, betweenness, eccentricity, eigenvector, and PageRank at different time points and plot these network parameters against time. By comparing the genotype-specific curves of network properties, we can address the following questions: (1) How does a QTL affect network structure in terms of network parameters? (2) How does a QTL pleiotropically affect different network parameters? Answers to these questions can help understand how the human genome encodes instructions of gene regulation for circadian rhythms and find genetic variants that drive genomic differences distinguishing a healthy status from a diseased status.

Clocks contain cyclic genes that drive behavior and physiology to change rhythmically in response to daily cycles. This process operates through high-dimensional, complex casual networks, and, more likely, is controlled by QTLs. The identification of such QTLs can facilitate our mechanistic understanding of genotype-phenotype relationships. Let (yli; zli) = (yli(t1), …, yli(tT); zsi(t1), …, zsi(tT)) denote the expression vector of gene l (l = 1, …, m) and the phenotypic vector of trait s (s = 1, …, p) measured for individual i at T time points. We extend the likelihood function given in Eq. (9) for a given SNP to include information about both gene expression and rhythmic traits. Under the expanded likelihood, we model the expanded mean vector for SNP genotype k by

μlkt=Wlkμlkt:Θlk+l=1,lldlkWllkμlkt:Θllk13aμskt=Rskμskt:Θsk+s=1,ssbskRsskμskt:Θssk+l=1dskRslkμlkt:Θslk,(13b)
(13)

where Eq. (13a) is described as Eq. (11), which models m-node gene networks, and Eq. (13b) characterizes how p rhythmic traits interact with each other to form phenotypic networks and how gene networks causally regulate rhythmic processes. In Eq. (13b), Rsk(⋅) is the independent phenotypic value of trait s (under the assumption that this trait is isolated), Rssk(⋅) is the dependent phenotypic value of traits s that forms due to the influence of other trait s′, and Rslk(⋅) is the dependent phenotypic value of trait s resulting from the regulation of gene l. The first two terms can reconstruct phenotypic networks that reflect trait-trait interactions, and the third term is used to reconstruct casual networks from genes to phenotypes. These three terms are a function of μskt, the time-varying genotypic mean of trait s; μskt, the time-varying genotypic mean of trait s′; and μlkt, the time-varying genotypic mean of gene l, specified by parameters Θsk, Θssk, and Θslk, respectively, in a nonparametric way.

The expanded likelihood contains a bivariate covariance matrix, composed of m-dimensional covariance submatrix (genes), p-dimensional covariance submatrix (traits), and (m × p)-dimensional covariance submatrix (genes vs traits). A bivariate autoregressive model, such as bivariate SAD(1), has proven to be powerful for modeling the longitudinal structure of large covariance matrices.85 Moreover, the existence of closed forms for the determinant and inverse of the bivariate SAD(1) matrix can increase the computational efficiency, despite the high dimensionality of the covariance matrix.

A similar log-likelihood procedure can be implemented to test whether a SNP is significant in affecting causal networks from genes to rhythmic phenotypes. The advantage of network analysis is that one can identify the roadmap of each node through direct and indirect paths toward a targeted phenotype. A further testing procedure can be developed to identify which genes and which paths play a critical role in mediating casual relationships.

Traditional approaches for mapping genotype-phenotype relationships are mostly based on a marginal single-marker analysis. These approaches are not particularly powerful for studying rhythmic behaviors that contain a large number of genes and their complex genetic interaction networks. In Sec. III, we describe a statistical model for reconstructing QTL networks that can systematically characterize the genetic control of circadian rhythms. To reveal the genomic internal workings behind biological processes from genotype to rhythmic phenotype, we describe a second statistical model for mapping individual QTLs that mediate gene regulatory networks of circadian rhythms. Here, we formulate a framework that unifies the above two models to identify and reconstruct QTL networks of gene networks. Let glj(t) denote the genetic variance of gene l (l = 1, …, m) due to SNP j (j = 1, …, q) at time t (t = 1, …, T). We argue that the effect of SNP j on gene l is different in terms of three scenarios as follows. In scenario 1, SNP j only affects gene l but not any other genes, in a way that is independent from other SNPs. In scenario 2, SNP j receives epistatic interactions in a way that its effect on gene l is influenced by other SNPs. In scenario 3, SNP j pleiotropically affects multiple genes so that its effect on gene l is regulated by other genes. The value of glj(t) that contributes to the genetic architecture is the net consequence of these three scenarios. Using the dEGT model, we decompose glj(t) using the following equations,

gljt=Qljgljt:Θlj+j=1,jjdljQljjgljt:Θljj+l=1,llbljWlljgljt:Θllj,
(14)

where Qljgljt:Θlj is the independent genetic variance of gene l due to SNP j, which is expected to occur when both SNP j and gene l are assumed to be in isolation. Qljjgljt:Θljj is the epistatic dependent genetic variance of gene l due to interactions of other dlj SNPs with SNP j, and Wlljgljt:Θllj is the pleiotropic dependent variance of gene l due to influences of other blj genes under the control of SNP j. Note that the number of the most significant SNPs that interact with SNP j (dljp) and the number of the most significant genes that regulate gene l (bljm) are determined by LASSO-based variable selection. A likelihood is formulated to solve the ODEs of Eq. (14) by implementing LOP-based nonparametric models to smooth the independent genetic variance, the epistatic dependent genetic variance, and pleiotropic dependent variance, with ODE parameters Θlj, Θljj, and Θllj, respectively, and autoregressive models to fit the covariance structure. The MLEs of ODE parameters allow us to reconstruct a tridimensional QTL interaction network of gene networks.

The nonlinear equations can be expanded to include rhythmic phenotypes. Let gsj(t) denote the genetic variance of rhythmic trait s (s = 1, …, p) explained by SNP j (j = 1, …, q) at time t (t = 1, …, T). By adding gsj(t) to the replicator equations, we can reconstruct a tridimensional QTL interaction network of causal gene-phenotype networks for circadian rhythms. To explain how such a tridimensional network works, we hypothesize a clock mediated by three QTLs that regulate three genes and two rhythmic traits (Fig. 6). This tridimensional network includes intertwined pleiotropic networks and epistatic networks across DNA sequences. In the pleiotropic networks, we can characterize how a QTL pleiotropically affects the expression of multiples genes and multiple rhythmic traits to form dynamic causal regulatory networks from genes to phenotypes. Under the control of SNP 1, gene 1 inhibits gene 3 and also promotes gene 2 that promotes phenotype 1, but these causal roadmaps change under the control of the other SNPs. For example, SNP 2 changes the relationship between gene 1 and 3; under its control, gene 1 is inhibited by gene 3, which promotes phenotype 1.

FIG. 6.

A hypothetical clock demonstrating how epistatic networks intertwine with pleiotropic networks during rhythmic cycles. Assume that three genes G1, G2, and G3 regulate rhythmic phenotypes P1 and P2, and this process is controlled by three SNPs. Under the control of each SNP, genes are causally linked with phenotypes through a pleiotropic network (shown on yellow ellipses). For the same gene or phenotype, different SNPs control their expression through epistatic networks (indicated by a dotted triangle). Pleiotropic networks differ in topological structure among SNPs, indicating that different SNPs affect causal gene-phenotypic networks in different ways. SNPs affect a gene or phenotype through different networks (green for G1, purple for G2, green-blue for G3, blue for P1, and black for P2), suggesting that each gene or phenotype is encoded by a different genetic system. Red and blue arrows denote promotion and inhibition from one gene to the next, respectively.

FIG. 6.

A hypothetical clock demonstrating how epistatic networks intertwine with pleiotropic networks during rhythmic cycles. Assume that three genes G1, G2, and G3 regulate rhythmic phenotypes P1 and P2, and this process is controlled by three SNPs. Under the control of each SNP, genes are causally linked with phenotypes through a pleiotropic network (shown on yellow ellipses). For the same gene or phenotype, different SNPs control their expression through epistatic networks (indicated by a dotted triangle). Pleiotropic networks differ in topological structure among SNPs, indicating that different SNPs affect causal gene-phenotypic networks in different ways. SNPs affect a gene or phenotype through different networks (green for G1, purple for G2, green-blue for G3, blue for P1, and black for P2), suggesting that each gene or phenotype is encoded by a different genetic system. Red and blue arrows denote promotion and inhibition from one gene to the next, respectively.

Close modal

In the epistatic networks (Fig. 6), we can characterize how different SNPs interact with each other to determine the expression of a gene or a phenotypic trait. Gene 1 is affected by an epistatic network in which SNP 1 promotes SNP 3 and SNP 2, whereas SNP 2 inhibits SNP 3. Phenotype 1 is affected by an epistatic network that is structurally the same as but quantitatively different from the epistatic network of gene 1. For other genes and phenotypes, we found distinct differences in epistatic network topology. Taken together, the tridimensional network charts the change of the pleiotropic landscape of genes and phenotypes from SNP to SNP and the change of the epistatic landscape of SNPs from genes to phenotypes.

A clock contains numerous cyclically expressed genes that mediate biological rhythms, a process encoded by the genome. Over the past decades, mutagenesis-based molecular genetic analysis has considerably contributed to the identification of clock genes that are required for rhythmic oscillations in response to the light/dark cycle, establishing the fundamental understanding of the genetic mechanisms underlying circadian rhythms. The 2017 Nobel Prize in Physiology or Medicine was awarded to Jeffrey C. Hall, Michael Rosbash, and Michael W. Young for their pioneering work in this establishment.20 With the advent of advanced sequencing techniques, current molecular studies have shifted from the identification of individual rhythmic genes to the genome-wide landscaping of transcriptomic genes.136,137,150 This paradigm shift has led to the discoveries of a number of new genes that rhythmically synchronize cellular metabolism and organismal behavior through the internal oscillators, or clocks.

The statistical models reviewed in this article can facilitate the promotion of this shift to a generic and wide domain without relying on the use of rhythmic mutants. As a routine genetic approach, linkage or association mapping populations have been produced worldwide for a wide array of species during the past three decades. These populations provide a rich biobank of genetic variants that may be responsible for rhythmic variation.49,151,152 More importantly, mapping approaches can stimulate the discoveries of new or non-conserved clock genes that are involved in circadian rhythms at the transcriptional level and beyond.

A number of statistical methods have been widely developed and applied to map circadian rhythms. One model, called functional mapping, integrates the mathematical aspects of circadian rhythms to map how a cQTL regulates molecular and physiological profiles rhythmically and to test by which parameter, period, phase, or amplitude the cQTL determines the temporal pattern of circadian rhythms.56–62 Because of a full use of longitudinal measures across multiple points, functional mapping can increase the power of QTL detection. However, most existing models aim to find individual cQTLs, failing to characterize the genetic complexity of rhythmic physiology and behavior. The omnigenic theory even suggests that a complex trait is determined by all genome-wide distributed genes carried by an organism.96 Thus, it is highly crucial for reconstructing a systematic network of how each gene acts and interacts with every other gene to contribute to phenotypic variation. The inference of such an omnigenic network is statistically challenging, but once reconstructed, it can provide a powerful tool to extract and excavate the new organizing principles of circadian rhythms.

In this article, we assemble and integrate advanced approaches for QTL functional mapping and gene network reconstruction through high-dimensional statistical modeling into a unified framework for inferring large-scale genetic networks that encompass circadian clocks. Different from GRNs widely reviewed in a range of biological, physical, and engineering literature, this review represents the first among its kind regarding SNP interactome networks. The reviewed statistical methodology overcomes several technical issues, typical of SNP-SNP network reconstruction. First, GRNs are reconstructed from continuous or semi-continuous abundance data that directly reflect the expression levels of different genes, whereas SNP data describe discrete genotypes of different individuals, which become meaningful for network reconstruction only after genotypes are associated with phenotypes of interest. We translate genetic information carried by individual SNPs into their continuous genetic effects by functional mapping, with which a series of nonlinear Lotka-Volterra equations are derived on the basis of evolutionary game theory. These equations establish a basis for a graph theory that codes SNP-SNP interactions into mathematical networks, i.e., SEGNs.

Second, since a complex trait may be controlled by all genes an organism carries, there is a necessity to reconstruct interaction networks that cover a complete set of genes. However, reconstructing a completely linked network from high-dimensional gene data is highly challenging in practice and also is not meaningful from an evolutionary perspective. We implement two advanced statistical approaches, functional clustering and variable selection, to classify all SNPs into distinct modules according to their similarity of temporal genetic effects and select a small set of the most significant SNPs that influence a focal SNP. This type of implementation facilitates the reconstruction of multilayer, sparse, and large-scale genetic networks filled by all SNPs from a mapping or association study. Third, a SNP is determined to be insignificant by commonly used marginal statistical approaches, such as functional mapping, but this detection is the net consequence of the intrinsic action of this SNP and its interactions with other SNPs. The statistical model reviewed in this article can decompose the net effect of a SNP into its independent effect, expected to occur when this SNP is assumed to be in isolation, and dependent effect, resulting from the interactions of other SNPs with the focal SNP. Thus, the insignificance of a SNP by marginal mapping does not necessarily indicate that this SNP is not important in mediating circadian rhythms since it may be confounded by its negative epistasis triggered by other SNPs. Thus, by knocking out these epistatic SNPs, we may clearly understand and better use the genetic effect of this SNP on rhythmic activities.

Fourth, to reveal the causal links from genotype to high-order phenotype, increasing studies have begun to integrate transcriptional, proteomic, and metabolomic profiles into mapping paradigms. However, most of these studies model the individual roles of different genes, proteins, or metabolites, rather the synthetic role of all these entities as a cohesive whole, in mediating genotype-phenotype relationships. By reconstructing a series of networks of networks, the statistical methods reviewed in this article leverage networks as a backbone of linking genotype to phenotype. Networks inferred at a single level of biological organization have been used in the past, but charting information flows of horizontal networks from lower microscopic organization levels (upstream) of molecules to higher macroscopic levels (downstream) of the whole organism via vertical networks has not been explored. Reconstructing intertwined networks is founded on the fundamental premise that networks at different scales share similar global statistical features and structural design principles. The reviewed methods will potentially fill the gap in the systematic, mechanistic characterization of holistic genotype-phenotype relationships for network biology and network medicine.

From the networks reconstructed at different scales, we can not only characterize key significant QTLs responsible for circadian rhythms but also chart a global picture of how each (significant or insignificant) QTL interacts with every other possible QTL to regulate rhythmic phase, amplitude, and period. A detailed analysis of rhythmic networks enables the discovery of intricate but well-orchestrated structural design principles underlying circadian rhythms. We describe and review the advanced statistical methods for reconstructing genetic networks underlying circadian rhythms, but these methods can be readily extended to study any other types of biological rhythms. The methods can be improved in several aspects for a broader use. One prominent research direction is the incorporation of environmental factors, spatial scales (such as different tissues or cells), and physiological states into network reconstruction at different levels, allowing the fundamental questions of how environmental and developmental agents affect network topology or how alteration of gene networks mediates rhythmic changes in physiology and behavior to be addressed. We deploy a general statistical procedure to establish a network framework, but mathematical, statistical, computational solutions of framework are likely to remain data-dependent. Optimal techniques should be developed to suit the given datasets, especially when the data are heterogeneous. Finally, and not least, we need to closely collaborate with experimental biologists or clinicians to justify the networks reconstructed by the statistical models reviewed by performing in-vitro or in-vivo experiments. Gene discoveries made by these justified methods will greatly advance the study of chronobiology and the development of chronotherapies.

Let yj = (yj(t1), …, yj(tT)) denote a vector of estimated time-varying genetic variance values explained by SNP j (j = 1, …, m). Given these data, the likelihood function of model parameters ϕ = (μ, Σ)Φ is written as

L(μ,Σ)=f(y1,,ym:μ,Σ),
(15)

where f(⋅) is the m-variate tT-dimensional longitudinal multivariate normal distribution with mean vector g = (g1, …, gm) with gi = (gi(t1), …, gi(tT)) and the covariance matrix of ei(t), Σ. As described below, we will model mean-covariance structures.

The time-varying genetic variance of each SNP is modeled by a system of ODEs described by Eq. (1) containing the independent and dependent components. Each component is fitted by a nonparametric approach, such as B-splines, regression B-spline, penalized B-spline, local polynomials, and Legendre orthogonal polynomials (LOP). Because of its advantage in orthogonality and efficient convergence, LOP has been used to model the curves of any complex form using sparse data in quantitative genetic studies.88,153 The LOP is a solution of the Legendre differential equation, (1 – v2)(d2u/dv2) – 2v(du/dv) + r(r + 1)u = 0. Let PjR(t) = (Pj1(t), …, PjR(t)) denote a vector of LOP including the first R orders for SNP j at time t, and αj = (αj1, …, αjR) denote a vector of basis values of time-invariant independent genetic variance of SNP j. Then, the independent genetic variance of SNP j is expressed as

git=PiRTtαi.
(16)

An optimal order of LOP may be SNP-specific; i.e., each SNP may have a different LOP order. The optimal order for each SNP j can be determined via an information criterion, such as AIC or BIC. Similarly, we use LOP to model time-varying dependent genetic variances. Time-invariant independent and dependent genetic variances, arrayed in Θj and Θjj, respectively, are the unknown ODE parameters to be estimated [see Eq. (1)].

Since the residual covariance matrix Σ contains an autocorrelative structure, its structural modeling using a parsimonious time-series approach can increase the precision of ODE parameter estimation and computational efficiency. Given its power to structure the longitudinal covariance of quantitative traits in genetic mapping studies,84,85 the structured antedependence (SAD) model, developed by Zimmerman and Núñez-Antón,154 is implemented into likelihood [see Eq. (15)]. The SAD assumes that residual errors at time t are not only composed of innovation errors specifically produced at this time point but also contain a proportion of the residual errors from the preceding time points. The size of this proportion, i.e., the degree of antedependence (ρ), decays with time lag. The first-order SAD [SAD(1)] only considers the dependence of errors at the immediate time point. The innovation error for SNP j is iid with mean zero and variance δj2, which is assumed to be constant across time points. Two parameters, ρj and δj2, can well model the structure of Σ.

With joint ODE and SAD(1) modeling, the parameters involved in likelihood [see Eq. (15)] are re-written as ϕ = Θj,Θjj,ρj,δj2jj=1m. We can obtain the optimal solution of these parameters by maximizing the likelihood of Eq. (15), expressed as

ϕ̂argmaxϕΦLμ,Σ.

We implement a hybrid algorithm of the fourth-order Runge-Kutta (RK4) algorithm and simplex approach to solve the likelihood incorporated by a system of LOP-transformed ODEs [Eq. (1)]. Intuitively, this maximization that makes the data most probable implies an optimal topological structure and organization by which genes interact with each other to maximize the joint expression of all genes. This solution is therefore consistent with the basic principle of evolutionary game theory.75 

L.S. conceived of the biological topic and contributed to writing. A.D. performed data analysis and wrote the code. C.G. critically reviewed the manuscript and provided insight into the integration of game theory. R.W. supervised the project and drafted the manuscript with inputs from L.S. and C.G.

L.S.'s work was supported by the National Natural Science Foundation of China (Grant No. 31870689), Forestry and Grassland Science and Technology Innovation Youth Top Talent Project of China (Grant No. 2020132608), and the National Key Research and Development Program of China (Grant No. 2018YFD1000401). C.G.'s work was supported in part by the National Science Foundation (Grant No. DMS-1814876).

The data and code can be available upon request from the corresponding author.

1.
M. W.
Young
and
S. A.
Kay
, “
Time zones: A comparative genetics of circadian clocks
,”
Nat. Rev. Genet.
2
(
9
),
702
715
(
2001
).
2.
U.
Albrecht
, “
Timing to perfection: The biology of central and peripheral circadian clocks
,”
Neuron
74
,
246
260
(
2012
).
3.
S.
Sahar
and
P.
Sassone-Corsi
, “
Regulation of metabolism: The circadian clock dictates the time
,”
Trends Endocrinol. Metab.
23
(
1
),
1
8
(
2012
).
4.
G. K.
Paschos
and
G. A.
FitzGerald
, “
Circadian clocks and metabolism: Implications for microbiome and aging
,”
Trends Genet.
33
(
10
),
760
769
(
2017
).
5.
Y.
Serin
and
N.
Acar Tek
, “
Effect of circadian rhythm on metabolic processes and the regulation of energy balance
,”
Ann. Nutr. Metab.
74
(
4
),
322
330
(
2019
).
6.
J.
Bass
and
M. A.
Lazar
, “
Circadian time signatures of fitness and disease
,”
Science
354
(
6315
),
994
999
(
2016
).
7.
J. T.
Bass
, “
The circadian clock system's influence in health and disease
,”
Genome Med.
9
(
1
),
94
(
2017
).
8.
F.
Rijo-Ferreira
and
J. S.
Takahashi
, “
Genomics of circadian rhythms in health and disease
,”
Genome Med.
11
(
1
),
82
(
2019
).
9.
Y.
Ouyang
,
C. R.
Andersson
,
T.
Kondo
,
S. S.
Golden
, and
C. H.
Johnson
, “
Resonating circadian clocks enhance fitness in cyanobacteria
,”
Proc. Natl. Acad. Sci.
95
,
8660
8664
(
1998
).
10.
A. N.
Dodd
,
N.
Salathia
,
A.
Hall
 et al, “
Plant circadian clocks increase photosynthesis, growth, survival, and competitive advantage
,”
Science
309
,
630
633
(
2005
).
11.
J. A.
Kim
,
H. S.
Kim
,
S. H.
Choi
 et al, “
The importance of the circadian clock in regulating plant metabolism
,”
Intl. J. Mol. Sci.
18
(
12
),
2680
(
2017
).
12.
D.
Srivastava
,
M.
Shamim
,
M. A.
Kumar
 et al, “
Role of circadian rhythm in plant system: An update from development to stress response
,”
Environ. Exp. Bot.
162
,
256
271
(
2019
).
13.
E. D.
Buhr
and
J. S.
Takahashi
, “
Molecular components of the mammalian circadian clock
,”
Handb. Exp. Pharmacol.
217
,
3
27
(
2013
).
14.
P. L.
Lowrey
and
J. S.
Takahashi
, “
Genetics of circadian rhythms in mammalian model organisms
,”
Adv. Genet.
74
,
175
230
(
2011
).
15.
R. C.
Huang
, “
The discoveries of molecular mechanisms for the circadian rhythm: The 2017 Nobel Prize in Physiology or Medicine
,”
Biomed. J.
41
(
1
),
5
8
(
2018
).
16.
K. H.
Cox
and
J. S.
Takahashi
, “
Circadian clock genes and the transcriptional architecture of the clock mechanism
,”
J. Mol. Endocrinol.
63
(
4
),
R93
R102
(
2019
).
17.
T. A.
Bargiello
and
M. W.
Young
, “
Molecular genetics of a biological clock in Drosophila
,”
Proc. Natl. Acad. Sci.
81
,
2142
2146
(
1984
).
18.
T. A.
Bargiello
,
F. R.
Jackson
, and
M. W.
Young
, “
Restoration of circadian behavioural rhythms by gene transfer in Drosophila
,”
Nature
312
,
752
754
(
1984
).
19.
P.
Reddy
,
W. A.
Zehring
,
D. A.
Wheeler
,
V.
Pirrotta
,
C.
Hadfield
, and
J. C.
Hall
, “
Molecular analysis of the period locus in Drosophila melanogaster and identification of a transcript involved in biological rhythms
,”
Cell
38
,
701
710
(
1984
).
20.
C.
Ibäñez
, “
Scientific background: Discoveries of molecular mechanisms controlling the circadian rhythm
,” Advanced Information, Nobelprize.org. http://www.nobelprize.org/nobel_prizes/medicine/laureates/2017/advanced.html.
21.
M. A.
Rosbash
, “
50-year personal journey: Location, gene expression, and circadian rhythms
,”
Cold Spring Harb. Perspect. Biol.
9
,
a032516
(
2017
).
22.
S. T.
Crews
,
J. B.
Thomas
, and
C. S.
Goodman
, “
The Drosophila single-minded gene encodes a nuclear protein with sequence similarity to the per gene product
,”
Cell
52
,
143
152
(
1988
).
23.
K. K.
Siwicki
,
C.
Eastman
,
G.
Petersen
,
M.
Rosbash
, and
J. C.
Hall
, “
Antibodies to the period gene product of Drosophila reveal diverse tissue distribution and rhythmic changes in the visual system
,”
Neuron
1
,
141
150
(
1988
).
24.
M. P.
Myers
,
K.
Wager-Smith
,
C. S.
Wesley
,
M. W.
Young
, and
A.
Sehgal
, “
Positional cloning and sequence analysis of the Drosophila clock gene timeless
,”
Science
270
,
805
808
(
1995
).
25.
A.
Sehgal
,
A.
Rothenfluh-Hilfiker
,
M.
Hunter-Ensor
,
Y.
Chen
,
M.
Myers
, and
M. W.
Young
, “
Rhythmic expression of timeless: A basis for promoting circadian cycles in period gene autoregulation
,”
Science
270
,
808
810
(
1995
).
26.
M. H.
Vitaterna
,
D. P.
King
,
A. M.
Chang
 et al, “
Mutagenesis and mapping of a mouse gene, Clock, essential for circadian behavior
,”
Science
264
,
719
725
(
1994
).
27.
D. P.
King
,
Y.
Zhao
,
A. M.
Sangoram
 et al, “
Positional cloning of the mouse circadian clock gene
,”
Cell
89
(
4
),
641
653
(
1997
).
28.
M. P.
Antoch
,
E. J.
Song
,
A. M.
Chang
 et al, “
Functional identification of the mouse circadian Clock gene by transgenic BAC rescue
,”
Cell
89
(
4
),
655
667
(
1997
).
29.
N.
Gekakis
,
D.
Staknis
,
H. B.
Nguyen
 et al, “
Role of the CLOCK protein in the mammalian circadian mechanism
,”
Science
280
,
1564
1569
(
1998
).
30.
M. E.
Eriksson
and
A. J.
Millar
, “
The circadian clock: A plant's best friend in a spinning world
,”
Plant Physiol.
132
,
732
738
(
2003
).
31.
M.
Loza-Correa
,
L.
Gomez-Valero
, and
C.
Buchrieser
, “
Circadian clock proteins in prokaryotes: Hidden rhythms?
,”
Front. Microbiol.
1
,
130
(
2010
).
32.
M.
Kamioka
,
S.
Takao
,
T.
Suzuki
 et al, “
Direct repression of evening genes by CIRCADIAN CLOCK-ASSOCIATED1 in the Arabidopsis circadian clock
,”
Plant Cell
28
(
3
),
696
711
(
2016
).
33.
A. K.
Michael
,
J. L.
Fribourgh
,
Y.
Chelliah
 et al, “
Formation of a repressive complex in the mammalian circadian clock is mediated by the secondary pocket of CRY1
,”
Proc. Natl. Acad. Sci.
114
(
7
),
1560
1565
(
2017
).
34.
J.
Takahashi
, “
Transcriptional architecture of the mammalian circadian clock
,”
Nat. Rev. Genet.
18
,
164
179
(
2017
).
35.
J.
Yan
,
H.
Wang
,
Y.
Liu
, and
C.
Shao
, “
Analysis of gene regulatory networks in the mammalian circadian rhythm
,”
PLoS Comput. Biol.
4
(
10
),
e1000193
(
2008
).
36.
N.
Koike
,
S. H.
Yoo
,
H. C.
Huang
 et al, “
Transcriptional architecture and chromatin landscape of the core circadian clock in mammals
,”
Science
338
(
6105
),
349
354
(
2012
).
37.
J. S.
Menet
,
J.
Rodriguez
,
K. C.
Abruzzi
, and
M.
Rosbash
, “
Nascent-Seq reveals novel features of mouse circadian transcriptional regulation
,”
eLife
1
,
e00011
(
2012
).
38.
O. A.
Podkolodnaya
,
N. N.
Podkolodnaya
, and
N. L.
Podkolodnyy
, “
The mammalian circadian clock: Gene regulatory network and computer analysis
,”
Russ. J. Genet. Appl. Res.
5
,
354
362
(
2015
).
39.
N.
Salathia
,
K.
Edwards
, and
A. J.
Millar
, “
QTL for timing: A natural diversity of clock genes
,”
Trends Genet.
18
(
3
),
115
118
(
2002
).
40.
R. J.
Konopka
and
S.
Benzer
, “
Clock mutants of Drosophila melanogaster
,”
Proc. Natl. Acad. Sci.
68
(
9
),
2112
2116
(
1971
).
41.
N.
Cermakian
,
L.
Monaco
,
M. P.
Pando
 et al, “
Altered behavioral rhythms and clock gene expression in mice with a targeted mutation in the Period1 gene
,”
EMBO J.
20
(
15
),
3967
3974
(
2001
).
42.
D.
Bell-Pedersen
,
V.
Cassone
,
D.
Earnest
 et al, “
Circadian rhythms from multiple oscillators: Lessons from diverse organisms
,”
Nat. Rev. Genet.
6
,
544
556
(
2005
).
43.
Y.
Niwa
,
T.
Matsuo
,
K.
Onai
 et al, “
Phase-resetting mechanism of the circadian clock in Chlamydomonas reinhardtii
,”
Proc. Natl. Acad. Sci.
110
(
33
),
13666
13671
(
2013
).
44.
H.
Oike
, “
Modulation of circadian clocks by nutrients and food factors
,”
Biosci. Biotechnol. Biochem.
81
(
5
),
863
870
(
2017
).
45.
E.
Poliner
,
C.
Cummings
,
L.
Newton
, and
E. M.
Farré
, “
Identification of circadian rhythms in Nannochloropsis species using bioluminescence reporter lines
,”
Plant J.
99
(
1
),
112
127
(
2019
).
46.
C.
Miyoshi
,
S. J.
Kim
,
T.
Ezaki
 et al, “
Methodology and theoretical basis of forward genetic screening for sleep/wakefulness in mice
,”
Proc. Natl. Acad. Sci.
116
(
32
),
16062
16067
(
2019
).
47.
K.
Swarup
,
C.
Alonso-Blanco
,
J. R.
Lynn
 et al, “
Natural allelic variation identifies new genes in the Arabidopsis circadian system
,”
Plant J.
20
(
1
),
67
77
(
1999
).
48.
M. U.
Anwer
and
S. J.
Davis
, “
An overview of natural variation studies in the Arabidopsis thaliana circadian clock
,”
Semin. Cell. Dev. Biol.
24
(
5
),
422
429
(
2013
).
49.
K.
Shimomura
,
S. S.
Low-Zeddies
,
D. P.
King
 et al, “
Genome-wide epistatic interaction analysis reveals complex genetic determinants of circadian behavior in mice
,”
Genome Res.
11
(
6
),
959
980
(
2001
).
50.
R. E.
Kerwin
,
J. M.
Jimenez-Gomez
,
D.
Fulop
 et al, “
Network quantitative trait loci mapping of circadian clock outputs identifies metabolic pathway-to-clock linkages in Arabidopsis
,”
Plant Cell
23
(
2
),
471
485
(
2011
).
51.
S. E.
Jones
,
J. M.
Lane
,
A. R.
Wood
 et al, “
Genome-wide association analyses of chronotype in 697,828 individuals provides insights into circadian rhythms
,”
Nat. Commun.
10
,
343
(
2019
).
52.
S. E.
Jones
,
J.
Tyrrell
,
A. R.
Wood
 et al, “
Genome-wide association analyses in 128,266 individuals identifies new morningness and sleep duration loci
,”
PLoS Genet.
12
(
8
),
e1006125
(
2016
).
53.
A.
Jagannath
,
L.
Taylor
,
Z.
Wakaf
 et al, “
The genetics of circadian rhythms, sleep and health
,”
Hum. Mol. Genet.
26
(
R2
),
R128
R138
(
2017
).
54.
A.
Ferguson
,
L. M.
Lyall
,
J.
Ward
 et al, “
Genome-wide association study of circadian rhythmicity in 71,500 UK biobank participants and polygenic association with mood instability
,”
EBioMedicine
35
,
279
287
(
2018
).
55.
M. J.
Rubin
,
M. T.
Brock
,
S. J.
Davis
, and
C.
Weinig
, “
QTL underlying circadian clock parameters under seasonally variable field settings in Arabidopsis thaliana
,”
G3
9
(
4
),
1131
1139
(
2019
).
56.
T.
Liu
,
X. L.
Liu
,
Y. M.
Chen
, and
R. L.
Wu
, “
A computational model for functional mapping of genes that regulate intra-cellular circadian rhythms
,”
Theor. Biol. Med. Model.
4
,
5
(
2007
).
57.
N.
Li
,
T.
McMurry
,
A.
Berg
 et al, “
Functional clustering of periodic transcriptional profiles through ARMA(p,q)
,”
PLoS ONE
5
(
4
),
e9894
(
2010
).
58.
B.-R.
Kim
,
R. C.
Littell
, and
R. L.
Wu
, “
Clustering periodic patterns of gene expression based on Fourier approximations
,”
Curr. Genom.
7
,
197
203
(
2006
).
59.
B.-R.
Kim
,
L.
Zhang
,
A.
Berg
,
J.
Fan
, and
R. L.
Wu
, “
A computational approach to the functional clustering of periodic gene expression profiles
,”
Genetics
180
,
821
834
(
2008
).
60.
B. R.
Kim
,
T.
McMurry
,
W.
Zhao
,
A.
Berg
, and
R. L.
Wu
, “
Wavelet-based functional clustering for high-dimensional dynamic gene expression patterns
,”
J. Comp. Biol.
17
,
1067
1080
(
2010
).
61.
G. F.
Fu
,
J.
Luo
,
A.
Berg
 et al, “
A dynamic model for functional mapping of biological rhythms
,”
J. Biol. Dyn.
4
,
1
10
(
2010
).
62.
G. F.
Fu
,
Z.
Wang
,
J. H.
Li
, and
R. L.
Wu
, “
A mathematical framework for functional mapping of complex phenotypes using delay differential equations
,”
J. Theor. Biol.
289
,
206
216
(
2011
).
63.
K.
Wei
,
Q.
Wang
,
J. W.
Gan
 et al, “
Mapping genes for drug chronotherapy
,”
Drug Discov. Today
23
,
1883
1888
(
2018
).
64.
R. L.
Wu
and
M.
Lin
, “
Functional mapping – how to map and study the genetic architecture of dynamic complex traits
,”
Nat. Rev. Genet.
7
,
229
237
(
2006
).
65.
L. D.
Sun
and
R. L.
Wu
, “
Mapping complex traits as a dynamic system
,”
Phys. Life Rev.
13
,
155
185
(
2015
).
66.
Z.
Li
and
M. J.
Sillanpää
, “
Dynamic quantitative trait locus analysis of plant phenomic data
,”
Trends Plant Sci.
20
,
822
833
(
2015
).
67.
N.
Vijesh
,
S. K.
Chakrabarti
, and
J.
Sreekumar
, “
Modeling of gene regulatory networks: A review
,”
J. Biomed. Sci. Eng.
6
,
223
231
(
2013
).
68.
Y. X.
Wang
and
H.
Huang
, “
Review on statistical methods for gene network reconstruction using expression data
,”
J. Theor. Biol.
362
,
53
61
(
2014
).
69.
V.
Huynh-Thu
and
G.
Sanguinetti
, “
Gene regulatory network inference: An introductory survey
,”
Methods Mol. Biol.
1883
,
1
23
(
2019
).
70.
H.
Yaghoobi
,
S.
Haghipour
,
H.
Hamzeiy
, and
M.
Asadi-Khiavi
, “
A review of modeling techniques for genetic regulatory networks
,”
J. Med. Signals. Sens.
2
(
1
),
61
70
(
2012
).
71.
L. B.
Jiang
,
H.
Shi
,
M.
Sang
 et al, “
A computational model for inferring QTL control networks underlying developmental covariation
,”
Front. Plant Sci.
10
,
1557
(
2019
).
72.
E. E.
Zhang
and
S. A.
Kay
, “
Clocks not winding down: Unravelling circadian networks
,”
Nat. Rev. Mol. Cell Biol.
11
,
764
776
(
2010
).
73.
J. V.
von Neumann
and
O.
Morgenstern
,
Theory of Games and Economic Behavior
(
Princeton University Press
,
Princeton, NJ
,
1944
).
74.
J. F.
Nash
, “
Equilibrium points in N-person games
,”
Proc. Natl. Acad. Sci.
36
,
48
49
(
1950
).
75.
J. M.
Smith
and
G. R.
Price
, “
Logic of animal conflict
,”
Nature
246
(
5427
),
15
18
(
1973
).
76.
R. A.
Fisher
,
The Genetic Theory of Natural Selection
(
Clarendon Press
,
Oxford
,
1930
).
77.
C. X.
Ma
,
G.
Casella
, and
R. L.
Wu
, “
Functional mapping of quantitative trait loci underlying the character process: A theoretical framework
,”
Genetics
161
,
1751
1762
(
2002
).
78.
R. L.
Wu
,
C. X.
Ma
,
M.
Lin
 et al, “
A general framework for analyzing the genetic architecture of developmental characteristics
,”
Genetics
166
,
1541
1551
(
2004
).
79.
C.
Wu
,
G.
Li
,
J.
Zhu
, and
Y. H.
Cui
, “
Functional mapping of dynamic traits with robust t-distribution
,”
PLoS ONE
6
(
9
),
e24902
(
2011
).
80.
G.
Liu
,
M.
Li
,
J.
Wen
,
Y.
Du
, and
Y.-M.
Zhang
, “
Functional mapping of quantitative trait loci associated with rice tillering
,”
Mol. Genet. Genom.
284
,
263
271
(
2010
).
81.
Z.
Li
,
R.
Henrik
,
S.
Hallingbäck
 et al, “
Functional multi-locus QTL mapping of temporal trends in Scots pine wood traits
,”
G3
4
(
12
),
2365
2379
(
2014
).
82.
I. Y.
Kwak
,
C. R.
Moore
,
E. P.
Spalding
, and
K. W.
Broman
, “
Mapping quantitative trait loci underlying function-valued traits using functional principal component analysis and multi-trait mapping
,”
G3
6
(
1
),
79
86
(
2016
).
83.
D. H.
Lyra
,
N.
Virlet
,
P.
Sadeghi-Tehran
 et al, “
Functional QTL mapping and genomic prediction of canopy height in wheat measured using a robotic field phenotyping platform
,”
J. Exp. Bot.
71
(
6
),
1885
1898
(
2020
).
84.
W.
Zhao
,
Y. Q.
Chen
,
G.
Casella
,
J. M.
Cheverud
, and
R. L.
Wu
, “
A non-stationary model for functional mapping of complex traits
,”
Bioinformatics
21
,
2469
2477
(
2005
).
85.
W.
Zhao
,
W.
Hou
,
R. C.
Littell
, and
R. L.
Wu
, “
Structured antedependence models for functional mapping of multivariate longitudinal traits
,”
Stat. Methods Mol. Genet. Biol.
4
(
1
),
33
(
2005
).
86.
M.
Karanasos
, “
A new method for obtaining the autocovariance of an ARMA model: An exact form solution
,”
Econometric Theory
14
,
622
640
(
1998
).
87.
J.
Yap
,
J.
Fan
, and
R. L.
Wu
, “
Nonparametric modeling of covariance structure in functional mapping of quantitative trait loci
,”
Biometrics
65
,
1068
1077
(
2009
).
88.
K.
Das
,
J. H.
Li
,
Z.
Wang
 et al, “
A dynamic model for genome-wide association studies
,”
Hum. Genet.
129
,
629
639
(
2011
).
89.
R.
Yang
and
S.
Xu
, “
Bayesian shrinkage analysis of quantitative trait loci for dynamic traits
,”
Genetics
176
,
1169
1185
(
2007
).
90.
E. B.
Klerman
and
M. A.
St Hilaire
, “
On mathematical modeling of circadian rhythms, performance, and alertness
,”
J. Biol. Rhyth.
22
(
2
),
91
102
(
2007
).
91.
A.
Asgari-Targhi
and
E. B.
Klerman
, “
Mathematical modeling of circadian rhythms
,”
Wiley Interdiscip. Rev. Syst. Biol. Med.
11
(
2
),
e1439
(
2019
).
92.
P. T.
Spellman
,
G.
Sherlock
,
M. Q.
Zhang
 et al, “
Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization
,”
Mol. Biol. Cell
9
(
12
),
3273
3297
(
1998
).
93.
J. K.
Dale
,
M.
Maroto
,
M. L.
Dequeant
 et al, “
Periodic notch inhibition by lunatic fringe underlies the chick segmentation clock
,”
Nature
421
,
275
278
(
2003
).
94.
T.
Konopka
and
M.
Rooman
, “
Gene expression model (in)validation by Fourier analysis
,”
BMC Syst. Biol.
4
,
123
(
2010
).
95.
R.
Yang
and
Z.
Su
, “
Analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation
,”
Bioinformatics
26
,
i168
i174
(
2010
).
96.
E. A.
Boyle
,
Y. I.
Li
, and
J. K.
Pritchard
, “
An expanded view of complex traits: From polygenic to omnigenic
,”
Cell
169
,
1177
1186
(
2017
).
97.
D.
Garlaschelli
,
G.
Caldarelli
, and
L.
Pietronero
, “
Universal scaling relations in food webs
,”
Nature
423
,
165
168
(
2003
).
98.
Y.-Y.
Liu
,
J.-J.
Slotine
, and
A.-L.
Barabási
, “
Controllability of complex networks
,”
Nature
423
,
167
173
(
2011
).
99.
J. C.
Nacher
and
T.
Akutsu
, “
Structural controllability of unidirectional bipartite networks
,”
Sci. Rep.
3
,
1647
(
2013
).
100.
S.
Suweis
,
F.
Simini
,
J. R.
Banavar
, and
A.
Maritan
, “
Emergence of structural and dynamical properties of ecological mutualistic networks
,”
Nature
500
,
449
452
(
2013
).
101.
J.
Grilli
,
M.
Adorisio
,
S.
Suweis
 et al, “
Feasibility and coexistence of large ecological communities
,”
Nat. Commun.
8
,
14389
(
2017
).
102.
D. M.
Busiello
,
S.
Suweis
,
J.
Hidalgo
 et al, “
Explorability and the origin of network sparsity in living systems
,”
Sci. Rep.
7
,
12323
(
2017
).
103.
R. M.
May
, “
Will a large complex system be stable?
,”
Nature
238
,
413
414
(
1972
).
104.
S.
Allesina
and
S.
Tang
, “
Stability criteria for complex ecosystems
,”
Nature
483
,
205
208
(
2012
).
105.
R. J.
Tibshirani
, “
Regression shrinkage and selection via the Lasso
,”
J. Roy. Stat. Soc. B
58
,
267
288
(
1996
).
106.
H.
Zou
and
T.
Hastie
, “
Regularization and variable selection via the elastic net
,”
J. Roy. Stat. Soc. B
67
,
301
320
(
2005
).
107.
M.
Yuan
and
Y.
Lin
, “
Model selection and estimation in regression with grouped variables
,”
J. Roy. Stat. Soc. B
68
,
49
67
(
2006
).
108.
H.
Wang
and
C.
Leng
, “
A note on adaptive group Lasso
,”
Comput. Stat. Data. Analy.
52
,
5277
5286
(
2008
).
109.
G. A. K.
Van Voorn
and
B. W.
Kooi
, “
Combining bifurcation and sensitivity analysis for ecological models
,”
Eur. Phys. J. Spec. Top.
226
,
2101
2118
(
2017
).
110.
A.
Donzé
,
G.
Clermont
, and
C. J.
Langmead
, “
Parameter synthesis in nonlinear dynamical systems: Application to systems biology
,”
J. Comput. Biol.
17
(
3
),
325
336
(
2010
).
111.
H.
Miao
,
H.
Wu
, and
H.
Xue
, “
Generalized ordinary differential equation models
,”
J. Am. Stat. Assoc.
109
(
508
),
1672
1682
(
2014
).
112.
J. O.
Ramsay
,
G.
Hooker
,
D.
Campbell
 et al, “
Parameter estimation for differential equations: A generalized smoothing approach
,”
J. Roy. Stat. Soc. Ser. B-Stat. Method.
69
,
741
770
(
2007
).
113.
H.
Liang
and
H.
Wu
, “
Parameter estimation for differential equation models using a framework of measurement error in regression models
,”
J. Am. Stat. Assoc.
103
(
484
),
1570
1583
(
2008
).
114.
J.
Cao
,
J. Z.
Huang
, and
H.
Wu
, “
Penalized nonlinear least squares estimation of time-varying parameters in ordinary differential equations
,”
J. Comput. Graph. Stat.
21
(
1
),
42
56
(
2012
).
115.
N. J. B.
Brunel
,
Q.
Clairon
, and
F.
d'Alché-Buc
, “
Parametric estimation of ordinary differential equations with orthogonality conditions
,”
J. Am. Stat. Assoc.
109
(
505
),
173
185
(
2014
).
116.
Y.
Li
,
J.
Zhu
, and
N.
Wang
, “
Regularized semiparametric estimation for ordinary differential equations
,”
Technometrics
57
(
3
),
341
350
(
2015
).
117.
S. Z.
Chen
,
A.
Shojaie
, and
D. M.
Witten
, “
Network reconstruction from high-dimensional ordinary differential equations
,”
J. Am. Stat. Assoc.
112
(
520
),
1697
1707
(
2017
).
118.
W.
Bateson
,
Mendel's Principles of Heredity
(
Cambridge University Press
,
Cambridge
,
1909
).
119.
H.
Cordell
, “
Epistasis: What it means, what it doesn't mean, and statistical methods to detect it in humans
,”
Human Mol. Genet.
11
,
2463
2468
(
2002
).
120.
G.
Michailidis
and
F.
d'Alché-Buc
, “
Autoregressive models for gene regulatory network inference: Sparsity, stability and causality issues
,”
Math. Biosci.
246
(
2
),
326
334
(
2013
).
121.
M. M.
Zavlanos
,
A. A.
Julius
,
S. P.
Boyd
, and
G. J.
Pappas
, “
Inferring stable genetic networks from steady-state data
,”
Automatica
47
,
1113
1122
(
2011
).
122.
J. E.
Larvie
,
M. G.
Sefidmazgi
,
A.
Homaifar
 et al, “
Stable gene regulatory network modeling from steady-state data
,”
Bioengineering
3
(
2
),
12
(
2016
).
123.
R. I. M.
Dunbar
, “
Neocortex size as a constraint on group size in primates
,”
J. Hum. Evol.
22
,
469
493
(
1992
).
124.
W.
Callebaut
and
D.
Rasskin-Gutman
,
Understanding the Development of Evolution of Natural Complex Systems Cambridge
(
MIT Press
,
Cambridge, MA
,
2005
).
125.
G. P.
Wagner
,
M.
Pavlicev
, and
J. M.
Cheverud
, “
The road to modularity
,”
Nat. Rev. Genet.
8
(
12
),
921
931
(
2007
).
126.
M. E.
Newman
, “
Modularity and community structure in networks
,”
Proc. Natl. Acad. Sci.
103
(
23
),
8577
8582
(
2006
).
127.
D.
Mehrle
,
A.
Strosser
, and
A.
Harkin
, “
Walk-modularity and community structure in networks
,”
Netw. Sci.
3
(
3
),
348
360
(
2015
).
128.
M.
Serban
, “
Exploring modularity in biological networks
,”
Phil. Trans. R. Soc.
375
,
20190316
(
2020
).
129.
J.
Yoon
,
A.
Blumer
, and
K.
Lee
, “
An algorithm for modularity analysis of directed and weighted biological networks based on edge-betweenness centrality
,”
Bioinformatics
22
(
24
),
3106
3108
(
2006
).
130.
H. M.
Kaltenbach
and
J.
Stelling
, “
Modular analysis of biological networks
,”
Adv. Exp. Med. Biol.
736
,
3
17
(
2012
).
131.
C. R.
Tosh
and
L.
McNally
, “
The relative efficiency of modular and non-modular networks of different size
,”
Proc. Biol. Sci.
282
(
1802
),
20142568
(
2015
).
132.
B.
Al-Anzi
,
S.
Gerges
,
N.
Olsman
 et al, “
Modeling and analysis of modular structure in diverse biological networks
,”
J. Theor. Biol.
422
,
18
30
(
2017
).
133.
G.
Didier
,
A.
Valdeolivas
, and
A.
Baudot
, “
Identifying communities from multiplex biological networks by randomized optimization of modularity
,”
F1000Res
7
,
1042
(
2018
).
134.
Y. Q.
Wang
,
M.
Xu
,
Z.
Wang
 et al, “
How to cluster gene expression dynamics in response to environmental signals
,”
Brief Bioinform.
13
,
162
174
(
2012
).
135.
J. Z.
Li
,
B. G.
Bunney
,
F.
Meng
 et al, “
Circadian patterns of gene expression in the human brain and disruption in major depressive disorder
,”
Proc. Natl. Acad. Sci.
110
(
24
),
9950
9955
(
2013
).
136.
L. M.
Smith
,
F. C.
Motta
,
G.
Chopra
 et al, “
An intrinsic oscillator drives the blood stage cycle of the malaria parasite Plasmodium falciparum
,”
Science
368
(
6492
),
754
759
(
2020
).
137.
F.
Rijo-Ferreira
,
V. A.
Acosta-Rodriguez
,
J. H.
Abel
 et al, “
The malaria parasite has an intrinsic clock
,”
Science
368
(
6492
),
746
753
(
2020
).
138.
W. H.
Walker
,
J. C.
Walton
,
A. C.
DeVries
 et al, “
Circadian rhythm disruption and mental health
,”
Transl. Psychiatry
10
,
28
(
2020
).
139.
C. X.
Chen
,
L. B.
Jiang
,
G. F.
Fu
 et al, “
An omnidirectional visualization model of personalized gene regulatory networks
,”
NPJ: Syst. Biol. Appl.
5
,
38
(
2019
).
140.
L.
Sun
,
L.
Jiang
,
C. N.
Grant
,
H.-G.
Wang
,
C.
Gragnoli
,
Z.
Liu
, and
R.
Wu
, “
Computational identification of gene networks as a biomarker of neuroblastoma risk
,”
Cancers
12
,
2086
(
2020
).
141.
K.
Karczewski
and
M.
Snyder
, “
Integrative omics for health and disease
,”
Nat. Rev. Genet.
19
,
299
310
(
2018
).
142.
Y.
Ye
,
Z.
Zhang
,
Y.
Liu
 et al, “
Multi-omics perspective of quantitative trait loci in precision medicine
,”
Trends Genet.
36
(
5
),
318
336
(
2020
).
143.
M.
Kang
and
J.
Gao
, “
Integration of multi-omics data for expression quantitative trait loci (eQTL) analysis and eQTL epistasis
,”
Methods Mol. Biol.
2082
,
157
171
(
2020
).
144.
S.
Horvath
and
J.
Dong
, “
Geometric interpretation of gene coexpression network analysis
,”
PLoS Comp. Biol.
4
,
e1000117
(
2008
).
145.
L. C.
Freeman
, “
Centrality in social networks conceptual clarification
,”
Social Networks
1
,
215
239
(
1978
).
146.
U.
Brandes
, “
A faster algorithm for betweenness centrality
,”
J. Math. Sociol.
25
,
163
177
(
2001
).
147.
P.
Hage
and
F.
Harary
, “
Eccentricity and centrality in networks
,”
Social Networks
17
,
57
63
(
1995
).
148.
P.
Bonacich
, “
Some unique properties of eigenvector centrality
,”
Social Networks
29
,
555
564
(
2007
).
149.
L.
Page
, “
The PageRank citation ranking: Bringing order to the web
,”
Stanford Digital Libraries Work. Paper
9
,
1
14
(
1998
).
150.
K.
Greenham
,
R. C.
Sartor
,
S.
Zorich
, et al “
Expansion of the circadian transcriptome in Brassica rapa and genome-wide diversification of paralog expression patterns
,”
Elife
9
,
e58993
(
2020
).
151.
D. J.
Gottlieb
,
G. T.
O'Connor
, and
J. B.
Wilk
, “
Genome-wide association of sleep and circadian phenotypes
,”
BMC Med. Genet.
8
,
S9
(
2007
).
152.
S. T.
Harbison
,
S.
Kumar
,
W.
Huang
 et al, “
Genome-wide association study of circadian behavior in Drosophila melanogaster
,”
Behav. Genet.
49
,
60
82
(
2019
).
153.
L. B.
Jiang
,
J. Y.
Liu
,
X. L.
Zhu
 et al, “
2HiGWAS: A unifying high-dimensional platform to infer the global genetic architecture of trait development
,”
Brief Bioinform.
16
,
905
911
(
2015
).
154.
D. L.
Zimmerman
and
V. A.
Núñez-Antón
,
Antedependence Models for Longitudinal Data
(
Chapman & Hall/CRC
,
Boca Raton, FL
,
2009
).