We present a systematic computational framework, eCodonOpt, for designing parental DNA sequences for directed evolution experiments through codon usage optimization. Given a set of homologous parental proteins to be recombined at the DNA level, the optimal DNA sequences encoding these proteins are sought for a given diversity objective. We find that the free energy of annealing between the recombining DNA sequences is a much better descriptor of the extent of crossover formation than sequence identity. Three different diversity targets are investigated for the DNA shuffling protocol to showcase the utility of the eCodonOpt framework: (i) maximizing the average number of crossovers per recombined sequence; (ii) minimizing bias in family DNA shuffling so that each of the parental sequence pair contributes a similar number of crossovers to the library; and (iii) maximizing the relative frequency of crossovers in specific structural regions. Each one of these design challenges is formulated as a constrained optimization problem that utilizes 0-1 binary variables as on/off switches to model the selection of different codon choices for each residue position. Computational results suggest that many-fold improvements in the crossover frequency, location and specificity are possible, providing valuable insights for the engineering of directed evolution protocols.
All Science Journal Classification (ASJC) codes