It is widely assumed that new proteins derive from old proteins, via duplication and adaptation of one of the copies or by combining smaller, already viable fragments with a well-defined structure. However, it recently became clear that (i) many functional proteins do not assume well-defined structures, (ii) proteins from random sequences can adapt and assume novel functions too, and (iii) in modern organisms many functional proteins emerge ''de-novo'', i.e. from previously non-coding DNA regions lacking any prior selection. This project will explore the boundaries between viable and non-viable protein sequences in the context of contemporary forms of life.


To mimic the process of “de-novo” protein evolution, random protein sequences will be used as proxies for spontaneously emerging polypeptides. Initially, large random sequence libraries (>106 variants) will be generated and expressed using modern methods of synthetic biology (automated design, DNA assembly, expression by cell-free translation systems) and using a semi-automated pipeline, the library expression/solubility profile will be evaluated in a 96-well format. Expression/solubility characteristics, together with corresponding features set will serve as an input for machine learning optimization. Sequential optimization using prediction/analysis cycles will provide a trajectory of features (such as amino acid composition and secondary structure properties) important for proteins to become tolerated and viable in contemporary life. Such “manual” will help to define what sets apart viable protein sequences from artificially generated random sequences and will find applications in protein design initiatives.


Five relevant publications of the research group:


Tretyachenko V, Vymětal J, Bednárová L, Kopecký V, Hofbauerová K, Jindrová H, Hubálek M, Souček R, Konvalinka J, Vondrášek J, and Hlouchová K. (2017) Random protein sequences can form defined secondary structure and are well-tolerated in vivo. SciRep 7, 15449.

Tretyachenko V, Voráček V, Souček R, Fujishima K, and Hlouchová K. (2020) CoLiDe: Combinatorial Library Design tool for probing protein sequence space. Bioinformatics DOI: 10.1093/bioinformatics/btaa804

Bornberg-Bauer E, Hlouchova K, Lange A. (2021) Structure and function of naturally evolved de novo proteins. Current Opinions in Structural Biology. 68, 175-183. DOI: 10.1016/

Tretyachenko V, Vymetal J, Neuwirthova T, Vondrasek J, Fujishima K, Hlouchova K. (2021) Unevolved proteins from modern and prebiotic amino acids manifest distinct structural profiles.

Heames B, Buchel F, Aubel M, Tretyachenko V, Lange A, Bornberg- Bauer E, Hlouchova K. (2021) Experimental characterization of de novo proteins and their unevolved random-sequence counterparts. submitted

Apply to the project

Don’t hesitate, submit an application now!

Choose your specialization