Genotype Data

Genomic selection is based on relationships between the animal’s genotypes and recorded phenotypes. A continuous flow of accurate data – both genotypic and phenotypic – is necessary to establish and refresh the reference population, measure performance and calibrate genomic results.

U.S. National Cooperator Database

The National Cooperator Database is the world’s largest animal database, comprised of phenotypic and genotypic data from individual dairy cattle. This database was established by USDA and is now managed by the Council on Dairy Cattle Breeding (CDCB). Genetic and genomic evaluations, as well as management and performance benchmarks, are derived from this database.
“Cooperator” is central to the database because it exists due to commitment of many groups – dairy producers, U.S. dairy organizations and like-minded international partners. CDCB accepts genotypes into the U.S. database from CDCB-certified genotyping laboratories and from evaluation centers in partnering countries.

International Collaboration

Since the start of genomic selection, there has been active global collaboration in research, development and data exchange. In 2011, agreements were established between organizations in the U.S., Canada, Italy and United Kingdom to exchange genotypes for Holstein bulls to increase all reference populations. More recently, collaborations developed between the U.S. and organizations in Switzerland, Japan, Scandinavia and Germany. In the Brown Swiss breed, eight countries collaborate through the Intergenomics program.

Reference Population

In genomic selection, a reference population with high reliable animals that have both genotypes and phenotypes is necessary as a base for comparison. 

To develop new genomic traits, relevant phenotypic data can be measured on a smaller group of genotyped animals. This reference population can then be used to predict genomic breeding values for all animals that have been genotyped.

Prediction of genomic values is more reliable, or accurate, with more animals in the reference population. Genomic evaluations are also more effective when the reference population is continually updated. The U.S. has established systems to ensure that new phenotypes – including milk recording, type classification, health and breeding information, and feed intake – continues to flow into the National Cooperator Database to maintain a strong reference population. 

The foundational genotype database was established by USDA in the 1990s with early research on genomic scanning and markers. The first U.S. Holstein sires were genotyped in 2008, and genomic evaluations were first officially released in 2009. The first genotype exchanges began between the U.S., Canada, Italy and United Kingdom in 2011, with several other similar collaborations in later years.

By May 2015 – seven years after the first bulls were genotyped – the National Cooperator Database had grown to 1 million animal genotypes. Since then, the U.S. database has grown rapidly as genomic testing of females has been readily adopted for mating and culling decisions. In March 2021, the 5 millionth animal genotype was added. 

Data Impacts on Genomic Reliabilities

Size of the reference population is one factor affecting the reliability of genomic evaluations. The number of animals in the reference population varies by breed. Naturally, the less populous dairy breeds have smaller reference populations.

BreedAnimals in Reference Population (Males and Females Combined)
Ayrshire*1000 – 2000
Guernsey1000 – 2000
Brown Swiss12,000+
Jersey150,000+
HolsteinNearly 1 million

*The U.S. Ayrshire reference population includes only animals considered purebred Ayrshire; no other red breeds included.

Reference populations are refreshed often and grow over time, which is necessary for optimal accuracy of evaluations.

Genomic reliabilities further differ between traits depending on the heritability and the amount of available phenotypic data. Generally, lower heritable traits, as for health and feed efficiency, have lower genomic reliabilities (30-50%). Reliabilities are higher for fertility and calving traits (50-70%). Production and type traits are often 70 to 90% reliable. Differences in reliabilities exist between breeds and by animals within a breed.

Growth of the U.S. Database

The foundational genotype database was established by USDA in the 1990s with early research on genomic scanning and markers. The first U.S. Holstein sires were genotyped in 2008, and genomic evaluations were first officially released in 2009. The first genotype exchanges began between the U.S., Canada, Italy and United Kingdom in 2011, with several other similar collaborations in later years.

By May 2015 – seven years after the first bulls were genotyped – the National Cooperator Database had grown to 1 million animal genotypes. Since then, the U.S. database has grown rapidly as genomic testing of females has been readily adopted for mating and culling decisions. In March 2021, the 5 millionth animal genotype was added.