Data in this release contains genotypes for 14,505 N/NIH heterogeneous stock (HS) outbred rats from the breeding colonies MCW: NMcwi:HS #2314009, RRID:RGD_2314009, and WFU: NMcwiWFsm:HS #13673907, RRID:RGD_13673907, the University of California San Diego (RRID: RGD_155269102), the University of Tennessee Health Sciences Center, or Oregon Health and Sciences University. Description of contents: PLINK binary format * round10.bed * round10.bim * round10.fam Variant Call Format * Heterogenous-stock_n15552_02222023_stitch2_QC_Sex_Het_pass_n14505.vcf.gz * Heterogenous-stock_n15552_02222023_stitch2_QC_Sex_Het_pass_n14505.vcf.gz.tbi –Variant Call Format (VCF): specifies the format of a text file for storing gene sequence variations. A header begins the file and provides metadata describing the body of the file. The body is tab separated into 8 mandatory columns and an unlimited number of optional columns for other information on the samples. --PLINK binary format: .bed is the PLINK binary biallelic genotype table. This is the primary representation of genotype calls at biallelic variants. The .bed file is accompanied by .bim (Variant information) and .fam (Sample information) files. Technical details: Rat Genome assembly: mRatBN7.2 (NCBI: GCF_015227675.2) All software versions used to generate the data in this object are noted below and in the Methods section of the associated publication: * fastx_toolkit 0.0.14 * cutadapt 4.1 * fgbio 1.3.0 * bbDuk 38.94 * BWA 0.7.17 * samtools 1.14 * picard 2.25.7 * STITCH 1.6.6 * GATK 4.2.0 * bcftools 1.14 * PLINK 1.9 * Python 3.10 ddGBS sequences were demultiplexed using fastx_toolkit v0.0.14. Barcode, adapter and quality trimming were subsequently performed using Cutadapt v4.1 setting minimum length per read as 25 bp and minimum base quality as 20. BWA-mem v0.7.17 was used to align ddGBS sequences with a constraint of an alignment score greater than 20, and the aligned BAM files were sorted and indexed by coordinates using SAMtools v1.14 for fast random access. lcWGS sequences were demultiplexed using fgbio v1.3.0. BBDuk v38.94 (ktrim=r, k=23, mink=11, hdist=1, trimpolyg=50, tpe, tbo) was used to trim adapters, and Cutadapt v4.1 was used to trim sequences with Phred base quality less than 5 and length shorter than 70 bp. Alignment of the lcWGS sequences was carried out using BWA-mem v0.7.17. Duplicated reads were marked using Picard v2.25.7 and indexed by coordinates using SAMtools v1.14 for fast random access. Aligned ddGBS and lcWGS sequences were jointly used to impute bi-allelic SNPs at given positions with STITCH v1.6.6 (niterations=2, k=16, nGen=100). During the imputation step, the reference panel based on HS rats’ eight inbred founder strains’ genotypes and the SNPs position file mentioned above were provided to STITCH to construct haplotypes for imputation. To increase computational efficiency, the imputation was performed parallelly on chromosome chunks with a one megabase buffer on each end. Each chunk had a length of at least seven megabases and at least one thousand SNPs. Then, we used BCFtools v1.14 to concatenate the chunks back to individual chromosomes. Quality control was conducted using Python. Following the imputation process, we implemented a filtering procedure to improve genotype quality. A total of 10,684,883 bi-allelic SNPs were imputed using our genotyping pipeline. Among them, we removed SNPs with an imputation info score less than 0.9 using BCFtools v1.14 . This resulted in a remaining count of 7,947,141 SNPs. Furthermore, we identified and filtered out a subset of SNPs that have low concordance with the ground truth dataset described above. As a result, we retained 7,323,260 SNPs with 7,148,654 on autosomal chromosomes, 174,374 on chromosome X, 118 on chromosome Y and 114 on mitochondria. After imputation, a sample quality control step was performed to ensure sample quality. Firstly, we excluded samples whose ratio of mapped reads on chromosome X and Y were incompatible with their reported sex. Samples with a genotype missing rate exceeding 0.1 or a genotype heterozygosity rate falling outside the range of the mean ± four standard deviations were also removed. Additionally, in the cases where we had multiple sequencing runs for the same samples, we kept only the one with the highest count of sequence reads. This quality control process resulted in the retention of 14,505 distinct HS rats (7,283 males, 7,222 females) with 5,745 individuals from ddGBS (2,903 males; 2,842 females) and 8,760 individuals from lcWGS (4,380 males; 4,380 females). Pipeline DOI: https://doi.org/10.5281/zenodo.10002191 _________________________________________________________________________