====== RNA-Seq ====== ====教程==== * 向SRA提交数据 [[stat:rnaseq:srasubmit]] * 知乎搜索 https://www.zhihu.com/search?q=rnaseq%20fastq%20star&type=content * RNA-Seq数据标准化方法 https://zhuanlan.zhihu.com/p/37196518 * RNA-seq 保姆教程:差异表达分析(一) https://zhuanlan.zhihu.com/p/585176027 * RNA-seq:转录组数据分析处理(上) https://zhuanlan.zhihu.com/p/61847802 * Common File Formats Used by the ENCODE Consortium [[https://www.encodeproject.org/help/file-formats/]] * FAA RNA Seq Compare [[https://www.faa.gov/sites/faa.gov/files/GEN_21001B_diffExp_TechReport.pdf]] =====流程===== [[https://bioconductor.org/packages/release/BiocViews.html#___GeneExpressionWorkflow]]\\ Raw Data: Fastq (.gz) -> - QC: FastQC - Filter - Alignment: STAR - Count - Cluster - Heatmap - Differential ====1 QC ==== ===FastQC=== apt install fastqc fastqc --noextract RawData/I409/I409_1.fq.gz -o results/1_initial_qc/ ====2 Alignment==== ===基因组注释数据=== * Ensembl: [[https://ensemblgenomes.org/]] [[https://useast.ensembl.org/index.html]] * FTP: [[https://ftp.ensembl.org/pub/rapid-release/]] >Rat [[https://useast.ensembl.org/Rattus_norvegicus/Info/Index]] >>GTF [[https://ftp.ensembl.org/pub/release-109/gtf/rattus_norvegicus/]] >>GFF [[https://ftp.ensembl.org/pub/release-109/gff3/rattus_norvegicus/]] ===Alignment Indexing=== [[https://registry.opendata.aws/jhu-indexes/]] ===对齐工具=== * [[https://github.com/pachterlab/kallisto]] * STAR * HISAT2 * Salmon [[https://combine-lab.github.io/salmon/getting_started/]] ===HISAT2=== [[https://daehwankimlab.github.io/hisat2/download/]] * [[https://notebook.community/ssjunnebo/pathogen-informatics-training/Notebooks/RNA-Seq/genome-mapping]] hisat2-build -p 32 fasta/Rattus_norvegicus.mRatBN7.2.dna.toplevel.fa hisat_index hisat2 -x hisat_index/hisat_index -1 I409/I409_1.fq.gz -2 I409/I409_2.fq.gz -S I409.sam -p 32 ===STAR=== 2.7.10b * Github:[[https://github.com/alexdobin/STAR]] * Ubuntu Package:[[https://ubuntu.pkgs.org/22.04/ubuntu-universe-arm64/rna-star_2.7.10a+dfsg-1_arm64.deb.html]] apt install rna-star # 创建索引,索引文件创建一次即可. 需要从Ensembl下载对应物种的Fasta文件和GTF文件。 STAR --runMode genomeGenerate --genomeDir star_index --genomeFastaFiles fasta/* --sjdbGTFfile gtf/* --runThreadN 14 # 运行分析 STAR --genomeDir star_index --readFilesIn filtered/sample_filtered.fq --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts --runThreadN 14 STAR --genomeDir star_index --readFilesIn rna4/RawData/I409/I409_1.fq --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts --runThreadN 14 === kallisto === # bat kallisto bus [arguments] FASTQ-files kallisto quant -i rattus_index_ki/transcriptome.idx -o reads.kallisto_quant -t 64 --fusion --pseudobam --genomebam --gtf gtf\Rattus_norvegicus.mRatBN7.2.109.gtf rna4\RawData\I409\I409_1.fq.gz rna4\RawData\I409\I409_2.fq.gz ====3 Count ==== >可以Reads的工具[[https://hbctraining.github.io/Intro-to-rnaseq-hpc-O2/lessons/05_counting_reads.html]] >>1 [[https://subread.sourceforge.net/]] >>>featureCounts [[https://rnnh.github.io/bioinfo-notebook/docs/featureCounts.html]] featureCounts -p -M -O -T 32 -a gtf/Rattus_norvegicus.mRatBN7.2.109.gtf -o output.txt data.sam [bam]