It contains the python scripts for the practice course of Biodatabase. It aims to study the SQL & Ensembl biodatabase.
assignment1: According to a gene name, find the corresponding Ensembl Gene ID using SQL
assignment2: Find all the corresponding transcript and translation IDs for an Ensembl Gene ID
assignment3: Find the length of the gene with a given gene name
assignment4: Given in- put gene names, use SQL to query for the IDs of corresponding transcripts that also have a protein translation, read the corresponding entries from the FASTA file containing all transcripts as cDNA, convert the cDNAs into protein sequences, do pair-wise local alignment for each pair of proteins encoded by a different gene, and output the score of the best local alignment for each pair of genes.
assignment5: Given a gene name, prints out the names of transcripts from this gene along with the number of exons for each transcript.
assignment6: Given a name of a marker, prints out the list of genes that overlap with the position of this marker.
assignment7: Given a gene name, find all GO terms associated with protein translations from this gene and prints out the list of GO terms.
assignment8: Given a set of gene names, read the upstream regions and print the results into a FASTA file with gene names as the headers and upstream sequences as the content. Then feed the output to a motif finder program and describe the results.