I'm a PhD candidate at School of Computer Science, McGill University under the supervision of Prof. Mathieu Blanchette and Prof. Doina Precup. My interests are in developing advanced machine learning techniques for genomics and sequence analysis. You can find my latest resume here.
Advanced Machine Learning for Biological Sequence Analysis and Gene Regulation : The study of gene regulation is an active research area and holds key to comprehend many biological mechanisms and prevention of diseases. Particularly, the transcription phase is crucial, where proteins called transcription factors bind to specific genomic sequences and initiate gene regulation.
In this direction, various experimental procedures have been developed to identify the specific genomic regions that are bound by transcription factors. These experiments are costly and time consuming, which led to development of computational approaches to predict these protein-DNA interactions. Recently, machine learning based methods, especially deep learning techniques, have proven to outperform classical computational based approaches in various research areas.
However, the sequential data required to train such machine learning models is often insufficient. Under such circumstances, the data can be augmented by the extant orthologous and ancestral sequences. The resultant models that combine sequences from different species can take advantage of advanced machine learning techniques and opens a door to evolutionary study of functional genomic regions. Thus, the goal of this research project is to summarize the topics related to computational analysis of transcriptional regulation with a focus on cutting-edge machine learning approaches.
Cell Type Prediction of Transcription Factor Binding Sites using Machine Learning : As my master's research project, we proposed a machine learning approach to predict the particular cell type where a given transcription factor can bind a DNA sequence. The learning models are trained on the DNA sequences provided from the publicly available ChIPseq experiments of the ENCODE project for 52 transcription factors across the GM12878, K562, HeLa, H1-hESC and HepG2 cell lines. Three different feature extraction methods are used based on k-mer representations, counts of known motifs, and a new model called the skip gram model, which has become very popular in the analysis of text. We used SVM, K-means and logistic regression for the classification task. We find that predictors based on known motifs counts detect cell-type specific signatures better than a previously published method, with mean AUC improvement of 0.18 and can be used to identify the interaction of transcription factors. Remarkably, the skip gram approach, which can be used without of any prior knowledge about transcription factor binding sites, performs almost as well as the motif-based method. Overall, our family of predictors will be useful to both better predict cell-type specific transcription factor occupancy and understand the mechanisms underlying this phenomenon.
The thesis was approved in 2016 and the results were published in ACM-BCB 2016 conference.
Ahsan, Faizy, Doina Precup, and Mathieu Blanchette. "Prediction of Cell Type Specific Transcription Factor Binding Site Occupancy." Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. ACM, 2016. [ pdf ]
Master of Science Sep,2013 - May,2016
School of Computer Science, McGill University
Bachelor of Technology July, 2008 - April, 2012
TandemLaunch Inc., Montreal
Machine Learning Researcher (May - August, 2015)
With SensingDynamics, we developed front end and back end of a software system to recognize smell using Microsoft Azure, RShiny, C and PHP.
Desautels Faculty of Management
Operations Management (Feb - August, 2015)
Under the guidance of Prof. Saibal Ray and Prof. Shanling Li, we developed a software to forecast the quantity of goods to be kept in retail stores using Machine Learning and Data Mining tools with Microsoft Access, R and SHELL.
National Institute of Informatics, Tokyo
Data Mining Internship (Feb - August, 2014)
Under supervision of Prof. Michael Houle, we developed clustering algorithms for High Dimension Datasets using C, C++, Shell and Matlab.
Center for Artificial Intelligence and Robotics, Bangalore
Scientist B (July, 2012 - July, 2013)
We were involved in development and implementation of Classification and Regression Trees using Gini Twoing criteria in Matlab and C.
Combutational Research Lab, Pune
Summer Intern (May - July 2011)
Development of front-end & back-end of Chipmunk ( Secure Data Transfer Appliance of CRL) using C, SHELL, Apache, MySQL, PHP, HTTP
At McGill University, I've been teaching assistant for the following courses:
Room 3140, Trottier Building, 3630 University
Montreal, QC, Canada, H3A OC6