This project involved processing nearly 0.5 TB of data using Pig and Hadoop on Amazon Web Services. The data ontained represented the semantic web and it was in RDF format. The inherent graphical structure of the data was similar to the web graph. The outdegree of nodes was observed to exhibit power laws is the case with the web. The code files for this project can be accessed here
- Twitter Analysis
This is an ongoing project is which various types of analyses are done on the data obtained from twitter. Some of the data was obtained by me and some was used directly from outside sources. Sentiment analysis was done on the data by stemming the words in the tweet and using an existing dataset about words and their sentiment. I am currently working on carrying out the language analysis on the twitter data. The code files for this project can be accessed here
- Kaggle - Titanic
Kaggle is an online platform where data scientists from all over the world come to solve data driven problems. I had participated in an introductory challenge in which the task was to predict who among the passengers of Titanic would have survived the crash. The data given included standard passenger details like sex, age, price of ticket, information about family members etc. I started out by exploring the data in Excel using Pivot Tables to find out key features. It turned out not surprisingly that being a female or being a child was 5 times better for survival. There were many other small interesting features like male passengers travelling in 2nd class were half as likely to survive as male passengers travelling in 3rd class. People travelling with family either survived together ot died together 90% of the times. Peopel travelling alone has a dismal suvival rate at 32%. These observations helped me set up the random forest the optimal parameters for which were obtained by grid search using the scikit-learn library in Python. The prediction accuracy of my model was 95% and it currently stands 12th out of the 6000 teams that participated. The code for this project will be uploaded after the competition ends. The results can be confirmed at this page . Look for the team name "crossvalidator" at 12th position.
- Neural networks
This project was done as part of an internship at the University of Alberta, Edmonton during the summer of 2010. The project involved modifying an exisitng neural network that was designed to predict univariate time series. I improved the definition of activator functions in the inner layers of the network to improve its performance on the univariate time series predictions and also extended the structure to tackle multivariate time series. The neural network used fuzzy rules as part of its activator functions.
This project was done as part of the graduate course on Distributed Simulation at McGill University. The single processor version and the time warp version for a queuing network was implemented. Message passing between different logical processors in the network was implemented using MPI in a multithreaded environment. Global Virtual Time algorithm was used to keep track of the time differences at different logical processors. The code for this project can be found here.
- A.I. projects and Game of Hex
These were a series of projects I completed during my undergraduate days when I was starting to learn machine learning and artificial intelligence. One of them was NLP based in which I, along with 2 other teammates developed a Hidden Markov Madel for part-of-speech tagging in C++. The tags were limited to noun, verb, adjective and adverbs. We achieved an accuracy of 88% with our model. The model was inspired by this paper by Dr. Rabiner. I also worked on developing a game similar to chess, called Game of Hex, with 3 other teammates. We developed and coded the algorithm for 2 player and 1 player versions of the Game of Hex. The files for the game have been uploaded here.
- Store website
Designed and developed a website for a health store with facility for online orders. The development was carried out in HTML5 and PHP. Link to the website will be added shortly.