Codeswitching (CS) is a widely observed phenomenon in social media where people communicate in two or more langauges interchangeably, (Spanish and English, for example). Codeswitching is common among bilingual speakers, both in speech and in writing. Identifying the languages in a codeswitched input is a crucial first step before applying other natural language processing algorithms.
I participated in the shared task of the Second Workshop in Computational Approaches to Codeswitching last year.
The system constructed using Conditional Random Field and fastText word vectors can identify English and Spanish in a codeswitched sentence with high token-level accuracy.
For more information, check out the project page.