Hindi songs constitute a major part of Indian music sales, and are now widely available in digital format on the Internet. And while most music lovers decide what they want to listen to based on their mood, G. Drushti Apoorva asked herself, “Why not do something cooler?” This is how she ended up designing something that could, based on lyrics, sort out all the songs in any database.
A paper published by 23-year-old Drushti on this subject, BolLy: Annotation of Sentiment Polarity in Bollywood Lyrics Dataset, won the best student paper award at the Pacific Association for Computational Linguistics (PACLING) 2017 conference in Myanmar. A total of 37 papers were presented from around the world.
Drushti’s paper provides a quality dataset for research in the field and encourages further research in the field of emotion extraction, particularly from lyrics. It can be used as a reference dataset for evaluating research, helping new researchers in the field of Hindi lyrics’ analysis and kick-starting studies on code mixing in Bollywood songs. “I am interested in literature, so lyrics caught my attention. I was looking for a project to work on that involved sentiment analysis, and creating a resource which would be useful for everyone in the field was the first step towards it. I started concentrating towards analysing Bollywood lyrics about a year ago,” Drushti says.
A dual-degree student — B. tech in computer science and MS (research) in computational linguistics — Drushti adds, “The paper presents a dataset of Bollywood song lyrics classified as positive or negative based on the emotion they evoke. It presents basic algorithms to do this computationally. The dataset can be used in various applications, such as extracting emotion polarity, which can then be used to create various systems such as automatic playlist generation and recommendation systems. The datasets will aid in music library management. Since the dataset comprises other major languages, it will help in code mixing too.”
Another unique feature of this resource is that it was created in Devanagari script to help researchers avoid the pre-processing cost of text normalisation. The large number of lyrics and metadata of 1,055 Hindi movie songs will support research in fields related to lyrics analysis and emotion polarity detection....