PhD Student, Intelligent Systems Program, University of Pittsburgh
20 Nov 2020 - Arun Balajiee
Today’s talk was about using simple principle of topic modelling and sentiment analysis to process textual data of different sections in 3 leading newspaper coverage over the span of 20 years to identify the “mood” of the US general populace they represent. The method used label classification for positive and negative group of words picked from newspaper texts over the years to classify the sentiment. The net sentiment of each section was the log of the ratio of the positive sentiment to the negative sentiment and the data points of the net sentiments for all the days of the newspaper publications over the 20 years was plotted in a scatterplot. Different factors affected the net sentiment of the graph – such as section from which the text of the newspaper was collected, the function of different topics. Texts that contained topics from enconomical and financial circumstances of those times generally contributed to the overall sentiment more than the texts from other sections such as editorial or opinions.
To reinforce their use of net sentiment to guage the public sentiment, they further posed two research questions – does economic news coverage respond to economic indicators such as the stock market and does the net sentiment reflect the public mood. For both these, Dr. Soroka et al. worked in trying to fit a significant mixed linear regression model and found a linear relationship between the two factors. When the net sentiment was higher, it signified a happier the public mood and economic news coverage was based on the performance of the difference financial indicators across time.
The scope for future work is to be able to utilize informal information such as gathered from social media along with the newspaper sentiment to be able to predict the public mood more accurately. Since this model builds on data that is collected every day, it tends to be more accurate in predicting the occurrences of public mood swings in near future than popular polling applications that collect data over several days before making predictions.