Twitter is an online social networking service with more than 300 million users, generating a huge amount of information
every day. Twitter’s most important characteristic is its ability for users to tweet about events, situations, feelings, opinions, or
even something totally new, in real time. The social media tweet text have been mined so as to identify the complaints regarding
various road transportation issues of traffic, accident, and potholes. In order to identify and segregate tweets related to different
issues, keyword-based approaches have been used previously, but these methods are solely dependent on seed keywords which are
manually given and these set of keywords are not sufficient to cover all tweets posts. So, to overcome this issue, a novel approach
has been proposed that captures the semantic context through dense word embedding by employing word2vec model. However,
the process of tweet segregation on the basis of semantic similar keywords may suffer from the problem of pragmatic ambiguity.
To handle this Word2Vec model has been applied to match the semantically similar tweets with respect to each category.
Furthermore, the hotspots have been identified corresponding to each category. However, due to the scarcity of geo-tagged tweets,
we have proposed a hybrid method which amalgamates Named Entity Recognition (NER), Part of speech (POS), and Regular
Expression (RE) to extract the location information from the tweet textual content. Due to the lack of availability of the ground
truth dataset, model feasibility has been validated from the existing data records (i.e., published by government official accounts
and reported on news media) and the evaluation results signify that the stated approach identifies few additional hotspots as
compared to the existing reports while analyzing the tweets.
Keywords: Twitter, Social Media, Tweet, Travel Habits, Road Condition, Road Traffic