AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |
Back to Blog
Hashtag suggester3/9/2023 The difference is that topology never ends. Topology is similar to job in Hadoop Mapreduce. The trident topology which works with tweets that already have hashtags are constantly reading tweets from Redis and after cleaning, transforming to TF-IDF vector and LSH, puts the tweets to different buckets. Another DRPC topology waits for request from API call which can either sent from Tweet Streaming or web interface, and then return hashtags suggestions. One trident topology consumes tweets from Redis and put tweets into different buckets. There are two Storm topology running here. The text and hashtags are feed into Redis which acts as a task queue here. All the raw tweets are stored into HDFS as historical data. Twitter Streaming API is the data source. The project is built on top of storm which is a perfect fit for streaming data. Then we can apply the same procedure to the tweets that do not have hashtags and makes hashtag suggestions based only on the tweets that are in the same buckets which are less than a few hundred. It can summarize as use Location Sensitive Hashing (LSH) to put tweets that have hashtags into different buckets and because of LSH, the tweets in the same buckets are thought to be similar. The approach implemented here is inspired by research in First Story Dectection (FSD) which basically tries to find the earliest news among all the news of the topic. Apparently, this approach is not scalable as you can easily have millions of tweets. A naive way is to calculate the cosine similarity between the new tweet and each of the old tweets, sort by cosine similarity and pick the top similar tweets. The problem is how to find similar tweets. The basic idea is to take advantage of tweets that already have hashtags and make suggestions based on the similar tweets' hashtags. It is a unique way to appreciate users' creativity in creating interesting hashtags as well as helps boost tweets' visibility. This project tries to suggest hashtags for tweets without hashtags. Especially, twitter hashtags are used extensively in trends & search and are part of social media economy. Hashtags are used in almost all social networks as a way to classify topic.
0 Comments
Read More
Leave a Reply. |