Text Mining in Social Media

“ Mining is the extraction of valuable minerals or other resources from the geological materials from Earth”.

And you thought the blog post is about text mining, it is!

We’ll just take from the definition what concerns us the most. Its the process of extracting valuable resources from whole lot of unwanted things.

Q: So in our context what’s the most valuable resource?

Ans: Data

But not just any data, there’s lot of it out there, we just want what’s most important for us in that particular context.

What is Text Mining?

Its the process of deriving high-quality information from text. It involves “the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources.”

It is a variation of data mining and more important than data mining itself [3].

In much simpler and human readable form, its a process of extracting information which concerns us the most from large written resources. It processes data from textual resources and using various NLP algorithms, it evaluates given textual information.

The written resources often include websites, blogs, news articles, emails, reviews and most of all Social media.

It usually involves structuring the given raw data and finding out patterns within the data and then evaluating/interpreting the output.

Why use Text Mining?

So why should we use this “Text mining” in the first place?

Text mining is very helpful in tasks where large amount of data is created in very short time and it needs to be analyzed. Which is humanly impossible in case of social media posts.

It helps you track, interpret textual information generated online from the resources mentioned above. Which helps when you want to check how your brand is doing, how people are reacting to it, what was the outreach of your latest marketing campaign, how people received the one latest feature you added to your software and much more.

Social media is treasure trove, holy grail of people’s opinions. One can use this information to do mind blowing things using the text mining. It can efficiently process the number of posts, followers and likes of your brand so that you can get a better understanding of people’s reactions.

This will also help you what’s in trend and what’s not depending on your specific target audience so that you can decide what will be your next strategy.

Traditionally companies used to hire people who used to go through social media and generate reports for it, now with the help of Text Mining we can get a lot of information in very short period of time.

Text Available on the Web [1]

As shown in above fig. only 20% of textual data on the internet or social medias is structured and rest 80% is unstructured[1]. This is why for research purposes it is mandatory to structure this data properly, which is often the first step in text mining operations[2].

But make no mistake, data mining can also work on unstructured data or semi-structured data(For ex. Emails, HTML Documents) [4] thus making it a lucrative solution for companies.

Text Mining Process

Text Mining Process [7]

Gaikwad et al. in their paper “Text Mining Methods and Techniques” discuss about a 5 stage process of Text Mining which is elaborated as follows.

  1. Document Collection: The process starts with collection of relevant documents/text from various sources. For this purpose various tools and services are available which are both free and paid, a simple Google Search will result in many such applications.
  2. Preprocessing: As mentioned earlier most of the text posted online is unstructured and the language we use to communicate was developed for humans and computers cannot comprehend the natural language [4]. Thus many text analysis techniques are used to preprocess data, to remove noise from textual data. This step mainly involves checking format and character set of the collected textual information.
  3. Applying Text Mining techniques: Many text mining techniques are available, depending on the scenario and goals of the organization a certain technique is applied on the preprocessed data, sometimes multiple techniques are applied until information is extracted. Following are some of the techniques used in Text Mining.
  • Term Based Method
  • Phrase Based Method
  • Concept Based Method
  • Pattern Taxonomy Method
  1. Analysis of Text: After successfully extracting relevant information we are left with abundant amount of information which then can be used for analysis, running statistical models.
  2. Discovery of Knowledge: The final result. After performing the analysis of the text we get the knowledge about the data collected.

Rinse and Repeat : These processes can be repeated several times to achieve timely goals of the organizations.

Other use cases

This blog primarily focuses on the use cases where a certain corporation can harness the power of Text Mining when evaluating the value of their respective brand, however there are many other use cases for which Text Mining can prove highly useful.

  1. In Systems Biology [5]
  2. Detecting abuse indicating user posts, and using social media as a source for automatic monitoring of drug/medication abuse[6].
  3. For Biology and Biomedicine[8]

Challenges in Text Mining

Complexity of our Natural Language, computers are not (yet) capable of understanding the natural language we use to communicate, which is major barrier in the text preprocessing. However, several Natural Language Processing techniques can be used to tackle this issue.

There is also an issue of language ambiguity, in any natural language one word can have different meanings depending on the context of the sentence in which it is used. We cannot completely eliminate the ambiguity from the text as it is supposed to give flexibility and usability [7]. Number of research has been done over past few decades which try to provide a solution over the ambiguity issue.


Scraping the social media for user views sounds sketchy as we do not get users’ explicit permission to use their post. One could also argue that whatever information posted on social media is part of public record and people are aware (?) of that. This could be a topic of a debate.

But we certainly cannot ignore the high value Text Mining delivers for businesses.


Makarand Bhale

Makarand Bhale

