Contributing to a Better Society with Text Mining
The massive daily overflow of data - or big data - creates the need for information seekers to digest and organize this data to make it useful and meaningful. Text mining techniques meet this need by extracting desired information and discovering knowledge from large, unstructured data.
A leading expert in text mining, Professor Min Song of the Department of Library and Information Science at Yonsei University in Seoul, Korea focuses his research on the discovery of knowledge from large natural language data such as social media channels, doctor’s notes, and scientific publications. Professor Song discusses his current research interests and new initiatives in a one-on-one interview.
Please introduce yourself.
I am Min Song and a Professor in the Department of Library and Information Science as well as the Department of Digital Analytics at Yonsei University. I am a text mining guy. I am interested in not only the theoretical aspects of text mining but also practical applications of the state-of-the-art text mining techniques for human-generated text. Before Yonsei, I was the tenured Associate Professor of Information Systems Department at New Jersey Institute of Technology. Before moving to academic, I was a senior software engineer at Thomson Reuters.
What are your research interests?
My current research focuses on understanding texts, unstructured texts, which is a result of expressing people’s thoughts and ideas that represent - by and large - culture, society, and the scientific community at a macro level. A gigantic amount of human-generated data has been generated in an exponential rate. To automatically discover hidden, important golden nuggets buried in such big data, text mining comes into play, and that is my research interest.
Tell us more about the TSMM laboratory.
I am managing the TSMM research lab, which stands for Text and Social Media Mining Lab. It consists of two tracks. One is biomedical text mining, and the other is Social media data mining.
Biomedical text mining focuses on extracting biomedical entities such as gene, disease, and metabolites and their relations from scientific text including doctor’s notes, clinical data, healthcare-related social media sites, and scientific publications.
Social media data mining deals with studying and understanding people’s opinions on social issues. This track is particularly interested in analyzing sentiments and the polarity of people. Polarity is “positive vs. negative” or “like vs. dislike.” It is a binary expression of people’s opinion, whereas sentiment analysis can vary by category, being expressed on a five-scale sentiment (e.g., like, dislike, neutral, very much like, very much dislike) or various moods (e.g., happiness, anger, sadness).
Tell us about your research team.
The students in my team and quite diverse and belong to either one of these two tracks. I have three postdoc fellows and 4 Ph.D. students, eight master’s students and one exchange student. They came from all over the globe, including the U.S., Thailand, China, Mongolia, and the Philippines. 40% are international, and 60% are Korean.
How can your research benefit society?
That is a difficult and important question. I believe that the advancement of science benefits society regardless of the specifics of research, such as the research goal and domain that research is targeted. To me, by discovering any hidden relations or patterns which cannot be seen without big data analytics, my research is to promote understanding across diverse groups so that we can reduce conflict among people and minimize the waste of resources.
Recently, I have established the Socialomics Research Center with multi-million dollar and multi-year funding provided by the National Research Foundation of Korea. Socialomics is a compound term which consists of “social” and “omics,” and “omics” refers to big data. Socialomics is about finding any regularities and mechanisms embedded in social and cultural big data. We are looking into a way of applying biological theories and findings for social and cultural big data.
Can you give a specific example of socialomics?
One example would be a presidential election with several candidates competing against each other. People have different opinions on various policies that each candidate promotes. Socialomics is about understanding what topics or aspects a particular candidate’s policy people like or dislike and what reason. So socialomics is about discovering that hidden information from social media data.
Do you have any interesting projects in mind for the research center?
Socialomics Research Center is a newly established center; I am personally particularly interested in studying social diseases, such as suicide and depression. Why do people have depression? Is it a personal disease or an outcome of social interaction? I am interested in understanding people who have depression or have committed suicide and finding out what factors play a major role in disease.
Tell us about recent trends in text mining and your current projects.
Recent trends in text mining have something to do with applying deep learning or AI techniques for human-generated texts. AlphaGo was an iconic event that showed us the impact of AI on our daily lives. My team is trying hard to lead such a trend in text mining.
One of the recent collaborative works in this respect is with Professor Christian Lovis in the Biomedical Informatics Department at the University of Geneva. This collaboration was first facilitated by a joint research initiative by the University of Geneva and Yonsei University. I contacted Professor Lovis and proposed my idea to collect specific disease-related social media data - for instance, chronic disease from Twitter or Reddit - and how people were talking about those diseases. That is how we started working together.
We are currently working on automatically mining doctor’s notes and clinical data written in French which are collected at the University Hospitals of Geneva. They are considering replacing their legacy system with the biomedical text mining tool I developed called PKDE4J. We are very much excited to see how this new tool can help improve the quality of healthcare services. I just visited Geneva two weeks ago and showed them how this new tool could be applied easily to their settings, and they were very surprised by the quality of the outcomes and the flexibility of the tool.
The findings that we discovered from our collaboration is that people are very anxious about their symptoms or disease. They express their concerns and their anxiety in their own terminology - not in medical terminology but in plain terms. So I believe the outcome of this collaborative work will help understand how we treat patients and how we cure their diseases.
Any words of advice for future students or young researchers interested in the field of text mining?
I want future students in the field of text mining to have enthusiasm and passion for their work. I strongly believe that our current work in this field actually helps others and also benefits society. So if students commit with good ideas, passion, and enthusiasm, then they will transform society. They will make our society better, and people will live happier lives.
What drives you? What propels you to excel in your research field?
I have self-motivation. My motivation is based on the belief that I can make a difference. I can’t do this alone, but with others – collaborators and students. I want to contribute to society, to make it better. I benefit greatly from society, and this is my turn to return my gratitude. I can give back by dedicating myself to my work, pursuing strong research and generating meaningful outcomes. That is my motivation.
Professor Hee Seung Lee
Professor Young-Hoon Kim