«Bachelor dissertation Alina Varfolomeeva Institute of Hospitality Management in Prague Department of Hospitality Management Major field of study: ...»
Application of Web-based Text Mining to Analysis of
Hotel Customer Comments
Institute of Hospitality Management in Prague
Department of Hospitality Management
Major field of study: Hospitality Management
Dissertation advisor: doc. RNDr. Zdena Lustigová, CSc., MBA
Date of submission: 2014-11-13
Date of defense: 2015-01
that the bachelor dissertation titled “Application of Web-based Text Mining to Analysis of Hotel Customer Comments” was written by me independently, and that all literature and additional material used are cited in the bibliography and that this version is exactly the same as the work submitted electronically.
In accordance with §47b law no. 111/1998 coll. on higher education institutions, I agree to my dissertation being published in its complete form in the publicly accessible electronic database of the Institute of Hospitality Management in Prague.
…………………………………… Alina Varfolomeeva In Prague on 13.11.2014 Abstract VARFOLOMEEVA, Alina. Application of web-based text-mining to analysis of hotel customer comments. [Bachelor dissertation] Institute of Hospitality Management in Prague. Prague: 2014. 40 pages.
With development of internet reservation portals, booking systems and social media platforms, there is no shortage of customers` feedback especially in hospitality sphere. The ability to monitor and carefully analyze such feedback is crucial for the success of hotel business. Consequently, text mining, a method used to extract meaningful patterns from unstructured data, gains popularity. This study discusses the application of text mining to analysis of hotel customer comments. The chosen framework incorporates methods of case study research and quantitative data analysis to carry out this study. The results of the study show that, while presenting good results in some parts of research, text mining techniques still cannot be used on their own for a proper analysis, and require presence of additional instruments of analysis.
Key words: customer comments, feedback, manual analysis, matrix, text mining Table of Contents Chapter 1. Introduction
1.1 Bakground and Rationale
1.2 Research Question and Objectives
1.3 Outline of The Thesis
Chapter 2. Literature Review
2.1 The Roots of Text Mining
2.2 Text Mining Vs Data Mining
2.3 Text Mining Techniques
2.3.1 Information Retrieval
2.3.2 Information Extraction
2.3.4 Tokenization and Stemming
2.4 Text Mining in Hospitality and Tourism Industry
2.4.1 Sentiment Analysis
2.4.2 Other Cases of Text Mining in Hospitality
2.5 Customers Feedback
Chapter 3. Methodology
3.1 Research Design
3.2 Data Collection Procedure
3.2.1 The Object of Case Study
3.2.2 Collection and Preparation of Reviews
3.3 Text Mining Analysis
Chapter 4. Findings and Results
4.1 Comparison of Manual and Machine Analysis
4.2 Encountered Difficulties
4.2.1 Language Barrier
4.2.2 Web Crawling Complications
Chapter 5. Conclusions and recommendations
5.1 Conclusions in Relation to Hypothesis
5.2 Conclusions to the Research Question
5.3 Recommendations and Limitations
List of References
Chapter 1. Introduction
1.1 Bakground and Rat ionale Hospitality is one of the most highly competitive industries in the modern world.
Hotels have to operate in a rapidly changing environment, where client’s attitudes towards any particular accommodation establishment can make it or break it.
With the pervasive use of computers and the Internet it has become extremely important to maintain guests’ satisfaction with overall hotel experience. Nowadays any hotel client can share his positive or negative comments with the whole world via means of social media, discussion forums and internet reservation portals. For any hotel’s success it is crucial to collect and analyse the data contained in those reviews to be able to improve hotel’s performance.
However, as reviews mostly come in a form of text, an unstr uctured data, it becomes more and more difficult for hotel personnel to read through and manually analyze the amount of comments the establishment receives. Even more difficult is to continuously monitor how the reviews change overtime and how it affects t he hotel’s image. While smaller hotels might receive only couple reviews daily, bigger and chain hotels have to deal with enormous amounts of comments, making it virtually impossible to process them without the use of computers and specially designed programs.
"We are drowning in information but are starving for knowledge," says Mani Shabrang, technical leader in research and development at Dow Chemical Co.'s business intelligence center in Midland, Mich 1. This saying is extremely appropriate for the hotel industry too. Only processed information, which is received from carefully analyzed reviews, can bring insight on what is needed to be improved in order for business to retain old customers and attract new ones.
Text mining, a rather young area of computer science, might turn out to be the solution to processing large amounts of unstructured data from customers’ reviews. It has strong connections with natural language processing, information Robb, D. Text mining tools take on unstructured dat a, Computerworld, June 21, 2004 retrieval, data mining and knowledge management. Text mining seeks to extract useful information from unstructured textual data through the identification and exploration of interesting patterns (Feldman and Sanger, 2007).
Nowadays some of the bigger hoteliers 2 already exploit text mining solutions, and more and more hotels understand the need in such software for more efficient analysis of customers’ feedback. Therefore availability and ease of use of appropriate text mining software is crucial for hotel’s high performance. The aim of this study is providing the insight on the use of text mining software and its possibilities in the sphere of hotel customers feedback on example of Hotel Assenzio**** in Prague, using text mining module of Statistica 3 software.
1.2 Research Question and Objectives The prosperity of any hotel in hospitality industry depends on its customers.
Without customers the hotel does not exist, and without understanding what makes customers return, or not to return; what makes them recommend a hotel or discourage friends and relatives from visiting; what items, services, and features create value for customers, it is impossible to have a successful business.
In order to understand the efficiency of analyzing and monitoring customers’ feedback using text mining software, the research question is addressed
How efficient is it to apply text mining techniques to analysis of customer comments of a particular hotel?
After having identified the research problem, two hypotheses (H.1 and H.2 ) were
generated from the literature review conducted in Chapter 2:
H1.: Text mining is a more precise and efficient method o f analysis of customer feedback than manual analysis.
For more information see http://clarabridge.com/about -us/news-and-events/press-releas e/choicehotels-deploys-clarabridge-text-mining-solution/ on Choice Hotels Int ernational, Inc. And http://www.informationweek.com/software/information-management/gaylord-hotels-taps-textmining-to-boost-guest-satisfaction/d/d-id/1067486? for Gaylord Hotels use of text mining For more information on StatSoft and Statistica software see statsoft.com H2.: Text mining can be used separately from other analysis techniques and present good results.
In order to achieve the aim of the study the following objectives need to be met to
answer the research question:
1. Identify the idea and process of text mining.
2. Determine the ways in which text mining can be used.
3. Analyze hotel customers’ feedback using Statistica software.
4. Uncover the differences and similarities in results of research based on using manual or text mining approach to analyze hotel customer comments.
The first objective is going to be answered through research of the literature on the subject of text mining, hotel customer feedback, and the findings which were made before the current study in each of these areas.
The second objective will be covered partly in the process of literature review and mainly discovered I methodology part of the thesis.
The third objective will be reached through using of text mining software and the following analysis of results.
The forth objective will be concluded in the results and findings based on the outcomes of the case study conducted using both manual and machine methods of analysis.
1.3 Outline of The Thesis The study is comprised of five chapters and appendices. Chapter Two provides theoretical framework of the study contributing to the understanding of text mining, its applications and ways of use. It leads to generating of the hypothesis from the literature reviewed in testing the theory and searching for the answer to the research question.
Chapter Three formulates the hypotheses, which are to be further tested to reveal if they are true to life. Also it provides a description to the methodology used in this research, starting with the research method and continuing by explaining the data collection and analysis procedure. At the end of the chapter, text mining spreadsheet is shown to describe the process of the text mining and be able to compare its results with the outcomes of manual research.
Chapter Four presents the findings obtained from the text mining research. Its results are being compared to the results of manual research, followed by presenting analysis with focus on the hypotheses set out for the study. Also it mentions the troubles and limitations met in the process of the research.
Chapter Five completes the study by presenting conclusions to the research hypotheses as well as to the research question. Implications and suggestions are provided in summarizing the conclusions.
Chapter 2. Literature ReviewText mining can be defined as a process of identifying and extracting useful information from textual documents. Feldman and Sanger (2007) characterize text-mining as a knowledge-intensive process in which a user interacts with a document collection over time by using a suite of analysis tools. According to StatSoft the purpose of text mining is processing of unstructured information and extraction of meaningful numerical data from the text, which makes the information contained in the text more accessible to various data mining techniques. Using text mining one can derive summaries from the documents in the set and retrieve key concepts for the whole set of documents. The documents can be analyzed in order to determine similarities between them, which can help with classification and structuring of data. In general, text mining turns text into numbers, which can be later incorporated in other data analyses to reveal interesting statistical results.
2.1 The Roots of Text Mining The roots of text mining lie in the area of information retrieval. The process of document classification, which is used in text mining, is similar to document indexing (Weiss et al., 2005). Document indexing was studied in depth already in in the late 1950s and 1960s (Luhn, 1959); (Maron and Kuhns, 1960).
However the term text mining itself was first brought up about 15 years ago by M.A. Hearst. According to Hearst (1999) a field of “text data mining” had “a name and a fair amount of hype, but as yet almost no practitioners.” Even the meaning of the term was unclear, as Hearst analyzed and defined anything but text mining.
Feldman (1998) around the same time discusses some of the distinctions which exist between classic data mining preprocessing operations and those of text mining systems. Feldman, Fresko, Hirsh et al. (1998) and Nahm and Mooney (2000), indicate text mining’s dependence on information extraction methodologies – especially in terms of preprocessing operations.
Steps to understanding the importance of text-mining were made by Lent, Agrawal, and Srikant (1997) in the seminal early work for identifying trends in large amounts of textual data. Feldman, Kloesgen, Ben-Yehuda, et al. (1997) provide an early analyses of knowledge discovery based on co-occurrence relationships between concepts in documents within a document set.
With growing amount of unstructured textual information and accessibility of online textual sources increases the need in both text-mining and web-mining techniques. At 2003 Hearst formulates his definition of text mining as "the discovery of new, previously unknown information, by automatically extracting information from different written sources." He divides terms data mining and text mining when he notes that in "text mining the patterns are extracted from natural language rather than from structured database of facts."
2.2 Text Mining Vs Data Mining There is no shortage in data nowadays. A consequence of the pervasive use of computers is that most data originate in digital form. Numerous transactions, which ones were made on paper, are now in paperless digital form, causing lots of data to be available for further analysis.
Data comes in numerical form. However, text mining works with unstructured information in a form of text, where the contents are readable and the meaning is obvious. This is the first distinction between data mining and text mining (Weiss et al., 2005). But it does not mean that the two concepts are distinct from each other.
They might use different approaches, but in the end the text will be processed and transformed into data.