«Bachelor dissertation Alina Varfolomeeva Institute of Hospitality Management in Prague Department of Hospitality Management Major field of study: ...»
text mining using term-document frequency matrix is rather precise in terms of determining the frequency of the word’s appearance in each particular document in the set, compared to manual analysis text mining is less time consuming in case the textual information was prepared for the analysis beforehand, text mining is more efficient than manual a nalysis in terms of defining most talked about topics, read-through combined with manual analysis showed to perform better in discovering more topics which the guests have discussed in their reviews, analyzing term-document frequency matrix in many cases does not help to distinguish between positive and negative reviews about a particular key feature.
4.2 Encountered Difficulties As was stated in the Chapter 3 the case study was conducted on a base of 44 reviews in English language, retrieved from TripAdvisor.com. Some difficulties and problems which were encountered during the preparation of this research study will be described in the following paragraphs.
4.2.1 Language Barrier While English comments can be easily processed by StatSoft STATISTICA Text Miner, we have encountered some difficulties with exporting Russian language comments into the software database.
First of all, Russian language is not always correctly transformed into the text miner. Letters become represented as random symbols. Second, in case of successful integration of Russian documents into the set of analyzed documents, the program fails to run the process of text mining and collapses.
Those problems were encountered during the work with Statistica software and might not be representative of other text mining solutions, but in the case of our research inability to use Russian comments for the analysis resulted in lost data, which otherwise might have improved the performance of the text mining process and made the final matrix more representative of the real situation in the hotel Assenzzio****.
It was decided not to use reviews of any other language in the case study, as the thorough manual analysis would not be possible for the most of them, hich would have resulted in discrepancies in final results.
4.2.2 Web Crawling Complications At the beginning of research process it was considered to use web crawling for the purpose of information retrieval. Statistica software has a built-in module for that purpose.
The query was formed to search the TripAdvisor web-site for comments on hotel Assenzio****. The process took several minutes and resulted in a database, containing an enormous number of hyperlinks, most of which were not even related to the search query. A lot of them should have been analyzed and discarded in order to retrieve only necessary information. There were several other trials to run the web crawling process and integrate obtained textual information into the research, but all of them needed additional sifting through to become at least partly usable in the case study.
Chapter 5. Conclusions and recommendations
5.1 Conclusions in Relation to Hypothesis With the aim of providing the insight on the use of text mining software and its possibilities in the sphere of hotel customers feedback on example of Hotel Assenzio**** in Prague, using text mining module of Statistica software, the first hypothesis derived from a literature review was (H1): "Text mining is a more precise and efficient method of analysis of customer feedback, than manual analysis".
Comparison of text mining to the manual analysis, described in Chapter 3 and presented in Chapter 4, uncovered differences between the results of each of the methods. Text mining appeared to be less precise than manual research i n parts of the case study, which contained both positive and negative reviews on a particular aspect of the hotel experience, or which described them in a different way. Text mining appears to be more efficient than manual research until the data presented in the term-document frequency matrix is visually can be linked together without additional manipulations. Therefore, the first hypothesis was rejected, based on those findings.
Second hypothesis retrieved from the literature review stated that: "Text mining can be used separately from other analysis techniques and present good results".
However the outcomes of the research present us with the fact that it is rather difficult to only use text mining for analyses of customers’ feedback. It becomes very inefficient to look for patterns in the text without the aid of data-mining, or without previously reading through the documents. Therefore, the second hypothesis is also rejected. Nevertheless, objectives set out to carry out the research were met. We identified the idea and process of text mining by conducting the literature review. The following ways to use text mining were
Natural language processing Information retrieval Information Extraction Sentiment analysis The hotel customers’ feedback was analyzed using both manual method and Statistica Text Miner software. We uncovered the differences in results of research determined by using either manual or text mining approach to analyze hotel customer comments.
5.2 Conclusions to the Research Quest ion The research question proposed in this study was " How efficient is it to apply text mining techniques to analysis of customer comments of a particular hotel?" with conclusions drawn upon analysis of the results which rejected both hypotheses set out, findings from this study indicated that application of text mining methods to analysis of hotel customer comments, applied to a particular hotel, appeared to be potentially useful, but not very efficient without use of other techniques.
Carried out case study have shown that manual analysis, conducted before text mining process, helped a lot with understanding and finding patterns in the matrix.
Without it some of the correlations found during the research might have been more confusing and less precise.
However analysis of the spreadsheet showed the possibility of text mining to discover hidden information from the text and create assumptions, which can later be tested on practice, when applied to a particular hotel.
5.3 Recommendations and Limitations The conducted research leads to understanding, that there is still much to be done in the area of text mining and its applications to the hospitality sphere, and analysis of customer comments in particular.
One particular way of continuing this research lies in the area of sentiment analysis, which was spoken about in the literature review section of this thesis. It can be used to avoid problem cases, when one key feature is reviewed both in negative and positive way. Sentiment analysis should provide a way to distinguish between such reviews when applied to the term-document frequency matrix, thus making concepts behind the reviewed document set clearer and more precise.
One more possibility is to collect reviews from several internet reservation systems (IRS) like TripAdvisor.com and Booking.com in order to compare, whether the difference between structure of the reviews is affecting the results of text mining analysis, or they are consistent no matter what type of IRS is taken into consideration. A scope of reviews from different web domains can also broaden the base for text analysis and make the results more precise and visible.
Another way to continue this research is to use data mining methods to work with the results, presented by text mining. Data mining will be able to find correlations and patterns more quickly, as text is turned into figures and presented in a way, which is understandable to computers.
The conducted case study research was carried out for a particular hotel in Prague
- hotel Assenzio****. However, it does not mean that the results of the study can be of importance only to the management and staff of hotel Assenzio****. The research may be used as an example for any other hotel of similar characteristics, especially for the hotels, situated in the same area of Prague, as the hotel Assenzio****. It provides a basis and a direction for any similar study.
Although text mining has not proved its efficiency in this research, the area has great potential to be researched and its methods can be applied not only to analyses of customer comments, but to conduct a research on competitors, and acquire intelligence needed for successful business development.
List of References Aciar Silvana “Mining Context Information from Consumer’s Reviews” in Proceedings of the Context-Aware Recommender Systems (CARS) Workshop, 2009.
Appelt, D.E. (1999) “Introduction to information extraction technology.” Tutorial, Int Joint Conf on Artificial Intelligence IJCAI’99. Morgan Kaufmann, San Mateo.
Tutorial notes available at w w w.ai.sri.com/~appelt/ie-tutorial.
Assenzio Hotel, 2014. Available at http://www.hotelassenzioprague.com/.
Belkahla, Wafa and Abdelfattah Triki (2011), “Customer Knowledge Enabled Innovation Capability: Proposing a Measurement Scale,” Journal of Knowledge Management, 15 (4), 648–674.
Bjørkelund, Eivind, Burnett Thomas H., Nørvåg Kjetil, „A Study of Opinion Mining and Visualization of Hotel Reviews” in Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services, 2012, Bali, Indonesia, 229 - 238.
Boiy, Erik, Moens Marie-Francine (2008), „A Machine Learning Approach to Sentiment Analysis in Multilingual Web Texts”, Journal Of Information Retrieval, 12 (5), 526 – 558, October 2009.
Caemmerer, Barbara and Alan Wilson (2010), “Customer Feedback Mechanisms and Organisational Learning in Service Operations,” International Journal of Operations & Production Management, 30 (3), 288–311.
Chakrabarti, S. (2003), Mining the Web. Discovering Knowledge from Hypertext Data. Morgan Kaufmann Publishers Choice Hotels Deploys Clarabridge Text Mining Solution, 2009. Available from http://clarabridge.com/about-us/news-and-events/press-release/choice-hotelsdeploys-clarabridge-text-mining-solution/.
Cobanoglu Cihan, “Analyze Online Reviews for Clues to Customer Satisfaction”,
Hospitality Technology, February 14, 2012. Available from:
Feldman R., Practical text mining, PKDD, 1998.
Feldman, R., Fresko, M., Hirsh, H., Aumann, Y., Liphstat, O., Schler, Y. and Rajman, M. (1998), “Knowledge Management: A Text Mining Approach”, Journal PAKM, 98, p. 9.1-9.10.
Feldman, R., Klösgen, W., Ben-Yehuda, Y., Kedar, G. and Reznikov V., “Pattern based browsing in document collections”, Principles of data mining and knowledge discovery, 1997, 112-122.
Feldman R. and Sanger J. (2007), The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge: Cambridge University Press.
Fenn, Jackie and Hugo LeHong (2012), “Hype Cycle for Emerging Technologies 2012,” Gartner, accessed November, 2013. Available from http://www.gartner.com/ DisplayDocument?doc_cd=233931.
Haruechaiyasak, C., Kongthon, A., Palingoon, P. and Sangkeettrakarn, C., “Constructing Thai opinion mining resource: A case study on hotel reviews,” in Proceedings of the Eighth Workshop on Asian Language Resouces, 2010.
Hearst, M.A. (1999) “Untangling text mining.” Proc Annual Meeting of the Association for Computational Linguistics ACL99. University of Maryland, June.
Hearst M., “What is Text Mining?” 17 October 2003, Available from:
Hu, Minqing and Bing Liu (2004), “Mining and Sumarizing Customer Reviews,” Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. New York, NY: AMC, 168-177.
InformationWeek 2008, “Gaylord Hotels Taps Text Mining to Boost Guest Satisfaction”, Doug Henschen. Available from: http://www.informationweek.com/ software/information-management/gaylord-hotels-taps-text-mining-to-boostguestsatisfaction/d/d-id/1067486?.
Janasik, Nina, Timo Honkela, and Henrik Bruun (2009), “Text Mining in Qualitative Research: Application of an Unsupervised Learning Method,” Organizational Research Methods, 12(3), 436–460.
Kasper, Walter, Vela Mihaela, „Sentiment Analysis for Hotel Reviews “ in Proceedings of the Computational Linguistics-Applications Conference.
Jachranka, Poland, 2011. 45–52.
Lau, Kin-Nam, Kam-Hon Lee, and Ying Ho (2005), “Text Mining for the Hotel Industry,” CornellHotel and Restaurant Administration Quarterly, 46 (3), 344–362.
Lee, M.J., Singh, N., & Chan, E. (2011) “Service failures and recovery actions in the hotel industry: A text-mining approach”. Journal of Vacation Marketing, 17(3), 197-208.
Lent B., Agrawal R.and Srikant R., 1997. „Discovering Trends in Text Data bases“.
In Proceedings of the 3 rd International Conference on Knowledge Discovery, KDD-97, Newport Beach, CA.
Luhn, H. P. (1959). “Auto-encoding of documents for information retrieval systems”. In M. Boaz, Modern Trends in Documentation (pp. 45-58). London, Pergamon Press.
Macdonald, Emma K., Hugh Wilson, Veronica Martinez, and Amir Toossi (2011), “Assessing Value-in-Use: A Conceptual Framework and Exploratory Study,” Industrial Marketing Management, 40 (5), 671–682.
Maron, M. E., & Kuhns, J. (1960). „On relevance, probabilistic indexing and information retrieval“. Journal of the Association for Computing Machinery, 7(3), 216–244.
Martin, J.D. (1995) “Clustering full text documents.” Proc IJCAI Workshop on Data Engineering for Inductive Learning at IJCAI-95. Montreal, Canada.
Nahm, U.Y. and Mooney, R.J. (2000) “Using information extraction to aid the discovery of prediction rules from texts.” Proc Workshop on Text Mining, Int Conf on Knowledge Discovery and Data Mining KDD-2000. Boston, USA, pp. 51-58.
Nasukawa T., Nagano T. (2001), „Text Analzsis and Knowledge Mining System“, IBM SYSTEMS JOURNAL, 40(4), 967-984.
Ordenes F., Burton J., Theodoulidis B., Gruber T., Zaki M., “Analyzing Customer Experience Feedback Using Text Mining: A Linguistics-Based Approach“, Journal of Service Research, 17 (3), 278-295 (2013).
Pabarskaite, Z. And Raudys, A. (2007), „A process of knowledge discovery from web log data: Systematization and critical review“, Journal of Intelligent Information Systems, 28 (1), 79 -105.