«Bachelor dissertation Alina Varfolomeeva Institute of Hospitality Management in Prague Department of Hospitality Management Major field of study: ...»
However, those negative comments seem subjective, as about the same amount of reviews state that the location is nice, only several minutes away from the metro and tram stop, and in a short walking distance from the Wenceslas square.
One more very subjective point of reviews is comments, made about hotel staff.
While some guests describe front desk workers as rude and arrogant, ot hers say that receptionist was “helpful and courteous”. It is important to mention that sometimes such polar reviews are made about the same person. It can mean only one thing: we are all people, and even the most courteous and ethical personnel cannot endure all the customers, who come to the hotel.
Next, we captured some negative comments on the quality of breakfast, the size of the rooms and overall hotel category. A couple of guests mention that breakfast was cold and had no vegetables. 8 reviews complain that the room was too small.
Several reviews stated that in whole the hotel is Ok, but does not deserve 4-star rating, as it lacks some features, which are expected in a hotel of this level. In
relation to that the following things were mentioned:
Hotel is lacking: tea sets in each room, complimentary bath-robes, slippers Rooms are rather plane and basic Some parts of the hotel need renovation as they are not well maintained Overall, those comments are true to life, but the price for the most of the rooms is quiet affordable.
Speaking about price we would move on to the positive points found during manual analysis. About 90 % of collected reviews state that hotel is good in terms of value for money. There was no review saying that hotel is too pricey. Aro und 60% of them are content or even happy with overall hotel experience, stating that they would stay in the hotel Assenzio again, or that they would recommend it.
Other positive feedback was given on the following:
Overall, majority of the guests noticed, that rooms and bathrooms are very clean and neat. Only one of the analyzed reviews contained an opposite saying.
Describing the rooms, all of the guests from the analyzed set of comments agree that hotel has very good beds with very comfortable matraces, so that some guests “could barely get out in the morning”.
Around 85% of the reviewers mentioned that breakfast was at least good, with most comments saying that it was great with nice selection of both cold and hot dishes.
As was already mentioned, positive feedback was received also on location of the hotel and work of the hotel staff.
On general more negative comments were received from couples, than from other groups of customers, however not all couples left negative reviews.
3.3 Text Mining Analysis As was previously mentioned, text mining module of StatSoft’s Statistica software will be used for machine analysis of customer comments. STATISTICA Text Miner transforms free form text and creates numeric indices and components that can be used to broaden the analysis of customer comments.
The basic idea of the text mining tool is to index every individual word in the set of documents. By doing so documents are represented as a “bag of words”— meaning the set of words that they contain, along with a count of how often each one of them appears in the document. The count of each word is called a frequency. This representation results in loss of the sequential information given by the word order, but at the same time it creates a solid basis for information retrieval techniques.
As we mentioned at the beginning of this chapter, each review was saved as a separate txt documents for the purpose of research. Those 44 files were used in STATISTICA Text Miner for creation of the “bag of words”, or a documents vocabulary, containing all the words from those documents.
To make the „bag of words” even more useful, the program can discard common words or “stop words”, mainly for efficiency reasons, although suitable compression techniques state the need for this (Witten et al., 1999).
It was decided to use the stop words list, provided with Statistica software, considering that some more words might be added in this list during the course of research.
To reduce the size of vocabulary even further, it is possible to use:
1) inclusion words, when words other those from the list are discarded,
2) phrases – word combinations, which are treated as a single word.
3) synonyms: to replace or combine words.
It was chosen to create a list of phrases, based on results of manual research, which contains phrases like: would recommend, clean rooms, helpful staff, a bit far, value for money, etc. The whole list of phrases is presented in the Appendix A.
The STATISTICA Text Miner uses stemming to reduce the vocabulary of indexed words. The program reduces similar words like travel and travelling to its stem, for which then the frequency is counted.
It is possible to select in what percentage of files each word will appear in order to avoid using very rare words and misspellings. It was decided to set a minimum border of 11% for an appearance of each particular word in the document. It means, that a word should appear in at least 11% of the documents from the 44 retrieved for analysis; that is in at least 5 files.
Moreover, the maximum for the number of words which could be selected was stated at 200 words, to make the text mining result even more useful.
After running the text miner, 1077 meaningful words and phrases were found in the initial set of documents, from which 150 words were used to create a spreadsheet. Most of the words, which appeared in the documents only 1 to 4 times, were discarded, unless they were synonyms of the same word. Also some very often repeated words, such as review and contributor, were removed from the vocabulary.
Figure 1 - Term-document Frequency Matrix
The spreadsheet, part of which is presented in the Figure 1 – Term-document Frequency Marix, is formed by rows, representing the documents in a set, and columns – key words of the document vocabulary. Another way to call this spreadsheet is a term-document frequency matrix, where each sell is populated by a frequency of each particular word in each particular document.
The work on the spreadsheet involves analyzing word frequencies and looking for patterns in the appearance of the same set of words in different documents.
Words summary can be created based on the information in the spreadsheet to show most frequently appearing words, some of which might bring insight on the most popular points of customers’’ reviews.
Figure 2 – Word Summary Some of the stemmed words are accompanied by the most common word formations that can be found in the text of reviews. Figure 2 shows that the most frequent words are star, hotel and room, which can be found practically in each document and not once. We can discard those words, but before that, notice which review does not say anything about the room, as it is quite unusual to have a review about a hotel experience and saying nothing about the room.
Going back to the term-document frequency matrix, we find a document/ review #37, which does not say anything about a room. However it is still informative. It contains the following words: august, breakfast, clean, Czech, free, good, hotel, lobby, located, place, Prague, quiet and Wi-Fi. Based on this set of words we can assume that a guest from Czech Republic was staying in the hotel Assenzio in August, and Wi-Fi was free in the lobby. Most probably if we would return the world “only”, which was presented less than in 5 documents, to the vocabulary of the spreadsheet, this word would belong to the review #37. This means that the initial review was complaining about availability of free Wi-Fi only in the hotel lobby.
After discarding the most and the least frequent words, we can move on to the analysis of the term-document frequency matrix. Rows and columns are being sorted in a way to be able to find common patterns in the appearance of the words together in the documents.
During the analysis of the resulting matrix it was noticed, that words room, cleanliness or clean and couple almost always appear together. Based on this we can make an assumption, that group of guests represented by couples commented on how clean are the rooms in the hotel. Other words that appear in the same documents as the category couples are: breakfast and cold, which might mean that complaints on cold breakfast were received only from couples.
Words city, center, located can also be almost always found in the same sentences, but without additional analysis and more words in correlation, it is impossible to tell if the reviews containing those words are positive or negative.
Chapter 4. Findings and Results
4.1 Comparison of Manual and Machine Analysis In testing our hypothesis we compared results of the text mining research to the results of the manual research in order to find out how precise and efficient text mining is and whether or not it can be used separately from other methods of analysis.
On the most general level, term-document frequency matrix (Appendix B) can present us with information on groups of travelers, the month they visited the hotel Assenzio, their country of origin.
The four groups of travelers, discovered by text mining software are: families, groups of friends, couples and solo travelers. The matrix shows that there were 7 reviews from family travelers, 14 – were made by friends who travelled together, there were 15 couples and only 2 solo reviewers. However, from the manual research we know, that one more group - business travelers - is missing, also some of the collected reviews haven’t mentioned to which group they belong.
Nevertheless, the result on identifyi ng the type of traveler is quite consistent with the manual research conducted previously.
As the language of selected number of reviews was English, it was expected to see that customers have mainly originated from United Kingdom. The text mining tool also noticed that 3 guests came from Italy, 2 – from Czech Republic and 2 are from Finland. However, because of vocabulary resizing a couple of other countries of origin, such as USA and Poland, got discarded.
Based on the data received from the words summary, the most popular topics of
comments are the following (see Figure 2 Word Summary, Chapter 3.2):
Those topics are expected to be the most spoken about for any hotel, the spreadsheet will be helpful to see if reviews related to those words are positive or negative, and even give the insight into each review.
How was already mentioned in the previous chapter, analysis of the text mining spreadsheet can show correlations between key words and documents in which they appear. Some of those correlations can be easily detected. Appearance of such words as price and value in the same sentences makes it a pattern, most likely stating that the guests are satisfied with the price they pay for the services they receive in the hotel. Even much more often appear together words quality and value. Even though quality can be related to different aspects of the hotel experience, it is possible to assume that those words have a linear positive correlation.
Speaking of problem with air conditioning, which was described during the manual analyses, the spreadsheet states 5 cases of phrase air conditioning and 9 times the word air is mentioned. Together with those words appear window and small or didn’t and small. Only ones word spacious is used in relation to all of them. So most probably the negative comments about air conditioning were received mainly from guests, who were accommodated in small rooms, where they had to open windows, as the air conditioning was not helping in such a small space. Between the reviews on this subject, there are also positive ones, as frequencies are found for the phrases: would stay and would return.
One of the more difficult concepts to analyze through text mining spreadsheet is the location. Only 5 out of 44 reviews do not contain related information.
Nevertheless, some of the related words help us understand which reviews mentioned location as a plus. Notice how words located, just and walk appear together in the sentences in the Figure 3 below. We can assume that those reviewers were content with location of the hotel, and considered it is just a short walk from metro or transport, or Wenceslas square. On the other hand, 3 reviews clearly state that hotel is far from the center of the city, which is also confirmed by the term-document frequency matrix.
Figure 3 – Analyzing Location Concept Figure 3 – A nalyzing Location Concept shows that the column cont aining the word just was sorted by descending of its frequency to represent its correlation wit h the location of the hotel.
Most of the remaining cases need more deep analysis, as they are not directly connected to the concept of location, and we cannot tell just by looking at the matrix, whether those remaining comments were positive or negative reviews of location. It means that attitudes towards hotel’s location can be more easily identified through manual analysis.
In terms of staff, the findings of text mining were quite similar to those of manual analysis. Words friendly, front and desk appeared together in the sentences, sometimes interchanged by synonymous nice or helpful and receptionist or staff accordingly.
Some other concepts, found during the manual analysis, got discarded from the text mining spreadsheet due to their low frequencies. That means we would not be able to receive information on guests’ requests and propositions of what needs to be improved in a hotel, as most of them are written in different words, even though the idea behind them night be the same. This indicates a necessity of other kinds of analysis for better outcomes.
As a result of comparing text mining analysis to the manual one, the following was