Visualisation of #EURegionsWeek 2017 on Twitter
In the Visualisation of #EURegionsWeek 2017 on Twitter, you can see the social graph of all Twitter exchanges around the #EURegionsWeek hashtag during the European Week of Regions and Cities that took place in Brussels on 9-12 October 2017.
If you are interested in knowing more about the approach that conditioned my choices, about the process and which tools I used, what follows is for you.
I pursued two goals in this visualization exercise:
- To learn how to visualize data in a beautiful social graph that can provide a message that is otherwise hidden in data. This is entirely part of my passion for visuals that offer sense
- To avoid having to write lines of code and to use free tools as much as possible
The starting point: collecting raw data
Any attempt to visualise data requires that you have a data set at your disposal. Extracting raw data from a social media platform, in a format that is manipulable, is less simple than it seems.
There are two ways to extract data from social media. One is to query the platform API for items posted in the past. The other way is to listen to posted items live, intercept and save them. There are tools available for both ways of working, free and not, of various qualities, and which require coding or not.
TAGS is a free Google Sheet template that allows you to query the Twitter's API. I used TAGS for another similar visualisation on the #RegioStars hashtag, but I quickly came across the limitations of the API. Twitter states them as follows:
"The Search API is not complete index of all Tweets, but instead an index of recent Tweets. At the moment that index includes between 6-9 days of Tweets."
For this visualisation, so I discarded TAGS and used Digimind which is a ""Smart software to listen, engage, analyze, and report tool", as says their web site. Access to features is not free but the European Commission has a license. Using a paid tool was a little against my goal but its availability and its completeness justified its use. For my debut with Digimind, I received the help of my colleague Corinne, @CocoDeBrux, who manages our @EU_Regional account. She put the tool in listening mode on several hashtags (see the list here below) and showed me how to export raw data, sentiment analysis and much more.
Raw data processing
One big advantage of using Digimind is that the tool can listen to several social media, not only to Twitter, and to several key words and hashtags. Thanks to Corinne, I was able to generate an Excel sheet containing the activity on all hashtags on all social media for the period from 8 October to 14 October 2017. After some manipulations to remove duplicated items, the final raw data set was ready to be exploited with a total of 10615 posts collected on Twitter, Facebook, Instagram, Google+, and other platforms.
From this point, I continued with Microsoft Excel to manipulate data containing account names and text in order to generate the 2 data source files for Gephi (see below). And I used Google Spreadsheet to manipulate anonymised data and sent it to Google Data Studio.
With 97% of the overall activity on all social media, Twitter can be considered as a good representative of what happened on social media during the EURegionsWeek 2017. Therefore, only Twitter activity is visualised in the social graphs.
The dominance of Twitter on other platforms is clear on the chart (note the logarithmic scale used on the X-axis to keep other platforms visible). Twitter hosted alone 97% of the overall activity on all social media with 10.265 posts.
Visualisation with Gephi
The open-source and free Gephi software is at the core of my visualisation process. "Gephi is the leading visualization and exploration software for all kinds of graphs and networks" and I used it to create the zoomable social graphs of the Twitter activity.
How to produce visualisations with Gephi is beyond the scope of this article. I suggest a Google search to find articles and tutorials about it.
Gephi allowed me to visualise how the 3727 Twitter accounts (nodes) are interconnected in 5434 lines (edges), indicating who is retweeting and citing who in the 10265 collected tweets. Thanks to the Gephi modularity function, nodes are grouped in clusters with the same color, making it easier to identify how they divide into distinct groups. The size of the nodes indicates the number of connections with other nodes ponderated by the number of exchanges on one connection. The thickness of the edges precisely indicates the importance of these exchanges between two nodes.
For those who would like to play with my data source files and generate other visualisations, you can download here the nodes table and the edges table in CSV format. I will be glad that you share your results with me, @jihan65
As said before, my colleague Corinne set up Digimind to listen to various hashtags on social media, starting by the official one: #EURegionsWeek. Many participants used other hashtags in their social activities, often in conjunction with the official hashtag but not always. The final picture is quite good for the #EURegionsWeek hashtag occupying 88% of the space but there are still too many hashtags in use for the same conference.
Another killing feature of Digimind for social media analysts is its ability to offers sentiment analysis. From raw data extracted from the tool, we can say that four fifths (79%) of what has been posted on social media is considered as neutral or without sentiment. The last fifth (21%) is entirely positive sounding since the negative sentiment is null. Especially on twitter, we note 38% of positive sentiment, only 2% of negative, and 60% of neutral or without sentiment.
English remains the most used language across all social networks. Far behind, we find the unexpected Dutch (at least to me) followed by Italian. We find French in 4th place only.