To download chats (I think it tries to get your user name password and store it the first time you use it but you can manually add to arguments each time):
Code:
library(talkstats)
x <- ts_chatbox(splitDate = FALSE)
"If you torture the data long enough it will eventually confess."
-Ronald Harry Coase -
We were looking at association of chatters. I'm working on an average distance measure function in R that goes with the visual representation as a gantt plot as seen below.
The code is pretty simple but relies on the qdap package that you'd have to download from my github. Also you'll need the talkstats package from there as well:
library(qdap); library(talkstats)
dat <- ts_chatbox()
#plot 1 colored
x <- with(dat, gantt_plot(dialogue, person))
#plot 2 black
x + scale_color_manual(values=rep("black", length(levels(dat$person))))
#or
with(dat, gantt_plot(dialogue, person, bar.color="black"))
#plot 3 faceted
with(dat, gantt_plot(dialogue, person, date, space = "free"))
To get a pdf of the graphics together -click here-
Plot 1
Plot2
Plot2
Too big to display. -click here- instead.
The level of detail in the pdf is well worth it. png tends to lose some smaller time durations.
The next step for this is to finish working on the distance function by first properly using the math annotation to describe what's going on and also use with outer and Vectorize to produce a distance matrix of average distances between users. If anyone wants to help here's that thread (LINK).
"If you torture the data long enough it will eventually confess."
-Ronald Harry Coase -
My apologies I thought they were self explanatory probably because I've been working with them so much lately. I was hoping they were because that's the mark of a good plot. Think of the y axis as time (unit of measure though is words). Where you have a color strip you were conversing in the chat box.
In the unfaceted plots we have time (days) as one big continuum from left to right. In the faceted I broke up the days. I could have gotten fancier with plotting the background colors by day but was lazy. To some extent then we can assume that people clustered in close proximity to each other were more conversant with on another. This distance measure I'm working on may capture this even better.
EDIT: I just realized that the xlab is set to a funky default. I changed that behavior but don't feel like fixing the graphics (lazy).
"If you torture the data long enough it will eventually confess."
-Ronald Harry Coase -
Can you explain the x-axis a little more? I don't know what the units of measurement are. And in the faceted plots I would have thought it was just a plot essentially of when we were chatting over time but all of the plots start at the left and most don't take up the full plotting region - what causes that?
Edit: I think I understand it now but I'd still like to hear your explanation. I was confused before because you said the y-axis was words but that doesn't make sense.
"His programming is malfunctioning. It begins! Get your weapons, he's going to become a killbot!!!" - bryangoodrich
Yeah the x axis is time but the unit of measure is actually words. So time is measured in words. All days start with 0 words. I suppose I have functions that could plot it in time but I was looking to demonstrate something easier. The restriction ("all of the plots start at the left and most don't take up the full plotting region") on the scales is that it is unfair (IMO) to compare facets when scales are allowed to be free. I may relax this in the future though.
I'm currently working on some functions to deal with time measures rather than words as the units but didn't forsee this until some of my recent work as an RA, so I didn't include this functionality in qdap initially.
"If you torture the data long enough it will eventually confess."
-Ronald Harry Coase -
No - it's fine that they don't take up the full plotting region. It was just that my intuition was that the x-axis was time and it didn't logically make sense the way the plots were laid out... which is why I was asking for clarification. But I think I get it now.
"His programming is malfunctioning. It begins! Get your weapons, he's going to become a killbot!!!" - bryangoodrich
Dason challenged the scales free idea and I decided it shouldn't be up to me if you use it or not. I plotted three different versions playing with scales and colors. Click here to see:
One of the things in the energy industry that is important is looking at smart meter data (meters with wifi giving interval data--15, 60 min data maybe). For instance, we want to be able to tease out from data certain phenomena happening at regular intervals. For instance, one contract's algorithms could take a year's worth of data for a household and find their baseline by looking at the hourly data.
I bring that up because I looked at that last graph and it's entirely incomprehensible looking at daily graphs what the outcome is over all those days. For instance, when do I tend to talk the most? I'm thinking you could generate a single graph for each chatter that is sort of like a heat map where it's brightest when they talk the most and cold where they're most absent. Make sense? Implementing it, not so easy lol
The first plot here sort of does that but it has a larger wave or whatever for when someone talks the most on a given day. I think the idea is to aggregate that information over multiple days at a given time location. That way, you end up with a composite or aggregate time value, but that plot is actually just as informative as the heat map idea I had. In fact, it visually does a good job at showing you where someone is very active, especially if it's with respect to other chatters.
@BG I think that wouldn't be useful for what's attempting to be shown, that being relationships. Time was pretty well shown already by Vinux. Secondly, the unit I used is words so you couldn't really tell when you talk. I didn't use times. The gantt is better in this case for relationships which is what we're after in that you can see clusters.
Incomprehensible means that something's not comprehend-able. That really is not an accurate assessment. Depending on what you're attempting to show will depend on what graphic you use. If you're looking for when you're most active then a line graph of hourly intervals would be better or perhaps a heat map as you suggest but I think the line graph would be better suited. But this idea would convey nothing about the relationship between chatters. One more thing that throws a monkey wrench into time is that there's a universal time zone being used. So when it say's I'm really active at 6 am, that's not true. For me It's probably 12 am but time is a relative concept with world chatters.
As far as implementing the heat map it would be pretty easy by creating a new variable that turns times into hours and then using geom_tile. I've done as a calendar heat map with relative ease.
"If you torture the data long enough it will eventually confess."
-Ronald Harry Coase -