R bloggers

Syndicate content
R news and tutorials contributed by (563) R bloggers
Updated: 1 hour 53 min ago

Microsoft acquires Revolution Analytics – news roundup

Mon, 2015-01-26 15:48

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

There was a lot of news coverage on Friday and over the weekend about the news that Microsoft will acquire Revolution Analytics. Here are some links to just a few of the articles published.

  • Wired: Microsoft is "heavily embracing the R programming language"; "the move deepens Microsof's investments in open source".
  • TechCrunch: Microsoft's acquisition is to "bolster its analytics services".
  • GigaOm: R is "hugely popular amongst data scientists and research types"; "a big deal in the world predictive analytics and machine learning ... an emerging market that Microsoft wants to get in on early".
  • Bloomberg BusinessWeek: Microsoft's acquisition is "to help the company build up its cloud-services business".
  • ComputerWorld: "In a move to beef up its portfolio of analysis software and services"; "Microsoft is pledging to contribute to the further development of R".
  • SiliconAngle: Acquisition "a double win for Microsoft"; "R has soared in popularity over recent years with the rise of data scientists ... an audience that Microsoft is actively trying to court"; "acquisition is as much about data scientists themselves as their algorithms".
  • VentureBeat: Deal is "latest proof [Microsoft] is serious about open source".
  • DataInformed: Move "strengthens [Microsoft's] commitment to big data analytics".

There's also a lively discussion on the deal in this Hacker News thread, and the Adventures in Advanced Analytics blog reviews Revolution Analytics' contributions to the R community over the last 7 years and the future prospects for the community.

To leave a comment for the author, please follow the link and comment on his blog: Revolutions. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Categories: Methodology Blogs

Presidential Approval and Applause

Mon, 2015-01-26 15:46

(This article was first published on More or Less Numbers, and kindly contributed to R-bloggers)

Some may have seen a twitter post about spurious correlations that myself and others mentioned on twitter.  Basically this was a joke about how correlation can be found in many things that certainly have no influence over each other.  I mention this because this post may or may not be in that category ;-)

About this time last year I looked at the two most recent State of the Union speeches and talked about the political priorities ostensibly shown in each.  For those that don't know, The State of the Union is the speech that the President of the United States delivers at the beginning of each year to a joint session of Congress (that is, both House and Senate).  For the most part or at least traditionally the aim of this speech is to outline the priorities for the next year for the President's office and to give a bit of an idea where the United States is at in general, or the "state of the union".

One of the more nuanced parts of the speech is that there are periods where the President is either interrupted with applause by members who feel what he is saying is good, or where he pauses to allow for applause (typically from his party).  The speeches are fairly lengthy.  The past several years these speeches have averaged about an hour.  Turns out applause is definitely a big part of the speech (it's polite afterall).  For President Obama's terms in his speeches, the word "applause" appears more than any other word (outside articles).  If applause lasts about 10 seconds on average, we're looking somewhere around 12-13 minutes of total applause during his speeches.  This doesn't take account the length of the applause times as in the text of the speeches it is only shown as "(applause)".

So what's the point other than that's a lot of clapping?  I wanted to look at if there was any similarities between the applause being given and the President's approval rating.  Appropriately, I'll be using a popular graph theme from the political analysis etc. site fivethirtyeight to display this brief analysis.  The theme was actually put together in R here, by Austin Clemens (thanks!).


So just by looking at the two lines, one indicating the number of times applause occurs during the speech, the other indicating the % approval (though the scale on the left not in % terms), we can see that it doesn't change a lot.  Except for two years, 2010 and 2014.  In 2010 he received 50% more applause than in the other years and about 30% more in 2014.

The question becomes, is the applause tactical to show support for the president by the party in a period of lessening approval?  The correlation coefficient was -.50, but as with spurious correlations, this very well could speak nothing of the influence of approval on the amount of applause.  In general just looking at the graph, the percentage change isn't the same for Applause and Approval however we can see that in general the change year over year between Applause and Approval certainly has an inverse relationship.   Meaning the line moves up for Applause between 2009 and 2010 then the line moves down for Approval rating for the same period.

Showing support and unity for a party leader by applauding is certainly reasonable, especially when support may be lacking from the general public.  Guessing as to whether this is considered before in response to the approval rating is more difficult.  Then again, it's a bit more fun to think that members of Congress would tactically use this:


  Code for this will appear on my Github page in the near future.

To leave a comment for the author, please follow the link and comment on his blog: More or Less Numbers. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Categories: Methodology Blogs

xkcd on P-values

Mon, 2015-01-26 14:38

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

From the "statistician humour" department, today's xkcd cartoon will ring a bell for anyone who's ever published (or read!) a scientific article including a P-value for a statistical test:

If finding P-value excuses is a common activity for you (and let's hope not!) then R has you covered with the Significantly Improved Significance Test. This R code from Rasmus Bååth will automatically annotate your P-values between 0.05 and 0.12 with excuses like "suggestive of statistical significance", "weakly non-significant" or "quasi-significant". Bonus points for links in the comments to real journal articles that actually use these excuses!

To leave a comment for the author, please follow the link and comment on his blog: Revolutions. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Categories: Methodology Blogs

Wine for Breakfast: Consumption Occasion as the Unit of Analysis

Mon, 2015-01-26 14:27

(This article was first published on Engaging Market Research, and kindly contributed to R-bloggers)

If the thought of a nice Chianti with that breakfast croissant is not that appealing, then I have made by point: occasion shapes consumption. Our tastes have been fashioned by culture and shared practice. Yet, we often ignore the context and run our analyses as if consumers were not nested within situations. Contextual effects are attributed to the person, who is treated as both the unit of observation and the unit of analysis.

Obviously, it would be difficult to interview the occasion. We need informants to learn about wine occasions. Thus, we seek out consumers to tell us when and where they drink what kinds of wines by themselves and with others. Even if one knows little about wine etiquette, the situation imposes such strong constraints that is makes sense to treat the consumption occasion as the unit of analysis. The person serves as the measuring instrument, but the focus is on the determining properties of the occasion.

Continuing with our example, there is a broad range of red and wine varietals that can be purchased in varying containers from a number of different retailers and served in various locations with a diversity of others. The list is long, and it is unlikely that we can ask for the details for more than a couple of consumption occasions before we fatigue our respondents. Yet, it is the specifics that we seek, including the benefits sought and the features considered.

Clearly, there is a self-selection process so that we would expect to find certain types of individuals in each situation. However, the consumption occasion imposes its own rules over and above any selection effect. Therefore, we would anticipate that whatever the reasons for your presence, the occasion will dictate its own norms. In the end, it is reasonable to aggregate the responses of everyone reporting on each consumption occasion and run the analysis with those aggregate responses as the rows. The columns are formed using all the data gathered about the occasion.

And It Isn't Just About Wine and Breakfast (Benefit Structure Analysis)

There are occasions when you use your smartphone to take pictures. If you were thinking about purchasing a new smartphone, you would consider camera ease of use and picture quality remembering those low-light photo that were out of focus and those sunsets where the sun is a blur. Usage occasion seems to impact almost every purchase. You pick your parents up at the airport, so you need four doors, preferably with easy access to the rear seats. Usage is so important that the website or the salesperson always  asks how you intend to use your new acquisition. Context matters whatever you buy (e.g., a washing machine, a garden hose, clothes, cosmetics, sporting equipment, and suitcases).

The goal is to uncover the major sources of variation differentiating among all the consumption occasions. Product differentiation and customer segmentation originate in the usage context. Since opportunities for increased profitability are found in the details, let's pretend we are journalists and ask who, what, where, when, why, and how. These six questions alone can generate a lot of rows, for instance, we obtain some 15,625 possible combinations when we suppose that the answers to each of the six questions could be classified into one of five categories (15,625 = 5x5x5x5x5x5). Of course, most of these rows will be empty because the responses to the six questions are not independent. Yet, 10% is still over 1500 rows, even if many of those rows will be sparse with zero or very small frequencies. Finally, the columns can contain any information collected about the consumption occasions in the rows, though one would expect inquiries concerning benefits sought and features preferred.

Now, we have a large matrix revealing the linkages between many specific occasions and a wide range of benefits and features. It might helpful to revisit the work on Benefit Structure Analysis from the 1970s in order to see how others have analyzed such a matrix. In Exhibit 5 from that Journal of Marketing article, we are presented with a matrix of 51 benefits wanted across 21 cleaning tasks. The solution was a simultaneous row and column linkage analysis, which seems similar to the biclustering that one would achieve today with nonnegative matrix factorization (NMF). As noted in the article, when cleaning furniture, the respondents desired products that removed dust, dirt and film without leaving residues or scratches. On the one hand, there appears to be a structure underlying the cleaning tasks revealed by their shared benefits, On the other hand, the benefits are clustered together by their common association with similar cleaning tasks.

Following that line of reasoning, we can simulate a data matrix by specifying a set of common latent features linking the occasions and the benefits. As outlined in a prior post, the data generating process is an additive superpositioning of building blocks formed by the occasion-benefit linkages. We can begin with some product, for example, coffee. When do we drink coffee, and why do we drink it? Even the shortest list would include starting the day (occasion) in order to jump-start the brain (benefit). Is this a building block? If there were a sizable cohort of first-of-the-day kickstarters who did not drink coffee for the same reasons at other occasions, then we would have a building block.

The data matrix tells us what benefits are sought in each occasion. Neither the occasions nor the benefits are independent. There are times and places when specialty coffee replaces our regular cup. What occasions come to mind when you think about iced or frozen blended coffees? To help us understand this process, I have reproduced a figure from an earlier post.

The associations between the ten occasions labeled A to J and the seven benefits numbered 1 to 7 are indicated by filled squares in Section a. The rows and columns are interchanged as we move from Sections b to c until we see the building blocks in Section d. The solid black and white squares do not show the shades of gray indicating the degree to which coffee drinkers demand the benefit in each occasion. Specifically, Benefit 6 is wanted in both Occasions A, C and H and Occasions D, G, I and E. However, it is likely drinkers are not equally demanding in the two sets of occasions. For example, coffee that starts the day must energize, but the coffee in the afternoon might be primarily a break or a low calorie refreshment. In both cases we are seeking stimulation, just not as much in the afternoon as the first cup of the day.

Benefit structure analysis remains a critical component in any marketing plan. Opportunity is found in the white spaces where benefits are not delivered by the current offerings. Case studies and qualitative research findings fill the business shelves of online and retail book sellers. Now, advances in statistical modeling enable us to inquire at the deep level of detail that drives consumer product purchases. The R code needed to simultaneously cluster the rows and columns of such data matrices has been provided in a series of previous posts on music, cosmetics, personality inventoriesscotch whiskey, feature usage, and the consumer purchase journey.

To leave a comment for the author, please follow the link and comment on his blog: Engaging Market Research. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Categories: Methodology Blogs