Why it’s easier than ever to find Groups You May Like
August 23, 2011
This is part of our continuing series that delves into analytics, engineering, or design elements of LinkedIn features. If this isn’t your cup of Java, check back tomorrow for regular LinkedIn programming. - Ed.
Groups You May Like (GYML), is a personalized group recommendation feature powered by LinkedIn’s Recommendation Engine. Thanks to groups, not only can members easily find people with related interests, but they can also engage with communities on specific topics related to their professional expertise.
Once you’ve found a group of your liking, users have always asked for similar groups that they can check out or participate in. Rather than going through the same laborious process of finding that suitable group, GYML lets you find them instantaneously.
Given that many users have asked us how this feature works, here’s a simple explanation on the inner workings of "Groups You May Like" on LinkedIn.
GYML matches key member profile features against key group features. An optimal matching relies on relevant historical data, key-features and a well-defined metric. More specifically,
Metric: We designed the metric to optimize for participation in the community and not necessarily only for group affinity. Indeed, what makes a group valuable, along with its members, is the member contributions to the professional dialogue. This design can be achieved by an approach we at LinkedIn call ‘data jiu jitsu’: First, match a group to a member based on content affinity, then optimize for the desired behavior (in our case “participation”) which can be done via social learning. Social learning theory tells us that individuals learn by observing other’s behavior and the outcomes of those behaviors. Hence, someone joining a group with high participation from its members is more likely to engage further in the future.
Key features: One of the most interesting aspects of GYML are the group features definitions. Beyond the usual suspects that include group title and group description, the real DNA of a group resides within its members. Hence, using a construct of information theory called Mutual Information we generate a “virtual” group profile which, following the homophily concept, can be matched against each member. Another source of information we use as a feature for matching is the popularity of the group in someone’s network. If many of your connections belong to a group, that group will probably be of interest to you.
Two interesting edge cases arose with this initial approach: potential mismatch with alumni groups (spurring strong reactions from members) and location specific groups, like “Yahoo India” for e.g. This was resolved by implementing filters that discard groups with an over-representation of a school (location) that does not match the member’s school (location).
Historical Data: To fine-tune the matching process, we leveraged historical data focusing on recent group joins on LinkedIn. To keep the best possible relevance in our matching algorithm, we also applied some filtering. First, we filtered out groups which our members may find controversial. Second, we did not show group recommendations to spammers: members who try to join groups for the only purpose of spamming the group were subsequently removed from the groups.
To provide constantly fresh recommendations, group recommendations are updated in real-time when members update their profiles while group features are updated offline on a weekly basis using Hadoop. Note that the latter could be updated more frequently if necessary, but we have found weekly updates to be quite sufficient to ensure freshness of the results.
GYML is one of the many Recommendation products we work on here in the SNA group at LinkedIn. Be sure to also read Adil Aijaz’s post on the thinking behind the broader suite of LinkedIn Products “You May Like”.
As a fresh out of school grad, implementing the backend of Groups You May Like gave me a fantastic exposure to many open source projects that we use at LinkedIn. Understanding the inner workings of Lucene, creating huge data flows in Hadoop with our open-source job scheduler Azkaban, training our models in R and pushing all the data to our ultra efficient and open-source Voldemort servers were just a few of the things I got a chance to explore. Having the opportunity to use all these tools to find gold-nuggets from LinkedIn’s huge collection of data was exhilarating.