COMMUNITY DETECTION on SOCIAL MEDIA using GRAPH BASED APPROACH

Social media has followed an exponential graph over the past few years with incorporating features which at one time seemed impossible. The social media has had an enduring effect on the thought process of the general populace. With the diverse nature of the population which take part in the daily chatting, tagging, posting and uploading on the virtual world, the study of such coalesce of communities. This paper aims at the mining and analysis of the communities with focus on the techniques used for the detection process. We discuss four methods of detection, beginning with the node-centric moving on to group centric

The maximum clique to the given network is found as follows Suppose we sample the sub-network with nodes {1-9} and find a clique {1, 2, 3} of size 3 In order to find a clique >3, remove all nodes with degree <=3-1=2  Remove nodes 2 and 9  Remove nodes 1 and 3  Remove node 4 1495 The resulting sub graph is Reachability-In node-centric community detection, reachability between two nodes is considered. Reachability can be defined using geodesic distance. Geodesic is the shortest path between any two nodes. Geodesicdistance is the number of hops in a geodesic between two nodes. Geodesic diameter is the maximal geodesic distance for any 2 nodes in a network [2]. Any node in a community should be reachable in k hops. Based on this criterion there are two types of substructures, which can be found. a. k-clique is a maximal sub graph in which the largest geodesic distance between any two nodes is no greater than k. That is, d (v i , v j ) ≤ k ∀v i , v j ∈V s where V s is the set of nodes in the sub graph. Note that the geodesic distance is defined on the original network. Thus, the geodesic is not necessarily included in the group structure. Therefore, a k-clique may have a diameter greater than k.
b. k-club restricts the geodesic distance within the group to be no greater than k. It is a maximal substructure of diameter k. [2] For the sub network below, 3-clique and 3-club are found. Group centric community detection:-It considers the connections within a group as a whole. Certain nodes in the group can have low connectivity, but the overall group should satisfy certain criteria. An example of group-centric community detection is finding density based groups. A sub graphG s (V s , E s ) is γ -dense (also called a quasi-clique [3] if Network centric community detection:-Network-centric criterion needs to consider the connections within a network globally. Network-centric community detection partitions the whole network into several disjoint sets. There are various approaches to this type of community detection.
Vertex similarity-Vertex similarity is defined in terms of the similarity of their social circles, e.g., the number of friends two share in common. Similarity measures used in practical networks include Jaccard similarity [4] and cosine similarity [5].
Jaccard Similarity 1496 Cosine Similarity For the given graph Jaccard and Cosine similarity are Latent space models-A latent space model maps nodes in a network into a low-dimensional Euclidean space such that the proximity between nodes based on network connectivity are kept in the new space [6] [7], then the nodes are clustered in the low-dimensional space using methods like k-means [8]. One representative approach is multidimensional scaling (MDS) [9]. Typically, MDS requires the input of a proximity matrixP∈R n×n , with each entry P ij denoting the distance between a pair of nodes i and j in the network. S ∈R n×l denote the coordinates of nodes in the ldimensional space such that S is column orthogonal. It can be shown that where I istheidentitymatrix,1 an n-dimensional column vector with each entry being 1,and • the element-wise matrix multiplication. It follows that S can be obtained via minimizing the discrepancy Suppose V contains the top l eigenvectors of P with largest Eigenvalues, ʌ is a diagonal matrix of top l eigenvalues ʌ=diag(λ 1 ,λ 2 ,···,λ l ).The optimal S is S = Vʌ 1/2 [2]. The classical k-means algorithm can be applied to S to find community partitions. Block model approximation-Block models approximate a given network by a block structure. Each block represents one community. Therefore, we approximate a given adjacency matrix A as follows.
A ≈ SΣST where S ∈{ 0,1} n×k is the block indicator matrix with S ij =1 if node i belongs to the j-thblock,Σak×k matrix indicating the block (group) interaction density, and k the number of blocks. A natural objective is to minimize the following [2] For the given graph, the top two Eigen vectors of the adjacency matrix are 1498 As indicated by the sign of the second column of S, nodes {1,2,3,4} form a community, and {5,6,7,8,9}is another community, which can be obtained by a k-means clustering applied to S. Spectral Clustering-Spectral clustering is derived from the problem of graph partition. Graph partition aims to find out a partition such that the cut (the total number of edges between two disjoint sets of nodes) is minimized [10]. Two commonly used variants in community detection are ratio and normalized cut. Le tπ = (C 1 ,C 2 ,···,C k ) be a graph partition such that C i ∩C j = φ and ∪ k i=1 C i = V.Theratiocutandthenormalizedcut are defined as: where ¯ Ci is the complement of Ci, and vol (Ci) =Σ v∈Ci d v .
For partition in red ( 1 ) For partition in green( 2 )

Modularity Maximization:-
Modularity is proposed specifically to measure the strength of a community partition for real-world networks by taking into account the degree distribution of nodes [11]. Given a network of n nodes and m edges, the expected number of edges between nodes v i and v j is d i d j /2m, where d i and d j are the degrees of node v i and v j , respectively. Considering one edge from node vi connecting to all nodes in the network randomly, it lands at node v j with probability d j /2m. As there are di such edges,the expected number of connections between the two are d i d j /2m [2]. For the graph below the expected number of edges between nodes 1 and 2 is 3*2/ (2*14) = 3/14. 1499

Modularity maximization can be reformulated as
With a spectral relaxation to allow S to be continuous, the optimal S can be computed as the top keigenvectors of the modularity matrix B [11] with the maximum eigenvalues.

Its top two maximum eigenvectors are
Hierarchy centric community detection:-Another line of community detection research is to build a hierarchical structure of communities based on network topology. This facilitates the examination of communities at different granularity. There are mainly two types of hierarchical clustering: divisive, and agglomerative 1500 Divisive:-One particular divisive clustering algorithm is to recursively remove the "weakest" tie in a network until the network is separated into two or more components. The general principle is as follows:  At each iteration, find out the edge with least strength. This kind of edge is most likely to be a tie connecting two communities.  Remove the edge and then update the strength of links.  Once a network is decomposed into two connected components, each component is considered a community.
The iterative process above can be applied to each community to find sub communities.
Newman and Girvan proposed a method to find weak ties using edge betweeness. Edge betweennessis defined to be the number of shortest paths that pass along one edge (Brandes, 2001). .TheNewman-Girvan algorithm suggests progressively removing edges with the highest betweenness. It will gradually disconnect the network,naturally leading to a hierarchical structure. [2] Edge betweeness of the figure below id shown in the table.  Agglomerative-Agglomerative clustering begins with base communities and merges them successively into larger communities following certain criterion. One such criterion is modularity (Clauset et al., 2004). Two communities are merged if doing so results in the largest increase of overall modularity.

Applications of community detection:-
Detection of suspicious events in social media:-Social network analysis can be used to increase the knowledge about the customers' behavior, mostly in relation to the customers' connections and how they create communities according to their call and text messages. By performing community detection, it is possible to recognize groups of customers which unexpected behavior in terms of usage and also in regard to types of social structures. Outliers groups might be pointed out as suspicious communities in terms of fraud events [12].
Recommendation systems:-Community detection can be used to build recommender systems, which recommends the most suitable products to the customers by predicting their interest. When focusing on the problem of recommending items to a user (i.e. a customer of an e-store), the underlying transaction data can be seen as a complex network (specifically, a bipartite network): inside this structure, information about customer tastes is codified and can be of good use for future suggestions [13].
Link prediction:-Community detection in complex networks can be used for link prediction between two actors. Link prediction evaluates the possibility of existence of future links between vertices by observing vertices and links attributes in the network. Link prediction is used to detect missing and fake links and predicts future existence of the links with the development of network [14].

Detection of terrorist groups:-
With the increasing popularity of social media over the last few years, terrorist groups have flocked to the popular web sites to spread their message and recruit new members. As terrorist groups establish a presence in these social networks, they do not rely on direct connections to influence sympathetic individuals. Instead, they leverage "friend of a friend" relationships where existing members or sympathizers bridge the gap between potential recruits and terrorist leadership or influencers. These terrorist social networks in social media can be uncovered and mapped, providing an opportunity to apply social network analysis algorithms. Leveraging these algorithms, the main influencers can be identified along with the individuals bridging the gap between the sympathizers and influencers [15].] Anomaly detection in social media:-Anomalies in online social networks can signify irregular, and often illegal behavior. Detection of such anomalies has been used to identify malicious individuals, including spammers, sexual predators, and online fraudsters. The detection of anomalies in online social networks is composed of two sub-processes; the selection and calculation of network features, and the classification of observations from this feature space [16].