The co-occurrence association is widely observed in many empirical data. Mining the information in co-occurrence data is essential for advancing our understanding of systems such as social networks, ecosystems, and brain networks. Measuring similarity of entities is one of the important tasks, which can usually be achieved using a network-based approach. Here, we show that traditional methods based on the aggregated network can bring unwanted indirect relationships. To cope with this issue, we propose a similarity measure based on the ego network of each entity, which effectively considers the change of an entity’s centrality from one ego network to another. The index proposed is easy to calculate and has a clear physical meaning. Using two different data sets, we compare the new index with other existing ones. We find that the new index outperforms the traditional network-based similarity measures, and it can sometimes surpass the embedding method. In the meanwhile, the measure by the new index is weakly correlated with those by other methods, hence providing a different dimension to quantify similarities in co-occurrence data. Altogether, our work makes an extension in the network-based similarity measure and can be potentially applied in several related tasks.

1.
M.
Mäntymäki
and
K.
Riemer
, “
Enterprise social networking: A knowledge management perspective
,”
Int. J. Inf. Manage.
36
(
6
),
1042
1052
(
2016
).
2.
G. C.
Kane
, “
The evolutionary implications of social media for organizational knowledge management
,”
Inf. Organ.
27
(
1
),
37
46
(
2017
).
3.
S.
Fortunato
and
D.
Hric
, “
Community detection in networks: A user guide
,”
Phys. Rep.
659
,
1
44
(
2016
).
4.
R.
Dayan
,
P.
Heisig
, and
F.
Matos
, “
Knowledge management as a factor for the formulation and implementation of organization strategy
,”
J. Knowl. Manage.
21
(
2
),
308
329
(
2017
).
5.
A.
Kumar
,
O.
Irsoy
,
P.
Ondruska
,
M.
Iyyer
,
J.
Bradbury
,
I.
Gulrajani
,
V.
Zhong
,
R.
Paulus
, and
R.
Socher
, “Ask me anything: Dynamic memory networks for natural language processing,” in International Conference on Machine Learning (
PMLR
, 2016), pp. 1378–1387.
6.
Y.
Goldberg
, “
A primer on neural network models for natural language processing
,”
J. Artif. Intell. Res.
57
,
345
420
(
2016
).
7.
B.
Barzel
and
A.-L.
Barabási
, “
Network link prediction by global silencing of indirect correlations
,”
Nat. Biotechnol.
31
(
8
),
720
725
(
2013
).
8.
L.
,
L.
Pan
,
T.
Zhou
,
Y.-C.
Zhang
, and
H. E.
Stanley
, “
Toward link predictability of complex networks
,”
Proc. Natl. Acad. Sci. U.S.A.
112
(
8
),
2325
2330
(
2015
).
9.
G.
Navarro
, “
A guided tour to approximate string matching
,”
ACM Comput. Surv.
33
(
1
),
31
88
(
2001
).
10.
D. J.
Berndt
, “Using dynamic time warping to find patterns in time series,” in KDD Workshop (
AAAI Press
, 1994), pp. 359–370.
11.
H.
Izakian
,
W.
Pedrycz
, and
I.
Jamal
, “
Fuzzy clustering of time series data using dynamic time warping distance
,”
Eng. Appl. Artif. Intell.
39
,
235
244
(
2015
).
12.
D. D.
Le
and
H. W.
Lauw
, “Multiperspective graph-theoretic similarity measure,” in Proceedings of the 27th ACM International Conference on Information and Knowledge Management (ACM, 2018), pp. 1223–1232.
13.
Z.
Wu
and
M.
Palmer
, “Verbs semantics and lexical selection,” in Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics (Association for Computational Linguistics, 1994), pp. 133–138.
14.
C.
Leacock
and
M.
Chodorow
, “
Combining local context and wordNet similarity for word sense identification
,” in
WordNet: Electronic Lexical Database
(MIT Press, 1998), pp.
265
283
.
15.
H.-H.
Chen
,
L.
Gou
,
X. L.
Zhang
, and
C. L.
Giles
, “Discovering missing links in networks using vertex similarity measures,” in Proceedings of the 27th Annual ACM Symposium on Applied Computing (ACM, 2012), pp. 138–143.
16.
T.
Zhou
,
L.
, and
Y.-C.
Zhang
, “
Predicting missing links via local information
,”
Eur. Phys. J. B
71
(
4
),
623
630
(
2009
).
17.
H. H.
Chen
and
C. L.
Giles
, “
Ascos++: An asymmetric similarity measure for weighted networks to address the problem of simrank
,”
ACM Trans. Knowl. Discov. Data
10
(
2
),
1
26
(
2015
).
18.
Y.
Li
,
P.
Luo
, and
C.
Wu
, “A new network node similarity measure method and its applications,” e-print arXiv:1403.4303 (2014).
19.
L.
Feng
and
B.
Bhanu
, “
Semantic concept co-occurrence patterns for image annotation and retrieval
,”
IEEE Trans. Pattern Anal. Mach. Intell.
38
(
4
),
785
799
(
2016
).
20.
S.
Henry
,
A.
McQuilkin
, and
B. T.
McInnes
, “
Association measures for estimating semantic similarity and relatedness between biomedical concepts
,”
Artif. Intell. Med.
93
,
1
10
(
2019
).
21.
M. J.
Cobo
,
W.
Wang
,
S.
Laengle
,
J. M
Merigó
,
D.
Yu
, and
E.
Herrera-Viedma
, “Co-words analysis of the last ten years of the international journal of uncertainty, fuzziness and knowledge-based systems,” in International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (Springer, 2018), pp. 667–677.
22.
J.
Feng
,
Y. Q.
Zhang
, and
H.
Zhang
, “
Improving the co-word analysis method based on semantic distance
,”
Scientometrics
111
(
3
),
1521
1531
(
2017
).
23.
P.
Mongeon
and
V.
Larivière
, “
Costly collaborations: The impact of scientific fraud on co-authors’ careers
,”
J. Assoc. Inf. Sci. Technol.
67
(
3
),
535
542
(
2016
).
24.
F.
Wang
,
Y.
Fan
,
A.
Zeng
, and
Z.
Di
, “
A nonlinear collective credit allocation in scientific publications
,”
Scientometrics
119
,
1655
1668
(
2019
).
25.
S.
Yu
,
F.
Xia
, and
H.
Liu
, “
Academic team formulation based on Liebig’s barrel: Discovery of anticask effect
,”
IEEE Trans. Comput. Soc. Syst.
6
,
1083
1094
(
2019
).
26.
T. N.
Dang
,
N.
Pendar
, and
A. G.
Forbes
, “Timearcs: Visualizing fluctuations in dynamic networks,” in Computer Graphics Forum (Wiley Online Library, 2016), Vol. 35, pp. 61–69.
27.
A.
Moreau
,
O.
Pivert
, and
G.
Smits
, “A typicality-based recommendation approach leveraging demographic data,” in International Conference on Flexible Query Answering Systems (Springer, 2017), pp. 71–83.
28.
M. N.
Uddin
,
T. H.
Duong
,
N. T.
Nguyen
,
X.-M.
Qi
, and
G. S.
Jo
, “
Semantic similarity measures for enhancing information retrieval in folksonomies
,”
Expert Syst. Appl.
40
(
5
),
1645
1653
(
2013
).
29.
I.
Hellsten
and
L.
Leydesdorff
, “
Automated analysis of actor–topic networks on twitter: New approaches to the analysis of socio-semantic networks
,”
J. Assoc. Inf. Sci. Technol.
71
,
3
15
(
2019
).
30.
G. A.
Barnett
,
J. B.
Ruiz
,
W. W.
Xu
,
J.-Y.
Park
, and
H. W.
Park
, “
The world is not flat: Evaluating the inequality in global information gatekeeping through website co-mentions
,”
Technol. Forecast. Soc. Change.
117
,
38
45
(
2017
).
31.
A.
Said
,
T. D.
Bowman
,
R. A.
Abbasi
,
N. R.
Aljohani
,
S.-U.
Hassan
, and
R.
Nawaz
, “
Mining network-level properties of twitter altmetrics data
,”
Scientometrics
120
,
217
235
(
2019
).
32.
N.
Morueta-Holme
,
B.
Blonder
,
B.
Sandel
,
B. J.
McGill
,
R. K.
Peet
,
J. E.
Ott
,
C.
Violle
,
B. J.
Enquist
,
P. M.
Jørgensen
, and
J.-C.
Svenning
, “
A network approach for inferring species associations from co-occurrence data
,”
Ecography
39
(
12
),
1139
1150
(
2016
).
33.
D. S.
Bassett
and
O.
Sporns
, “
Network neuroscience
,”
Nat. Neurosci.
20
(
3
),
353
(
2017
).
34.
G.
Yan
,
P. E.
Vértes
,
E. K.
Towlson
,
Y. L.
Chew
,
D. S.
Walker
,
W. R.
Schafer
, and
A.-L.
Barabási
, “
Network control principles predict neuron function in the Caenorhabditis elegans connectome
,”
Nature
550
(
7677
),
519
(
2017
).
35.
R.
Kat
,
R.
Jevnisek
, and
S.
Avidan
, “Matching pixels using co-occurrence statistics,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (
IEEE
, 2018), pp. 1751–1759.
36.
C.
Xu
and
R.
Bai
, “Inferring social ties from multi-view spatiotemporal co-occurrence,” in Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data (Springer, 2018), pp. 378–392.
37.
H.
Hseu
,
A.
Bhalerao
, and
R. G.
Wilson
, “Image matching based on the co-occurrence matrix,” Department of Computer Science Research Report (Department of Computer Science, University of Warwick, 1999).
38.
R. M.
Haralick
,
K.
Shanmugam
et al. “
Textural features for image classification
,”
IEEE Trans. Syst. Man Cybern.
6
,
610
621
(
1973
).
39.
L. A.
Adamic
and
E.
Adar
, “
Friends and neighbors on the web
,”
Soc. Netw.
25
(
3
),
211
230
(
2003
).
40.
Paul
Jaccard
, “
Étude comparative de la distribution florale dans une portion des Alpes et des Jura
,”
Bull. Soc. Vaud. Sci. Nat.
37
,
547
(
1901
).
41.
G.
Salton
and
J. M.
Mcgill
,
Introduction to Modern Information Retrieval
(
McGraw-Hill
,
1983
).
42.
E.
Ravasz
,
A. L.
Somera
,
D. A.
Mongru
,
Z. N.
Oltvai
, and
A.-L.
Barabási
, “
Hierarchical organization of modularity in metabolic networks
,”
Science
297
(
5586
),
1551
1555
(
2002
).
43.
T.
Mikolov
,
K.
Chen
,
G.
Corrado
, and
J.
Dean
, “Efficient estimation of word representations in vector space,” e-print arXiv:1301.3781 (2013).
44.
T.
Mikolov
,
I.
Sutskever
,
K.
Chen
,
G. S.
Corrado
, and
J.
Dean
, “Distributed representations of words and phrases and their compositionality,” in Advances in Neural Information Processing Systems (
Curran Associates, Inc.
, 2013), pp. 3111–3119.
45.
T.
Kenter
and
M.
De Rijke
, “Short text similarity with word embeddings,” in Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (ACM, 2015), pp. 1411–1420.
46.
M.d.
Shajalal
and
M.
Aono
, “Sentence-level semantic textual similarity using word-level semantics,” in 2018 10th International Conference on Electrical and Computer Engineering (ICECE) (IEEE, 2018), pp. 113–116.
47.
D.
Wang
,
S.
Deng
, and
G.
Xu
, “
Sequence-based context-aware music recommendation
,”
Inf. Retr. J.
21
(
2–3
),
230
252
(
2018
).
48.
L.
Ponzanelli
,
G.
Bavota
,
M.
Di Penta
,
R.
Oliveto
, and
M.
Lanza
, “Mining stackoverflow to turn the IDE into a self-confident programming prompter,” in Working Conference on Mining Software Repositories (ACM, 2014), pp. 102–111.
49.
S. A.
Chowdhury
and
A.
Hindle
, “Mining stackoverflow to filter out off-topic IRC discussion,” in Mining Software Repositories (
IEEE Press
, 2015), pp. 422–425.
50.
R.
Abdalkareem
,
E.
Shihab
, and
J.
Rilling
, “
On code reuse from stackoverflow: An exploratory study on android apps
,”
Inf. Softw. Technol.
88
,
148
158
(
2017
).
51.
T.
Jia
,
D.
Wang
, and
B. K.
Szymanski
, “
Quantifying patterns of research-interest evolution
,”
Nat. Human Behav.
1
(
4
),
0078
(
2017
).
52.
A.
Zeng
,
Z.
Shen
,
J.
Zhou
,
Y.
Fan
,
Z.
Di
,
Y.
Wang
,
H. E.
Stanley
, and
S.
Havlin
, “
Increasing trend of scientists to switch between topics
,”
Nat. Commun.
10
(
1
),
3439
(
2019
).
53.
F.
Battiston
,
F.
Musciotto
,
D.
Wang
,
A.-L.
Barabási
,
M.
Szell
, and
R.
Sinatra
, “
Taking census of physics
,”
Nat. Rev. Phys.
1
(
1
),
89
(
2019
).
You do not currently have access to this content.