The main difficulty arising in the process of automating the retrieval of objects from heterogeneous distributed information bases of an enterprise is the problem of unification of disparate content presented from different points of view and in the context of different paradigms for organizing data storage. The article presents the formulation of the problem of developing graphematic analysis for the purpose of recognizing images of technical documentation and converting graphic information into a machine-readable form, the mechanisms for removing stop words, stemming, lemmatization necessary for solving the problem are described in detail, and an algorithm for searching text structures using templates is developed. The article proposes the implementation of the graphematic analysis algorithm as the first module in the automatic processing of texts in natural language, which makes it possible to parcel out semantically significant constructions from semi-structured resources using special graphematic descriptors. The proposed implementation makes it possible to parcel out such complex structures in natural language, such as, for example, direct speech, to detect and replace abbreviations and abbreviations.

1.
H.
Ltifi
,
C.
Kolski
,
M.B.
Ayed
, and
A.M.
Alimi
,
Journal of Decision Systems
22
69
96
(
2013
).
2.
D.
Chen
and
C.D.
Manning
,
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing
,
2014
, pp.
740
750
.
3.
R.S.
Renu
,
G.
Mocko
, and
A.
Koneru
,
Procedia of Computer Science
20
446
453
(
2013
).
4.
A.V.
Rabin
and
A.A.
Petrushevskaya
,
Journal of Physics: Conference Series
1679
042002
(
2020
).
5.
A.V.
Rabin
and
A.A.
Petrushevskaya
,
Journal of Physics: Conference Series
1679
042008
(
2020
).
6.
A.A.
Zarubin
,
A.R.
Koval
,
V.S.
Moshkin
, and
A.A.
Filippov
,
3rd International conference “Information Technology and Nanotechnology
2017, CEUR-WS.org,
1903
128
134
(
2017
).
7.
D.
Rajpathak
,
R.
Chougule
, and
P.
Bandyopadhyay
,
Knowledge and Information Systems
31
405
432
(
2012
).
8.
AstraVer: reliable and secure software
http://astraver.linuxtesting.org/review/.
9.
A.V.
Rabin
and
A.A.
Petrushevskaya
,
IOP Conference Series: Materials Science and Engineering
862
052076
(
2020
).
10.
A.V.
Rabin
and
A.A.
Petrushevskaya
,
IOP Conference Series: Materials Science and Engineering
862
052078
(
2020
).
This content is only available via PDF.
You do not currently have access to this content.