Naumen Intelligent Search — Search company’s digital repositories and public sources

Find what's important faster

with Naumen Enterprise Search system

An intelligent search engine based on artificial intelligence and big data processing technologies

NAUMEN Intelligent Search reduces time of daily information searches through diverse sources and offers business users accurate and detailed answers to complex questions related to production processes, rendered services and applied research. The smart search engine is based on machine learning and natural language processing technologies. Itis capable of processing vast volumes of data and delivers accumulated knowledge to company’s employees.

What objectives you can achieve

Combining all information sources into a single search environment

Our solution enables you to conveniently search dozens of data sources connected to the search engine, i.e. network files/folders, company’s systems and portals, electronic libraries, etc.

Receiving high-quality search results for analysis and decision-making

A search engine helps you to quickly find information you need in large volumes of unstructured data from various sources and provides a user-targeted response.

Expanding knowledge management areas

You can expand your company’s knowledge management by connecting to the search engine knowledge bases and other sources, which contain important business information. Also, your company’s employees can be timely informed about new materials of interest.

A single search boxfor all queries

Regardless of the information location, users can easily access it via a universal search interface: a search box, search result pages and applicable filters. Connectors of the search engine to Internal data sources are specified at the deployment phase. A spider bot (web crawler) collects information from the external data sources.

Searching files of different formats

A full-text search is carried throughout all Microsoft Office files (doc, docx, xls, xlsx, ppt, pptx, etc.), web pages (html, htm), open document format files (odt)as well as files of text documents iin graphic-formats (pdf, djvu, jpeg, etc.).

All supported formats

Understanding meaning of documents and users’ queries

NAUMEN Intelligent Search understands the meaning of documents and provides users with advanced search results, which are relevant to their search intent. It is also capable of understanding acronyms and terminology, used by company’s employees and, generally, in this industry. Moreover, machine learning technologies constantly improve the search engine’s degree of understanding the meaning of documents.

Semantic Search

The search engine uses extended document data obtained through semantic analysis technologies. Document attributes are determined at the semantic analysis phase, which represent the document summary to group documents by their contents, highlight key words, assign tags, etc. Search algorithms, considering these data, significantly improve quality of search results even if documents, containing responses to users’ queries, do not contain words from the original queries (fuzzy search).

Personification

When delivering search results, the system accounts for features of users’ profiles, users’ areas of interest, history of user’s queries as well as some unique parameters generated by the system based on analysis of the users’ documents.

Self-Learning Search Engine

Machine learning ensures search quality and accuracy when a number of documents constantly increases, new data sources are connected, new document versions are released and other changes occur related to a company’s Information Storage and Processing Policy.

Delivers all features of modern search engines in-one solution

Provides content-wise and attribute-based full-text search

When delivering search results key words are searched through the document content and the attributes (fields) of the document profile.

Uses Morphology-based and Exact Match Search

In morphology-based search document key words are searched not only in a strictly specified form, but also in all morphological forms, such as gender, number and case inflection.

Provides Facet Filter Search

Users can manage a document sample range in the search results using a group of filters (facets), which identify different document features (type, author, creation date, etc.).

Enables Thesaurus Search

Searches using the thesauri and semantic similarity data obtained by distributional semantics methods.

Enables Contextual Search

With contextual search document are searched by key words, if they are spaced from each other at a distance less than the specified distance.

Uses Unified Document Catalogue

The technologies for cataloging and categorizing data are used to create an unified document catalogue from all data sources with user-friendly structure and convenient navigation.

Largest implementation projects

Cognitive search system
for Gazprom Neft Research and Engineering Center

100+

users at the pilot project phase

200+

data sources

100 000+

electronic documents available for search

Previously search queries gave excess information, but the cognitive search system enables to make a clarified query and obtain focused responses, create filters by particular aspects.

Boris Belozerov
Gazprom Neft Research and Engineering Center Digital Technology & Geology Expertise Manager

Global CIO Union of Russian IT Directors
Project of the Year 2018 Contest Award Special Nomination: Global CIO Choice

Top 10 Oil&Gas IT Projects
Contest Award
Nomination: Corporate Information System

Stages to implement a company’s Intelligent Search System

Implementation of the Intelligent Search System in a company is a full-scale project, which involves NAUMEN’s team of experts. As a rule, there are several main phases of the project subject to the nature of the objectives set.

1. Analysis of Data Sources, Types and Formats

First, studying all data sources, document types and storage formats, contents and attributes. This stage is the most time consuming, because of requirement to identify the maximum number of details and data management features to minimize the risk of unnecessarily costly changes in data extraction and storage algorithms in the future.
2. Source Integration and Data Pre-Processing

Integrating data sources and creating an unified search environment at the second stage. To achieve this, our experts develop a data model, on the basis of which interaction with the data sources occurs as well as create a data bank for several categories of data. The data uploaded into the data bank are pre-processed to improve the quality of scanned documents, solve encoding issues, delete garbage characters, etc.
3. Language Modelling

Third, constructing a language model based on extracted from the documents text data The model accounts for specific features and wording standards in different types of documents: technical, scientific, etc. The language model further enables the search engine to understand meaning of the documents.
4. Semantic Analysis and Document Structuring

After passing through the machine learning phase the system uses the language model to identify special attributes of documents, which reflect their summaries. Finally a semantic space, is constructed. It is a basis for further analysis and system intelligence by including document structuring tasks: grouping documents by contents, identifying key words, assigning tags.
5. Configuring Search and Ranking Algorithms

The final stage is to configure search and ranking algorithms. The model of ranking document in the search results can be adjusted subject to numerous parameters, which ensure high relevance of results: document relevance, different priorities for document content and attributes, specific features of query wording, etc. The filters and thesauri of the subject domain are set up, which expand the search results due to inclusion of documents with similar meaning.

Software platform

The main components of NAUMEN Intelligent Search are written in Scala programming language, relational PostgreSQL (indices) and non-relational MongoDB (content storage) are used as DBMS. It also uses open source components: Elasticsearch search engine and Apache Spark framework for a distributed processing of unstructured and semi-structured data.