jagomart
digital resources
picture1_Processing Pdf 181279 | Temjournalmay2016 236 240


 144x       Filetype PDF       File size 0.22 MB       Source: www.temjournal.com


File: Processing Pdf 181279 | Temjournalmay2016 236 240
tem journal 5 2 236 240 pattern recognition and natural language processing state of the art 1 2 2 2 2 mirjana kocaleva done stojanov igor stojanovik zoran zdravev 1e ...

icon picture PDF Filetype PDF | Posted on 30 Jan 2023 | 2 years ago
Partial capture of text on file.
                                                              TEM Journal 5(2) 236–240 
                    Pattern Recognition and Natural Language 
                                        Processing: State of the Art 
                                                                           
                                                   1,2                     2                      2                     2
                           Mirjana Kocaleva  , Done Stojanov  , Igor Stojanovik  , Zoran Zdravev   
                               1E-learning Center – University “Goce Delcev”, Krste Misirkov bb, Shtip, R.Macedonia 
                          2
                          Faculty of Computer Science – University “Goce Delcev”, Krste Misirkov bb,  Shtip, R.Macedonia 
                 Abstract –Development of information technologies is      linguistics, statistics and computational intelligence.  
              growing steadily. With the latest software technologies      AI is an interdisciplinary branch of computer science 
              development and application of the methods of  that has connections to other sciences such as 
              artificial intelligence and machine learning intelligence    neuroscience, philosophy, linguistics and 
              embededs in computers, the expectations are that in          psychology. Despite its application in industry, 
              near future computers will be able to solve problems         nowadays-predictive methods in AI are also 
              themselves like people do. Artificial  intelligence          commonly used  in social sciences, such as 
              emulates human behavior on computers. Rather than            economics. 
              executing instructions one by one, as theyare                    
              programmed, machine learning employs prior                      There are several areas of specialization of 
              experience/data that is used in the process of system’s 
              training.  In this state of the art paper, common            artificial intelligence, such as: 
              methods in AI, such as machine learning, pattern                 
              recognition and the natural language processing (NLP)             •   games playing,          i.e.   computers are 
              are discussed. Also are given standard architecture of                programmed to oppose gamers 
              NLP processing system and the level thatisneeded for              •   Expert systems: computers are programmed 
              understanding NLP. Lastly  the  statistical NLP                       to make decisions about situations in real-life 
              processing and multi-word expressions are described.                  (Mycin is a typical expert AI system that was 
                                                                                    developed in the 1970’s and it has been used 
                 Keywords  –artificial intelligence, machine learning,              for bacteria identification and 
              pattern recognition, natural language processing. 
                                                                                    recommendation of medications and drugs 
                                                                                    based on known symptoms). 
              1.  Introduction                                                  •   Natural       language:      computers are 
                                                                                    programmed to process sentences from 
                 Depending of the scope of application, there are                   spoken languages, analysing the 
              many definitions for the artificial intelligence.                     morphology, lexicography and even the 
              According to [2], [4] artificial intelligence maps                    semantics of a whole sentence. 
              human behaviour on computers. Regardless whether                  •   Neural  networks:  combination of artificial 
              the human behaviour is emulated or not, the goal of                   neurons designed upon the neuron of a 
              AI is to create intelligence. Without any doubt, the                  human being, primarily used for recognition 
              yet to come challenge in AI is to emulate completely                  purpose.  
              or near-perfectly general intelligence. For different             •   Robotics:  computers are programmed to 
              purposes, AI combines different methods from                          receive surrounding signals and to generate 
                  
                                                                                    intelligent reactions upon them. 
                                                                                     
              DOI: 10.18421/TEM52-18                                          Machine learning is a branch of AI that analyses 
              https://dx.doi.org/10.18421/TEM52-18                         systems, which are able to learn from training data, 
                                                                           rather than following a strict order of execution of 
              Corresponding author:  Mirjana Kocaleva  -  E-learning       already programmed instructions. Machine learning 
              Center – University “Goce Delcev”, Krste Misirkov bb, 
              Shtip, R. Macedonia                                          tends to construct self-adjustable artificial systems, 
              Email:  mirjana.kocaleva@ugd.edu.mk                          which can grow up alone and change through time by 
                           © 2016 Mirjana Kocaleva et al, published by     gathering experience from new training data sets [9]. 
              UIKTEN. This work is licensed under the Creative  Alike data mining, the machine learning also 
              Commons Attribution-NonCommercial-NoDerivs 3.0 processes  raw data in order to find patterns, but 
              License.                                                     instead of extracting data which will undergo human 
              The article is published with Open Access at  analysis, it uses data to improve system’s own 
              www.temjournal.com 
              236                                                                                                                                      TEM Journal – Volume 5 / Number 2 / 2016. 
                                                          TEM Journal 5(2) 236–240 
              capabilities. Detected patterns are used for system’s     knowledge discovery. Unlike machine learning 
              self-adjustment [4], [5], [6].                            which is focused to maximize the rate of recognition 
                There are two different kinds of learning. Learning     [15], [17], [18], [19], pattern recognition models 
              without supervision and learning with adequate  patterns and regularities found in data.    
              supervision. Learning without any kind of  For our research, pattern recognition is important 
              supervision requires an ability to identify patterns in   as a field in machine learning. Supervised learning 
              streams of inputs, whereas learning with supervision      employs training data set, which is used to identify 
              involves classification and numerical regressions.  patterns that match or resemble already annotated 
              The category an object belongs to is determined with      regularities. Unlike supervised learning, 
              classification and regression deals with obtaining a      unsupervised learning does not rely on training data 
              set of already recorded input/output samples, thereby     and it can be applied to detect unfamiliar regularities 
              from respective inputs we are discovering functions       in data. 
              enabling the generation of suitable outputs.                 By analyzing training samples, supervised learning 
                Computational learning theory is a branch of  methods always produce an inferred 
              modern computer science that deals with a  function.Applying these functions, an output for any 
              mathematical analysis of the performance of machine       valid input object can be easily predicted. There are 
              learning algorithms.Essentially, machine learning can     two kinds  of inferred functions:  if the output is 
              be defined as a self-learning data method for  discrete the function is called a classifierand if the 
              improving computer’s actions or behaviour.  The  output is continuous the function is called regression 
              training data set strictly depends of the domain of the   function.Data clustering (k-nearest neighbour’s 
              problem under consideration.                              algorithm, support vector machine, naive Bayes 
                                                                        classifier) and ANNs-artificial neural networks are 
                                                                        common approaches to supervised learning. 
                                                                           According [2], unsupervised learning does not rely 
                                                                        on previously labelled data and it attempts to detect 
                                                                        built-in patterns in data. It can be used to calculate 
                                                                        the correct output for any new data instance. Data 
                                                                        clustering (k-means clustering and hierarchical 
                                                                        clustering), hidden Markov models and blind signal 
                                                                        separation using features extraction techniques for 
                                                                        dimensionality reduction are common methods in 
                                                                        unsupervised learning. 
                              Figure 1.Retrieved from                      In pattern recognition, there are problems where 
                 http://whiteswami.wordpress.com/machine-learning/      distinct representations can be obtained for the same 
                                                                        pattern, and depending  on the type of classifier 
              Robot can learn to walk based on reading from  (statistical or structural), one type of representation is 
              sensors of force correction of the output for a specific  preferred versus the others.By taking into 
              input. Pattern recognition or the process of teaching a   consideration the statistical variance of all inputs, 
              program or system to be able to recognize patterns        algorithms for pattern recognition aim to perform 
              [5], [6] is another way of thinking about machine  "most probably" matching.    
              learning.                                                     
                                                                        3.  Natural language processing 
              2.  Pattern recognition                                      Natural language processing (NLP) or 
                Patterns are a form of language.Pattern recognition     computational linguistics is a major field of 
              is studied in many fields, including psychology,  computer-related research. By discovering language 
              psychiatry,  ethnology, cognitive sciences,  and          patterns, children learn how to make a difference 
              computer science andtraffic flow.Pattern recognition      between singular and plural, match templates in 
              is a field in machine learning, but may also refer to     nouns, verbs and adjectives and how to form 
              pattern recognition (psychology), identification of  different types of sentences, such as declarative or an 
              faces, objects, words, melodies, etc. [8], [10].Since     imperative sentence. If we can define,and moreover, 
              the scopes of machine learning, knowledge describe patterns from natural language then we can 
              discovery, pattern recognition and data mining highly     teach the computer about the way we speak and 
              overlap, they are hard to separate. Most often,  understand sentences form the spoken language. 
              machine learning refers to  methods based on  According [9], [13], [14] much of the research in this 
              supervised learning, while unsupervised learning is       field relies on methods from cognitive science and 
              primarily explored by data mining  and KDD - 
              TEM Journal – Volume 5 / Number 2 / 2016.                                                                                                                                   237
                                                                                                                               
                                                                                                    TEM Journal 5(2) 236–240 
                       linguistics. Natural language processing techniques                                                      We use six levels about understanding standard 
                       train computers to understand what a human speaks.                                                   architecture of NLP system with aim to find out the 
                           Natural language processing gives machines the  meaning of Natural Language Processing. Not every 
                       ability to read and understand the languages that  level is used by every system for NLP. Those levels 
                       humans speak. NLP research aims to answer the  are [1], [3]: 
                       question of how people are able to comprehend the                                                         
                       meaning of a spoken/written sentence and how  Phonetic level 
                       people understand what happened, when and where                                                           
                       that happened or what is an assumption, belief or                                                        Way in which words are produced, transmittedand 
                       fact.                                                                                                understoodin language is also known as phonetics 
                                                                                                                            [1]. This level is of great importance for 
                                                                                                                            understanding spoken language, but it is not 
                                                                                                                            important for the written text [3]. 
                                                                                                                                 
                                                                                                                            Morphological level 
                                                                                                                             
                                                                                                                                A morphemein a language is the smallest unit of a 
                                                                                                                            word to carry meaning  [3].According to Jurafsky 
                                                                                                                            [13] there are stems and affixes as types of 
                                                                                                                            morphemes.The stem is the primary part of a word 
                                                                                                                            and it gives the meaning (for example happy), 
                                                        Figure 2.Retrieved                                                  otherwise the part of the word, which adds further 
                        fromhttp://tex.stackexchange.com/questions/184099/logo-                                             significance of the word, is known as affix.Affixes 
                                            for-natural-language-processing                                                 are usually  suffixes (for example in happ-ily) and 
                           The common elements of any standard architecture                                                 prefixes (for example in un-happy). 
                       of system for NLP processing are:                                                                         
                           Speech recognition: Turning of a spoken word into                                                Syntactic level 
                       array of words. Spoken words are composed of a   
                       series of parameters related to the sense of hearing.                                                    The study of sentences is known as syntax. Syntax 
                           Language understanding: The goal of this element                                                 analysis includes the action of dividing a sentence 
                       is to generate a meaning for spoken words, and that                                                  into components of which it  is created. The main 
                       meaning will be used by the next element (dialogue                                                   interests of the people are regulations and conditions, 
                       management).                                                                                         which are defined by grammar and designed to keep 
                           Dialogue management: Main task of this element                                                   a sentence along.The position of the word in the 
                       is to coordinate and hold together all parts of the  sentence can be established if the single word is an 
                       system and users, andconnecting with other systems.                                                  object or subject in that sentence.NLP systems store 
                           Communication with external system: such as  a representation of every sentences andthey store the 
                       expert system,  system  for databases,  or other  fact that a word is a verb and what kind of verb it is 
                       computer application.                                                                                [1], [3], [11]. 
                           Response generation: Setting out a message that   
                       system should deliver.                                                                               Semantic level 
                           Speech output: Use of different techniques to   
                       produce the message from the system. [11]                                                                Semantics concerned with the manner syntactic 
                                                                                                                            structures are constructed.Syntactic and semantic 
                                                                                                                            levels are inseparable and complement each other. 
                                                                                                                            Syntax analysis deals with the structure of sentences, 
                                                                                                                            while semantic analysis is searching for the meaning 
                                                                                                                            inthose  sentences  [1], [3].  According  to  semantics, 
                                                                                                                            most words have multiple meanings.Hence, we can 
                                                                                                                            identify the appropriate word by looking at the rest of 
                                                                                                                            the sentence or dependent on context [3]. 
                                                                                                                                 
                                                                                                                            Discourse level 
                                                                                                                             
                                  Figure 3.Standard architecture of NLP system                                                  The discourse level examines the meaning of the 
                                                                                                                            sentence in dependence of the other sentence in the 
                                                                                                                            text or paragraph in the same document. The 
                       238                                                                                                                                      TEM Journal – Volume 5 / Number 2 / 2016. 
                                                        TEM Journal 5(2) 236–240 
             structure of this level is predictable and because of        a)  Can be degraded into several lexemes, and 
             thatreason, is use by NLP to understand the role of          b)  Can show lexical, syntactic, semantic, 
             each information in a document[1], [3].                          pragmatic and/or statistical idiomaticity [8]. 
                                                                         
             Pragmatic level                                            When one or more parts of an MWE are not part of 
                                                                      the conventional lexicon then we can say that lexical 
                This level deals with the analysis of sentences and   idiomaticityoccurs. For example, in the word Wi - Fi, 
             how they are used in different situations.In addition,   the components (Wi and Fi) do nothave a meaning 
             also  how their meaning changes depending on the  when they are separated. 
             situation [3].                                              
                 
                All the levels described here are inseparable and 
             complement each other. The aim of NLP systems is 
             to inject these definitions into a computer, and then 
             using them to create a structured unambiguous 
             sentence with well-defined meaning [3], [7], [12], 
             [16]. 
                 
                3.1. Learning about Natural Language Processing 
                                                                                                                             
                As a discipline of informatics, NLP deals with the             Figure 4.A classification for MWEs [8] 
             relationship among computers and natural 
             languages.Understanding natural languages is an AI         Syntactic idiomaticity happens when the syntax of 
             problem because the identification of natural  the MWE can not be extracted from that of its 
             languages requires understanding of sciences such as     building blocks [8], [16].  For example, we can 
             computer science, statistics, science of language and    analyse a syntactically idiomatic "on the whole”. 
             many others. Today’s modern algorithms for NLP  This word is adverbial but consists of a preposition 
             are based mainly on statistical machine learning.        (on) and an adjective (whole). 
                Today’s modern algorithms for NLP are  mainly           When the understanding of MWE is not coming 
             basedon statistical machine learning. Machine            from its building blocks, we are talking about 
             learning aims to learn computer automatically from a     semantic idiomaticity [8]. We can also use semantic 
             large number of different corpus, in a different way     idiomaticity for allegorical language [8]. 
             from the methods of processing the natural                 When some MWE with a fixed location, position 
             language.A corpus is a set of documents that are  or situation are related, then we talk about pragmatic 
             annotated by hand with the correct values and it  idiomaticity [8]. For example good night and good 
             serves for learning purpose.                             luck. The former is a greeting associated specifically 
                                                                      with nights and the latter is having a good wishes for 
                3.2. Statistical NLP                                  the people who work in mines. 
                                                                        Statistic idiomaticity occurs when some 
                Statistical processing of natural language uses       combination of words (term under) is repeated often 
             many different methods to solve problems. Some of        in texts, unlike the words that make up the term 
             those methods use probability, some use statistic, and   under.  Anotherexample is impeccable credentials, 
             others  use mathematics and so on. Problems may  which  occur much more frequently thanspotless 
             arise with words that have multiple meaning and with     credentials.Part of statistical idiomaticity are the 
             sentences that are too long. Usually long sentences      binomials, for example true and false, where the 
             can be interpreted in several different ways. Methods    opposite adjective does not keep the sense of the 
             for clarifying sentences usually use corpus and  term under [8]. 
             Markov models. Statistical NLP is composed of   
             many techniques (such as modelling based on  4.  Conclusion 
             probability, algebra, and theory of information) for        
             automated processing of natural or spoken language         Study about natural language processing is 
             [7].                                                     progressively changing from semantics to narrative 
                                                                      comprehension. In addition, people opinionis that the 
                3.3. Multi-word expressions (MWE)                     processing of a natural language represents a big AI 
                                                                      difficulty. This is without any doubt a top challenge 
                Multi-word expressions (MWEs) have paid that yet to be solved-  make computers smart and 
             attention from the NLP community. They are lexical       intelligent as people are. The future of NLP is 
             elements and they: 
             TEM Journal – Volume 5 / Number 2 / 2016.                                                                                                                                   239
                                                                                                                           
The words contained in this file might help you see if this file matches what you are looking for:

...Tem journal pattern recognition and natural language processing state of the art mirjana kocaleva done stojanov igor stojanovik zoran zdravev e learning center university goce delcev krste misirkov bb shtip r macedonia faculty computer science abstract development information technologies is linguistics statistics computational intelligence growing steadily with latest software ai an interdisciplinary branch application methods that has connections to other sciences such as artificial machine neuroscience philosophy embededs in computers expectations are psychology despite its industry near future will be able solve problems nowadays predictive also themselves like people do commonly used social emulates human behavior on rather than economics executing instructions one by theyare programmed employs prior there several areas specialization experience data process system s training this paper common nlp games playing i discussed given standard architecture oppose gamers level thatisneed...

no reviews yet
Please Login to review.