The Practice Of Programming Pdf 188769 | Learning Based Identification Of Coding Best Practices From Software Documentation

Partial capture of text on file.
                      Learning-based Identiﬁcation of Coding Best
                            Practices from Software Documentation
                                           Neela Sawant                                  Srinivasan H Sengamedu
                                         AWS AI, Amazon                                       AWS AI, Amazon
                                          Bangalore, India                                      Seattle, USA
                                       nsawant@amazon.com                                 sengamed@amazon.com
               Abstract—Automatic identiﬁcation of coding best practices can      • “It is good programming practice to not use mutable
             scale the development of code and application analyzers. We             objects as default values. Instead, use None as the default
             present Doc2BP, a deep learning tool to identify coding best            value and inside the function, check if the parameter
             practices in software documentation. Natural language descrip-          is None and create a new list/ dictionary/ whatever if
             tions are mapped to an informative embedding space, optimized           it is” (Python 3.7 tutorial)
             under the dual objectives of binary and few shot classiﬁcation.
             The binary objective powers general classiﬁcation into known         • “When using the DynamoDBMapper to add or edit
             best practice categories using a deep learning classiﬁer. The few       signed (or encrypted and signed) items, conﬁgure it to use
             shot objective facilitates example-based classiﬁcation into novel       a save behavior, such as PUT, that includes all attributes.
             categories by matching embeddings with user-provided examples           Otherwise, you might not be able to decrypt your data”
             at run-time, without having to retrain the underlying model.            (AWS Java SDK guide)
             We analyze the effects of manually and synthetically labeled
             examples, context, and cross-domain information.                     • “The minimum maintenance window is 60 minutes”
               We have applied Doc2BP to Java, Python, AWS Java SDK,                 (AWS CloudFormation user guide).
             andAWSCloudFormationdocumentations.Withrespecttoprior                Our goal is to automate best practice identiﬁcation from the
             works that primarily leverage keyword heuristics and our own
             parts of speech pattern baselines, we obtain 3-5% F1 score         documentation on various languages, frameworks, and applica-
             improvement for Java and Python, and 15-20% for AWS Java           tions to help scale the development of related code and appli-
             SDKandAWSCloudFormation. Experiments with four few shot            cation analyzers. Identiﬁed best practices can be implemented
             use-cases show promising results (5-shot accuracy of 99%+ for      as new static analysis rules or used to enhance existing rules
             Java NullPointerException and AWS Java metrics, 65% for AWS        by covering more APIs and properties. Our primary use-case
             CloudFormation numerics, and 35% for Python best practices).
               Doc2BPhascontributed new rules and improved speciﬁcations        is Amazon CodeGuru (https://aws.amazon.com/codeguru/) [4],
             in Amazon’s code and application analyzers: (a) 500+ new checks    a developer tool that provides intelligent recommendations
             in cfn-lint, an open-source AWS CloudFormation linter, (b) over    to improve code quality and identify an application’s most
             97% automated coverage of metrics APIs and related practices       expensive lines of code. The ﬁrst three coding best practices
             in Amazon DevOps Guru, (c) support for nullable AWS APIs in        described above were implemented as new rules in Code-
             Amazon CodeGuru’s Java NullPointerException (NPE) detector,
             (d) 200+ new best practices for Java, Python, and respective       Guru’s Java, Python, and AWS Java SDK code analyzers,
             AWSSDKsin Amazon CodeGuru, and (e) 2% reduction in false           respectively. The fourth practice was used to update an existing
             positives in Amazon CodeGuru’s Java resource leak detector.        rule in cfn-lint, a linter for AWS CloudFormation [5].
               Index Terms—natural language understanding, information            Prior works in automated best practice identiﬁcation pri-
             extraction, embeddings, deep learning, few shot learning
                                                                                marily rely on keyword heuristics curated on case-by-case
                                   I. INTRODUCTION                              basis. For example, extracting warnings and recommendations
               Creating quality software requires an in depth knowledge         by matching keywords such as ‘must’, ‘should’, ‘require’,
             of coding best practices on various aspects such as data struc-    ‘encourage’, and ‘recommend’ [2], [6], [7]. However, such
             tures, error handling, resource management, multiprocessing,       heuristics fail to generalize for various reasons.
             and security. Coding best practices need to be identiﬁed before      • Keyword mismatch - Keywords may differ across use-
             they can be incorporated in developer code or implemented               cases, for example in describing nullable APIs (null in
             as static analyzer checks. However, identiﬁcation is non-               Java and None in Python) or resources leaks (terminate
             trivial since best practice descriptions can be fragmented in           or kill in Python and tear down in AWS CloudFormation
             documentation and hard to ﬁnd due to signiﬁcant differences             instead of shutdown and close in Java and AWS Java).
             in keywords, form, and semantics [1]–[3]. For example,               • Context sensitivity - Best practices may be contextual. For
               • “Document.getText Method Now Allows for Partial Re-                 example, AWS SDK for Java [8] describes over 2000
                  turns. For more efﬁcient use, callers should invoke seg-           metrics APIs to monitor the health and behavior of AWS
                  ment.setPartialReturn(true) and be prepared to receive a           services. The text is not consistently structured, requiring
                  portion at a time” (Java 11 Swing API reference)                   the context of each metric to be inferred before extracting
                     related best practice descriptions. For example, for Ama-               effective solution for both general-purpose and specialized
                     zonLex1 RuntimeSystemErrors, relevant practices include                 requirements. Section VII details real-world impact of Doc2BP
                     “The response code range for a system error is 500 to                   on multiple code and application analyzers such as cfn-lint -
                     599”, “Valid dimension for the PostContent operation                    an AWS CloudFormation linter [5], Amazon DevOps Guru -
                     with the Text or Speech InputMode: BotName, BotAlias,                   a cloud operations service to improve application availability
                     Operation, InputMode”, and “Unit: Count”. For AWS                       [10], and Amazon CodeGuru - an automated code review
                     Lambda2 Errors metric, relevant practices include “Sum                  tool for multiple programming languages and frameworks
                     statistic”, “To calculate the error rate, divide the value              including Java, Python, and respective AWS SDKs [4].
                     of Errors by the value of Invocations”, etc.
                  • Non-keyword patterns - Best practice descriptions may                                             II. RELATED WORK
                     not be keyword based. For example, AWS CloudFor-                           We now present prior work in extracting information from
                     mation [9] describes value constraints on resources and                 software documentation as well as related work in deep learn-
                     properties of AWS cloud services such as a) “TargetValue                ing, natural language understanding, and few shot learning.
                     range is 8.515920e-109 to 1.174271e+108 (Base 10) or
                     2e-360 to 2e360 (Base 2)”, (b) “The minimum window is                   A. Information Extraction from Software Documentation
                     a 60 minute”, (c) “Up to ﬁve VPC security group IDs, of                    Monperrus et al. conducted a formal study of the types of
                     the form sg-xxxxxxxx”, (d) “The total number of allowed                 knowledge in software documentation [6]. They proposed a
                     resources is 250”. These practices are numeric patterns.                list of keywords based on a manual review of Java docu-
                  Our main contribution is Doc2BP, a deep learning tool to                   mentation, RFC2119 - “Keywords for use in RFCs to Indi-
               identify best practice descriptions from software documenta-                  cate Requirement Levels” [11], Oracle technical reports [12],
               tion. The tool is aimed at reducing the overhead in maintaining               and research papers [13]. Use-cases include extraction of
               multiple heuristics and simplifying new rule creation for                     methodcall practices (“Subclasses should not call this internal
               different programming languages and frameworks. The tool                      method”), subclassing practices (“Subclasses may override
               supports two modes, general classiﬁcation and example-based                   any of the following methods: isLabelProperty, getImage,
               classiﬁcation, powered by a common embedding space for                        getText, dispose”), or synchronization practices (“If multiple
               natural language descriptions and jointly optimized under the                 threads access a hash map concurrently, and at least one of
               dual objectives of binary and few shot classiﬁcation respec-                  the threads modiﬁes the map structurally, it must be synchro-
               tively. The binary classiﬁcation objective ensures coverage of                nized externally”). This approach has been reused in other
               known categories in available training data via a deep learning               general-purpose studies [2], [3], [14]–[16] and extended for
               classiﬁer, whereas the few shot objective allows classiﬁcation                specialized requirements such as interrupt conditions [17] and
               into previously unseen categories based on the embedding                      performance concerns [7], [18]. For performance concerns,
               similarity with a few user-labeled examples at inference time,                keywords can be fast, slow, expensive, cheap, performance,
               without retraining the underlying deep learning model.                        speedup, efﬁcient, etc. and their inﬂections (e.g., efﬁciency,
                  Wehaveextensively applied Doc2BP on Java, Python, AWS                      efﬁciently) [7], resulting in ﬁndings such as “Raising this value
               Java SDK, and AWS CloudFormation documentations. The                          decreases the number of seeds found, which makes mean shift
               choice of documentations reﬂects the domains supported by                     computationally cheaper”. Table I lists popular prior work.
               AmazonCodeGuruatthetimeofwritingthis paper, and offers                           Fewstudies have used specialized natural language process-
               a good mix of general-purpose and specialized domains to                      ing for healthcare [19], resource and method handling [20],
               study. Section III motivates the learning based approach using                [21], bug report analysis [22], [23] and software categorization
               a case study on AWS CloudFormation. Sections IV and V                         [24]. A recent survey [25] indicates that less than 5% of
               present the representation learning formulation and overall                   research in security patterns uses natural language processing,
               Doc2BP system. Section VI covers extensive experiments                        for example, to extract access control requirements [26],
               with manually and synthetically labeled examples, context,                    [27], privacy policy visualization and summarization [28],
               and cross-domain information. With respect to prior keyword                   inconsistent security requirements detection [29], and mining
               heuristics and our own parts of speech (POS) pattern base-                    cyber threats from online documents [30], [31], and logs [32].
               lines, we obtain 3-5% F1 score improvement in best practice
               detection for Java and Python and 15-20% for AWS Java SDK                     B. Related Work in Machine Learning
               and AWSCloudFormation. We experiment with four use-cases                         We now discuss concepts related to Doc2BP formulation.
               in few shot setting with promising results (5-shot accuracy of                   1) Deep Learning and Natural Language Understanding:
               99%+ for Java NullPointerException and AWS Java metrics,                      Deep learning has achieved a major breakthrough in many
               65%for AWS CloudFormation numerics, and 35% for Python                        ﬁelds [33]–[36]. The seminal survey by Allamanis et al.
               best practices). These results indicate that Doc2BP is an                     [37] covers many applications such as code search, code
                 1https://docs.aws.amazon.com/lex/latest/dg/                                 completion, code generation, and documentation improvement.
               monitoring-aws-lex-cloudwatch.html                                            Subsequently many powerful neural models such as Code-
                 2https://docs.aws.amazon.com/lambda/latest/dg/monitoring-metrics.html       BERT[38],PLBART[39],andCodeT5[40]havebeenapplied
                                                                                                              TABLE I
                                                           KEYWORDHEURISTICS IN POPULAR SOFTWARE DOCUMENTATION MINING LITERATURE
                            Reference        Use-Case                        Keyword Pattern
                                             ControlFlow:Conditional         ”(assum— only— debug— restrict— never— condition— strict—necessar— portab— strong)”
                                             ControlFlow:Temporal            ”(call— invo— before — after — between — once — prior)”
                                             Recommend:Warning               ”(warn—aware—error—note)”
                       Monperrus et al. [6]  Recommend:Afﬁrmative            ”(must— mandat— require— shall— should— encourage— recommend— may )”
                                             Recommend:Alternative           ”(desir—alternativ—addition)”
                                             Performance:Performance         ”(performan—efﬁcien—fast—quick—better—best)”
                                             Concurrency:Concurrency         ”(concurren—synchron—lock—thread—simultaneous)”
                                             Subclassing:Subclassing         ”(extend—overrid—overload—overwrit—re.?implement—sub.?class—super—inherit)”
                                             ControlFlow:Conditional         ”(under the condition—whether— if —when—assume that)”
                                             ControlFlow:Temporal            ”(before—after)”
                                             Recommend:Warning               ”(insecure —susceptible — error— null— exception— susceptible— unavailable— not thread safe— illegal— inappropriate— insecure)”
                                             Recommend:Afﬁrmative            ”(must—should—have to—need to)”
                           Li et al. [2]     Recommend:Alternative           ”(instead of—rather than—otherwise)”
                                             Recommend:Recommendation        ”(deprecate—better to—best to—recommended—less desirable—discourage)”
                                             Recommend:Negative              ”(do not—be not—never)”
                                             Recommend:Emphasis              ”(none—only—always)”
                                             Recommend:Note                  ”(note that—notably—caution)”
                           Tao 2020 [7]      Performance:Performance         ”(fast—slow—expensive—cheap—performan—speedup—computation—accelerat—intensi—scalable—efﬁcien)”
                   to problems of bug detection [41], code review generation [42],
                   and code and documentation synthesis [43], [44].
                       Detecting best practices, recommendations, and warnings
                   is related to traditional natural language understanding tasks
                   such as sentiment analysis [45]–[48] and suggestion mining
                   [49]–[51]. In general literature, these tasks have been modeled
                   using classical approaches such as parts of speech [52]–[54]
                   and deep learning [35], [36], [55].
                       We have not seen any prior work generally applying deep
                   learning or advanced natural language understanding for best
                   practice identiﬁcation from software documentation. Section
                   5.6 of the Allamanis survey [37] states ‘Also out-of-scope is
                   work that combines natural language information with APIs’
                   and refers readers to investigate work already discussed above.                                      Fig. 1. POS patterns learned from documentation of two AWS services.
                       2) Few Shot Learning: Few-shot learning classiﬁes new
                   data having seen only a few training examples [56]. Few shot                                       with a letter or number”). We chose parts of speech (POS)
                   learning can be made tractable by incorporating in pre-training                                    representations, mapping each word to its POS tag according
                   [57] knowledge from similar tasks, useful parameters, or                                           to its syntactic role in the sentence (noun, pronoun, adjective,
                   data [58]–[60]. Similarity based algorithms such as matching                                       determiner, verb) [63]. We then applied PreﬁxSpan [64], a
                   networks [61] or prototypical networks [62] learn embeddings                                       rule induction algorithm to infer frequent POS subsequence
                   from training tasks that allow classiﬁcation of unseen classes                                     patterns. Given two sequences x = (x ,x ,...,x ) and
                   with few examples. Our approach is inspired by matching                                                                                                             1    2           m
                                                                                                                      y = (y ,y ,...,y ), x is called a subsequence of y, denoted
                   networks [61] and weakly supervised training [58].                                                           1    2           n
                                                                                                                      as x ⊆ y if there exist integers 1 ≤ a ≤ a ≤ ... ≤ a                                    ≤n
                                                                                                                                                                               1        2                 m
                    III. FROM KEYWORDS TO LEARNING SYNTAX PATTERNS                                                    such that x ⊆ y , x ⊆ y , x                          ⊆y .Figure 1 shows the
                                                                                                                                       1       a       2       a       m         a
                                                                                                                                                 1               2                m
                       We conducted a case study with AWS CloudFormation (a                                           frequent POS subsequence patterns learned from the selected
                   specialized framework for 200+ AWS services) to motivate                                           examples. We observed the following:
                   learning based approaches for detecting non-keyword patterns                                          1) Ability to Replace Keyword-based Solutions: POS sub-
                   given a few examples. Based on a manual documentation                                              sequences [‘MD’, ‘VB’] and [‘MD’, ‘RB’] occur in sentences
                                                                                      3                               whosePOSsequencematchesthefollowingregularexpression
                   review of two AWS services - CloudTrail and CodeCom-
                        4                                                                                             {[MD] >< .∗ > ∗ < [VB]|[RB] >}. MD, VB, and RB
                   mit , we extracted about 50 best practice examples ranging
                   from general recommendations (e.g, “You can only use this                                          are POS tags representing modal structure, base verbs and
                   property to add code when creating a repository with a AWS                                         adverbs respectively. The pattern triggers on all imperative
                   CloudFormation template at creation time”; “This property                                          sentences, for example, “The value must be no more than 255
                   cannot be used for updating code to an existing repository”),                                      characters”. Comparing the detections from imperative POS
                   to alpha-numeric value constraints (e.g., “Be between 3 and                                        pattern with keyword heuristics in Table I, we ﬁnd a signiﬁcant
                   128 characters”; “Start with a letter or number, and end                                           overlap as seen in Figure 2. We ﬁnd that the imperative pattern
                                                                                                                      detects about 50% of all detections in the afﬁrmative category
                      3https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/                                and over 40% detections by the conditional category.
                   aws-resource-cloudtrail-trail.html                                                                    2) Ability to Capture Non-Keyword Information: The sub-
                      4https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/
                   aws-resource-codecommit-repository.html                                                            sequence [‘DT’, ‘NN’, ‘CD’] (in regular expression format
                                                                                                coverage of known categories in the available training data, by
                                                                                                training a deep learning classiﬁer for general classiﬁcation. By
                                                                                                default, such classiﬁer cannot adapt to categories not included
                                                                                                in the training set. To avoid model retraining for emerging
                                                                                                requirements, we introduce a few shot learning capability
                                                                                                that performs example-based classiﬁcation, e.g. predicting new
                                                                                                classes based on embedding similarity with few user-labeled
                                                                                                examples at run time, without modifying model parameters.
                                                                                                This objective encourages examples belonging to the same
                                                                                                category to be co-located in the embedding space, thereby
                                                                                                facilitating similarity based, non-parametric classiﬁcation.
                                                                                                A. Binary Classiﬁcation
                                                                                                   Let g be any binary classiﬁer parameterized via θ. If g is a
                                                                                                logistic regression parameterized by θ = {w,b}, label y can be
                                                                                                modeled as a function of input embedding fφ(x) as follows:
                                                                                                                                                 ⊺
                                                                                                              yˆ = P(y = 1|x;φ,θ) = σ(w fφ(x)+b)                          (1)
                                                                                                                              −a
                Fig. 2. Cooccurrence analysis between imperative POS pattern and multiple       where σ(a) = 1/(1+e              ) is the logistic sigmoid function.
                keyword heuristics shows that a single learned rule can signiﬁcantly replace       Loss between the predicted and actual probability distribu-
                detections from multiple heuristics. The imperative pattern detects about 50%   tions (yˆ and y) is quantiﬁed using binary cross entropy (BCE).
                of all detections in the afﬁrmative category and over 40% detections in the                              M
                conditional category. Diagonal is suppressed to improve visual contrast.                 L        =−Xylog(yˆ)+(1−y)log(1−yˆ)                              (2)
                                                                                                           BCE                i       i             i              i
                                                                                                                        i=1
                {[DT] >< .∗ > ∗ < [NN] >< .∗ > ∗ < [CD] >}                                      B. Few Shot Classiﬁcation
                is a pattern containing CD, e.g. cardinal digit. It matches a                      Our formulation of example-based classiﬁcation is founded
                wide variety of numeric value constraints. For example, (a)                     on two ideas. First, for the model to generalize to the test
                “The maximum length is 200 characters”, (b) “The number of                      environment given a small number of new labeled examples
                resources cannot exceed 250 across events”, (c) “The count of                   (few shot), it should be trained under a similar setting. Sec-
                allowed data resources is 250”, and (d) “This can be a number                   ondly, the model should classify new test examples without
                from 1 - 1024”. The ability to detect the non-keyword patterns                  any changes to the model parameters. For these purposes, we
                is an additional beneﬁt of learning based approach.                             adopt the following episodic training strategy.
                   To summarize, learning based algorithms can infer useful                        We create an n-way-k-shot episodic training, where the
                                                                                                                                         Mc
                patterns from few examples, replace or augment keyword                          labeled dataset S = {(x ,z )}                 is converted into several
                                                                                                                      c          j   j   j=1
                heuristics, and capture non-keyword requirements. This is                       training episodes (e.g., mini-batches) by subsampling n train-
                possible because the natural language constructs as well as                     ing classes as well as k examples within each class. Each
                software documentation exhibit reasonably consistent struc-                     episode consists of n × k labeled examples (support set B)
                tures. This insight has led to our detailed deep learning                       and an additional t examples (test set), also sampled from
                formulation, described below.                                                   the same n classes. Test label z is modeled based on the
                                                                                                embedding similarity of the test and the support examples.
                       IV. REPRESENTATION LEARNING FRAMEWORK                                    The similarity function between two embeddings, say a(.,.)
                   We are given a training dataset S containing M labeled                       can be any attention kernel like a kernel density estimator or
                examples, S = {(x ,y ),...,(x ,y )} where x ∈ RD, is a                          k-nearest neighbor that produces a similarity score. Similar to
                                       1   1           M M                   i                  matching network [61], we model a as a soft-max over the
                D-dimensional feature vector and y ∈ {0,1} is a best practice
                                                           i                                    cosine similarity c(.,.) of embeddings, e.g.
                label. For a subset Sc containing Mc known best practices
                                                               Mc                                                                       c(f (x),f (x’))
                e.g., S    ⊂ S = {(x ,y ) | y = 1}                  , we are also given                                                e   φ      φ
                        c                 j   j       j        j=1                                                a(x,x’;φ) = P                                           (3)
                an additional label z ∈ {1,...,N} to denote the category of                                                                c(f (x),f (x”))
                                         j                                                                                             x” e   φ     φ
                best practice from N categories known at training time, for                        The label distribution zˆ is a function of class similarities.
                example, related to performance, security, subclassing, etc. For                                                        X                     ′
                                                                     Mc                                        zˆ = P(z|x;φ) =                  a(x,x’;φ)z                (4)
                simplicity, we denote it as S = {(x ,z )}                 .
                                                    c        j   j   j=1
                   The core idea is to learn a metric space where each example                                                       (x′,z′)∈B
                can be encoded into a smaller L-dimensional dense represen-                        Loss between the predicted and actual probability distribu-
                                                                       D         L              tions (zˆ and z) is quantiﬁed using general cross entropy (CE).
                tation (e.g., embedding) with function f : R               →R ,L≤D
                                                                 φ
                and φ representing learnable embeddings. We optimize the                                                              M
                                                                                                                                        c
                embedding space under the dual objectives of binary and few                                             L      =−Xzlog(zˆ)                                (5)
                shot classiﬁcation. The binary classiﬁcation objective ensures                                            CE                i      i
                                                                                                                                      i=1
The words contained in this file might help you see if this file matches what you are looking for:

...Learning based identication of coding best practices from software documentation neela sawant srinivasan h sengamedu aws ai amazon bangalore india seattle usa nsawant com sengamed abstract automatic can it is good programming practice to not use mutable scale the development code and application analyzers we objects as default values instead none present docbp a deep tool identify value inside function check if parameter in natural language descrip create new list dictionary whatever tions are mapped an informative embedding space optimized python tutorial under dual objectives binary few shot classication objective powers general into known when using dynamodbmapper add or edit categories classier signed encrypted items congure facilitates example novel save behavior such put that includes all attributes by matching embeddings with user provided examples otherwise you might be able decrypt your data at run time without having retrain underlying model java sdk guide analyze effects man...
Related files

Share

Help

Related files

Share

Share to social media

Help

Login Area