jagomart
digital resources
picture1_Alawaretal22pykokkostool


 131x       Filetype PDF       File size 0.62 MB       Source: users.ece.utexas.edu


File: Alawaretal22pykokkostool
pykokkos performanceportablekernelsinpython nader al awar neil mehta steven zhu nader alawar utexas edu neilmehta lbl gov stevenzhu utexas edu theuniversity of texas at austin nersc theuniversity of texas at austin ...

icon picture PDF Filetype PDF | Posted on 03 Feb 2023 | 2 years ago
Partial capture of text on file.
                                      PyKokkos:PerformancePortableKernelsinPython
                                      Nader Al Awar                                                 Neil Mehta                                                Steven Zhu
                                nader.alawar@utexas.edu                                         neilmehta@lbl.gov                                      stevenzhu@utexas.edu
                           TheUniversity of Texas at Austin                                             NERSC                                  TheUniversity of Texas at Austin
                                     Austin, Texas, USA                                    Berkeley, California, USA                                      Austin, Texas, USA
                                                                     George Biros                                             Milos Gligoric
                                                                    gbiros@acm.org                                         gligoric@utexas.edu
                                                        TheUniversity of Texas at Austin                          TheUniversity of Texas at Austin
                                                                  Austin, Texas, USA                                        Austin, Texas, USA
                    ABSTRACT                                                                                     of hardware requires that users learn specific programming inter-
                   Asmodernsupercomputershaveincreasingly heterogeneous hard-                                    faces and frameworks, such as OpenMP or CUDA, and learn about
                   ware,theneedforwritingparallelcodethatisbothportableandper-                                   architecture-specific details to extract optimal performance, such
                    formant across different hardware architectures increases. Kokkos                            as optimal memory layouts. Consequently, users end up re-writing
                    is a C++ library that provides abstractions for writing performance                          code to achieve the same functionality on different hardware.
                    portable code. Using Kokkos, programmers can write their code                                    It is therefore desirable to write code once and be able to run it
                    once and run it efficiently on a variety of architectures. However,                          ondifferent hardware without losing performance. Kokkos [10] is a
                    the target audience of Kokkos, typically scientists, prefers dynami-                         framework and C++ library for writing performance portable code.
                    cally typed languages such as Python instead of C++. We demon-                               Using Kokkos, users can write parallel, high-performance code
                    strate a framework, dubbed PyKokkos, that enables performance                                that can run efficiently on different hardware without needing to
                    portable code through Python. PyKokkos transparently translates                              re-write any code. Kokkos achieves this by providing high-level ab-
                    code written in a subset of Python to C++ and Kokkos, and then                               stractions that generalize over different HPC frameworks, providing
                    connectsthegeneratedcodetoPythonbyautomaticallygenerating                                    unified syntax and hiding architecture-specific details.
                    language bindings. PyKokkos achieves performance comparable                                      Python has recently seen widespread use in the machine learn-
                    to Kokkos in ExaMiniMD, a ∼3k lines of code molecular dynamics                               ing and scientific computing communities [9]. As the main im-
                    mini-application. The demo video for PyKokkos can be found at                                plementation of Python is an interpreter, it’s performance is an
                    https://youtu.be/1oFvhlhoDaY.                                                                issue when compared to C++. Python users have therefore turned
                                                                                                                 to libraries and packages such as NumPy [7], which provides a
                    KEYWORDS                                                                                     high-performance array type, and SciPy [11], which includes na-
                    PyKokkos, Python, high performance computing, Kokkos                                         tive implementations of algorithms commonly used in scientific
                                                                                                                 computing. These implementations are written in C or C++ and
                   ACMReferenceFormat:                                                                           are exposed to Python. However, scientists typically need to write
                    Nader Al Awar, Neil Mehta, Steven Zhu, George Biros, and Milos Gligoric.                     their own implementations of parallel high-performance functions
                    2022. PyKokkos: Performance Portable Kernels in Python. In 44th Interna-                     (also known as kernels), ideally using Python.
                    tional Conference on Software Engineering Companion (ICSE ’22 Companion),                        WepresentPyKokkos,aPythonframeworkforwritingperfor-
                    May21ś29,2022, Pittsburgh, PA, USA. ACM, New York, NY, USA, 4 pages.                         manceportable kernels entirely through Python [4, 12]. PyKokkos
                    https://doi.org/10.1145/3510454.3516827                                                      is a Python implementation of the Kokkos framework, and allows
                                                                                                                 users to write high-performance kernels that can run efficiently
                    1 INTRODUCTION                                                                               onavariety of architectures. PyKokkos provides a domain-specific
                    Modern high-performance computing (HPC) systems are adopt-                                   language (DSL for short) embedded in Python for writing these
                    ing increasingly heterogeneous hardware: the current TOP500                                  kernels. It will translate this DSL into C++ and Kokkos, and then
                    list [3], which ranks supercomputers based on a standard bench-                              automatically generate language bindings to access the generated
                    mark, shows that seven of the top ten include more than one kind                             kernel code from Python.
                    of processor, typically a CPU and a GPU. This hardware is provided                               WeevaluatedPyKokkosbyportingexistingKokkosapplications
                    byvarioussemiconductorchipvendors,includingIntel,Nvidia,and                                  and kernels to Python and PyKokkos [4], finding that PyKokkos
                   AMD.Thispresentsachallengetoendusers,astargetingeachkind                                      applications can achieve performance similar to their Kokkos coun-
                                                                                                                 terparts, while being more concise (i.e., requiring less lines of code).
                    Permission to make digital or hard copies of part or all of this work for personal or            PyKokkosis open source and is publicly available on GitHub as
                    classroom use is granted without fee provided that copies are not made or distributed        part of the official Kokkos organization at:
                    for profit or commercial advantage and that copies bear this notice and the full citation    https://github.com/kokkos/pykokkos.
                    onthefirstpage.Copyrightsforthird-partycomponentsofthisworkmustbehonored.
                    For all other uses, contact the owner/author(s).
                    ICSE ’22 Companion, May 21ś29, 2022, Pittsburgh, PA, USA
                   ©2022Copyrightheldbytheowner/author(s).
                   ACMISBN978-1-4503-9223-5/22/05.
                    https://doi.org/10.1145/3510454.3516827
                  ICSE’22 Companion, May 21ś29, 2022, Pittsburgh, PA, USA                                        NaderAlAwar,NeilMehta,StevenZhu,GeorgeBiros,andMilosGligoric
                  1 import pykokkos as pk                                                              the user first defines a class with a @pk.functor decorator (line 3),
                  2                                                                                    referred to as a functor. The user can then write each kernel as a
                  3 @pk.functor                                                                        methodintheclass decorated with @pk.workunit (line 12).
                  4 class InnerProduct:                                                                    Inside the class, the user defines a constructor, which is the
                  5    def __init__(self, N: int, M: int):                                             __init__methodinPython(line5).Intheconstructor, the user
                  6       self.N: int = N                                                              defines all member variables that they wish to access from the
                  7       self.M: int = M                                                              kernels. As PyKokkos will translate kernels to C++, the user must
                  8       self.y: pk.View1D[int] = pk.View([N], dtype=int)                             specify the types of all variables that will be used in kernel code.
                  9       self.x: pk.View1D[int] = pk.View([M], dtype=int)                             This is accomplished through the use of Python’s type annota-
                 10       self.A: pk.View2D[int] = pk.View([N, M], dtype=int)                          tions [2]. Lines 6 and 7 show an example of member variables
                 11                                                                                    defined as integers using Python’s int type annotation. Besides
                 12    @pk.workunit                                                                    integers, PyKokkos allows other Python primitive types such as
                 13    def yAx(self, j: int, acc: pk.Acc[int]):                                        bool, float, as well as NumPy primitive types. Another impor-
                 14       temp2: int = 0                                                               tant datatype used in Kokkos and PyKokkos is the View. A View
                 15       for i in range(self.M):                                                      is an n-dimensional array that serves as the main data structure
                 16         temp2+=self.A[j][i] ∗ self.x[i]                                            in Kokkos. PyKokkos provides type annotations for views that in-
                 17       acc += self.y[j] ∗ temp2                                                     clude the dimensionality and the datatype (lines 8-10). The View
                 18                                                                                    constructor accepts as input a list of dimensions and the datatype
                 19 # Assume N, M are given on the command line and parsed before use                  of the elements. Crucially, the user does not need to specify the
                 20 if __name__ == "__main__":                                                         memorylayout(i.e. row-major or column-major), as that will be
                 21    pk.set_default_space(pk.OpenMP)                                                 selected by PyKokkos using the currently enabled execution space.
                 22    t = InnerProduct(N, M)                                                              Withthemembervariablesdefined, the user can begin writing
                 23    policy = pk.RangePolicy(pk.Default, 0, N)                                       kernels. Recall, a kernel is defined as a method decorated with
                 24    result = pk.parallel_reduce(policy, t.yAx)                                      @pk.workunit,yAxinthisexample(line 13). The first argument
                  Figure 1: An example of a matrix-weighted inner product                              of a workunit is self, which simply refers to the class instance.
                  kernel from the Kokkos tutorial written in PyKokkos.                                 This argument will not be translated to C++ as this is implicit
                  2 EXAMPLE                                                                            in C++; a type annotation is therefore not needed. The second
                  In this section, we first describe the main abstractions used in                     argumentis an integer that represents a thread ID, which will have
                  Kokkos, and then show an example of a PyKokkos kernel that                           a unique value per each thread at run-time. Since this kernel will
                  illustrates these abstractions in Python.                                            perform a reduction, we will need a third argument to hold the
                                                                                                       result of that reduction, called an accumulator. In C++ and Kokkos,
                  2.1     Kokkos                                                                       it would be enough to pass a variable by reference to hold the
                  The main goal of Kokkos is to allow writing high performance                         result. Python, however, does not allow passing primitive types
                  code that is portable across different architectures. Consequently,                  byreference. Consequently, we introduce a new type annotation,
                  it provides abstractions for parallel execution and data structures                  pk.Acc, parameterized on the datatype of the accumulator, i.e.
                  to enable this goal. The main abstractions for parallel execution                    pk.Acc[int]whichisequivalent to int& in C++.
                  include execution spaces, which represent the processors on a par-                       Thekernel’sbodyalsocontainstypeannotations.Wefirstdefine
                  ticular machine, such as CPUsandGPUs;executionpatterns,which                         a temporary variable (line 14), then perform a sequential reduction
                  represent common parallel operations, such as a parallel for, paral-                 (lines 15-16). Finally, we update the accumulator (line 17).
                  lel reduce, and parallel scan; and execution policies, which specify                     Theusercannowcallthekernel.Starting from main (line 20),
                  how akernelwillrun(i.e., execution space, number of threads, etc.).                  theuserfirstsetsthedefaultexecutionspacetobeOpenMP(line21).
                  Themainabstractions for data structures include memory spaces,                       This ensures that, by default, all views will be allocated in a mem-
                  which represent the memory accessible from these processors, and                     ory space accessible from the CPU with the appropriate memory
                  memorylayouts, which specify how memory buffers are arranged                         layouts. The user then creates an object of the functor class (line 22)
                  in memory, such as row-major or column-major.                                        and a RangePolicy, specifying the execution space (pk.Default
                                                                                                       will evaluate to OpenMP in this case), the starting thread ID, and
                  2.2     PyKokkos                                                                     the number of threads to launch (line 23). The user can then call
                                                                                                       pk.parallel_reduce, passing in the execution policy and the
                  Figure 1 shows an example of a matrix-weighted inner product                         kernel to be executed. When the kernel finishes execution, the
                  kernelwritteninPythonandPyKokkos.Thiswasoriginallywritten                            result is returned (line 24).
                  in C++andKokkosinthe03exerciseintheofficialKokkostutorials                               To run this kernel with CUDA, the only change necessary is
                  repository [1], but we ported the example to Python and PyKokkos.                    passing pk.Cuda to pk.set_default_space on line 21.
                     To use PyKokkos from Python, the user must first import the
                  pykokkosmodule(line1).Theas pkstatementmeansthatpkcan                                3 TECHNIQUEANDIMPLEMENTATION
                  be used as an alias to pykokkos.
                     PyKokkos provides three styles for writing kernels. The style                     In this section, we describe the implementation and workflow of
                  showninFigure1isanexampleoftheClassSty style. In this style,                         the PyKokkos framework [4, 12]. The workflow of PyKokkos can
                  PyKokkos: Performance Portable Kernels in Python                                                             ICSE’22 Companion, May 21ś29, 2022, Pittsburgh, PA, USA
                  be divided into two phases: an ahead-of-time (AOT) phase and a                        copydatatothenecessarymemoryspacepriortokernelexecution.
                  run-time phase. During the AOT phase, PyKokkos translates kernel                      This saves the user from reasoning about data copying and syn-
                  code to C++ and Kokkos, then generates language bindings code                         chronization and also allows PyKokkos to support any architecture
                  to allow inter-operation between Python and the generated kernel                      as long as it supports data copying to and from main memory.
                  code, and finally compiles the generated code. During the run-time
                  phase, PyKokkos imports the compiled code from Python and calls                       4 INSTALLATION
                  it. Additionally, PyKokkos makes use of existing Python language                      In this section we describe the steps needed to install PyKokkos.
                  bindingsforC++KokkosviewsfromthePyKokkos-Baserepository.                              Requiredsoftwareandlibraries.PyKokkosrequirestheConda[5]
                  3.1     AOTPhase                                                                      package manager and compilers supported by Kokkos (e.g. NVCC
                                                                                                        for CUDA). Each Kokkos execution space additionally requires the
                  Figure 2 [12] shows a high level overview of the implementation                       corresponding framework’s software (e.g., a CUDA installation).
                  andworkflowofPyKokkos.First,theuserprovidesthePythonfiles                                 ThefirststepistoclonethePyKokkos-Baserepositoryandinstall
                  containing the PyKokkos kernel code to PKC (step ○ in Figure 2).                      the necessary dependencies into a new Conda environment.
                                                                                1
                  PKC,short for PyKokkos compiler, is the main component of the                              $ git clone https://github.com/kokkos/pykokkos-base/
                  frameworkwhichhandlestranslation and language binding code                                 $ cd pykokkos-base
                  generation, accessible through a command line script.                                      $ conda create --name pyk --file requirements.txt
                     PKCwillparsetheuser-providedPythonfilestoextractaPython                                This will create an environment called pyk. Afterwards, the user
                                                                   ○
                  abstract syntax tree (AST for short) (step 2 )using the Python stan-                  can install PyKokkos-Base into the environment.
                  dard library module ast. The translator component of PKC will                              $ python setup.py install -- -DKokkos_ENABLE_OPENMP=ON \
                  walk through this tree and translate it to a C++ AST that contains                               -DKokkos_ENABLE_CUDA=ON -DENABLE_LAYOUTS=ON
                                                          ○
                  the functor and kernel code (step 3 ).                                                    This command calls the Python setup script, which will compile
                     Oncethekernelcodeisgenerated,PKCmustdoadditionalwork                               the C++ View constructor bindings. The arguments after install
                  tomakeitaccessiblefromPython.Thisisaccomplishedthroughthe                             specify the execution spaces to enable, as well as enabling memory
                  use of language bindings, which allow for inter-operation between                     layouts in the View constructors. The next step is to clone and
                  different languages. For PyKokkos, we are interested in calling                       install PyKokkos itself.
                  C++fromPython,sowemakeuseof pybind11,alibrarytocreate                                      $ git clone https://github.com/kokkos/pykokkos/
                  PythonbindingsofC++code.PKCwillgenerateawrapperfunction                                    $ pip install --user -e .
                  that instantiates the functor and calls the kernel, and then generate
                  pybind11codetobindthewrapperfunction.                                                 5 USAGE
                     Theoutputofthetranslator is a C++ AST that includes both the
                  functor and the language binding code. PKC serializes the AST into                    Webriefly describe how PyKokkos applications can be executed.
                                              ○                                                         The first step is to invoke pkc.py script, passing in one or more
                  a C++ source file (step 4 ) and compiles it into a shared object file
                        ○                                                                               files containing the kernels and specifying the execution space.
                  (step 5 ) that it caches on the filesystem to be used at run-time.
                                                                                                        Since the PyKokkos code is embedded in regular Python code, the
                  3.2     Run-TimePhase                                                                 application can then be launched normally.
                  During the run-time phase, the user calls their kernel code as if it                       $ pkc.py 03.py -spaces OpenMP
                  werenormalPython(line24inFigure1). At this stage, PyKokkos                                 $ python 03.py
                  checks if the kernel code has already been translated and compiled                        Figures 3 and 4 show screenshots of the output of these com-
                  in the AOT phase by looking for the shared object file. If PyKokkos                   mandsrespectively. Alternatively, users can skip the call to pkc.py
                  does not find it, it will internally call PKC to generate it at run-                  and launch the application directly, causing PyKokkos to translate
                              ○                                                                         andcompile the kernels at run-time.
                  time (step 6 ). Note that this will incur significant overhead due to
                  calling the C++ compiler; however, once the shared object file has                    6 EVALUATION
                  been generated, subsequent calls to the kernel will simply re-use it
                  instead of re-compiling, even across different runs.                                  Inthissection,wesummarizeaperformanceevaluationofPyKokkos
                     PyKokkoswill then import the shared object file and call the re-                   usingExaMiniMD[4],a∼3klinesofcodemoleculardynamicsmini-
                  questedkernel(step○),returningtheresultifthekernelperformed                           application. ExaMiniMD was originally written in C++ and Kokkos,
                                           7
                                                                   ○
                  a parallel reduce or scan operation (step 8 ).                                        but we ported it to Python and PyKokkos.
                     PyKokkosadditionally makes use of existing Python language                             Figure 5 shows a plot the number of atoms (x-axis) and total Ex-
                  bindings for C++ Kokkos views. These bindings allow calling the                       aMiniMDexecutiontime(y-axis).WeshowdataforbothPyKokkos
                  C++ constructor of the views, which will return a View object                         andKokkos,usingbothOpenMPandCUDA.Theplotsshowthat
                  to Python that behaves as a regular NumPy array. As in Kokkos,                        PythonandPyKokkoswithOpenMPonlyintroducesminimal,con-
                  PyKokkoswill automatically select the memory space and layout                         stant overhead that does not scale with the size of the input data,
                  according to the default execution space, although the user is al-                    even as the number of atoms increases. For CUDA, we do observe
                  lowed to manually override these. In case the selected memory                         extra overhead. By profiling ExaMiniMD further, we found that the
                  space is not accessible from Python (e.g., GPU memory), PyKokkos                      PyKokkos kernels themselves achieved performance identical to
                  will instead allocate the View in main memory and automatically                       the original Kokkos kernels. The additional constant overhead can
                                                         ICSE’22 Companion, May 21ś29, 2022, Pittsburgh, PA, USA                                                                                                                                                                                                                                                             NaderAlAwar,NeilMehta,StevenZhu,GeorgeBiros,andMilosGligoric
                                                                                                                                                                                                                                                                                                                       PKC
                                                                                                                    CLI                                     .py files                                                Parser                                  Python AST                                         Translator                                       C++ AST                                         Serializer                                         C++ source                                           Compiler
                                                                                                                                                                      1                                                                                                     2                                                                                                3                                                                                                       4
                                                                                                                                                                                                                 6             .py files                                                                                                                                                                                                                                                                                        5
                                                                                                                                                                                                           Runtime                                                                                                                              7 Import + Call                                                                                                                                                          .so files
                                                                                                                                                                                                                                                                                                                                                           8 Results
                                                                                                                                                                                Figure 2: An overview of the PyKokkos framework implementation.
                                                                                                                                                                                                                                                                                                                                              C++code;thedevelopers were able to generate bindings for a li-
                                                                                                                                                                                                                                                                                                                                              brary of pre-existing kernels written in C++ and Kokkos. PyKokkos
                                                                                                                                                                                                                                                                                                                                              allowsuserstowritenewkernelsentirelythroughPython.Oureval-
                                                                                                                                                                                                                                                                                                                                              uation showed that PyKokkos can match Kokkos for performance,
                                                                                                                                                                                                                                                                                                                                              even for larger applications such as ExaMiniMD.
                                                          Figure 3: Screenshot of using PKC from the command line.                                                                                                                                                                                                                            ACKNOWLEDGMENTS
                                                                                                                                                                                                                                                                                                                                              WethankMartinBurtscher, Mattan Erez, Ian Henriksen, Damien
                                                                                                                                                                                                                                                                                                                                              Lebrun-Grandie, Jonathan R. Madsen, Arthur Peters, Keshav Pin-
                                                                                                                                                                                                                                                                                                                                              gali, David Poliakoff, Sivasankaran Rajamanickam, Christopher J.
                                                                                  Figure 4: Screenshot of running the 03 exercise.                                                                                                                                                                                                            Rossbach, Joseph B. Ryan, Karl W. Schulz, and Christian Trott. This
                                                                                                                                                                                                                                                                                                                                              work was partially supported by the US National Science Foun-
                                                                                                                      PyKokkos (OpenMP)                                                                                                                                                                                                       dation under Grant Nos. CCF-1652517 and CCF-1817048, and the
                                                                                       6                                                                                                                                                                                                                                                      Department of Energy, National Nuclear Security Administration
                                                                                                                      Kokkos (OpenMP)                                                                                                                                                                                                         under Award Number DE-NA0003969.
                                                                                       5                              PyKokkos (CUDA)
                                                                                                                      Kokkos (CUDA)                                                                                                                                                                                                           REFERENCES
                                                                                       4                                                                                                                                                                                                                                                         [1] 2015. Kokkos Tutorials. https://github.com/kokkos/kokkos-tutorials.
                                                                                       3                                                                                                                                                                                                                                                         [2] 2020. typing - Support for type hints. https://docs.python.org/3/library/typing.
                                                                                                                                                                                                                                                                                                                                                              html.
                                                                                    Time [s]                                                                                                                                                                                                                                                     [3] 2021. Top 500 November 2021. https://www.top500.org/lists/top500/2021/11/.
                                                                                       2                                                                                                                                                                                                                                                         [4] Nader Al Awar, Steven Zhu, George Biros, and Milos Gligoric. 2021. A Perfor-
                                                                                                                                                                                                                                                                                                                                                              mancePortabilityFrameworkforPython.InProceedingsoftheACMInternational
                                                                                       1                                                                                                                                                                                                                                                                      Conference on Supercomputing. 467ś478.
                                                                                                                                                                                                                                                                                                                                                 [5] Inc. Anaconda. 2021. Conda. https://docs.conda.io/projects/conda/en/latest/.
                                                                                                                                                                                                                                                                                                                                                 [6] Stefan Behnel, Robert Bradshaw, Craig Citro, Lisandro Dalcin, Dag Sverre Selje-
                                                                                       0                                                                                                                                                                                                                                                                      botn, and Kurt Smith. 2011. Cython: The Best of Both Worlds. In Computing in
                                                                                                4000400040004000                        32000320003200032000                     108000108000108000108000                  256000256000256000256000                   500000500000500000500000                                                                Science and Engineering. 31ś39.
                                                                                                                                                                                Atoms                                                                                                                                                            [7] Charles R. Harris, K. Jarrod Millman, Stefan J. van der Walt, Ralf Gommers,
                                                                                                                                                                                                                                                                                                                                                              Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg,
                                                                                             Figure 5: ExaMiniMDtotal execution time.                                                                                                                                                                                                                         Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van
                                                                                                                                                                                                                                                                                                                                                              Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fernandez del Rio, Mark Wiebe,
                                                         be attributed to the startup time of the Python interpreter. Further-                                                                                                                                                                                                                                Pearu Peterson, Pierre Gerard-Marchant, Kevin Sheppard, Tyler Reddy, Warren
                                                                                                                                                                                                                                                                                                                                                              Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant. 2020.
                                                         more, the extra overhead for CUDA can be attributed to Kokkos                                                                                                                                                                                                                                        Array programming with NumPy. Nature 585, 7825 (2020), 357ś362.
                                                         prefetching memory, which is currently not available in PyKokkos                                                                                                                                                                                                                        [8] SiuKwanLam,AntoinePitrou,andStanleySeibert.2015. Numba:ALLVM-Based
                                                         (although support for this is being added currently).                                                                                                                                                                                                                                                Python JIT Compiler. In Workshop on the LLVM Compiler Infrastructure in HPC.
                                                                                                                                                                                                                                                                                                                                                              1ś6.
                                                                   Insummary,PyKokkosachievesperformanceonparwithKokkos                                                                                                                                                                                                                          [9] Travis E. Oliphant. 2007. Python for Scientific Computing. Computing in Science
                                                         with only small overhead. Our ICS’21 paper [4] includes a more                                                                                                                                                                                                                                       and Engineering 9, 3 (2007), 10ś20.
                                                         extensive evaluation on numerous smaller kernels, showing simi-                                                                                                                                                                                                                      [10] ChristianTrott,LucBerger-Vergiat,DavidPoliakoff,SivasankaranRajamanickam,
                                                                                                                                                                                                                                                                                                                                                              DamienLebrun-Grandie,JonathanMadsen,NaderAlAwar,MilosGligoric,Galen
                                                         lar results, as well as a study of code complexity that shows that                                                                                                                                                                                                                                   Shipman, and Geoff Womeldorff. 2021. The Kokkos EcoSystem: Comprehensive
                                                         PyKokkoscodeismoreconciseandlessverbosethanKokkos.                                                                                                                                                                                                                                                   Performance Portability for High Performance Computing. Computing in Science
                                                                                                                                                                                                                                                                                                                                                              Engineering 23, 5 (2021), 10ś18.
                                                                                                                                                                                                                                                                                                                                              [11] Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler
                                                         7 CONCLUSION                                                                                                                                                                                                                                                                                         Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser,
                                                                                                                                                                                                                                                                                                                                                              Jonathan Bright, Stefan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jar-
                                                        We presented PyKokkos, a framework for writing performance                                                                                                                                                                                                                                            rod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern,
                                                         portablekernelsusingPython.ExistingapproachesincludeCython[6],                                                                                                                                                                                                                                       Eric Larson, C J Carey, İlhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas,
                                                         whichprovides C-like language extensions and statically compiles                                                                                                                                                                                                                                     Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E. A. Quintero,
                                                                                                                                                                                                                                                                                                                                                              Charles R. Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa,
                                                         code for better performance; Cython, however, currently has lim-                                                                                                                                                                                                                                     Paul van Mulbregt, and SciPy 1.0 Contributors. 2020. SciPy 1.0: Fundamen-
                                                         ited support for parallelism. Numba [8] is a just-in-time compiler                                                                                                                                                                                                                                   tal Algorithms for Scientific Computing in Python. Nature Methods 17 (2020),
                                                         that compiles a subset of Python to LLVM IR. Numba supports                                                                                                                                                                                                                                          261ś272.
                                                                                                                                                                                                                                                                                                                                              [12] Steven Zhu, Nader Al Awar, Mattan Erez, and Milos Gligoric. 2021. Dynamic
                                                         parallelism, but does not provide performance portability. Way-                                                                                                                                                                                                                                      Generation of Python Bindings for HPC Kernels. In International Conference on
                                                         Out [12] automatically generates language bindings for existing                                                                                                                                                                                                                                      Automated Software Engineering (ASE). 92ś103.
The words contained in this file might help you see if this file matches what you are looking for:

...Pykokkos performanceportablekernelsinpython nader al awar neil mehta steven zhu alawar utexas edu neilmehta lbl gov stevenzhu theuniversity of texas at austin nersc usa berkeley california george biros milos gligoric gbiros acm org abstract hardware requires that users learn specific programming inter asmodernsupercomputershaveincreasingly heterogeneous hard faces and frameworks such as openmp or cuda about ware theneedforwritingparallelcodethatisbothportableandper architecture details to extract optimal performance formant across different architectures increases kokkos memory layouts consequently end up re writing is a c library provides abstractions for code achieve the same functionality on portable using programmers can write their it therefore desirable once be able run efficiently variety however ondifferent without losing target audience typically scientists prefers dynami framework cally typed languages python instead we demon parallel high strate dubbed enables needing throug...

no reviews yet
Please Login to review.