Spaces:

DD0101
/

IDSF-JointBERT_CRF

Sleeping

App Files Files Community

DD0101 commited on Apr 28, 2023

Commit

c938124

1 Parent(s): ce2151a

first commit

Browse files

Files changed (16) hide show

JointModel.png +0 -0
LICENSE +661 -0
data_loader.py +269 -0
dataset_statistic.png +0 -0
early_stopping.py +64 -0
gradio_demo.py +250 -0
main.py +139 -0
predict.py +232 -0
predict.sh +3 -0
requirements.txt +11 -0
run_jointBERT-CRF_PhoBERTencoder.sh +23 -0
run_jointBERT-CRF_XLM-Rencoder.sh +23 -0
run_jointIDSF_PhoBERTencoder.sh +30 -0
run_jointIDSF_XLM-Rencoder.sh +30 -0
trainer.py +300 -0
utils.py +115 -0

JointModel.png ADDED Viewed

LICENSE ADDED Viewed

	@@ -0,0 +1,661 @@

+                    GNU AFFERO GENERAL PUBLIC LICENSE
+                       Version 3, 19 November 2007
+ Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+                            Preamble
+  The GNU Affero General Public License is a free, copyleft license for
+software and other kinds of works, specifically designed to ensure
+cooperation with the community in the case of network server software.
+  The licenses for most software and other practical works are designed
+to take away your freedom to share and change the works.  By contrast,
+our General Public Licenses are intended to guarantee your freedom to
+share and change all versions of a program--to make sure it remains free
+software for all its users.
+  When we speak of free software, we are referring to freedom, not
+price.  Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+them if you wish), that you receive source code or can get it if you
+want it, that you can change the software or use pieces of it in new
+free programs, and that you know you can do these things.
+  Developers that use our General Public Licenses protect your rights
+with two steps: (1) assert copyright on the software, and (2) offer
+you this License which gives you legal permission to copy, distribute
+and/or modify the software.
+  A secondary benefit of defending all users' freedom is that
+improvements made in alternate versions of the program, if they
+receive widespread use, become available for other developers to
+incorporate.  Many developers of free software are heartened and
+encouraged by the resulting cooperation.  However, in the case of
+software used on network servers, this result may fail to come about.
+The GNU General Public License permits making a modified version and
+letting the public access it on a server without ever releasing its
+source code to the public.
+  The GNU Affero General Public License is designed specifically to
+ensure that, in such cases, the modified source code becomes available
+to the community.  It requires the operator of a network server to
+provide the source code of the modified version running there to the
+users of that server.  Therefore, public use of a modified version, on
+a publicly accessible server, gives the public access to the source
+code of the modified version.
+  An older license, called the Affero General Public License and
+published by Affero, was designed to accomplish similar goals.  This is
+a different license, not a version of the Affero GPL, but Affero has
+released a new version of the Affero GPL which permits relicensing under
+this license.
+  The precise terms and conditions for copying, distribution and
+modification follow.
+                       TERMS AND CONDITIONS
+  0. Definitions.
+  "This License" refers to version 3 of the GNU Affero General Public License.
+  "Copyright" also means copyright-like laws that apply to other kinds of
+works, such as semiconductor masks.
+  "The Program" refers to any copyrightable work licensed under this
+License.  Each licensee is addressed as "you".  "Licensees" and
+"recipients" may be individuals or organizations.
+  To "modify" a work means to copy from or adapt all or part of the work
+in a fashion requiring copyright permission, other than the making of an
+exact copy.  The resulting work is called a "modified version" of the
+earlier work or a work "based on" the earlier work.
+  A "covered work" means either the unmodified Program or a work based
+on the Program.
+  To "propagate" a work means to do anything with it that, without
+permission, would make you directly or secondarily liable for
+infringement under applicable copyright law, except executing it on a
+computer or modifying a private copy.  Propagation includes copying,
+distribution (with or without modification), making available to the
+public, and in some countries other activities as well.
+  To "convey" a work means any kind of propagation that enables other
+parties to make or receive copies.  Mere interaction with a user through
+a computer network, with no transfer of a copy, is not conveying.
+  An interactive user interface displays "Appropriate Legal Notices"
+to the extent that it includes a convenient and prominently visible
+feature that (1) displays an appropriate copyright notice, and (2)
+tells the user that there is no warranty for the work (except to the
+extent that warranties are provided), that licensees may convey the
+work under this License, and how to view a copy of this License.  If
+the interface presents a list of user commands or options, such as a
+menu, a prominent item in the list meets this criterion.
+  1. Source Code.
+  The "source code" for a work means the preferred form of the work
+for making modifications to it.  "Object code" means any non-source
+form of a work.
+  A "Standard Interface" means an interface that either is an official
+standard defined by a recognized standards body, or, in the case of
+interfaces specified for a particular programming language, one that
+is widely used among developers working in that language.
+  The "System Libraries" of an executable work include anything, other
+than the work as a whole, that (a) is included in the normal form of
+packaging a Major Component, but which is not part of that Major
+Component, and (b) serves only to enable use of the work with that
+Major Component, or to implement a Standard Interface for which an
+implementation is available to the public in source code form.  A
+"Major Component", in this context, means a major essential component
+(kernel, window system, and so on) of the specific operating system
+(if any) on which the executable work runs, or a compiler used to
+produce the work, or an object code interpreter used to run it.
+  The "Corresponding Source" for a work in object code form means all
+the source code needed to generate, install, and (for an executable
+work) run the object code and to modify the work, including scripts to
+control those activities.  However, it does not include the work's
+System Libraries, or general-purpose tools or generally available free
+programs which are used unmodified in performing those activities but
+which are not part of the work.  For example, Corresponding Source
+includes interface definition files associated with source files for
+the work, and the source code for shared libraries and dynamically
+linked subprograms that the work is specifically designed to require,
+such as by intimate data communication or control flow between those
+subprograms and other parts of the work.
+  The Corresponding Source need not include anything that users
+can regenerate automatically from other parts of the Corresponding
+Source.
+  The Corresponding Source for a work in source code form is that
+same work.
+  2. Basic Permissions.
+  All rights granted under this License are granted for the term of
+copyright on the Program, and are irrevocable provided the stated
+conditions are met.  This License explicitly affirms your unlimited
+permission to run the unmodified Program.  The output from running a
+covered work is covered by this License only if the output, given its
+content, constitutes a covered work.  This License acknowledges your
+rights of fair use or other equivalent, as provided by copyright law.
+  You may make, run and propagate covered works that you do not
+convey, without conditions so long as your license otherwise remains
+in force.  You may convey covered works to others for the sole purpose
+of having them make modifications exclusively for you, or provide you
+with facilities for running those works, provided that you comply with
+the terms of this License in conveying all material for which you do
+not control copyright.  Those thus making or running the covered works
+for you must do so exclusively on your behalf, under your direction
+and control, on terms that prohibit them from making any copies of
+your copyrighted material outside their relationship with you.
+  Conveying under any other circumstances is permitted solely under
+the conditions stated below.  Sublicensing is not allowed; section 10
+makes it unnecessary.
+  3. Protecting Users' Legal Rights From Anti-Circumvention Law.
+  No covered work shall be deemed part of an effective technological
+measure under any applicable law fulfilling obligations under article
+11 of the WIPO copyright treaty adopted on 20 December 1996, or
+similar laws prohibiting or restricting circumvention of such
+measures.
+  When you convey a covered work, you waive any legal power to forbid
+circumvention of technological measures to the extent such circumvention
+is effected by exercising rights under this License with respect to
+the covered work, and you disclaim any intention to limit operation or
+modification of the work as a means of enforcing, against the work's
+users, your or third parties' legal rights to forbid circumvention of
+technological measures.
+  4. Conveying Verbatim Copies.
+  You may convey verbatim copies of the Program's source code as you
+receive it, in any medium, provided that you conspicuously and
+appropriately publish on each copy an appropriate copyright notice;
+keep intact all notices stating that this License and any
+non-permissive terms added in accord with section 7 apply to the code;
+keep intact all notices of the absence of any warranty; and give all
+recipients a copy of this License along with the Program.
+  You may charge any price or no price for each copy that you convey,
+and you may offer support or warranty protection for a fee.
+  5. Conveying Modified Source Versions.
+  You may convey a work based on the Program, or the modifications to
+produce it from the Program, in the form of source code under the
+terms of section 4, provided that you also meet all of these conditions:
+    a) The work must carry prominent notices stating that you modified
+    it, and giving a relevant date.
+    b) The work must carry prominent notices stating that it is
+    released under this License and any conditions added under section
+    7.  This requirement modifies the requirement in section 4 to
+    "keep intact all notices".
+    c) You must license the entire work, as a whole, under this
+    License to anyone who comes into possession of a copy.  This
+    License will therefore apply, along with any applicable section 7
+    additional terms, to the whole of the work, and all its parts,
+    regardless of how they are packaged.  This License gives no
+    permission to license the work in any other way, but it does not
+    invalidate such permission if you have separately received it.
+    d) If the work has interactive user interfaces, each must display
+    Appropriate Legal Notices; however, if the Program has interactive
+    interfaces that do not display Appropriate Legal Notices, your
+    work need not make them do so.
+  A compilation of a covered work with other separate and independent
+works, which are not by their nature extensions of the covered work,
+and which are not combined with it such as to form a larger program,
+in or on a volume of a storage or distribution medium, is called an
+"aggregate" if the compilation and its resulting copyright are not
+used to limit the access or legal rights of the compilation's users
+beyond what the individual works permit.  Inclusion of a covered work
+in an aggregate does not cause this License to apply to the other
+parts of the aggregate.
+  6. Conveying Non-Source Forms.
+  You may convey a covered work in object code form under the terms
+of sections 4 and 5, provided that you also convey the
+machine-readable Corresponding Source under the terms of this License,
+in one of these ways:
+    a) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by the
+    Corresponding Source fixed on a durable physical medium
+    customarily used for software interchange.
+    b) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by a
+    written offer, valid for at least three years and valid for as
+    long as you offer spare parts or customer support for that product
+    model, to give anyone who possesses the object code either (1) a
+    copy of the Corresponding Source for all the software in the
+    product that is covered by this License, on a durable physical
+    medium customarily used for software interchange, for a price no
+    more than your reasonable cost of physically performing this
+    conveying of source, or (2) access to copy the
+    Corresponding Source from a network server at no charge.
+    c) Convey individual copies of the object code with a copy of the
+    written offer to provide the Corresponding Source.  This
+    alternative is allowed only occasionally and noncommercially, and
+    only if you received the object code with such an offer, in accord
+    with subsection 6b.
+    d) Convey the object code by offering access from a designated
+    place (gratis or for a charge), and offer equivalent access to the
+    Corresponding Source in the same way through the same place at no
+    further charge.  You need not require recipients to copy the
+    Corresponding Source along with the object code.  If the place to
+    copy the object code is a network server, the Corresponding Source
+    may be on a different server (operated by you or a third party)
+    that supports equivalent copying facilities, provided you maintain
+    clear directions next to the object code saying where to find the
+    Corresponding Source.  Regardless of what server hosts the
+    Corresponding Source, you remain obligated to ensure that it is
+    available for as long as needed to satisfy these requirements.
+    e) Convey the object code using peer-to-peer transmission, provided
+    you inform other peers where the object code and Corresponding
+    Source of the work are being offered to the general public at no
+    charge under subsection 6d.
+  A separable portion of the object code, whose source code is excluded
+from the Corresponding Source as a System Library, need not be
+included in conveying the object code work.
+  A "User Product" is either (1) a "consumer product", which means any
+tangible personal property which is normally used for personal, family,
+or household purposes, or (2) anything designed or sold for incorporation
+into a dwelling.  In determining whether a product is a consumer product,
+doubtful cases shall be resolved in favor of coverage.  For a particular
+product received by a particular user, "normally used" refers to a
+typical or common use of that class of product, regardless of the status
+of the particular user or of the way in which the particular user
+actually uses, or expects or is expected to use, the product.  A product
+is a consumer product regardless of whether the product has substantial
+commercial, industrial or non-consumer uses, unless such uses represent
+the only significant mode of use of the product.
+  "Installation Information" for a User Product means any methods,
+procedures, authorization keys, or other information required to install
+and execute modified versions of a covered work in that User Product from
+a modified version of its Corresponding Source.  The information must
+suffice to ensure that the continued functioning of the modified object
+code is in no case prevented or interfered with solely because
+modification has been made.
+  If you convey an object code work under this section in, or with, or
+specifically for use in, a User Product, and the conveying occurs as
+part of a transaction in which the right of possession and use of the
+User Product is transferred to the recipient in perpetuity or for a
+fixed term (regardless of how the transaction is characterized), the
+Corresponding Source conveyed under this section must be accompanied
+by the Installation Information.  But this requirement does not apply
+if neither you nor any third party retains the ability to install
+modified object code on the User Product (for example, the work has
+been installed in ROM).
+  The requirement to provide Installation Information does not include a
+requirement to continue to provide support service, warranty, or updates
+for a work that has been modified or installed by the recipient, or for
+the User Product in which it has been modified or installed.  Access to a
+network may be denied when the modification itself materially and
+adversely affects the operation of the network or violates the rules and
+protocols for communication across the network.
+  Corresponding Source conveyed, and Installation Information provided,
+in accord with this section must be in a format that is publicly
+documented (and with an implementation available to the public in
+source code form), and must require no special password or key for
+unpacking, reading or copying.
+  7. Additional Terms.
+  "Additional permissions" are terms that supplement the terms of this
+License by making exceptions from one or more of its conditions.
+Additional permissions that are applicable to the entire Program shall
+be treated as though they were included in this License, to the extent
+that they are valid under applicable law.  If additional permissions
+apply only to part of the Program, that part may be used separately
+under those permissions, but the entire Program remains governed by
+this License without regard to the additional permissions.
+  When you convey a copy of a covered work, you may at your option
+remove any additional permissions from that copy, or from any part of
+it.  (Additional permissions may be written to require their own
+removal in certain cases when you modify the work.)  You may place
+additional permissions on material, added by you to a covered work,
+for which you have or can give appropriate copyright permission.
+  Notwithstanding any other provision of this License, for material you
+add to a covered work, you may (if authorized by the copyright holders of
+that material) supplement the terms of this License with terms:
+    a) Disclaiming warranty or limiting liability differently from the
+    terms of sections 15 and 16 of this License; or
+    b) Requiring preservation of specified reasonable legal notices or
+    author attributions in that material or in the Appropriate Legal
+    Notices displayed by works containing it; or
+    c) Prohibiting misrepresentation of the origin of that material, or
+    requiring that modified versions of such material be marked in
+    reasonable ways as different from the original version; or
+    d) Limiting the use for publicity purposes of names of licensors or
+    authors of the material; or
+    e) Declining to grant rights under trademark law for use of some
+    trade names, trademarks, or service marks; or
+    f) Requiring indemnification of licensors and authors of that
+    material by anyone who conveys the material (or modified versions of
+    it) with contractual assumptions of liability to the recipient, for
+    any liability that these contractual assumptions directly impose on
+    those licensors and authors.
+  All other non-permissive additional terms are considered "further
+restrictions" within the meaning of section 10.  If the Program as you
+received it, or any part of it, contains a notice stating that it is
+governed by this License along with a term that is a further
+restriction, you may remove that term.  If a license document contains
+a further restriction but permits relicensing or conveying under this
+License, you may add to a covered work material governed by the terms
+of that license document, provided that the further restriction does
+not survive such relicensing or conveying.
+  If you add terms to a covered work in accord with this section, you
+must place, in the relevant source files, a statement of the
+additional terms that apply to those files, or a notice indicating
+where to find the applicable terms.
+  Additional terms, permissive or non-permissive, may be stated in the
+form of a separately written license, or stated as exceptions;
+the above requirements apply either way.
+  8. Termination.
+  You may not propagate or modify a covered work except as expressly
+provided under this License.  Any attempt otherwise to propagate or
+modify it is void, and will automatically terminate your rights under
+this License (including any patent licenses granted under the third
+paragraph of section 11).
+  However, if you cease all violation of this License, then your
+license from a particular copyright holder is reinstated (a)
+provisionally, unless and until the copyright holder explicitly and
+finally terminates your license, and (b) permanently, if the copyright
+holder fails to notify you of the violation by some reasonable means
+prior to 60 days after the cessation.
+  Moreover, your license from a particular copyright holder is
+reinstated permanently if the copyright holder notifies you of the
+violation by some reasonable means, this is the first time you have
+received notice of violation of this License (for any work) from that
+copyright holder, and you cure the violation prior to 30 days after
+your receipt of the notice.
+  Termination of your rights under this section does not terminate the
+licenses of parties who have received copies or rights from you under
+this License.  If your rights have been terminated and not permanently
+reinstated, you do not qualify to receive new licenses for the same
+material under section 10.
+  9. Acceptance Not Required for Having Copies.
+  You are not required to accept this License in order to receive or
+run a copy of the Program.  Ancillary propagation of a covered work
+occurring solely as a consequence of using peer-to-peer transmission
+to receive a copy likewise does not require acceptance.  However,
+nothing other than this License grants you permission to propagate or
+modify any covered work.  These actions infringe copyright if you do
+not accept this License.  Therefore, by modifying or propagating a
+covered work, you indicate your acceptance of this License to do so.
+  10. Automatic Licensing of Downstream Recipients.
+  Each time you convey a covered work, the recipient automatically
+receives a license from the original licensors, to run, modify and
+propagate that work, subject to this License.  You are not responsible
+for enforcing compliance by third parties with this License.
+  An "entity transaction" is a transaction transferring control of an
+organization, or substantially all assets of one, or subdividing an
+organization, or merging organizations.  If propagation of a covered
+work results from an entity transaction, each party to that
+transaction who receives a copy of the work also receives whatever
+licenses to the work the party's predecessor in interest had or could
+give under the previous paragraph, plus a right to possession of the
+Corresponding Source of the work from the predecessor in interest, if
+the predecessor has it or can get it with reasonable efforts.
+  You may not impose any further restrictions on the exercise of the
+rights granted or affirmed under this License.  For example, you may
+not impose a license fee, royalty, or other charge for exercise of
+rights granted under this License, and you may not initiate litigation
+(including a cross-claim or counterclaim in a lawsuit) alleging that
+any patent claim is infringed by making, using, selling, offering for
+sale, or importing the Program or any portion of it.
+  11. Patents.
+  A "contributor" is a copyright holder who authorizes use under this
+License of the Program or a work on which the Program is based.  The
+work thus licensed is called the contributor's "contributor version".
+  A contributor's "essential patent claims" are all patent claims
+owned or controlled by the contributor, whether already acquired or
+hereafter acquired, that would be infringed by some manner, permitted
+by this License, of making, using, or selling its contributor version,
+but do not include claims that would be infringed only as a
+consequence of further modification of the contributor version.  For
+purposes of this definition, "control" includes the right to grant
+patent sublicenses in a manner consistent with the requirements of
+this License.
+  Each contributor grants you a non-exclusive, worldwide, royalty-free
+patent license under the contributor's essential patent claims, to
+make, use, sell, offer for sale, import and otherwise run, modify and
+propagate the contents of its contributor version.
+  In the following three paragraphs, a "patent license" is any express
+agreement or commitment, however denominated, not to enforce a patent
+(such as an express permission to practice a patent or covenant not to
+sue for patent infringement).  To "grant" such a patent license to a
+party means to make such an agreement or commitment not to enforce a
+patent against the party.
+  If you convey a covered work, knowingly relying on a patent license,
+and the Corresponding Source of the work is not available for anyone
+to copy, free of charge and under the terms of this License, through a
+publicly available network server or other readily accessible means,
+then you must either (1) cause the Corresponding Source to be so
+available, or (2) arrange to deprive yourself of the benefit of the
+patent license for this particular work, or (3) arrange, in a manner
+consistent with the requirements of this License, to extend the patent
+license to downstream recipients.  "Knowingly relying" means you have
+actual knowledge that, but for the patent license, your conveying the
+covered work in a country, or your recipient's use of the covered work
+in a country, would infringe one or more identifiable patents in that
+country that you have reason to believe are valid.
+  If, pursuant to or in connection with a single transaction or
+arrangement, you convey, or propagate by procuring conveyance of, a
+covered work, and grant a patent license to some of the parties
+receiving the covered work authorizing them to use, propagate, modify
+or convey a specific copy of the covered work, then the patent license
+you grant is automatically extended to all recipients of the covered
+work and works based on it.
+  A patent license is "discriminatory" if it does not include within
+the scope of its coverage, prohibits the exercise of, or is
+conditioned on the non-exercise of one or more of the rights that are
+specifically granted under this License.  You may not convey a covered
+work if you are a party to an arrangement with a third party that is
+in the business of distributing software, under which you make payment
+to the third party based on the extent of your activity of conveying
+the work, and under which the third party grants, to any of the
+parties who would receive the covered work from you, a discriminatory
+patent license (a) in connection with copies of the covered work
+conveyed by you (or copies made from those copies), or (b) primarily
+for and in connection with specific products or compilations that
+contain the covered work, unless you entered into that arrangement,
+or that patent license was granted, prior to 28 March 2007.
+  Nothing in this License shall be construed as excluding or limiting
+any implied license or other defenses to infringement that may
+otherwise be available to you under applicable patent law.
+  12. No Surrender of Others' Freedom.
+  If conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License.  If you cannot convey a
+covered work so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you may
+not convey it at all.  For example, if you agree to terms that obligate you
+to collect a royalty for further conveying from those to whom you convey
+the Program, the only way you could satisfy both those terms and this
+License would be to refrain entirely from conveying the Program.
+  13. Remote Network Interaction; Use with the GNU General Public License.
+  Notwithstanding any other provision of this License, if you modify the
+Program, your modified version must prominently offer all users
+interacting with it remotely through a computer network (if your version
+supports such interaction) an opportunity to receive the Corresponding
+Source of your version by providing access to the Corresponding Source
+from a network server at no charge, through some standard or customary
+means of facilitating copying of software.  This Corresponding Source
+shall include the Corresponding Source for any work covered by version 3
+of the GNU General Public License that is incorporated pursuant to the
+following paragraph.
+  Notwithstanding any other provision of this License, you have
+permission to link or combine any covered work with a work licensed
+under version 3 of the GNU General Public License into a single
+combined work, and to convey the resulting work.  The terms of this
+License will continue to apply to the part which is the covered work,
+but the work with which it is combined will remain governed by version
+3 of the GNU General Public License.
+  14. Revised Versions of this License.
+  The Free Software Foundation may publish revised and/or new versions of
+the GNU Affero General Public License from time to time.  Such new versions
+will be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+  Each version is given a distinguishing version number.  If the
+Program specifies that a certain numbered version of the GNU Affero General
+Public License "or any later version" applies to it, you have the
+option of following the terms and conditions either of that numbered
+version or of any later version published by the Free Software
+Foundation.  If the Program does not specify a version number of the
+GNU Affero General Public License, you may choose any version ever published
+by the Free Software Foundation.
+  If the Program specifies that a proxy can decide which future
+versions of the GNU Affero General Public License can be used, that proxy's
+public statement of acceptance of a version permanently authorizes you
+to choose that version for the Program.
+  Later license versions may give you additional or different
+permissions.  However, no additional obligations are imposed on any
+author or copyright holder as a result of your choosing to follow a
+later version.
+  15. Disclaimer of Warranty.
+  THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
+APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
+HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
+OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
+THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
+IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
+ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
+  16. Limitation of Liability.
+  IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
+THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
+GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
+USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
+DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
+PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
+EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
+SUCH DAMAGES.
+  17. Interpretation of Sections 15 and 16.
+  If the disclaimer of warranty and limitation of liability provided
+above cannot be given local legal effect according to their terms,
+reviewing courts shall apply local law that most closely approximates
+an absolute waiver of all civil liability in connection with the
+Program, unless a warranty or assumption of liability accompanies a
+copy of the Program in return for a fee.
+                     END OF TERMS AND CONDITIONS
+            How to Apply These Terms to Your New Programs
+  If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+  To do so, attach the following notices to the program.  It is safest
+to attach them to the start of each source file to most effectively
+state the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+    <one line to give the program's name and a brief idea of what it does.>
+    Copyright (C) <year>  <name of author>
+    This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU Affero General Public License as published
+    by the Free Software Foundation, either version 3 of the License, or
+    (at your option) any later version.
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU Affero General Public License for more details.
+    You should have received a copy of the GNU Affero General Public License
+    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+Also add information on how to contact you by electronic and paper mail.
+  If your software can interact with users remotely through a computer
+network, you should also make sure that it provides a way for users to
+get its source.  For example, if your program is a web application, its
+interface could display a "Source" link that leads users to an archive
+of the code.  There are many ways you could offer source, and different
+solutions will be better for different programs; see section 13 for the
+specific requirements.
+  You should also get your employer (if you work as a programmer) or school,
+if any, to sign a "copyright disclaimer" for the program, if necessary.
+For more information on this, and how to apply and follow the GNU AGPL, see
+<https://www.gnu.org/licenses/>.

data_loader.py ADDED Viewed

	@@ -0,0 +1,269 @@

+import copy
+import json
+import logging
+import os
+import torch
+from torch.utils.data import TensorDataset
+from utils import get_intent_labels, get_slot_labels
+logger = logging.getLogger(__name__)
+class InputExample(object):
+    """
+    A single training/test example for simple sequence classification.
+    Args:
+        guid: Unique id for the example.
+        words: list. The words of the sequence.
+        intent_label: (Optional) string. The intent label of the example.
+        slot_labels: (Optional) list. The slot labels of the example.
+    """
+    def __init__(self, guid, words, intent_label=None, slot_labels=None):
+        self.guid = guid
+        self.words = words
+        self.intent_label = intent_label
+        self.slot_labels = slot_labels
+    def __repr__(self):
+        return str(self.to_json_string())
+    def to_dict(self):
+        """Serializes this instance to a Python dictionary."""
+        output = copy.deepcopy(self.__dict__)
+        return output
+    def to_json_string(self):
+        """Serializes this instance to a JSON string."""
+        return json.dumps(self.to_dict(), indent=2, sort_keys=True) + "\n"
+class InputFeatures(object):
+    """A single set of features of data."""
+    def __init__(self, input_ids, attention_mask, token_type_ids, intent_label_id, slot_labels_ids):
+        self.input_ids = input_ids
+        self.attention_mask = attention_mask
+        self.token_type_ids = token_type_ids
+        self.intent_label_id = intent_label_id
+        self.slot_labels_ids = slot_labels_ids
+    def __repr__(self):
+        return str(self.to_json_string())
+    def to_dict(self):
+        """Serializes this instance to a Python dictionary."""
+        output = copy.deepcopy(self.__dict__)
+        return output
+    def to_json_string(self):
+        """Serializes this instance to a JSON string."""
+        return json.dumps(self.to_dict(), indent=2, sort_keys=True) + "\n"
+class JointProcessor(object):
+    """Processor for the JointBERT data set """
+    def __init__(self, args):
+        self.args = args
+        self.intent_labels = get_intent_labels(args)
+        self.slot_labels = get_slot_labels(args)
+        self.input_text_file = "seq.in"
+        self.intent_label_file = "label"
+        self.slot_labels_file = "seq.out"
+    @classmethod
+    def _read_file(cls, input_file, quotechar=None):
+        """Reads a tab separated value file."""
+        with open(input_file, "r", encoding="utf-8") as f:
+            lines = []
+            for line in f:
+                lines.append(line.strip())
+            return lines
+    def _create_examples(self, texts, intents, slots, set_type):
+        """Creates examples for the training and dev sets."""
+        examples = []
+        for i, (text, intent, slot) in enumerate(zip(texts, intents, slots)):
+            guid = "%s-%s" % (set_type, i)
+            # 1. input_text
+            words = text.split()  # Some are spaced twice
+            # 2. intent
+            intent_label = (
+                self.intent_labels.index(intent) if intent in self.intent_labels else self.intent_labels.index("UNK")
+            )
+            # 3. slot
+            slot_labels = []
+            for s in slot.split():
+                slot_labels.append(
+                    self.slot_labels.index(s) if s in self.slot_labels else self.slot_labels.index("UNK")
+                )
+            assert len(words) == len(slot_labels)
+            examples.append(InputExample(guid=guid, words=words, intent_label=intent_label, slot_labels=slot_labels))
+        return examples
+    def get_examples(self, mode):
+        """
+        Args:
+            mode: train, dev, test
+        """
+        data_path = os.path.join(self.args.data_dir, self.args.token_level, mode)
+        logger.info("LOOKING AT {}".format(data_path))
+        return self._create_examples(
+            texts=self._read_file(os.path.join(data_path, self.input_text_file)),
+            intents=self._read_file(os.path.join(data_path, self.intent_label_file)),
+            slots=self._read_file(os.path.join(data_path, self.slot_labels_file)),
+            set_type=mode,
+        )
+processors = {"syllable-level": JointProcessor, "word-level": JointProcessor}
+def convert_examples_to_features(
+    examples,
+    max_seq_len,
+    tokenizer,
+    pad_token_label_id=-100,
+    cls_token_segment_id=0,
+    pad_token_segment_id=0,
+    sequence_a_segment_id=0,
+    mask_padding_with_zero=True,
+):
+    # Setting based on the current model type
+    cls_token = tokenizer.cls_token
+    sep_token = tokenizer.sep_token
+    unk_token = tokenizer.unk_token
+    pad_token_id = tokenizer.pad_token_id
+    features = []
+    for (ex_index, example) in enumerate(examples):
+        if ex_index % 5000 == 0:
+            logger.info("Writing example %d of %d" % (ex_index, len(examples)))
+        # Tokenize word by word (for NER)
+        tokens = []
+        slot_labels_ids = []
+        for word, slot_label in zip(example.words, example.slot_labels):
+            word_tokens = tokenizer.tokenize(word)
+            if not word_tokens:
+                word_tokens = [unk_token]  # For handling the bad-encoded word
+            tokens.extend(word_tokens)
+            # Use the real label id for the first token of the word, and padding ids for the remaining tokens
+            slot_labels_ids.extend([int(slot_label)] + [pad_token_label_id] * (len(word_tokens) - 1))
+        # Account for [CLS] and [SEP]
+        special_tokens_count = 2
+        if len(tokens) > max_seq_len - special_tokens_count:
+            tokens = tokens[: (max_seq_len - special_tokens_count)]
+            slot_labels_ids = slot_labels_ids[: (max_seq_len - special_tokens_count)]
+        # Add [SEP] token
+        tokens += [sep_token]
+        slot_labels_ids += [pad_token_label_id]
+        token_type_ids = [sequence_a_segment_id] * len(tokens)
+        # Add [CLS] token
+        tokens = [cls_token] + tokens
+        slot_labels_ids = [pad_token_label_id] + slot_labels_ids
+        token_type_ids = [cls_token_segment_id] + token_type_ids
+        input_ids = tokenizer.convert_tokens_to_ids(tokens)
+        # The mask has 1 for real tokens and 0 for padding tokens. Only real
+        # tokens are attended to.
+        attention_mask = [1 if mask_padding_with_zero else 0] * len(input_ids)
+        # Zero-pad up to the sequence length.
+        padding_length = max_seq_len - len(input_ids)
+        input_ids = input_ids + ([pad_token_id] * padding_length)
+        attention_mask = attention_mask + ([0 if mask_padding_with_zero else 1] * padding_length)
+        token_type_ids = token_type_ids + ([pad_token_segment_id] * padding_length)
+        slot_labels_ids = slot_labels_ids + ([pad_token_label_id] * padding_length)
+        assert len(input_ids) == max_seq_len, "Error with input length {} vs {}".format(len(input_ids), max_seq_len)
+        assert len(attention_mask) == max_seq_len, "Error with attention mask length {} vs {}".format(
+            len(attention_mask), max_seq_len
+        )
+        assert len(token_type_ids) == max_seq_len, "Error with token type length {} vs {}".format(
+            len(token_type_ids), max_seq_len
+        )
+        assert len(slot_labels_ids) == max_seq_len, "Error with slot labels length {} vs {}".format(
+            len(slot_labels_ids), max_seq_len
+        )
+        intent_label_id = int(example.intent_label)
+        if ex_index < 5:
+            logger.info("*** Example ***")
+            logger.info("guid: %s" % example.guid)
+            logger.info("tokens: %s" % " ".join([str(x) for x in tokens]))
+            logger.info("input_ids: %s" % " ".join([str(x) for x in input_ids]))
+            logger.info("attention_mask: %s" % " ".join([str(x) for x in attention_mask]))
+            logger.info("token_type_ids: %s" % " ".join([str(x) for x in token_type_ids]))
+            logger.info("intent_label: %s (id = %d)" % (example.intent_label, intent_label_id))
+            logger.info("slot_labels: %s" % " ".join([str(x) for x in slot_labels_ids]))
+        features.append(
+            InputFeatures(
+                input_ids=input_ids,
+                attention_mask=attention_mask,
+                token_type_ids=token_type_ids,
+                intent_label_id=intent_label_id,
+                slot_labels_ids=slot_labels_ids,
+            )
+        )
+    return features
+def load_and_cache_examples(args, tokenizer, mode):
+    processor = processors[args.token_level](args)
+    # Load data features from cache or dataset file
+    cached_features_file = os.path.join(
+        args.data_dir,
+        "cached_{}_{}_{}_{}".format(
+            mode, args.token_level, list(filter(None, args.model_name_or_path.split("/"))).pop(), args.max_seq_len
+        ),
+    )
+    if os.path.exists(cached_features_file):
+        logger.info("Loading features from cached file %s", cached_features_file)
+        features = torch.load(cached_features_file)
+    else:
+        # Load data features from dataset file
+        logger.info("Creating features from dataset file at %s", args.data_dir)
+        if mode == "train":
+            examples = processor.get_examples("train")
+        elif mode == "dev":
+            examples = processor.get_examples("dev")
+        elif mode == "test":
+            examples = processor.get_examples("test")
+        else:
+            raise Exception("For mode, Only train, dev, test is available")
+        # Use cross entropy ignore index as padding label id so that only real label ids contribute to the loss later
+        pad_token_label_id = args.ignore_index
+        features = convert_examples_to_features(
+            examples, args.max_seq_len, tokenizer, pad_token_label_id=pad_token_label_id
+        )
+        logger.info("Saving features into cached file %s", cached_features_file)
+        torch.save(features, cached_features_file)
+    # Convert to Tensors and build dataset
+    all_input_ids = torch.tensor([f.input_ids for f in features], dtype=torch.long)
+    all_attention_mask = torch.tensor([f.attention_mask for f in features], dtype=torch.long)
+    all_token_type_ids = torch.tensor([f.token_type_ids for f in features], dtype=torch.long)
+    all_intent_label_ids = torch.tensor([f.intent_label_id for f in features], dtype=torch.long)
+    all_slot_labels_ids = torch.tensor([f.slot_labels_ids for f in features], dtype=torch.long)
+    dataset = TensorDataset(
+        all_input_ids, all_attention_mask, all_token_type_ids, all_intent_label_ids, all_slot_labels_ids
+    )
+    return dataset

dataset_statistic.png ADDED Viewed

early_stopping.py ADDED Viewed

	@@ -0,0 +1,64 @@

+import os
+import numpy as np
+import torch
+class EarlyStopping:
+    """Early stops the training if validation loss doesn't improve after a given patience."""
+    def __init__(self, patience=7, verbose=False):
+        """
+        Args:
+            patience (int): How long to wait after last time validation loss improved.
+                            Default: 7
+            verbose (bool): If True, prints a message for each validation loss improvement.
+                            Default: False
+        """
+        self.patience = patience
+        self.verbose = verbose
+        self.counter = 0
+        self.best_score = None
+        self.early_stop = False
+        self.val_loss_min = np.Inf
+    def __call__(self, val_loss, model, args):
+        if args.tuning_metric == "loss":
+            score = -val_loss
+        else:
+            score = val_loss
+        if self.best_score is None:
+            self.best_score = score
+            self.save_checkpoint(val_loss, model, args)
+        elif score < self.best_score:
+            self.counter += 1
+            print(f"EarlyStopping counter: {self.counter} out of {self.patience}")
+            if self.counter >= self.patience:
+                self.early_stop = True
+        else:
+            self.best_score = score
+            self.save_checkpoint(val_loss, model, args)
+            self.counter = 0
+    def save_checkpoint(self, val_loss, model, args):
+        """Saves model when validation loss decreases or accuracy/f1 increases."""
+        if self.verbose:
+            if args.tuning_metric == "loss":
+                print(f"Validation loss decreased ({self.val_loss_min:.6f} --> {val_loss:.6f}).  Saving model ...")
+            else:
+                print(
+                    f"{args.tuning_metric} increased ({self.val_loss_min:.6f} --> {val_loss:.6f}).  Saving model ..."
+                )
+        model.save_pretrained(args.model_dir)
+        torch.save(args, os.path.join(args.model_dir, "training_args.bin"))
+        self.val_loss_min = val_loss
+        # # Save model checkpoint (Overwrite)
+        # if not os.path.exists(self.args.model_dir):
+        #     os.makedirs(self.args.model_dir)
+        # model_to_save = self.model.module if hasattr(self.model, 'module') else self.model
+        # model_to_save.save_pretrained(self.args.model_dir)
+        # # Save training arguments together with the trained model
+        # torch.save(self.args, os.path.join(self.args.model_dir, 'training_args.bin'))
+        # logger.info("Saving model checkpoint to %s", self.args.model_dir)

gradio_demo.py ADDED Viewed

	@@ -0,0 +1,250 @@

+import gradio as gr
+import argparse
+import logging
+import os
+import numpy as np
+import torch
+from torch.utils.data import DataLoader, SequentialSampler, TensorDataset
+from tqdm import tqdm
+from utils import MODEL_CLASSES, get_intent_labels, get_slot_labels, init_logger, load_tokenizer
+logger = logging.getLogger(__name__)
+def get_device(pred_config):
+    return "cuda" if torch.cuda.is_available() and not pred_config.no_cuda else "cpu"
+def get_args(pred_config):
+    args = torch.load(os.path.join(pred_config.model_dir, "training_args.bin"))
+    args.model_dir = pred_config.model_dir
+    args.data_dir = 'PhoATIS'
+    return args
+def load_model(pred_config, args, device):
+    # Check whether model exists
+    if not os.path.exists(pred_config.model_dir):
+        raise Exception("Model doesn't exists! Train first!")
+    try:
+        model = MODEL_CLASSES[args.model_type][1].from_pretrained(
+            args.model_dir, args=args, intent_label_lst=get_intent_labels(args), slot_label_lst=get_slot_labels(args)
+        )
+        model.to(device)
+        model.eval()
+        logger.info("***** Model Loaded *****")
+    except Exception:
+        raise Exception("Some model files might be missing...")
+    return model
+def convert_input_file_to_tensor_dataset(
+    lines,
+    pred_config,
+    args,
+    tokenizer,
+    pad_token_label_id,
+    cls_token_segment_id=0,
+    pad_token_segment_id=0,
+    sequence_a_segment_id=0,
+    mask_padding_with_zero=True,
+):
+    # Setting based on the current model type
+    cls_token = tokenizer.cls_token
+    sep_token = tokenizer.sep_token
+    unk_token = tokenizer.unk_token
+    pad_token_id = tokenizer.pad_token_id
+    all_input_ids = []
+    all_attention_mask = []
+    all_token_type_ids = []
+    all_slot_label_mask = []
+    for words in lines:
+        tokens = []
+        slot_label_mask = []
+        for word in words:
+            word_tokens = tokenizer.tokenize(word)
+            if not word_tokens:
+                word_tokens = [unk_token]  # For handling the bad-encoded word
+            tokens.extend(word_tokens)
+            # Use the real label id for the first token of the word, and padding ids for the remaining tokens
+            slot_label_mask.extend([pad_token_label_id + 1] + [pad_token_label_id] * (len(word_tokens) - 1))
+        # Account for [CLS] and [SEP]
+        special_tokens_count = 2
+        if len(tokens) > args.max_seq_len - special_tokens_count:
+            tokens = tokens[: (args.max_seq_len - special_tokens_count)]
+            slot_label_mask = slot_label_mask[: (args.max_seq_len - special_tokens_count)]
+        # Add [SEP] token
+        tokens += [sep_token]
+        token_type_ids = [sequence_a_segment_id] * len(tokens)
+        slot_label_mask += [pad_token_label_id]
+        # Add [CLS] token
+        tokens = [cls_token] + tokens
+        token_type_ids = [cls_token_segment_id] + token_type_ids
+        slot_label_mask = [pad_token_label_id] + slot_label_mask
+        input_ids = tokenizer.convert_tokens_to_ids(tokens)
+        # The mask has 1 for real tokens and 0 for padding tokens. Only real tokens are attended to.
+        attention_mask = [1 if mask_padding_with_zero else 0] * len(input_ids)
+        # Zero-pad up to the sequence length.
+        padding_length = args.max_seq_len - len(input_ids)
+        input_ids = input_ids + ([pad_token_id] * padding_length)
+        attention_mask = attention_mask + ([0 if mask_padding_with_zero else 1] * padding_length)
+        token_type_ids = token_type_ids + ([pad_token_segment_id] * padding_length)
+        slot_label_mask = slot_label_mask + ([pad_token_label_id] * padding_length)
+        all_input_ids.append(input_ids)
+        all_attention_mask.append(attention_mask)
+        all_token_type_ids.append(token_type_ids)
+        all_slot_label_mask.append(slot_label_mask)
+    # Change to Tensor
+    all_input_ids = torch.tensor(all_input_ids, dtype=torch.long)
+    all_attention_mask = torch.tensor(all_attention_mask, dtype=torch.long)
+    all_token_type_ids = torch.tensor(all_token_type_ids, dtype=torch.long)
+    all_slot_label_mask = torch.tensor(all_slot_label_mask, dtype=torch.long)
+    dataset = TensorDataset(all_input_ids, all_attention_mask, all_token_type_ids, all_slot_label_mask)
+    return dataset
+def predict(text):
+    lines = text
+    dataset = convert_input_file_to_tensor_dataset(lines, pred_config, args, tokenizer, pad_token_label_id)
+    # Predict
+    sampler = SequentialSampler(dataset)
+    data_loader = DataLoader(dataset, sampler=sampler, batch_size=pred_config.batch_size)
+    all_slot_label_mask = None
+    intent_preds = None
+    slot_preds = None
+    for batch in tqdm(data_loader, desc="Predicting"):
+        batch = tuple(t.to(device) for t in batch)
+        with torch.no_grad():
+            inputs = {
+                "input_ids": batch[0],
+                "attention_mask": batch[1],
+                "intent_label_ids": None,
+                "slot_labels_ids": None,
+            }
+            if args.model_type != "distilbert":
+                inputs["token_type_ids"] = batch[2]
+            outputs = model(**inputs)
+            _, (intent_logits, slot_logits) = outputs[:2]
+            # Intent Prediction
+            if intent_preds is None:
+                intent_preds = intent_logits.detach().cpu().numpy()
+            else:
+                intent_preds = np.append(intent_preds, intent_logits.detach().cpu().numpy(), axis=0)
+            # Slot prediction
+            if slot_preds is None:
+                if args.use_crf:
+                    # decode() in `torchcrf` returns list with best index directly
+                    slot_preds = np.array(model.crf.decode(slot_logits))
+                else:
+                    slot_preds = slot_logits.detach().cpu().numpy()
+                all_slot_label_mask = batch[3].detach().cpu().numpy()
+            else:
+                if args.use_crf:
+                    slot_preds = np.append(slot_preds, np.array(model.crf.decode(slot_logits)), axis=0)
+                else:
+                    slot_preds = np.append(slot_preds, slot_logits.detach().cpu().numpy(), axis=0)
+                all_slot_label_mask = np.append(all_slot_label_mask, batch[3].detach().cpu().numpy(), axis=0)
+    intent_preds = np.argmax(intent_preds, axis=1)
+    if not args.use_crf:
+        slot_preds = np.argmax(slot_preds, axis=2)
+    slot_label_map = {i: label for i, label in enumerate(slot_label_lst)}
+    slot_preds_list = [[] for _ in range(slot_preds.shape[0])]
+    for i in range(slot_preds.shape[0]):
+        for j in range(slot_preds.shape[1]):
+            if all_slot_label_mask[i, j] != pad_token_label_id:
+                slot_preds_list[i].append(slot_label_map[slot_preds[i][j]])
+    return (lines, slot_preds_list, intent_preds)
+def text_analysis(text):
+    text = [text.strip().split()]
+    words, slot_preds, intent_pred = predict(text)[0][0],  predict(text)[1][0],  predict(text)[2][0]
+    slot_tokens = []
+    for word, pred in zip(words, slot_preds):
+        if pred == 'O':
+            slot_tokens.extend([(word, None), (" ", None)])
+        elif pred[0] == 'I':
+            added_tokens = list(slot_tokens[-2])
+            added_tokens[0] += f' {word}'
+            slot_tokens[-2] = tuple(added_tokens)
+        else:
+            slot_tokens.extend([(word, pred[2:]), (" ", None)])
+    intent_label = intent_label_lst[intent_pred]
+    return slot_tokens, intent_label
+if __name__ == "__main__":
+    init_logger()
+    parser = argparse.ArgumentParser()
+    # parser.add_argument("--input_file", default="sample_pred_in.txt", type=str, help="Input file for prediction")
+    # parser.add_argument("--output_file", default="sample_pred_out.txt", type=str, help="Output file for prediction")
+    parser.add_argument("--model_dir", default="./atis_model", type=str, help="Path to save, load model")
+    parser.add_argument("--batch_size", default=32, type=int, help="Batch size for prediction")
+    parser.add_argument("--no_cuda", action="store_true", help="Avoid using CUDA when available")
+    pred_config = parser.parse_args()
+    # load model and args
+    args = get_args(pred_config)
+    device = get_device(pred_config)
+    model = load_model(pred_config, args, device)
+    logger.info(args)
+    intent_label_lst = get_intent_labels(args)
+    slot_label_lst = get_slot_labels(args)
+    # Convert input file to TensorDataset
+    pad_token_label_id = args.ignore_index
+    tokenizer = load_tokenizer(args)
+    examples = ["tôi muốn bay một chuyến khứ_hồi từ đà_nẵng đến đà_lạt",
+                ("giá vé khứ_hồi từ đà_nẵng đến vinh dưới 2 triệu đồng giá vé khứ_hồi từ quy nhơn đến vinh dưới 3 triệu đồng giá vé khứ_hồi từ"
+                " buôn_ma_thuột đến vinh dưới 4 triệu rưỡi"),
+                "cho tôi biết các chuyến bay đến đà_nẵng vào ngày 14 tháng sáu",
+                "những chuyến bay nào khởi_hành từ thành_phố hồ_chí_minh bay đến frankfurt mà nối chuyến ở singapore và hạ_cánh trước 9 giờ tối"]
+    demo = gr.Interface(
+        text_analysis,
+        gr.Textbox(placeholder="Enter sentence here...", label="Input"),
+        [gr.HighlightedText(label='Highlighted Output'), gr.Textbox(label='Intent Label')],
+        examples=examples,
+    )
+    demo.launch(share=True)

main.py ADDED Viewed

	@@ -0,0 +1,139 @@

+import argparse
+from data_loader import load_and_cache_examples
+from trainer import Trainer
+from utils import MODEL_CLASSES, MODEL_PATH_MAP, init_logger, load_tokenizer, set_seed
+def main(args):
+    init_logger()
+    set_seed(args)
+    tokenizer = load_tokenizer(args)
+    train_dataset = load_and_cache_examples(args, tokenizer, mode="train")
+    dev_dataset = load_and_cache_examples(args, tokenizer, mode="dev")
+    test_dataset = load_and_cache_examples(args, tokenizer, mode="test")
+    trainer = Trainer(args, train_dataset, dev_dataset, test_dataset)
+    if args.do_train:
+        trainer.train()
+    if args.do_eval:
+        trainer.load_model()
+        trainer.evaluate("test")
+    if args.do_eval_dev:
+        trainer.load_model()
+        trainer.evaluate("dev")
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    # parser.add_argument("--task", default=None, required=True, type=str, help="The name of the task to train")
+    parser.add_argument("--model_dir", default=None, required=True, type=str, help="Path to save, load model")
+    parser.add_argument("--data_dir", default="./PhoATIS", type=str, help="The input data dir")
+    parser.add_argument("--intent_label_file", default="intent_label.txt", type=str, help="Intent Label file")
+    parser.add_argument("--slot_label_file", default="slot_label.txt", type=str, help="Slot Label file")
+    parser.add_argument(
+        "--model_type",
+        default="phobert",
+        type=str,
+        help="Model type selected in the list: " + ", ".join(MODEL_CLASSES.keys()),
+    )
+    parser.add_argument("--tuning_metric", default="loss", type=str, help="Metrics to tune when training")
+    parser.add_argument("--seed", type=int, default=1, help="random seed for initialization")
+    parser.add_argument("--train_batch_size", default=32, type=int, help="Batch size for training.")
+    parser.add_argument("--eval_batch_size", default=64, type=int, help="Batch size for evaluation.")
+    parser.add_argument(
+        "--max_seq_len", default=50, type=int, help="The maximum total input sequence length after tokenization."
+    )
+    parser.add_argument("--learning_rate", default=5e-5, type=float, help="The initial learning rate for Adam.")
+    parser.add_argument(
+        "--num_train_epochs", default=10.0, type=float, help="Total number of training epochs to perform."
+    )
+    parser.add_argument("--weight_decay", default=0.0, type=float, help="Weight decay if we apply some.")
+    parser.add_argument(
+        "--gradient_accumulation_steps",
+        type=int,
+        default=1,
+        help="Number of updates steps to accumulate before performing a backward/update pass.",
+    )
+    parser.add_argument("--adam_epsilon", default=1e-8, type=float, help="Epsilon for Adam optimizer.")
+    parser.add_argument("--max_grad_norm", default=1.0, type=float, help="Max gradient norm.")
+    parser.add_argument(
+        "--max_steps",
+        default=-1,
+        type=int,
+        help="If > 0: set total number of training steps to perform. Override num_train_epochs.",
+    )
+    parser.add_argument("--warmup_steps", default=0, type=int, help="Linear warmup over warmup_steps.")
+    parser.add_argument("--dropout_rate", default=0.1, type=float, help="Dropout for fully-connected layers")
+    parser.add_argument("--logging_steps", type=int, default=200, help="Log every X updates steps.")
+    parser.add_argument("--save_steps", type=int, default=200, help="Save checkpoint every X updates steps.")
+    parser.add_argument("--do_train", action="store_true", help="Whether to run training.")
+    parser.add_argument("--do_eval", action="store_true", help="Whether to run eval on the test set.")
+    parser.add_argument("--do_eval_dev", action="store_true", help="Whether to run eval on the dev set.")
+    parser.add_argument("--no_cuda", action="store_true", help="Avoid using CUDA when available")
+    parser.add_argument(
+        "--ignore_index",
+        default=0,
+        type=int,
+        help="Specifies a target value that is ignored and does not contribute to the input gradient",
+    )
+    parser.add_argument("--intent_loss_coef", type=float, default=0.5, help="Coefficient for the intent loss.")
+    parser.add_argument(
+        "--token_level",
+        type=str,
+        default="word-level",
+        help="Tokens are at syllable level or word level (Vietnamese) [word-level, syllable-level]",
+    )
+    parser.add_argument(
+        "--early_stopping",
+        type=int,
+        default=50,
+        help="Number of unincreased validation step to wait for early stopping",
+    )
+    parser.add_argument("--gpu_id", type=int, default=0, help="Select gpu id")
+    # CRF option
+    parser.add_argument("--use_crf", action="store_true", help="Whether to use CRF")
+    # init pretrained
+    parser.add_argument("--pretrained", action="store_true", help="Whether to init model from pretrained base model")
+    parser.add_argument("--pretrained_path", default="./viatis_xlmr_crf", type=str, help="The pretrained model path")
+    # Slot-intent interaction
+    parser.add_argument(
+        "--use_intent_context_concat",
+        action="store_true",
+        help="Whether to feed context information of intent into slots vectors (simple concatenation)",
+    )
+    parser.add_argument(
+        "--use_intent_context_attention",
+        action="store_true",
+        help="Whether to feed context information of intent into slots vectors (dot product attention)",
+    )
+    parser.add_argument(
+        "--attention_embedding_size", type=int, default=200, help="hidden size of attention output vector"
+    )
+    parser.add_argument(
+        "--slot_pad_label",
+        default="PAD",
+        type=str,
+        help="Pad token for slot label pad (to be ignore when calculate loss)",
+    )
+    parser.add_argument(
+        "--embedding_type", default="soft", type=str, help="Embedding type for intent vector (hard/soft)"
+    )
+    parser.add_argument("--use_attention_mask", action="store_true", help="Whether to use attention mask")
+    args = parser.parse_args()
+    args.model_name_or_path = MODEL_PATH_MAP[args.model_type]
+    main(args)

predict.py ADDED Viewed

	@@ -0,0 +1,232 @@

+import argparse
+import logging
+import os
+import numpy as np
+import torch
+from torch.utils.data import DataLoader, SequentialSampler, TensorDataset
+from tqdm import tqdm
+from utils import MODEL_CLASSES, get_intent_labels, get_slot_labels, init_logger, load_tokenizer
+logger = logging.getLogger(__name__)
+def get_device(pred_config):
+    return "cuda" if torch.cuda.is_available() and not pred_config.no_cuda else "cpu"
+def get_args(pred_config):
+    args = torch.load(os.path.join(pred_config.model_dir, "training_args.bin"))
+    args.model_dir = 'JointBERT-CRF_PhoBERTencoder'
+    args.data_dir = 'PhoATIS'
+    return args
+def load_model(pred_config, args, device):
+    # Check whether model exists
+    if not os.path.exists(pred_config.model_dir):
+        raise Exception("Model doesn't exists! Train first!")
+    try:
+        model = MODEL_CLASSES[args.model_type][1].from_pretrained(
+            args.model_dir, args=args, intent_label_lst=get_intent_labels(args), slot_label_lst=get_slot_labels(args)
+        )
+        model.to(device)
+        model.eval()
+        logger.info("***** Model Loaded *****")
+    except Exception:
+        raise Exception("Some model files might be missing...")
+    return model
+def read_input_file(pred_config):
+    lines = []
+    with open(pred_config.input_file, "r", encoding="utf-8") as f:
+        for line in f:
+            line = line.strip()
+            words = line.split()
+            lines.append(words)
+    return lines
+def convert_input_file_to_tensor_dataset(
+    lines,
+    pred_config,
+    args,
+    tokenizer,
+    pad_token_label_id,
+    cls_token_segment_id=0,
+    pad_token_segment_id=0,
+    sequence_a_segment_id=0,
+    mask_padding_with_zero=True,
+):
+    # Setting based on the current model type
+    cls_token = tokenizer.cls_token
+    sep_token = tokenizer.sep_token
+    unk_token = tokenizer.unk_token
+    pad_token_id = tokenizer.pad_token_id
+    all_input_ids = []
+    all_attention_mask = []
+    all_token_type_ids = []
+    all_slot_label_mask = []
+    for words in lines:
+        tokens = []
+        slot_label_mask = []
+        for word in words:
+            word_tokens = tokenizer.tokenize(word)
+            if not word_tokens:
+                word_tokens = [unk_token]  # For handling the bad-encoded word
+            tokens.extend(word_tokens)
+            # Use the real label id for the first token of the word, and padding ids for the remaining tokens
+            slot_label_mask.extend([pad_token_label_id + 1] + [pad_token_label_id] * (len(word_tokens) - 1))
+        # Account for [CLS] and [SEP]
+        special_tokens_count = 2
+        if len(tokens) > args.max_seq_len - special_tokens_count:
+            tokens = tokens[: (args.max_seq_len - special_tokens_count)]
+            slot_label_mask = slot_label_mask[: (args.max_seq_len - special_tokens_count)]
+        # Add [SEP] token
+        tokens += [sep_token]
+        token_type_ids = [sequence_a_segment_id] * len(tokens)
+        slot_label_mask += [pad_token_label_id]
+        # Add [CLS] token
+        tokens = [cls_token] + tokens
+        token_type_ids = [cls_token_segment_id] + token_type_ids
+        slot_label_mask = [pad_token_label_id] + slot_label_mask
+        input_ids = tokenizer.convert_tokens_to_ids(tokens)
+        # The mask has 1 for real tokens and 0 for padding tokens. Only real tokens are attended to.
+        attention_mask = [1 if mask_padding_with_zero else 0] * len(input_ids)
+        # Zero-pad up to the sequence length.
+        padding_length = args.max_seq_len - len(input_ids)
+        input_ids = input_ids + ([pad_token_id] * padding_length)
+        attention_mask = attention_mask + ([0 if mask_padding_with_zero else 1] * padding_length)
+        token_type_ids = token_type_ids + ([pad_token_segment_id] * padding_length)
+        slot_label_mask = slot_label_mask + ([pad_token_label_id] * padding_length)
+        all_input_ids.append(input_ids)
+        all_attention_mask.append(attention_mask)
+        all_token_type_ids.append(token_type_ids)
+        all_slot_label_mask.append(slot_label_mask)
+    # Change to Tensor
+    all_input_ids = torch.tensor(all_input_ids, dtype=torch.long)
+    all_attention_mask = torch.tensor(all_attention_mask, dtype=torch.long)
+    all_token_type_ids = torch.tensor(all_token_type_ids, dtype=torch.long)
+    all_slot_label_mask = torch.tensor(all_slot_label_mask, dtype=torch.long)
+    dataset = TensorDataset(all_input_ids, all_attention_mask, all_token_type_ids, all_slot_label_mask)
+    return dataset
+def predict(pred_config):
+    # load model and args
+    args = get_args(pred_config)
+    device = get_device(pred_config)
+    model = load_model(pred_config, args, device)
+    logger.info(args)
+    intent_label_lst = get_intent_labels(args)
+    slot_label_lst = get_slot_labels(args)
+    # Convert input file to TensorDataset
+    pad_token_label_id = args.ignore_index
+    tokenizer = load_tokenizer(args)
+    lines = read_input_file(pred_config)
+    dataset = convert_input_file_to_tensor_dataset(lines, pred_config, args, tokenizer, pad_token_label_id)
+    # Predict
+    sampler = SequentialSampler(dataset)
+    data_loader = DataLoader(dataset, sampler=sampler, batch_size=pred_config.batch_size)
+    all_slot_label_mask = None
+    intent_preds = None
+    slot_preds = None
+    for batch in tqdm(data_loader, desc="Predicting"):
+        batch = tuple(t.to(device) for t in batch)
+        with torch.no_grad():
+            inputs = {
+                "input_ids": batch[0],
+                "attention_mask": batch[1],
+                "intent_label_ids": None,
+                "slot_labels_ids": None,
+            }
+            if args.model_type != "distilbert":
+                inputs["token_type_ids"] = batch[2]
+            outputs = model(**inputs)
+            _, (intent_logits, slot_logits) = outputs[:2]
+            # Intent Prediction
+            if intent_preds is None:
+                intent_preds = intent_logits.detach().cpu().numpy()
+            else:
+                intent_preds = np.append(intent_preds, intent_logits.detach().cpu().numpy(), axis=0)
+            # Slot prediction
+            if slot_preds is None:
+                if args.use_crf:
+                    # decode() in `torchcrf` returns list with best index directly
+                    slot_preds = np.array(model.crf.decode(slot_logits))
+                else:
+                    slot_preds = slot_logits.detach().cpu().numpy()
+                all_slot_label_mask = batch[3].detach().cpu().numpy()
+            else:
+                if args.use_crf:
+                    slot_preds = np.append(slot_preds, np.array(model.crf.decode(slot_logits)), axis=0)
+                else:
+                    slot_preds = np.append(slot_preds, slot_logits.detach().cpu().numpy(), axis=0)
+                all_slot_label_mask = np.append(all_slot_label_mask, batch[3].detach().cpu().numpy(), axis=0)
+    intent_preds = np.argmax(intent_preds, axis=1)
+    if not args.use_crf:
+        slot_preds = np.argmax(slot_preds, axis=2)
+    slot_label_map = {i: label for i, label in enumerate(slot_label_lst)}
+    slot_preds_list = [[] for _ in range(slot_preds.shape[0])]
+    for i in range(slot_preds.shape[0]):
+        for j in range(slot_preds.shape[1]):
+            if all_slot_label_mask[i, j] != pad_token_label_id:
+                slot_preds_list[i].append(slot_label_map[slot_preds[i][j]])
+    # Write to output file
+    with open(pred_config.output_file, "w", encoding="utf-8") as f:
+        for words, slot_preds, intent_pred in zip(lines, slot_preds_list, intent_preds):
+            line = ""
+            for word, pred in zip(words, slot_preds):
+                if pred == "O":
+                    line = line + word + " "
+                else:
+                    line = line + "[{}:{}] ".format(word, pred)
+            f.write("<{}> -> {}\n".format(intent_label_lst[intent_pred], line.strip()))
+    logger.info("Prediction Done!")
+if __name__ == "__main__":
+    init_logger()
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--input_file", default="sample_pred_in.txt", type=str, help="Input file for prediction")
+    parser.add_argument("--output_file", default="sample_pred_out.txt", type=str, help="Output file for prediction")
+    parser.add_argument("--model_dir", default="./atis_model", type=str, help="Path to save, load model")
+    parser.add_argument("--batch_size", default=32, type=int, help="Batch size for prediction")
+    parser.add_argument("--no_cuda", action="store_true", help="Avoid using CUDA when available")
+    pred_config = parser.parse_args()
+    predict(pred_config)

predict.sh ADDED Viewed

	@@ -0,0 +1,3 @@

+python3 predict.py --input_file data/viatis/test/seq.in \
+                              --output_file predictions.txt \
+                              --model_dir viatis_phobert_crf_attn/4e-5/0.15

requirements.txt ADDED Viewed

	@@ -0,0 +1,11 @@

+torch==2.0.0
+transformers
+seqeval
+pytorch-crf
+tensorflow
+sentencepiece
+tensorboard
+numpy>=1.21.2
+tqdm
+typing_extensions
+protobuf<5,>=3.20.3

run_jointBERT-CRF_PhoBERTencoder.sh ADDED Viewed

	@@ -0,0 +1,23 @@

+export lr=3e-5
+export c=0.6
+export s=100
+echo "${lr}"
+export MODEL_DIR=JointBERT-CRF_PhoBERTencoder
+export MODEL_DIR=$MODEL_DIR"/"$lr"/"$c"/"$s
+echo "${MODEL_DIR}"
+python3 main.py --token_level word-level \
+                  --model_type phobert \
+                  --model_dir $MODEL_DIR \
+                  --data_dir PhoATIS \
+                  --seed $s \
+                  --do_train \
+                  --do_eval \
+                  --save_steps 140 \
+                  --logging_steps 140 \
+                  --num_train_epochs 50 \
+                  --tuning_metric mean_intent_slot \
+                  --use_crf \
+                  --gpu_id 0 \
+                  --embedding_type soft \
+                  --intent_loss_coef $c \
+                  --learning_rate $lr

run_jointBERT-CRF_XLM-Rencoder.sh ADDED Viewed

	@@ -0,0 +1,23 @@

+export lr=4e-5
+export c=0.45
+export s=10
+echo "${lr}"
+export MODEL_DIR=JointBERT-CRF_XLM-Rencoder
+export MODEL_DIR=$MODEL_DIR"/"$lr"/"$c"/"$s
+echo "${MODEL_DIR}"
+python3 main.py --token_level syllable-level \
+                  --model_type xlmr \
+                  --model_dir $MODEL_DIR \
+                  --data_dir PhoATIS \
+                  --seed $s \
+                  --do_train \
+                  --do_eval \
+                  --save_steps 140 \
+                  --logging_steps 140 \
+                  --num_train_epochs 50 \
+                  --tuning_metric mean_intent_slot \
+                  --use_crf \
+                  --gpu_id 0 \
+                  --embedding_type soft \
+                  --intent_loss_coef $c \
+                  --learning_rate $lr

run_jointIDSF_PhoBERTencoder.sh ADDED Viewed

	@@ -0,0 +1,30 @@

+#As we initialize JointIDSF from JointBERT, user need to train a base model JointBERT first
+./run_jointBERT-CRF_PhoBERTencoder.sh
+#Train JointIDSF
+export lr=4e-5
+export c=0.15
+export s=100
+echo "${lr}"
+export MODEL_DIR=JointIDSF_PhoBERTencoder
+export MODEL_DIR=$MODEL_DIR"/"$lr"/"$c"/"$s
+echo "${MODEL_DIR}"
+python3 main.py --token_level word-level \
+                  --model_type phobert \
+                  --model_dir $MODEL_DIR \
+                  --data_dir PhoATIS \
+                  --seed $s \
+                  --do_train \
+                  --do_eval \
+                  --save_steps 140 \
+                  --logging_steps 140 \
+                  --num_train_epochs 50 \
+                  --tuning_metric mean_intent_slot \
+                  --use_intent_context_attention \
+                  --attention_embedding_size 200 \
+                  --use_crf \
+                  --gpu_id 0 \
+                  --embedding_type soft \
+                  --intent_loss_coef $c \
+                  --pretrained \
+                  --pretrained_path JointBERT-CRF_PhoBERTencoder/3e-5/0.6/100 \
+                  --learning_rate $lr

run_jointIDSF_XLM-Rencoder.sh ADDED Viewed

	@@ -0,0 +1,30 @@

+#As we initialize JointIDSF from JointBERT, user need to train a base model JointBERT first
+./run_jointBERT-CRF_XLM-Rencoder.sh
+#Train JointIDSF
+export lr=3e-5
+export c=0.25
+export s=10
+echo "${lr}"
+export MODEL_DIR=JointIDSF_XLM-Rencoder
+export MODEL_DIR=$MODEL_DIR"/"$lr"/"$c"/"$s
+echo "${MODEL_DIR}"
+python3 main.py --token_level syllable-level \
+                  --model_type xlmr \
+                  --model_dir $MODEL_DIR \
+                  --data_dir PhoATIS \
+                  --seed $s \
+                  --do_train \
+                  --do_eval \
+                  --save_steps 140 \
+                  --logging_steps 140 \
+                  --num_train_epochs 50 \
+                  --tuning_metric mean_intent_slot \
+                  --use_intent_context_attention \
+                  --attention_embedding_size 200 \
+                  --use_crf \
+                  --gpu_id 0 \
+                  --embedding_type soft \
+                  --intent_loss_coef $c \
+                  --pretrained \
+                  --pretrained_path JointBERT-CRF_XLM-Rencoder/4e-5/0.45/10 \
+                  --learning_rate $lr

trainer.py ADDED Viewed

	@@ -0,0 +1,300 @@

+import logging
+import os
+import numpy as np
+import torch
+from early_stopping import EarlyStopping
+from torch.utils.data import DataLoader, RandomSampler, SequentialSampler
+from torch.utils.tensorboard import SummaryWriter
+from tqdm.auto import tqdm, trange
+from transformers import AdamW, get_linear_schedule_with_warmup
+from utils import MODEL_CLASSES, compute_metrics, get_intent_labels, get_slot_labels
+logger = logging.getLogger(__name__)
+class Trainer(object):
+    def __init__(self, args, train_dataset=None, dev_dataset=None, test_dataset=None):
+        self.args = args
+        self.train_dataset = train_dataset
+        self.dev_dataset = dev_dataset
+        self.test_dataset = test_dataset
+        self.intent_label_lst = get_intent_labels(args)
+        self.slot_label_lst = get_slot_labels(args)
+        # Use cross entropy ignore index as padding label id so that only real label ids contribute to the loss later
+        self.pad_token_label_id = args.ignore_index
+        self.config_class, self.model_class, _ = MODEL_CLASSES[args.model_type]
+        # self.config = self.config_class.from_pretrained(model_path, finetuning_task=args.task)
+        if args.pretrained:
+            print(args.model_name_or_path)
+            self.model = self.model_class.from_pretrained(
+                args.pretrained_path,
+                args=args,
+                intent_label_lst=self.intent_label_lst,
+                slot_label_lst=self.slot_label_lst,
+            )
+        else:
+            self.config = self.config_class.from_pretrained(args.model_name_or_path, finetuning_task=args.token_level)
+            self.model = self.model_class.from_pretrained(
+                args.model_name_or_path,
+                config=self.config,
+                args=args,
+                intent_label_lst=self.intent_label_lst,
+                slot_label_lst=self.slot_label_lst,
+            )
+        # GPU or CPU
+        torch.cuda.set_device(self.args.gpu_id)
+        print(self.args.gpu_id)
+        print(torch.cuda.current_device())
+        self.device = "cuda" if torch.cuda.is_available() and not args.no_cuda else "cpu"
+        self.model.to(self.device)
+    def train(self):
+        train_sampler = RandomSampler(self.train_dataset)
+        train_dataloader = DataLoader(self.train_dataset, sampler=train_sampler, batch_size=self.args.train_batch_size)
+        writer = SummaryWriter(log_dir=self.args.model_dir)
+        if self.args.max_steps > 0:
+            t_total = self.args.max_steps
+            self.args.num_train_epochs = (
+                self.args.max_steps // (len(train_dataloader) // self.args.gradient_accumulation_steps) + 1
+            )
+        else:
+            t_total = len(train_dataloader) // self.args.gradient_accumulation_steps * self.args.num_train_epochs
+        print("check init")
+        results = self.evaluate("dev")
+        print(results)
+        # Prepare optimizer and schedule (linear warmup and decay)
+        no_decay = ["bias", "LayerNorm.weight"]
+        optimizer_grouped_parameters = [
+            {
+                "params": [p for n, p in self.model.named_parameters() if not any(nd in n for nd in no_decay)],
+                "weight_decay": self.args.weight_decay,
+            },
+            {
+                "params": [p for n, p in self.model.named_parameters() if any(nd in n for nd in no_decay)],
+                "weight_decay": 0.0,
+            },
+        ]
+        optimizer = AdamW(optimizer_grouped_parameters, lr=self.args.learning_rate, eps=self.args.adam_epsilon)
+        scheduler = get_linear_schedule_with_warmup(
+            optimizer, num_warmup_steps=self.args.warmup_steps, num_training_steps=t_total
+        )
+        # Train!
+        logger.info("***** Running training *****")
+        logger.info("  Num examples = %d", len(self.train_dataset))
+        logger.info("  Num Epochs = %d", self.args.num_train_epochs)
+        logger.info("  Total train batch size = %d", self.args.train_batch_size)
+        logger.info("  Gradient Accumulation steps = %d", self.args.gradient_accumulation_steps)
+        logger.info("  Total optimization steps = %d", t_total)
+        logger.info("  Logging steps = %d", self.args.logging_steps)
+        logger.info("  Save steps = %d", self.args.save_steps)
+        global_step = 0
+        tr_loss = 0.0
+        self.model.zero_grad()
+        train_iterator = trange(int(self.args.num_train_epochs), desc="Epoch")
+        early_stopping = EarlyStopping(patience=self.args.early_stopping, verbose=True)
+        for _ in train_iterator:
+            epoch_iterator = tqdm(train_dataloader, desc="Iteration", position=0, leave=True)
+            print("\nEpoch", _)
+            for step, batch in enumerate(epoch_iterator):
+                self.model.train()
+                batch = tuple(t.to(self.device) for t in batch)  # GPU or CPU
+                inputs = {
+                    "input_ids": batch[0],
+                    "attention_mask": batch[1],
+                    "intent_label_ids": batch[3],
+                    "slot_labels_ids": batch[4],
+                }
+                if self.args.model_type != "distilbert":
+                    inputs["token_type_ids"] = batch[2]
+                outputs = self.model(**inputs)
+                loss = outputs[0]
+                if self.args.gradient_accumulation_steps > 1:
+                    loss = loss / self.args.gradient_accumulation_steps
+                loss.backward()
+                tr_loss += loss.item()
+                if (step + 1) % self.args.gradient_accumulation_steps == 0:
+                    torch.nn.utils.clip_grad_norm_(self.model.parameters(), self.args.max_grad_norm)
+                    optimizer.step()
+                    scheduler.step()  # Update learning rate schedule
+                    self.model.zero_grad()
+                    global_step += 1
+                    if self.args.logging_steps > 0 and global_step % self.args.logging_steps == 0:
+                        print("\nTuning metrics:", self.args.tuning_metric)
+                        results = self.evaluate("dev")
+                        writer.add_scalar("Loss/validation", results["loss"], _)
+                        writer.add_scalar("Intent Accuracy/validation", results["intent_acc"], _)
+                        writer.add_scalar("Slot F1/validation", results["slot_f1"], _)
+                        writer.add_scalar("Mean Intent Slot", results["mean_intent_slot"], _)
+                        writer.add_scalar("Sentence Accuracy/validation", results["semantic_frame_acc"], _)
+                        early_stopping(results[self.args.tuning_metric], self.model, self.args)
+                        if early_stopping.early_stop:
+                            print("Early stopping")
+                            break
+                    # if self.args.save_steps > 0 and global_step % self.args.save_steps == 0:
+                    #     self.save_model()
+                if 0 < self.args.max_steps < global_step:
+                    epoch_iterator.close()
+                    break
+            if 0 < self.args.max_steps < global_step or early_stopping.early_stop:
+                train_iterator.close()
+                break
+            writer.add_scalar("Loss/train", tr_loss / global_step, _)
+        return global_step, tr_loss / global_step
+    def write_evaluation_result(self, out_file, results):
+        out_file = self.args.model_dir + "/" + out_file
+        w = open(out_file, "w", encoding="utf-8")
+        w.write("***** Eval results *****\n")
+        for key in sorted(results.keys()):
+            to_write = " {key} = {value}".format(key=key, value=str(results[key]))
+            w.write(to_write)
+            w.write("\n")
+        w.close()
+    def evaluate(self, mode):
+        if mode == "test":
+            dataset = self.test_dataset
+        elif mode == "dev":
+            dataset = self.dev_dataset
+        else:
+            raise Exception("Only dev and test dataset available")
+        eval_sampler = SequentialSampler(dataset)
+        eval_dataloader = DataLoader(dataset, sampler=eval_sampler, batch_size=self.args.eval_batch_size)
+        # Eval!
+        logger.info("***** Running evaluation on %s dataset *****", mode)
+        logger.info("  Num examples = %d", len(dataset))
+        logger.info("  Batch size = %d", self.args.eval_batch_size)
+        eval_loss = 0.0
+        nb_eval_steps = 0
+        intent_preds = None
+        slot_preds = None
+        out_intent_label_ids = None
+        out_slot_labels_ids = None
+        self.model.eval()
+        for batch in tqdm(eval_dataloader, desc="Evaluating"):
+            batch = tuple(t.to(self.device) for t in batch)
+            with torch.no_grad():
+                inputs = {
+                    "input_ids": batch[0],
+                    "attention_mask": batch[1],
+                    "intent_label_ids": batch[3],
+                    "slot_labels_ids": batch[4],
+                }
+                if self.args.model_type != "distilbert":
+                    inputs["token_type_ids"] = batch[2]
+                outputs = self.model(**inputs)
+                tmp_eval_loss, (intent_logits, slot_logits) = outputs[:2]
+                eval_loss += tmp_eval_loss.mean().item()
+            nb_eval_steps += 1
+            # Intent prediction
+            if intent_preds is None:
+                intent_preds = intent_logits.detach().cpu().numpy()
+                out_intent_label_ids = inputs["intent_label_ids"].detach().cpu().numpy()
+            else:
+                intent_preds = np.append(intent_preds, intent_logits.detach().cpu().numpy(), axis=0)
+                out_intent_label_ids = np.append(
+                    out_intent_label_ids, inputs["intent_label_ids"].detach().cpu().numpy(), axis=0
+                )
+            # Slot prediction
+            if slot_preds is None:
+                if self.args.use_crf:
+                    # decode() in `torchcrf` returns list with best index directly
+                    slot_preds = np.array(self.model.crf.decode(slot_logits))
+                else:
+                    slot_preds = slot_logits.detach().cpu().numpy()
+                out_slot_labels_ids = inputs["slot_labels_ids"].detach().cpu().numpy()
+            else:
+                if self.args.use_crf:
+                    slot_preds = np.append(slot_preds, np.array(self.model.crf.decode(slot_logits)), axis=0)
+                else:
+                    slot_preds = np.append(slot_preds, slot_logits.detach().cpu().numpy(), axis=0)
+                out_slot_labels_ids = np.append(
+                    out_slot_labels_ids, inputs["slot_labels_ids"].detach().cpu().numpy(), axis=0
+                )
+        eval_loss = eval_loss / nb_eval_steps
+        results = {"loss": eval_loss}
+        # Intent result
+        intent_preds = np.argmax(intent_preds, axis=1)
+        # Slot result
+        if not self.args.use_crf:
+            slot_preds = np.argmax(slot_preds, axis=2)
+        slot_label_map = {i: label for i, label in enumerate(self.slot_label_lst)}
+        out_slot_label_list = [[] for _ in range(out_slot_labels_ids.shape[0])]
+        slot_preds_list = [[] for _ in range(out_slot_labels_ids.shape[0])]
+        for i in range(out_slot_labels_ids.shape[0]):
+            for j in range(out_slot_labels_ids.shape[1]):
+                if out_slot_labels_ids[i, j] != self.pad_token_label_id:
+                    out_slot_label_list[i].append(slot_label_map[out_slot_labels_ids[i][j]])
+                    slot_preds_list[i].append(slot_label_map[slot_preds[i][j]])
+        total_result = compute_metrics(intent_preds, out_intent_label_ids, slot_preds_list, out_slot_label_list)
+        results.update(total_result)
+        logger.info("***** Eval results *****")
+        for key in sorted(results.keys()):
+            logger.info("  %s = %s", key, str(results[key]))
+        if mode == "test":
+            self.write_evaluation_result("eval_test_results.txt", results)
+        elif mode == "dev":
+            self.write_evaluation_result("eval_dev_results.txt", results)
+        return results
+    def save_model(self):
+        # Save model checkpoint (Overwrite)
+        if not os.path.exists(self.args.model_dir):
+            os.makedirs(self.args.model_dir)
+        model_to_save = self.model.module if hasattr(self.model, "module") else self.model
+        model_to_save.save_pretrained(self.args.model_dir)
+        # Save training arguments together with the trained model
+        torch.save(self.args, os.path.join(self.args.model_dir, "training_args.bin"))
+        logger.info("Saving model checkpoint to %s", self.args.model_dir)
+    def load_model(self):
+        # Check whether model exists
+        if not os.path.exists(self.args.model_dir):
+            raise Exception("Model doesn't exists! Train first!")
+        try:
+            self.model = self.model_class.from_pretrained(
+                self.args.model_dir,
+                args=self.args,
+                intent_label_lst=self.intent_label_lst,
+                slot_label_lst=self.slot_label_lst,
+            )
+            self.model.to(self.device)
+            logger.info("***** Model Loaded *****")
+        except Exception:
+            raise Exception("Some model files might be missing...")

utils.py ADDED Viewed

	@@ -0,0 +1,115 @@

+import logging
+import os
+import random
+import numpy as np
+import torch
+from model import JointPhoBERT, JointXLMR
+from seqeval.metrics import f1_score, precision_score, recall_score
+from transformers import (
+    AutoTokenizer,
+    RobertaConfig,
+    XLMRobertaConfig,
+    XLMRobertaTokenizer,
+)
+MODEL_CLASSES = {
+    "xlmr": (XLMRobertaConfig, JointXLMR, XLMRobertaTokenizer),
+    "phobert": (RobertaConfig, JointPhoBERT, AutoTokenizer),
+}
+MODEL_PATH_MAP = {
+    "xlmr": "xlm-roberta-base",
+    "phobert": "vinai/phobert-base",
+}
+def get_intent_labels(args):
+    return [
+        label.strip()
+        for label in open(os.path.join(args.data_dir, args.token_level, args.intent_label_file), "r", encoding="utf-8")
+    ]
+def get_slot_labels(args):
+    return [
+        label.strip()
+        for label in open(os.path.join(args.data_dir, args.token_level, args.slot_label_file), "r", encoding="utf-8")
+    ]
+def load_tokenizer(args):
+    return MODEL_CLASSES[args.model_type][2].from_pretrained(args.model_name_or_path)
+def init_logger():
+    logging.basicConfig(
+        format="%(asctime)s - %(levelname)s - %(name)s -   %(message)s",
+        datefmt="%m/%d/%Y %H:%M:%S",
+        level=logging.INFO,
+    )
+def set_seed(args):
+    random.seed(args.seed)
+    np.random.seed(args.seed)
+    torch.manual_seed(args.seed)
+    if not args.no_cuda and torch.cuda.is_available():
+        torch.cuda.manual_seed_all(args.seed)
+def compute_metrics(intent_preds, intent_labels, slot_preds, slot_labels):
+    assert len(intent_preds) == len(intent_labels) == len(slot_preds) == len(slot_labels)
+    results = {}
+    intent_result = get_intent_acc(intent_preds, intent_labels)
+    slot_result = get_slot_metrics(slot_preds, slot_labels)
+    sementic_result = get_sentence_frame_acc(intent_preds, intent_labels, slot_preds, slot_labels)
+    mean_intent_slot = (intent_result["intent_acc"] + slot_result["slot_f1"]) / 2
+    results.update(intent_result)
+    results.update(slot_result)
+    results.update(sementic_result)
+    results["mean_intent_slot"] = mean_intent_slot
+    return results
+def get_slot_metrics(preds, labels):
+    assert len(preds) == len(labels)
+    return {
+        "slot_precision": precision_score(labels, preds),
+        "slot_recall": recall_score(labels, preds),
+        "slot_f1": f1_score(labels, preds),
+    }
+def get_intent_acc(preds, labels):
+    acc = (preds == labels).mean()
+    return {"intent_acc": acc}
+def read_prediction_text(args):
+    return [text.strip() for text in open(os.path.join(args.pred_dir, args.pred_input_file), "r", encoding="utf-8")]
+def get_sentence_frame_acc(intent_preds, intent_labels, slot_preds, slot_labels):
+    """For the cases that intent and all the slots are correct (in one sentence)"""
+    # Get the intent comparison result
+    intent_result = intent_preds == intent_labels
+    # Get the slot comparision result
+    slot_result = []
+    for preds, labels in zip(slot_preds, slot_labels):
+        assert len(preds) == len(labels)
+        one_sent_result = True
+        for p, l in zip(preds, labels):
+            if p != l:
+                one_sent_result = False
+                break
+        slot_result.append(one_sent_result)
+    slot_result = np.array(slot_result)
+    semantic_acc = np.multiply(intent_result, slot_result).mean()
+    return {"semantic_frame_acc": semantic_acc}