Add new SentenceTransformer model

Browse files

Files changed (11) hide show

1_Pooling/config.json +10 -0
README.md +1638 -0
config.json +26 -0
config_sentence_transformers.json +10 -0
model.safetensors +3 -0
modules.json +20 -0
sentence_bert_config.json +4 -0
special_tokens_map.json +37 -0
tokenizer.json +0 -0
tokenizer_config.json +65 -0
vocab.txt +0 -0

1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "word_embedding_dimension": 384,
+  "pooling_mode_cls_token": false,
+  "pooling_mode_mean_tokens": true,
+  "pooling_mode_max_tokens": false,
+  "pooling_mode_mean_sqrt_len_tokens": false,
+  "pooling_mode_weightedmean_tokens": false,
+  "pooling_mode_lasttoken": false,
+  "include_prompt": true
+}

README.md ADDED Viewed

	@@ -0,0 +1,1638 @@

+---
+tags:
+- sentence-transformers
+- sentence-similarity
+- feature-extraction
+- generated_from_trainer
+- dataset_size:3742
+- loss:SoftmaxLoss
+base_model: sentence-transformers/all-MiniLM-L6-v2
+widget:
+- source_sentence: 'As the year draws to a close, we have seen the number of emerging
+    threats like advance phishing attacks from the Syrian Electronic Army, financial
+    malware and exploit kits, Cryptolocker ransomware infections, massive Bitcoin
+    theft, extensive privacy breach from NSA and many more.
+    The financial malware''s were the most popular threat this year. Money is always
+    a perfect motivation for attackers and cyber criminals who are continually targeting
+    financial institutions.
+    On Tuesday, Antivirus firm Symantec has released a Threat report, called "The
+    State of Financial Trojans: 2013", which revealed that over 1,400 financial institutions
+    have been targeted and compromised millions of computers around the globe and
+    the most targeted banks are in the US with 71.5% of all analyzed Trojans.
+    Financial institutions have been fighting against malware for the last ten years
+    to protect their customers and online transactions from threat. Over the time
+    the attackers adapted to these countermeasures and sophisticated banking Trojans
+    began to emerge.
+    According to the report, the number of infections of the most common financial
+    Trojans grew to 337 percent in the first nine months of 2013. Nearly 1,500 institutions
+    in 88 countries were potential targets during 2013.
+    The financial fraud marketplace is also increasingly organized and Cyber criminals
+    are using advanced Trojans to commit large scale attacks.
+    Attackers of all skill levels can enter the arena of financial fraud, as the underground
+    marketplace is a service industry that provides an abundance of resources. Those
+    who lack expertise can simply purchase what they need. For as little as $100,
+    an attacker can avail of a leaked Zeus or Spyeye equipped with Web-injects.
+    The modern financial Trojan is extremely flexible, supporting a range of functionality
+    designed to facilitate fraudulent transactions across a variety of services.
+    Two dominant attack strategies are:
+    Focused attack: This approach suits attackers with limited resources but also
+    scales well to larger operations. If the distribution is accurate and the target
+    institution has a sizeable client base, a focused attack can provide an adequate
+    supply of targets. Shylock, Bebloh and Tilon all use this approach exclusively.
+    Broad strokes: In this attack strategy, Trojans are set to target large numbers
+    of institutions. Tilon, Cridex, and Gameover adopt these tactics and Zeus also
+    uses this approach in its default configuration.
+    According to Symantec, the main reason for the surge is weak authentication practices:
+    Unfortunately, in many situations, security implementations adopted by financial
+    institutions are inadequate to defend against the modern financial Trojan. Institutions
+    are starting to adopt strong security measures like chipTAN, but the adoption
+    rate is slow. Institutions that persist with weaker security measures will continue
+    to be exploited by attackers.
+    They need to maintain constant vigilance, apply software updates, maintain an
+    awareness of new threats and deploy complementary security solutions that can
+    defend against evolving malware attacks.
+    '
+  sentences:
+  - 'As the year draws to a close, we have seen the number of emerging threats like
+    advance phishing attacks from the Syrian Electronic Army, financial malware and
+    exploit kits, Cryptolocker ransomware infections, massive Bitcoin theft, extensive
+    privacy breach from NSA and many more.
+    The financial malware''s were the most popular threat this year. Money is always
+    a perfect motivation for attackers and cyber criminals who are continually targeting
+    financial institutions.
+    On Tuesday, Antivirus firm Symantec has released a Threat report, called "The
+    State of Financial Trojans: 2013", which revealed that over 1,400 financial institutions
+    have been targeted and compromised millions of computers around the globe and
+    the most targeted banks are in the US with 71.5% of all analyzed Trojans.
+    Financial institutions have been fighting against malware for the last ten years
+    to protect their customers and online transactions from threat. Over the time
+    the attackers adapted to these countermeasures and sophisticated banking Trojans
+    began to emerge.
+    According to the report, the number of infections of the most common financial
+    Trojans grew to 337 percent in the first nine months of 2013. Nearly 1,500 institutions
+    in 88 countries were potential targets during 2013.
+    The financial fraud marketplace is also increasingly organized and Cyber criminals
+    are using advanced Trojans to commit large scale attacks.
+    Attackers of all skill levels can enter the arena of financial fraud, as the underground
+    marketplace is a service industry that provides an abundance of resources. Those
+    who lack expertise can simply purchase what they need. For as little as $100,
+    an attacker can avail of a leaked Zeus or Spyeye equipped with Web-injects.
+    The modern financial Trojan is extremely flexible, supporting a range of functionality
+    designed to facilitate fraudulent transactions across a variety of services.
+    Two dominant attack strategies are:
+    Focused attack: This approach suits attackers with limited resources but also
+    scales well to larger operations. If the distribution is accurate and the target
+    institution has a sizeable client base, a focused attack can provide an adequate
+    supply of targets. Shylock, Bebloh and Tilon all use this approach exclusively.
+    Broad strokes: In this attack strategy, Trojans are set to target large numbers
+    of institutions. Tilon, Cridex, and Gameover adopt these tactics and Zeus also
+    uses this approach in its default configuration.
+    According to Symantec, the main reason for the surge is weak authentication practices:
+    Unfortunately, in many situations, security implementations adopted by financial
+    institutions are inadequate to defend against the modern financial Trojan. Institutions
+    are starting to adopt strong security measures like chipTAN, but the adoption
+    rate is slow. Institutions that persist with weaker security measures will continue
+    to be exploited by attackers.
+    They need to maintain constant vigilance, apply software updates, maintain an
+    awareness of new threats and deploy complementary security solutions that can
+    defend against evolving malware attacks.
+    '
+  - 'While Windows users are currently in fear of getting their systems hijacked by
+    the WannaCry ransomware outbreak, Apple users are sitting relaxed, thinking that
+    malware attacks are something that happens to Windows users, and not Apple.
+    But you are mistaken – Apple products are also not immune to the hack attacks
+    and malware infections, as an ebook can hack your Mac, iPhone, and iPad.
+    Apple on Monday pushed out software updates for iOS, macOS, Safari, tvOS, iCloud,
+    iTunes, and watchOS to fix a total of 67 unique security vulnerabilities, many
+    of which allows attackers to perform remote code execution on an affected system.
+    iOS is 10.3.2 for iPhone, iPad, and iPod
+    Apple''s mobile operating system iOS 10.3.2 for the iPhone, iPad and iPod touch
+    addresses 41 security flaws, 23 of which resides in WebKit, including 17 remote
+    code execution and 5 cross-site scripting (XSS) vulnerabilities.
+    Besides this, iOS 10.3.2 also addresses a pair of flaws in iBooks for iOS (CVE-2017-2497,
+    CVE-2017-6981) that could allow e-books to open arbitrary websites and execute
+    malicious code with root privileges.
+    Other flaws addressed in iOS 10.3.2 include a memory corruption issue in AVE Video
+    Encoder that could allow a malicious application to gain kernel-level privileges,
+    and a certificate validation issue in the certificate trust policy for handling
+    of untrusted certificates.
+    Apple users can install iOS 10.3.2 by connecting their iOS devices to iTunes or
+    downloading it directly by going to the Settings → General → Software Update.
+    macOS Sierra 10.12.5 for El Capitan and Yosemite
+    Apple''s Mac operating system macOS Sierra 10.12.5 addresses a total of 37 vulnerabilities,
+    including a pair of bugs in iBook that allow the execution of arbitrary code with
+    root privileges, and a separate bug in iBook that allows an application to escape
+    its secure sandbox.
+    Other flaws addressed in macOS Sierra 10.12.5 include a Wi-Fi networking issue
+    that allows the theft of network credentials, elevation of privilege bugs in both
+    the Intel and Nvidia graphics drivers, and four different arbitrary code execution
+    flaws in SQLite.
+    Mac users can download the update through the App Store → Updates. Alternatively,
+    macOS Sierra users can be download Sierra 10.12.5 as a stand-alone update, OS
+    X El Capitan users can download the update here, and OS X Yosemite users can get
+    the security update here.
+    Safari 10.1.1 for Apple Browser
+    Safari 10.1.1 addresses a total of 26 security issues, 23 of which resides in
+    WebKit, many of which are also patched in the iOS 10.3.2.
+    Rest three vulnerabilities are patched in the Safari browser itself.
+    The Safari 10.1.1 update can be downloaded by going to the App Store → Updates
+    on El Capitan and Yosemite systems.
+    watchOS 3.2.2 for Apple Watch
+    Apple Watch users should install watchOS 3.2.2 that patches a total of 12 security
+    vulnerabilities, four of which could be used by attackers to execute remote code
+    execution on the affected device.
+    Users of Apple Watch can download watchOS 3.2.2 by connecting their watch to its
+    charger, and opening the Apple Watch app → My Watch tab → General → Software Update
+    on their iPhone.
+    tvOS 10.2.1 for Apple TV
+    Apple has also released tvOS 10.2.1 to patch a total of 23 vulnerabilities, 12
+    of which resides in WebKit engine that could allow an attacker to perform cross-site
+    scripting and remote code execution attacks on a target device.
+    The tvOS 10.2.1 update can be downloaded directly from the Apple TV by going to
+    Settings → System → Update Software.
+    iTunes 12.6.1 for Windows and iCloud for Windows 6.2.1
+    Meanwhile, Apple also released patches for Windows users using iTunes and iCloud.
+    Both iTunes 12.6.1 and iCloud 6.2.1 patches a single remote code execution bug
+    in WebKit for Windows 7 and later.
+    Apple users are recommended to update all their operating systems for Apple products
+    and Safari as soon as possible before cyber criminals exploited them. Patches
+    are available through automatic updates.
+    '
+  - 'A really bad year for the world''s second-largest email service provider, Yahoo
+    Mail! The company announced today, ''we identified a coordinated effort to gain
+    unauthorized access to Yahoo Mail accounts'', user names and passwords of its
+    email customers have been stolen and are used to access multiple accounts.
+    Yahoo did not say how many accounts have been affected, and neither they are sure
+    about the source of the leaked users'' credentials. It appears to have come from
+    a third party database being compromised, and not an infiltration of Yahoo''s
+    own servers.
+    "We have no evidence that they were obtained directly from Yahoo''s systems. Our
+    ongoing investigation shows that malicious computer software used the list of
+    usernames and passwords to access Yahoo Mail accounts. The information sought
+    in the attack seems to be names and email addresses from the affected accounts''
+    most recent sent emails."
+    For now, Yahoo is taking proactive actions to protect their affected users, "We
+    are resetting passwords on impacted accounts and we are using second sign-in verification
+    to allow users to re-secure their accounts. Impacted users will be prompted (if
+    not, already) to change their password and may receive an email notification or
+    an SMS text if they have added a mobile number to their account."
+    People frequently use the same passwords on multiple accounts, so possibly hackers
+    are brute-forcing Yahoo accounts with the user credentials stolen from other data
+    breaches.
+    Yahoo users can prevent account hijacks by using a strong and unique password.
+    You can use ''Random strong password generator'' feature of DuckDuckGo search
+    engine to get a unique & strong password.
+    Users are also recommended to enable two-factor authentication, which requires
+    a code texted to the legitimate user''s mobile phone whenever a login attempt
+    is made from a new computer.
+    Yahoo! was hacked in July 2012, with attackers stealing 450,000 email addresses
+    and passwords from a Yahoo! contributor network.
+    Readers can also download two free Whitepaper related to the Email and account
+    security:
+    Cloud-Based Email Archiving
+    Email Data Loss Prevention
+    Well, Yahoo is now working with federal law enforcement as a part of its investigation.
+    '
+- source_sentence: 'Security researchers have spotted a new malware campaign in the
+    wild that spreads an advanced botnet malware by leveraging at least three recently
+    disclosed vulnerabilities in Microsoft Office.
+    Dubbed Zyklon, the fully-featured malware has resurfaced after almost two years
+    and primarily found targeting telecommunications, insurance and financial services.
+    Active since early 2016, Zyklon is an HTTP botnet malware that communicates with
+    its command-and-control servers over Tor anonymising network and allows attackers
+    to remotely steal keylogs, sensitive data, like passwords stored in web browsers
+    and email clients.
+    Zyklon malware is also capable of executing additional plugins, including secretly
+    using infected systems for DDoS attacks and cryptocurrency mining.
+    Different versions of the Zyklon malware has previously been found being advertised
+    on a popular underground marketplace for $75 (normal build) and $125 ( Tor-enabled
+    build).
+    According to a recently published report by FireEye, the attackers behind the
+    campaign are leveraging three following vulnerabilities in Microsoft Office that
+    execute a PowerShell script on the targeted computers to download the final payload
+    from its C&C server.
+    1) .NET Framework RCE Vulnerability (CVE-2017-8759)—this remote code execution
+    vulnerability exists when Microsoft .NET Framework processes untrusted input,
+    allowing an attacker to take control of an affected system by tricking victims
+    into opening a specially crafted malicious document file sent over an email. Microsoft
+    already released a security patch for this flaw in September updates.
+    2) Microsoft Office RCE Vulnerability (CVE-2017-11882)—it''s a 17-year-old memory
+    corruption flaw that Microsoft patched in November patch update allows a remote
+    attacker to execute malicious code on the targeted systems without requiring any
+    user interaction after opening a malicious document.
+    3) Dynamic Data Exchange Protocol (DDE Exploit)—this technique allows attackers
+    to leverage a built-in feature of Microsoft Office, called DDE, to perform code
+    execution on the targeted device without requiring Macros to be enabled or memory
+    corruption.
+    As explained by the researchers, attackers are actively exploiting these three
+    vulnerabilities to deliver Zyklon malware using spear phishing emails, which typically
+    arrives with an attached ZIP file containing a malicious Office doc file.
+    Once opened, the malicious doc file equipped with one of these vulnerabilities
+    immediately runs a PowerShell script, which eventually downloads the final payload,
+    i.e., Zyklon HTTP malware, onto the infected computer.
+    "In all these techniques, the same domain is used to download the next level payload
+    (Pause.ps1), which is another PowerShell script that is Base64 encoded," the FireEye
+    researchers said.
+    "The Pause.ps1 script is responsible for resolving the APIs required for code
+    injection. It also contains the injectable shellcode."
+    "The injected code is responsible for downloading the final payload from the server.
+    The final stage payload is a PE executable compiled with .Net framework."
+    Interestingly, the PowerShell script connects to a dotless IP address (example:
+    https://3627732942) to download the final payload.
+    What is Dotless IP Address? If you are unaware, dotless IP addresses, sometimes
+    referred as ''Decimal Address,'' are decimal values of IPv4 addresses (represented
+    as dotted-quad notation). Almost all modern web browsers resolve decimal IP address
+    to its equivalent IPV4 address when opened with "https://" following the decimal
+    value.
+    For example, Google''s IP address 216.58.207.206 can also be represented as https://3627732942
+    in decimal values (Try this online converter).
+    The best way to protect yourself and your organisation from such malware attacks
+    are always to be suspicious of any uninvited document sent via an email and never
+    click on links inside those documents unless adequately verifying the source.
+    Most importantly, always keep your software and systems up-to-date, as threat
+    actors incorporate recently discovered, but patched, vulnerabilities in popular
+    software—Microsoft Office, in this case—to increase the potential for successful
+    infections.
+    '
+  sentences:
+  - 'India-linked highly targeted mobile malware campaign, first unveiled two weeks
+    ago, has been found to be part of a broader campaign targeting multiple platforms,
+    including windows devices and possibly Android as well.
+    As reported in our previous article, earlier this month researchers at Talos threat
+    intelligence unit discovered a group of Indian hackers abusing mobile device management
+    (MDM) service to hijack and spy on a few targeted iPhone users in India.
+    Operating since August 2015, the attackers have been found abusing MDM service
+    to remotely install malicious versions of legitimate apps, including Telegram,
+    WhatsApp, and PrayTime, onto targeted iPhones.
+    These modified apps have been designed to secretly spy on iOS users, and steal
+    their real-time location, SMS, contacts, photos and private messages from third-party
+    chatting applications.
+    During their ongoing investigation, Talos researchers identified a new MDM infrastructure
+    and several malicious binaries – designed to target victims running Microsoft
+    Windows operating systems – hosted on the same infrastructure used in previous
+    campaigns.
+    Ios-update-whatsapp[.]com (new)
+    Wpitcher[.]com
+    Ios-certificate-update.com
+    "We know that the MDM and the Windows services were up and running on the same
+    C2 server in May 2018," researchers said in a blog post published today.
+    "Some of the C2 servers are still up and running at this time. The Apache setup
+    is very specific, and perfectly matched the Apache setup of the malicious IPA
+    apps."
+    Possible Connections with "Bahamut Hacking Group"
+    Besides this, researchers also found some potential similarities that link this
+    campaign with an old hacking group, dubbed "Bahamut," an advanced threat actor
+    who was previously targeting Android devices using similar MDM technique as used
+    in the latest iOS malware campaign.
+    The newly identified MDM infrastructure, which was created in January 2018, and
+    used from January to March of this year, targeted two Indian devices and one located
+    in Qatar with a British phone number.
+    According to the researchers, Bahamut also targeted similar Qatar-based individuals
+    during their Android malware campaign, as detailed by Bellingcat in a blog post.
+    "Bahamut shared a domain name with one of the malicious iOS applications mentioned
+    in our previous post," researchers said.
+    "The new MDM platform we identified has similar victimology with Middle Eastern
+    targets, namely Qatar, using a U.K. mobile number issued from LycaMobile. Bahamut
+    targeted similar Qatar-based individuals during their campaign."
+    Apart from distributing modified Telegram and WhatsApp apps with malicious functionalities,
+    the newly-identified server also distributes modified versions of Safari browser
+    and IMO video chatting app to steal more personal information on victims.
+    Attackers Using Malicious Safari Browser to Steal Login Credentials
+    According to the researchers, the malicious Safari browser has been pre-configured
+    to automatically exfiltrate the username and the password of the users for a variety
+    of other web services, Yahoo, Rediff, Amazon, Google, Reddit, Baidu, ProtonMail,
+    Zoho, Tutanota and more.
+    "The malware continuously monitors a web page, seeking out the HTML form fields
+    that hold the username and password as the user types them in to steal credentials.
+    The names of the inspected HTML fields are embedded into the app alongside the
+    domain names," the researchers said.
+    The malicious browser contains three malicious plugins—Add Bookmark, Add To Favourites,
+    and Add to Reading List—that just like the other apps, send stolen data to a remote
+    attacker-controlled server.
+    At this time, it''s unclear who is behind the campaign, who was targeted in the
+    campaign, and what were the motives behind the attack, but the technical elements
+    suggest the attackers are operating from India, and are well-funded.
+    Researchers said that those infected with this kind of malware need to enroll
+    their devices, which means "they should be on the lookout at all times to avoid
+    accidental enrollment."
+    The best way to avoid being a victim to such attacks is to always download apps
+    from official app store.
+    '
+  - 'Security researchers have spotted a new malware campaign in the wild that spreads
+    an advanced botnet malware by leveraging at least three recently disclosed vulnerabilities
+    in Microsoft Office.
+    Dubbed Zyklon, the fully-featured malware has resurfaced after almost two years
+    and primarily found targeting telecommunications, insurance and financial services.
+    Active since early 2016, Zyklon is an HTTP botnet malware that communicates with
+    its command-and-control servers over Tor anonymising network and allows attackers
+    to remotely steal keylogs, sensitive data, like passwords stored in web browsers
+    and email clients.
+    Zyklon malware is also capable of executing additional plugins, including secretly
+    using infected systems for DDoS attacks and cryptocurrency mining.
+    Different versions of the Zyklon malware has previously been found being advertised
+    on a popular underground marketplace for $75 (normal build) and $125 ( Tor-enabled
+    build).
+    According to a recently published report by FireEye, the attackers behind the
+    campaign are leveraging three following vulnerabilities in Microsoft Office that
+    execute a PowerShell script on the targeted computers to download the final payload
+    from its C&C server.
+    1) .NET Framework RCE Vulnerability (CVE-2017-8759)—this remote code execution
+    vulnerability exists when Microsoft .NET Framework processes untrusted input,
+    allowing an attacker to take control of an affected system by tricking victims
+    into opening a specially crafted malicious document file sent over an email. Microsoft
+    already released a security patch for this flaw in September updates.
+    2) Microsoft Office RCE Vulnerability (CVE-2017-11882)—it''s a 17-year-old memory
+    corruption flaw that Microsoft patched in November patch update allows a remote
+    attacker to execute malicious code on the targeted systems without requiring any
+    user interaction after opening a malicious document.
+    3) Dynamic Data Exchange Protocol (DDE Exploit)—this technique allows attackers
+    to leverage a built-in feature of Microsoft Office, called DDE, to perform code
+    execution on the targeted device without requiring Macros to be enabled or memory
+    corruption.
+    As explained by the researchers, attackers are actively exploiting these three
+    vulnerabilities to deliver Zyklon malware using spear phishing emails, which typically
+    arrives with an attached ZIP file containing a malicious Office doc file.
+    Once opened, the malicious doc file equipped with one of these vulnerabilities
+    immediately runs a PowerShell script, which eventually downloads the final payload,
+    i.e., Zyklon HTTP malware, onto the infected computer.
+    "In all these techniques, the same domain is used to download the next level payload
+    (Pause.ps1), which is another PowerShell script that is Base64 encoded," the FireEye
+    researchers said.
+    "The Pause.ps1 script is responsible for resolving the APIs required for code
+    injection. It also contains the injectable shellcode."
+    "The injected code is responsible for downloading the final payload from the server.
+    The final stage payload is a PE executable compiled with .Net framework."
+    Interestingly, the PowerShell script connects to a dotless IP address (example:
+    https://3627732942) to download the final payload.
+    What is Dotless IP Address? If you are unaware, dotless IP addresses, sometimes
+    referred as ''Decimal Address,'' are decimal values of IPv4 addresses (represented
+    as dotted-quad notation). Almost all modern web browsers resolve decimal IP address
+    to its equivalent IPV4 address when opened with "https://" following the decimal
+    value.
+    For example, Google''s IP address 216.58.207.206 can also be represented as https://3627732942
+    in decimal values (Try this online converter).
+    The best way to protect yourself and your organisation from such malware attacks
+    are always to be suspicious of any uninvited document sent via an email and never
+    click on links inside those documents unless adequately verifying the source.
+    Most importantly, always keep your software and systems up-to-date, as threat
+    actors incorporate recently discovered, but patched, vulnerabilities in popular
+    software—Microsoft Office, in this case—to increase the potential for successful
+    infections.
+    '
+  - 'Attention WordPress users!
+    Your website could easily get hacked if you are using "Ultimate Addons for Beaver
+    Builder," or "Ultimate Addons for Elementor" and haven''t recently updated them
+    to the latest available versions.
+    Security researchers have discovered a critical yet easy-to-exploit authentication
+    bypass vulnerability in both widely-used premium WordPress plugins that could
+    allow remote attackers to gain administrative access to sites without requiring
+    any password.
+    What''s more worrisome is that opportunistic attackers have already started exploiting
+    this vulnerability in the wild within 2 days of its discovery in order to compromise
+    vulnerable WordPress websites and install a malicious backdoor for later access.
+    Both vulnerable plugins, made by software development company Brainstorm Force,
+    are currently powering over hundreds of thousands of WordPress websites using
+    Elementor and Beaver Builder frameworks, helping website admins and designers
+    extend the functionality of their websites with more widgets, modules, page templates.
+    Discovered by researchers at web security service MalCare, the vulnerability resides
+    in the way both plugins let WordPress account holders, including administrators,
+    authenticate via Facebook and Google login mechanisms.
+    Image credit: WebARX
+    According to the vulnerability''s advisory, due to lack of checks in the authentication
+    method when a user login via Facebook or Google, vulnerable plugins can be tricked
+    into allowing malicious users to login as any other targeted user without requiring
+    any password.
+    "However, the Facebook and Google authentication methods did not verify the token
+    returned by Facebook and Google, and since they don''t require a password, there
+    was no password check," explained WebARX researchers, who also analysed the flaw
+    and confirmed its active exploitation.
+    "To exploit the vulnerability, the hacker needs to use the email ID of an admin
+    user of the site. In most cases, this information can be retrieved fairly easily,"
+    MalCare said.
+    In an email to The Hacker News, WebARX confirmed that attackers are abusing this
+    flaw to install a fake SEO stats plugin after uploading a tmp.zip file on the
+    targeted WordPress server, which eventually drops a wp-xmlrpc.php backdoor file
+    to the root directory of the vulnerable site.
+    MalCare discovered this vulnerability on Wednesday that affects below-listed versions
+    of the plugins and reported it to the developers on the same day, who then quickly
+    addressed the issue and released patched versions of both within just 7 hours.
+    Ultimate Addons for Elementor <= 1.20.0
+    Ultimate Addons for Beaver Builder <= 1.24.0
+    The authentication bypass vulnerability has been patched with the release of "Ultimate
+    Addons for Elementor version 1.20.1" and "Ultimate Addons for Beaver Builder version
+    1.24.1," which affected websites are highly recommended to install as soon as
+    possible.
+    '
+- source_sentence: 'Exclusive — If you have not updated your website to the latest
+    WordPress version 5.0.3, it''s a brilliant idea to upgrade the content management
+    software of your site now. From now, I mean immediately.
+    Cybersecurity researchers at RIPS Technologies GmbH today shared their latest
+    research with The Hacker News, revealing the existence of a critical remote code
+    execution vulnerability that affects all previous versions of WordPress content
+    management software released in the past 6 years.
+    The remote code execution attack, discovered and reported to the WordPress security
+    team late last year, can be exploited by a low privileged attacker with at least
+    an "author" account using a combination of two separate vulnerabilities—Path Traversal
+    and Local File Inclusion—that reside in the WordPress core.
+    The requirement of at least an author account reduces the severity of this vulnerability
+    to some extent, which could be exploited by a rogue content contributor or an
+    attacker who somehow manages to gain author''s credential using phishing, password
+    reuse or other attacks.
+    "An attacker who gains access to an account with at least author privileges on
+    a target WordPress site can execute arbitrary PHP code on the underlying server,
+    leading to a full remote takeover," Scannell says.
+    Video Demonstration — Here''s How the Attack Works
+    According to Simon Scannell, a researcher at RIPS Technologies GmbH, the attack
+    takes advantage of the way WordPress image management system handles Post Meta
+    entries used to store description, size, creator, and other meta information of
+    uploaded images.
+    Scannell found that a rogue or compromised author account can modify any entries
+    associated with an image and set them to arbitrary values, leading to the Path
+    Traversal vulnerability.
+    "The idea is to set _wp_attached_file to evil.jpg?shell.php, which would lead
+    to an HTTP request being made to the following URL: https://targetserver.com/wp-content/uploads/evil.jpg?shell.php,"
+    Scannell explains.
+    And, "it is still possible to plant the resulting image into any directory by
+    using a payload such as evil.jpg?/../../evil.jpg."
+    The Path Traversal flaw in combination with a local file inclusion flaw in theme
+    directory could then allow the attacker to execute arbitrary code on the targeted
+    server.
+    The attack, as shown in the proof-of-concept video shared by the researcher, can
+    be executed within seconds to gain complete control over a vulnerable WordPress
+    blog.
+    According to Scannell, the code execution attack became non-exploitable in WordPress
+    versions 5.0.1 and 4.9.9 after patch for another vulnerability was introduced
+    which prevented unauthorized users from setting arbitrary Post Meta entries.
+    However, the Path Traversal flaw is still unpatched even in the latest WordPress
+    version and can be exploited if any installed 3rd-party plugin incorrectly handles
+    Post Meta entries.
+    Scannell confirmed that the next release of WordPress would include a fix to completely
+    address the issue demonstrated by the researcher.
+    '
+  sentences:
+  - 'Exclusive — If you have not updated your website to the latest WordPress version
+    5.0.3, it''s a brilliant idea to upgrade the content management software of your
+    site now. From now, I mean immediately.
+    Cybersecurity researchers at RIPS Technologies GmbH today shared their latest
+    research with The Hacker News, revealing the existence of a critical remote code
+    execution vulnerability that affects all previous versions of WordPress content
+    management software released in the past 6 years.
+    The remote code execution attack, discovered and reported to the WordPress security
+    team late last year, can be exploited by a low privileged attacker with at least
+    an "author" account using a combination of two separate vulnerabilities—Path Traversal
+    and Local File Inclusion—that reside in the WordPress core.
+    The requirement of at least an author account reduces the severity of this vulnerability
+    to some extent, which could be exploited by a rogue content contributor or an
+    attacker who somehow manages to gain author''s credential using phishing, password
+    reuse or other attacks.
+    "An attacker who gains access to an account with at least author privileges on
+    a target WordPress site can execute arbitrary PHP code on the underlying server,
+    leading to a full remote takeover," Scannell says.
+    Video Demonstration — Here''s How the Attack Works
+    According to Simon Scannell, a researcher at RIPS Technologies GmbH, the attack
+    takes advantage of the way WordPress image management system handles Post Meta
+    entries used to store description, size, creator, and other meta information of
+    uploaded images.
+    Scannell found that a rogue or compromised author account can modify any entries
+    associated with an image and set them to arbitrary values, leading to the Path
+    Traversal vulnerability.
+    "The idea is to set _wp_attached_file to evil.jpg?shell.php, which would lead
+    to an HTTP request being made to the following URL: https://targetserver.com/wp-content/uploads/evil.jpg?shell.php,"
+    Scannell explains.
+    And, "it is still possible to plant the resulting image into any directory by
+    using a payload such as evil.jpg?/../../evil.jpg."
+    The Path Traversal flaw in combination with a local file inclusion flaw in theme
+    directory could then allow the attacker to execute arbitrary code on the targeted
+    server.
+    The attack, as shown in the proof-of-concept video shared by the researcher, can
+    be executed within seconds to gain complete control over a vulnerable WordPress
+    blog.
+    According to Scannell, the code execution attack became non-exploitable in WordPress
+    versions 5.0.1 and 4.9.9 after patch for another vulnerability was introduced
+    which prevented unauthorized users from setting arbitrary Post Meta entries.
+    However, the Path Traversal flaw is still unpatched even in the latest WordPress
+    version and can be exploited if any installed 3rd-party plugin incorrectly handles
+    Post Meta entries.
+    Scannell confirmed that the next release of WordPress would include a fix to completely
+    address the issue demonstrated by the researcher.
+    '
+  - 'Android Security Squad, the China-based group that uncovered a second Android
+    master key vulnerability that might be abused to modify smartphone apps without
+    breaking their digital signatures.
+    The whole point of digitally signing a document or file is to prove the file hasn''t
+    been modified. The process uses a form of public-key cryptography. In Chinese
+    version of hacking attack, malicious code can be added into the file headers,
+    but the method is limited because targeted files need to be smaller than 64K in
+    size.
+    APK files are packed using a version of the widespread ZIP archiving algorithm.
+    Most ZIP implementations won''t permit two same-named files in one archive, but
+    the algorithm itself doesn''t forbid that possibility. So basically, two versions
+    of the classes.dex file are placed inside of the package, the original and a hacked
+    alternative.
+    When checking an app''s digital signature, the Android OS looks at the first matching
+    file, but when actually executing and launching the file, it grabs the last one.
+    To Trojanize an app, then, all you need to do is shoehorn your malicious code
+    into it using a name that already exists within the app.
+    The flaw is very similar to the first master key vulnerability recently announced
+    by researchers from mobile security firm Bluebox Security. According to BlueBox,
+    99% of Android devices are vulnerable to this attack. Google has already patched
+    the flaw and posted it to the Android Open Source Project (AOSP).
+    You can use ReKey, a free mobile app that''s designed to patch the Android master
+    key vulnerability that''s present in an estimated 900 million devices that run
+    Android and that could be exploited by attackers to take full control of a device.
+    Always get your apps from legitimate sources, always check to make sure the developer
+    name is valid, and configure your phone so it doesn''t permit installing apps
+    from unknown sources.
+    '
+  - 'Cyber criminals are using popular note-taking app Evernote as Command-and-Control
+    Server to give commands to the malware installed on infected PCs using botnets.
+    TrendMicro uncovered a malware detected as "BKDR_VERNOT.A" tried to communicate
+    with Command-and-Control Server using Evernote.
+    Malware delivered via an executable file that installs the malware as a dynamic-link
+    library. The installer then ties the DLL into a legitimate running process, hiding
+    it from casual detection. Once installed, BKDR_VERNOT.A can perform several backdoor
+    commands such as downloading, executing, and renaming files. It then gathers information
+    from the infected system, including details about its OS, timezone, user name,
+    computer name, registered owner and organization.
+    Researchers also pointed out that the backdoor may have also used Evernote as
+    a location to upload stolen data. "Unfortunately, during our testing, it was not
+    able to login using the credentials embedded in the malware. This is possibly
+    a security measure imposed by Evernote following its recent hacking issue."
+    "Though this is a clever maneuver to avoid detection, this is not the first time
+    that a legitimate service like Evernote was used as a method of evasion."
+    Like Evernote, Google Docs, Twitter and others have been misused in the past.
+    '
+- source_sentence: 'U.S. has the top Security Agencies like NSA, FBI to tackle cyber
+    crime and terrorism with their high profile surveillance technologies, but even
+    after that U.S is proudly hosting 44% of the entire cloud based malware distribution.
+    With the enhancement in Internet technology, Cloud computing has shown the possibility
+    of existence and now has become an essential gradient for any Internet Identity.
+    Cloud services are designed in such a way that it is easy to maintain, use, configure
+    and can be scaled depending upon the requirement of the service being provided
+    using the CLOUD technology with cost effective manner.
+    Due to the Easy and Cost effective alternative of traditional computing, Malware
+    writers are using the big cloud hosting platforms to quickly and effectively serve
+    malware to Internet users, allowing them to bypass detection and geographic blacklisting
+    by serving from a trusted provider.
+    Hiding behind trusted domains and names is not something new. According to recently
+    published SERT Q4 2013 Threat Intelligence Report, the malware distributors are
+    using Cloud Services from Amazon, GoDaddy and Google like a legitimate customer,
+    allowing them to infect millions of computers and vast numbers of enterprise systems.
+    The Cloud-based hosting services let malware distributors to avoid the detection
+    because repeatedly changes IP addresses and domain names to avoid detection. Amazon
+    and GoDaddy were identified as the top malware-hosting providers, with a 16 percent
+    and a 14 percent share, respectively.
+    Major Additional findings include:
+    United States hosts 4.6 times more malware than the next leading country.
+    58% of malicious files obtained were identified as HTML files, 26% were directly
+    executable.
+    Many malware developers and distributors are utilizing social engineering tactics,
+    including the use of trusted keywords and services, to evade detection and increase
+    potential infection counts.
+    A single malicious domain was spread across 20 countries, 67 providers and 199
+    unique IPs evade detection.
+    The SERT Research team collected a large number of samples from more than 12,000
+    Registrars, 22,000 ISPs (Internet Service Providers) and tested all malicious
+    packages with more than 40 antivirus engines, output of which is concluded below:
+    The majority of the top malware sites is domains commonly associated with the
+    Potentially Unwanted Applications (PUA), more commonly known as adware, type of
+    malware distributions.
+    "Researchers found that a significant portion of the malware sampled consisted
+    of Microsoft Windows 32-bit Portable Executable (PE32) files being used to distribute
+    pay-per-install applications known as potentially unwanted applications (PUAs)."
+    The report claimed that these malware is undetectable from over 40 anti-virus
+    engines, that can act as a gateway for exploits and more than half of malware
+    found being distributed by HTML web pages.
+    '
+  sentences:
+  - 'U.S. has the top Security Agencies like NSA, FBI to tackle cyber crime and terrorism
+    with their high profile surveillance technologies, but even after that U.S is
+    proudly hosting 44% of the entire cloud based malware distribution.
+    With the enhancement in Internet technology, Cloud computing has shown the possibility
+    of existence and now has become an essential gradient for any Internet Identity.
+    Cloud services are designed in such a way that it is easy to maintain, use, configure
+    and can be scaled depending upon the requirement of the service being provided
+    using the CLOUD technology with cost effective manner.
+    Due to the Easy and Cost effective alternative of traditional computing, Malware
+    writers are using the big cloud hosting platforms to quickly and effectively serve
+    malware to Internet users, allowing them to bypass detection and geographic blacklisting
+    by serving from a trusted provider.
+    Hiding behind trusted domains and names is not something new. According to recently
+    published SERT Q4 2013 Threat Intelligence Report, the malware distributors are
+    using Cloud Services from Amazon, GoDaddy and Google like a legitimate customer,
+    allowing them to infect millions of computers and vast numbers of enterprise systems.
+    The Cloud-based hosting services let malware distributors to avoid the detection
+    because repeatedly changes IP addresses and domain names to avoid detection. Amazon
+    and GoDaddy were identified as the top malware-hosting providers, with a 16 percent
+    and a 14 percent share, respectively.
+    Major Additional findings include:
+    United States hosts 4.6 times more malware than the next leading country.
+    58% of malicious files obtained were identified as HTML files, 26% were directly
+    executable.
+    Many malware developers and distributors are utilizing social engineering tactics,
+    including the use of trusted keywords and services, to evade detection and increase
+    potential infection counts.
+    A single malicious domain was spread across 20 countries, 67 providers and 199
+    unique IPs evade detection.
+    The SERT Research team collected a large number of samples from more than 12,000
+    Registrars, 22,000 ISPs (Internet Service Providers) and tested all malicious
+    packages with more than 40 antivirus engines, output of which is concluded below:
+    The majority of the top malware sites is domains commonly associated with the
+    Potentially Unwanted Applications (PUA), more commonly known as adware, type of
+    malware distributions.
+    "Researchers found that a significant portion of the malware sampled consisted
+    of Microsoft Windows 32-bit Portable Executable (PE32) files being used to distribute
+    pay-per-install applications known as potentially unwanted applications (PUAs)."
+    The report claimed that these malware is undetectable from over 40 anti-virus
+    engines, that can act as a gateway for exploits and more than half of malware
+    found being distributed by HTML web pages.
+    '
+  - 'Windows 8 will be challenge for Malware writers
+    Microsoft™s security researcher believe that upcoming operating system, Windows
+    8 is a step forward in security and Windows 8 will be far better at protecting
+    against malware than it''s predecessors.
+    Chris Valasek, a senior security research scientist at development testing firm
+    Coverity, began examining the security features of Windows 8 last autumn, before
+    the consumer previews of the upcoming revamp of the new Microsoft OS came out.
+    "There are always going to be vulnerabilities but you can make it difficult to
+    leverage vulnerabilities to write exploits." One major change between Windows
+    7 and 8 is the addition of more exploit-mitigation technologies, however. Windows
+    Memory Managers (specifically the Windows Heap Manager and Windows Kernel Pool
+    Allocator) are designed to make it far harder for attackers to exploit buffer-overflow
+    vulnerabilities and the like to push malware onto vulnerable systems.
+    The "security sandbox" for applications for Windows 8 will also be a great step
+    forward. "These new Windows 8 Apps will be contained by a much more restrictive
+    security sandbox, which is a mechanism to prevent programs from performing certain
+    actions," Valasek explains.
+    "This new App Container provides the operating system with a way to make more
+    fine-grained decisions on what actions certain applications can perform, instead
+    of relying on the more broad ''Integrity Levels'' that debuted in Windows Vista/7.
+    Windows 8 also comes with a new version of Internet Explorer, Microsoft''s browser
+    software. Internet Explorer 10 will come with a mode that disables support for
+    third-party plug-ins such as Flash and Java.
+    '
+  - 'Ransomware, a threat to internet users that continues to grow in popularity with
+    cyber criminals due to its success and monetary potential. This is nothing new
+    and to be expected. I have noticed many discussions on underground hacking forums
+    about "How to create Ransomware like Cryptolocker malware" or "Malware - hacking
+    tool-kit with ransomware features".
+    Security intelligence provider, IntelCrawler has discovered a new ransomware variant
+    called Locker that demands $150 (£92) to restore files that it has encrypted.
+    Like Cryptolocker, this new ransomware is also nasty because infected users are
+    in danger of losing their personal files forever.
+    Locker mainly spreads by drive-by downloads from compromised websites, disguised
+    itself as MP3 files and use system software vulnerabilities to infect the end
+    user.
+    Once it has infected a system, malware first checks the infected machine has an
+    internet connection or not. Then it deletes any original files from the victim''s
+    computer after using AES-CTR for encrypting the files on infected devices and
+    add ". perfect" extension to them.
+    Locker''s encryption is based on an open source tool called ''TurboPower LockBox''
+    library. After encrypting all files, the malware place a "CONTACT.TXT" file in
+    each directory, which provides contact details of the author to buy the decryption
+    key and once the ransom is paid, each victim gets a key to unscramble files.
+    The good news is that the researchers are working on the universal decryption
+    software in order to help the victims. "It appears that the hackers are simply
+    comparing the list of infected IP addresses of users, along with their host names,"
+    according IntelCrawler.
+    IntelCrawler had discovered 50 different builds of the malware, which are being
+    sold in underground markets for pay-per install programs. One builds had just
+    under 6,000 infected machines. ZdNet reported.
+    Malware will encrypt all drives visible on an infected system, so you must be
+    sure that your backups are stored remotely or in a location that is not simply
+    another drive partition or mapping to another location.
+    The malware infects users from the United States, Turkey, Russia, Germany and
+    the Netherlands. Users should remain vigilant about their security. Please double
+    check the legitimacy of links received in emails and ensure you have your antivirus
+    up to date to help protect against such threats.
+    '
+- source_sentence: 'Security Event : Hack In Paris (16-17 June, 2011)
+    Hack In Paris is an international and corporate security event that will take
+    place in Disneyland Paris® fromJune 16th to 17th of 2011. Please refer to the
+    homepage to get up-to-date information about the event.
+    Topics
+    The following list contains major topics the conference will cover. Please consider
+    submitting even if the subject of your research is not listed here.
+    Advances in reverse engineering
+    Vulnerability research and exploitation
+    Penetration testing and security assessment
+    Malware analysis and new trends in malicous codes
+    Forensics, IT crime & law enforcement
+    Privacy issues: LOPPSI, HADOPI, …
+    Low-level hacking (console security & mobile devices)
+    Risk management and ISO 27001
+    Dates
+    January 20: CFP announced
+    March 30: Submission deadline
+    April 15: Notification sent to authors
+    April 17: Program announcement
+    June 16-17: Hack In Paris
+    June 18: Nuit du Hack
+    More Information: https://hackinparis.com
+    '
+  sentences:
+  - 'It''s just two weeks into the Trump presidency, but his decisions have caused
+    utter chaos around the country.
+    One such order signed by the president was banning both refugees and visa holders
+    from seven Muslim-majority countries (Iraq, Iran, Libya, Yemen, Somalia, Syria,
+    and Sudan) from entering the United States, resulting in unexpectedly arrest of
+    some travelers at airports.
+    Now, it seems like some anti-Trump protesters have publically declared their fight
+    against the president by exploiting a known flaw in low power FM (LPFM) radio
+    transmitters to play a song the radio stations didn''t intend to broadcast.
+    Radio stations in South Carolina, Indiana, Texas, Tennessee and Kentucky, were
+    hacked recently to broadcast the Bompton-based rapper YG and Nipsey Hussle''s
+    anti-Trump song "Fuck Donald Trump," which was already a radio hit in some parts
+    of the country last year, several sources report.
+    The song was repeatedly played on Monday night, according to the RadioInsight,
+    and the news of the incident began emerging shortly after Trump''s inauguration
+    on January 20, eight days before hackers hacked 70 percent of the police CCTV
+    cameras in Washington DC.
+    Hackers gained access to the radio stations by exploiting known vulnerabilities
+    in Barix Exstreamer devices which can decode audio file formats and send them
+    along for LPFM transmission.
+    Over a dozen radio stations experienced the hack in recent weeks, though some
+    of them shut down their airwaves as quickly as possible in an attempt to avoid
+    playing the inflammatory "FDT (Fuck Donald Trump)" song on loop.
+    The hackers or group of hackers behind the cyber attack is still unknown. The
+    affected stations so far include:
+    105.9 WFBS-LP Salem, S.C.
+    Radio 810 WMGC/96.7 W244CW Murfreesboro TN
+    101.9 Pirate Seattle
+    100.9 WCHQ-LP Louisville
+    100.5 KCGF-LP San Angelo TX
+    However, there are unconfirmed reports from radio stations in California, Indiana,
+    and Washington State that are believed to be affected as well.
+    Has any of the radio stations you listen to been hit by the hackers? Let us know
+    in the comments!
+    '
+  - 'Google is going to shut down its social media network Google+ after the company
+    suffered a massive data breach that exposed the private data of hundreds of thousands
+    of Google Plus users to third-party developers.
+    According to the tech giant, a security vulnerability in one of Google+''s People
+    APIs allowed third-party developers to access data for more than 500,000 users,
+    including their usernames, email addresses, occupation, date of birth, profile
+    photos, and gender-related information.
+    Since Google+ servers do not keep API logs for more than two weeks, the company
+    cannot confirm the number of users impacted by the vulnerability.
+    However, Google assured its users that the company found no evidence that any
+    developer was aware of this bug, or that the profile data was misused by any of
+    the 438 developers that could have had access.
+    "However, we ran a detailed analysis over the two weeks prior to patching the
+    bug, and from that analysis, the Profiles of up to 500,000 Google+ accounts were
+    potentially affected. Our analysis showed that up to 438 applications may have
+    used this API," Google said in blog post published today.
+    The vulnerability was open since 2015 and fixed after Google discovered it in
+    March 2018, but the company chose not to disclose the breach to the public—at
+    the time when Facebook was being roasted for Cambridge Analytica scandal.
+    Though Google has not revealed the technical details of the security vulnerability,
+    the nature of the flaw seems to be something very similar to Facebook API flaw
+    that recently allowed unauthorized developers to access private data from Facebook
+    users.
+    Besides admitting the security breach, Google also announced that the company
+    is shutting down its social media network, acknowledging that Google+ failed to
+    gain broad adoption or significant traction with consumers.
+    "The consumer version of Google+ currently has low usage and engagement: 90 percent
+    of Google+ user sessions are less than five seconds," Google said.
+    In response, the company has decided to shut down Google+ for consumers by the
+    end of August 2019. However, Google+ will continue as a product for Enterprise
+    users.
+    Google Introduces New Privacy Controls Over Third-Party App Permissions
+    As part of its "Project Strobe," Google engineers also reviewed third-party developer
+    access to Google account and Android device data; and has accordingly now introduced
+    some new privacy controls.
+    When a third-party app prompts users for access to their Google account data,
+    clicking "Allow" button approves all requested permissions at once, leaving an
+    opportunity for malicious apps to trick users into giving away powerful permissions.
+    But now Google has updated its Account Permissions system that asks for each requested
+    permission individually rather than all at once, giving users more control over
+    what type of account data they choose to share with each app.
+    Since APIs can also allow developers to access users'' extremely sensitive data,
+    like that of Gmail account, Google has limited access to Gmail API only for apps
+    that directly enhance email functionality—such as email clients, email backup
+    services and productivity services.
+    Google shares fell over 2 percent to $1134.23 after the data breach reports.
+    '
+  - 'Security Event : Hack In Paris (16-17 June, 2011)
+    Hack In Paris is an international and corporate security event that will take
+    place in Disneyland Paris® fromJune 16th to 17th of 2011. Please refer to the
+    homepage to get up-to-date information about the event.
+    Topics
+    The following list contains major topics the conference will cover. Please consider
+    submitting even if the subject of your research is not listed here.
+    Advances in reverse engineering
+    Vulnerability research and exploitation
+    Penetration testing and security assessment
+    Malware analysis and new trends in malicous codes
+    Forensics, IT crime & law enforcement
+    Privacy issues: LOPPSI, HADOPI, …
+    Low-level hacking (console security & mobile devices)
+    Risk management and ISO 27001
+    Dates
+    January 20: CFP announced
+    March 30: Submission deadline
+    April 15: Notification sent to authors
+    April 17: Program announcement
+    June 16-17: Hack In Paris
+    June 18: Nuit du Hack
+    More Information: https://hackinparis.com
+    '
+pipeline_tag: sentence-similarity
+library_name: sentence-transformers
+---
+# SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
+## Model Details
+### Model Description
+- **Model Type:** Sentence Transformer
+- **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision fa97f6e7cb1a59073dff9e6b13e2715cf7475ac9 -->
+- **Maximum Sequence Length:** 256 tokens
+- **Output Dimensionality:** 384 dimensions
+- **Similarity Function:** Cosine Similarity
+<!-- - **Training Dataset:** Unknown -->
+<!-- - **Language:** Unknown -->
+<!-- - **License:** Unknown -->
+### Model Sources
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
+- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
+### Full Model Architecture
+```
+SentenceTransformer(
+  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
+  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+  (2): Normalize()
+)
+```
+## Usage
+### Direct Usage (Sentence Transformers)
+First install the Sentence Transformers library:
+```bash
+pip install -U sentence-transformers
+```
+Then you can load this model and run inference.
+```python
+from sentence_transformers import SentenceTransformer
+# Download from the 🤗 Hub
+model = SentenceTransformer("sgadagin/fine_tuned_sbert")
+# Run inference
+sentences = [
+    'Security Event : Hack In Paris (16-17 June, 2011)\n\n\nHack In Paris is an international and corporate security event that will take place in Disneyland Paris® fromJune 16th to 17th of 2011. Please refer to the homepage to get up-to-date information about the event.\n\nTopics\nThe following list contains major topics the conference will cover. Please consider submitting even if the subject of your research is not listed here.\nAdvances in reverse engineering\nVulnerability research and exploitation\nPenetration testing and security assessment\nMalware analysis and new trends in malicous codes\nForensics, IT crime & law enforcement\nPrivacy issues: LOPPSI, HADOPI, …\nLow-level hacking (console security & mobile devices)\nRisk management and ISO 27001\nDates\nJanuary 20: CFP announced\nMarch 30: Submission deadline\nApril 15: Notification sent to authors\nApril 17: Program announcement\nJune 16-17: Hack In Paris\nJune 18: Nuit du Hack\nMore Information: https://hackinparis.com\n\n',
+    'Security Event : Hack In Paris (16-17 June, 2011)\n\n\nHack In Paris is an international and corporate security event that will take place in Disneyland Paris® fromJune 16th to 17th of 2011. Please refer to the homepage to get up-to-date information about the event.\n\nTopics\nThe following list contains major topics the conference will cover. Please consider submitting even if the subject of your research is not listed here.\nAdvances in reverse engineering\nVulnerability research and exploitation\nPenetration testing and security assessment\nMalware analysis and new trends in malicous codes\nForensics, IT crime & law enforcement\nPrivacy issues: LOPPSI, HADOPI, …\nLow-level hacking (console security & mobile devices)\nRisk management and ISO 27001\nDates\nJanuary 20: CFP announced\nMarch 30: Submission deadline\nApril 15: Notification sent to authors\nApril 17: Program announcement\nJune 16-17: Hack In Paris\nJune 18: Nuit du Hack\nMore Information: https://hackinparis.com\n\n',
+    'Google is going to shut down its social media network Google+ after the company suffered a massive data breach that exposed the private data of hundreds of thousands of Google Plus users to third-party developers.\n\nAccording to the tech giant, a security vulnerability in one of Google+\'s People APIs allowed third-party developers to access data for more than 500,000 users, including their usernames, email addresses, occupation, date of birth, profile photos, and gender-related information.\n\nSince Google+ servers do not keep API logs for more than two weeks, the company cannot confirm the number of users impacted by the vulnerability.\n\nHowever, Google assured its users that the company found no evidence that any developer was aware of this bug, or that the profile data was misused by any of the 438 developers that could have had access.\n"However, we ran a detailed analysis over the two weeks prior to patching the bug, and from that analysis, the Profiles of up to 500,000 Google+ accounts were potentially affected. Our analysis showed that up to 438 applications may have used this API," Google said in blog post published today.\nThe vulnerability was open since 2015 and fixed after Google discovered it in March 2018, but the company chose not to disclose the breach to the public—at the time when Facebook was being roasted for Cambridge Analytica scandal.\n\nThough Google has not revealed the technical details of the security vulnerability, the nature of the flaw seems to be something very similar to Facebook API flaw that recently allowed unauthorized developers to access private data from Facebook users.\n\nBesides admitting the security breach, Google also announced that the company is shutting down its social media network, acknowledging that Google+ failed to gain broad adoption or significant traction with consumers.\n"The consumer version of Google+ currently has low usage and engagement: 90 percent of Google+ user sessions are less than five seconds," Google said.\nIn response, the company has decided to shut down Google+ for consumers by the end of August 2019. However, Google+ will continue as a product for Enterprise users.\n\nGoogle Introduces New Privacy Controls Over Third-Party App Permissions\n\nAs part of its "Project Strobe," Google engineers also reviewed third-party developer access to Google account and Android device data; and has accordingly now introduced some new privacy controls.\n\nWhen a third-party app prompts users for access to their Google account data, clicking "Allow" button approves all requested permissions at once, leaving an opportunity for malicious apps to trick users into giving away powerful permissions.\nBut now Google has updated its Account Permissions system that asks for each requested permission individually rather than all at once, giving users more control over what type of account data they choose to share with each app.\n\nSince APIs can also allow developers to access users\' extremely sensitive data, like that of Gmail account, Google has limited access to Gmail API only for apps that directly enhance email functionality—such as email clients, email backup services and productivity services.\n\nGoogle shares fell over 2 percent to $1134.23 after the data breach reports.\n\n',
+]
+embeddings = model.encode(sentences)
+print(embeddings.shape)
+# [3, 384]
+# Get the similarity scores for the embeddings
+similarities = model.similarity(embeddings, embeddings)
+print(similarities.shape)
+# [3, 3]
+```
+<!--
+### Direct Usage (Transformers)
+<details><summary>Click to see the direct usage in Transformers</summary>
+</details>
+-->
+<!--
+### Downstream Usage (Sentence Transformers)
+You can finetune this model on your own dataset.
+<details><summary>Click to expand</summary>
+</details>
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Dataset
+#### Unnamed Dataset
+* Size: 3,742 training samples
+* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
+* Approximate statistics based on the first 1000 samples:
+  |         | sentence_0                                                                           | sentence_1                                                                           | label                                                                                 |
+  |:--------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------|
+  | type    | string                                                                               | string                                                                               | int                                                                                   |
+  | details | <ul><li>min: 37 tokens</li><li>mean: 252.46 tokens</li><li>max: 256 tokens</li></ul> | <ul><li>min: 37 tokens</li><li>mean: 252.46 tokens</li><li>max: 256 tokens</li></ul> | <ul><li>0: ~35.20%</li><li>1: ~10.30%</li><li>2: ~17.90%</li><li>3: ~36.60%</li></ul> |
+* Samples:
+  | sentence_0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | sentence_1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | label          |
+  |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
+  | <code>U.S. online fashion retailer SHEIN has admitted that the company has suffered a significant data breach after unknown hackers stole personally identifiable information (PII) of almost 6.5 million customers.<br><br>Based in North Brunswick and founded in 2008, SHEIN has become one of the largest online fashion retailers that ships to more than 80 countries worldwide. The site has been initially designed to produce "affordable" and trendy fashion clothing for women.<br><br>SHEIN revealed last weekend that its servers had been targeted by a "concerted criminal cyber-attack" that began in June this year and lasted until August 22, when the company was finally made aware of the potential theft.<br><br>Soon after that, the company scanned its servers to remove all possible backdoored entry points, leveraging which hackers could again infiltrate the servers. SHEIN assured its customers that the website is now safe to visit.<br><br>Hackers Stole Over 6.42 Million SHEIN Customers' Data<br><br>Although details about the inci...</code>       | <code>U.S. online fashion retailer SHEIN has admitted that the company has suffered a significant data breach after unknown hackers stole personally identifiable information (PII) of almost 6.5 million customers.<br><br>Based in North Brunswick and founded in 2008, SHEIN has become one of the largest online fashion retailers that ships to more than 80 countries worldwide. The site has been initially designed to produce "affordable" and trendy fashion clothing for women.<br><br>SHEIN revealed last weekend that its servers had been targeted by a "concerted criminal cyber-attack" that began in June this year and lasted until August 22, when the company was finally made aware of the potential theft.<br><br>Soon after that, the company scanned its servers to remove all possible backdoored entry points, leveraging which hackers could again infiltrate the servers. SHEIN assured its customers that the website is now safe to visit.<br><br>Hackers Stole Over 6.42 Million SHEIN Customers' Data<br><br>Although details about the inci...</code>       | <code>1</code> |
+  | <code>A location based Social Networking platform with 45 million users,'Foursquare' was vulnerable to the primary email address disclosed.<br><br>Foursquare is a Smartphone application that gives you details of nearby cafes, bars, shops, parks using GPS location and also tells about your friends nearby.<br><br>According to a Penetration tester and hacker 'Jamal Eddine', an attacker can extract email addresses of all 45 million users just by using a few lines of scripting tool.<br><br>Basically the flaw exists in the Invitation system of the Foursquare app. While testing the app, he found that invitation received on the recipient's end actually disclosing the sender's email address, as shown above.<br><br>Invitation URL:<br>https://foursquare.com/mehdi?action=acceptFriendship&expires=1378920415&src=wtbfe&uid=64761059&sig=mmlx96RwGrQ2fJAg4OWZhAWnDvc%3D<br>Where 'uid' parameter represents the sender's profile ID.<br><br>Hacker noticed that the parameter in the Invitation URL can be modified in order to spoof the sender profile i...</code> | <code>A location based Social Networking platform with 45 million users,'Foursquare' was vulnerable to the primary email address disclosed.<br><br>Foursquare is a Smartphone application that gives you details of nearby cafes, bars, shops, parks using GPS location and also tells about your friends nearby.<br><br>According to a Penetration tester and hacker 'Jamal Eddine', an attacker can extract email addresses of all 45 million users just by using a few lines of scripting tool.<br><br>Basically the flaw exists in the Invitation system of the Foursquare app. While testing the app, he found that invitation received on the recipient's end actually disclosing the sender's email address, as shown above.<br><br>Invitation URL:<br>https://foursquare.com/mehdi?action=acceptFriendship&expires=1378920415&src=wtbfe&uid=64761059&sig=mmlx96RwGrQ2fJAg4OWZhAWnDvc%3D<br>Where 'uid' parameter represents the sender's profile ID.<br><br>Hacker noticed that the parameter in the Invitation URL can be modified in order to spoof the sender profile i...</code> | <code>1</code> |
+  | <code>Earlier this week Dropbox team unveiled details of three critical vulnerabilities in Apple macOS operating system, which altogether could allow a remote attacker to execute malicious code on a targeted Mac computer just by convincing a victim into visiting a malicious web page.<br><br>The reported vulnerabilities were originally discovered by Syndis, a cybersecurity firm hired by Dropbox to conduct simulated penetration testing attacks as Red Team on the company's IT infrastructure, including Apple software used by Dropbox.<br><br>The vulnerabilities were discovered and disclosed to Apple security team in February this year, which were then patched by Apple just over one month later with the release of its March security updates. DropBox applauded Apple for its quick response to its bug report.<br><br>According to DropBox, the vulnerabilities discovered by Syndis didn't just affect its macOS fleet, but also affected all Safari users running the latest version of the web browser and operating system at t...</code>                   | <code>Earlier this week Dropbox team unveiled details of three critical vulnerabilities in Apple macOS operating system, which altogether could allow a remote attacker to execute malicious code on a targeted Mac computer just by convincing a victim into visiting a malicious web page.<br><br>The reported vulnerabilities were originally discovered by Syndis, a cybersecurity firm hired by Dropbox to conduct simulated penetration testing attacks as Red Team on the company's IT infrastructure, including Apple software used by Dropbox.<br><br>The vulnerabilities were discovered and disclosed to Apple security team in February this year, which were then patched by Apple just over one month later with the release of its March security updates. DropBox applauded Apple for its quick response to its bug report.<br><br>According to DropBox, the vulnerabilities discovered by Syndis didn't just affect its macOS fleet, but also affected all Safari users running the latest version of the web browser and operating system at t...</code>                   | <code>3</code> |
+* Loss: [<code>SoftmaxLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#softmaxloss)
+### Training Hyperparameters
+#### Non-Default Hyperparameters
+- `multi_dataset_batch_sampler`: round_robin
+#### All Hyperparameters
+<details><summary>Click to expand</summary>
+- `overwrite_output_dir`: False
+- `do_predict`: False
+- `eval_strategy`: no
+- `prediction_loss_only`: True
+- `per_device_train_batch_size`: 8
+- `per_device_eval_batch_size`: 8
+- `per_gpu_train_batch_size`: None
+- `per_gpu_eval_batch_size`: None
+- `gradient_accumulation_steps`: 1
+- `eval_accumulation_steps`: None
+- `torch_empty_cache_steps`: None
+- `learning_rate`: 5e-05
+- `weight_decay`: 0.0
+- `adam_beta1`: 0.9
+- `adam_beta2`: 0.999
+- `adam_epsilon`: 1e-08
+- `max_grad_norm`: 1
+- `num_train_epochs`: 3
+- `max_steps`: -1
+- `lr_scheduler_type`: linear
+- `lr_scheduler_kwargs`: {}
+- `warmup_ratio`: 0.0
+- `warmup_steps`: 0
+- `log_level`: passive
+- `log_level_replica`: warning
+- `log_on_each_node`: True
+- `logging_nan_inf_filter`: True
+- `save_safetensors`: True
+- `save_on_each_node`: False
+- `save_only_model`: False
+- `restore_callback_states_from_checkpoint`: False
+- `no_cuda`: False
+- `use_cpu`: False
+- `use_mps_device`: False
+- `seed`: 42
+- `data_seed`: None
+- `jit_mode_eval`: False
+- `use_ipex`: False
+- `bf16`: False
+- `fp16`: False
+- `fp16_opt_level`: O1
+- `half_precision_backend`: auto
+- `bf16_full_eval`: False
+- `fp16_full_eval`: False
+- `tf32`: None
+- `local_rank`: 0
+- `ddp_backend`: None
+- `tpu_num_cores`: None
+- `tpu_metrics_debug`: False
+- `debug`: []
+- `dataloader_drop_last`: False
+- `dataloader_num_workers`: 0
+- `dataloader_prefetch_factor`: None
+- `past_index`: -1
+- `disable_tqdm`: False
+- `remove_unused_columns`: True
+- `label_names`: None
+- `load_best_model_at_end`: False
+- `ignore_data_skip`: False
+- `fsdp`: []
+- `fsdp_min_num_params`: 0
+- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
+- `fsdp_transformer_layer_cls_to_wrap`: None
+- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
+- `deepspeed`: None
+- `label_smoothing_factor`: 0.0
+- `optim`: adamw_torch
+- `optim_args`: None
+- `adafactor`: False
+- `group_by_length`: False
+- `length_column_name`: length
+- `ddp_find_unused_parameters`: None
+- `ddp_bucket_cap_mb`: None
+- `ddp_broadcast_buffers`: False
+- `dataloader_pin_memory`: True
+- `dataloader_persistent_workers`: False
+- `skip_memory_metrics`: True
+- `use_legacy_prediction_loop`: False
+- `push_to_hub`: False
+- `resume_from_checkpoint`: None
+- `hub_model_id`: None
+- `hub_strategy`: every_save
+- `hub_private_repo`: None
+- `hub_always_push`: False
+- `gradient_checkpointing`: False
+- `gradient_checkpointing_kwargs`: None
+- `include_inputs_for_metrics`: False
+- `include_for_metrics`: []
+- `eval_do_concat_batches`: True
+- `fp16_backend`: auto
+- `push_to_hub_model_id`: None
+- `push_to_hub_organization`: None
+- `mp_parameters`:
+- `auto_find_batch_size`: False
+- `full_determinism`: False
+- `torchdynamo`: None
+- `ray_scope`: last
+- `ddp_timeout`: 1800
+- `torch_compile`: False
+- `torch_compile_backend`: None
+- `torch_compile_mode`: None
+- `dispatch_batches`: None
+- `split_batches`: None
+- `include_tokens_per_second`: False
+- `include_num_input_tokens_seen`: False
+- `neftune_noise_alpha`: None
+- `optim_target_modules`: None
+- `batch_eval_metrics`: False
+- `eval_on_start`: False
+- `use_liger_kernel`: False
+- `eval_use_gather_object`: False
+- `average_tokens_across_devices`: False
+- `prompts`: None
+- `batch_sampler`: batch_sampler
+- `multi_dataset_batch_sampler`: round_robin
+</details>
+### Training Logs
+| Epoch  | Step | Training Loss |
+|:------:|:----:|:-------------:|
+| 1.0684 | 500  | 1.2186        |
+| 2.1368 | 1000 | 1.145         |
+### Framework Versions
+- Python: 3.12.9
+- Sentence Transformers: 3.4.1
+- Transformers: 4.49.0
+- PyTorch: 2.6.0
+- Accelerate: 1.4.0
+- Datasets: 3.3.2
+- Tokenizers: 0.21.0
+## Citation
+### BibTeX
+#### Sentence Transformers and SoftmaxLoss
+```bibtex
+@inproceedings{reimers-2019-sentence-bert,
+    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
+    author = "Reimers, Nils and Gurevych, Iryna",
+    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
+    month = "11",
+    year = "2019",
+    publisher = "Association for Computational Linguistics",
+    url = "https://arxiv.org/abs/1908.10084",
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

config.json ADDED Viewed

	@@ -0,0 +1,26 @@

+{
+  "_name_or_path": "fine_tuned_sbert",
+  "architectures": [
+    "BertModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 384,
+  "initializer_range": 0.02,
+  "intermediate_size": 1536,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 6,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "torch_dtype": "float32",
+  "transformers_version": "4.49.0",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "__version__": {
+    "sentence_transformers": "3.4.1",
+    "transformers": "4.49.0",
+    "pytorch": "2.6.0"
+  },
+  "prompts": {},
+  "default_prompt_name": null,
+  "similarity_fn_name": "cosine"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a69349eb8ec33b8bdaeefc01305d25ac13a85d3bbe606a8d6a4316d4c370061c
+size 90864192

modules.json ADDED Viewed

	@@ -0,0 +1,20 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  },
+  {
+    "idx": 2,
+    "name": "2",
+    "path": "2_Normalize",
+    "type": "sentence_transformers.models.Normalize"
+  }
+]

sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "max_seq_length": 256,
+  "do_lower_case": false
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,65 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "[CLS]",
+  "do_basic_tokenize": true,
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "max_length": 128,
+  "model_max_length": 256,
+  "never_split": null,
+  "pad_to_multiple_of": null,
+  "pad_token": "[PAD]",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "sep_token": "[SEP]",
+  "stride": 0,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff