|
Metadata-Version: 2.1 |
|
Name: defusedxml |
|
Version: 0.7.1 |
|
Summary: XML bomb protection for Python stdlib modules |
|
Home-page: https://github.com/tiran/defusedxml |
|
Author: Christian Heimes |
|
Author-email: [email protected] |
|
Maintainer: Christian Heimes |
|
Maintainer-email: [email protected] |
|
License: PSFL |
|
Download-URL: https://pypi.python.org/pypi/defusedxml |
|
Keywords: xml bomb DoS |
|
Platform: all |
|
Classifier: Development Status :: 5 - Production/Stable |
|
Classifier: Intended Audience :: Developers |
|
Classifier: License :: OSI Approved :: Python Software Foundation License |
|
Classifier: Natural Language :: English |
|
Classifier: Programming Language :: Python |
|
Classifier: Programming Language :: Python :: 2 |
|
Classifier: Programming Language :: Python :: 2.7 |
|
Classifier: Programming Language :: Python :: 3 |
|
Classifier: Programming Language :: Python :: 3.5 |
|
Classifier: Programming Language :: Python :: 3.6 |
|
Classifier: Programming Language :: Python :: 3.7 |
|
Classifier: Programming Language :: Python :: 3.8 |
|
Classifier: Programming Language :: Python :: 3.9 |
|
Classifier: Topic :: Text Processing :: Markup :: XML |
|
Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.* |
|
|
|
=================================================== |
|
defusedxml -- defusing XML bombs and other exploits |
|
=================================================== |
|
|
|
.. image:: https://img.shields.io/pypi/v/defusedxml.svg |
|
:target: https://pypi.org/project/defusedxml/ |
|
:alt: Latest Version |
|
|
|
.. image:: https://img.shields.io/pypi/pyversions/defusedxml.svg |
|
:target: https://pypi.org/project/defusedxml/ |
|
:alt: Supported Python versions |
|
|
|
.. image:: https://travis-ci.org/tiran/defusedxml.svg?branch=master |
|
:target: https://travis-ci.org/tiran/defusedxml |
|
:alt: Travis CI |
|
|
|
.. image:: https://codecov.io/github/tiran/defusedxml/coverage.svg?branch=master |
|
:target: https://codecov.io/github/tiran/defusedxml?branch=master |
|
:alt: codecov |
|
|
|
.. image:: https://img.shields.io/pypi/dm/defusedxml.svg |
|
:target: https://pypistats.org/packages/defusedxml |
|
:alt: PyPI downloads |
|
|
|
.. image:: https://img.shields.io/badge/code%20style-black-000000.svg |
|
:target: https://github.com/psf/black |
|
:alt: Code style: black |
|
|
|
.. |
|
|
|
"It's just XML, what could probably go wrong?" |
|
|
|
Christian Heimes <[email protected]> |
|
|
|
Synopsis |
|
======== |
|
|
|
The results of an attack on a vulnerable XML library can be fairly dramatic. |
|
With just a few hundred **Bytes** of XML data an attacker can occupy several |
|
**Gigabytes** of memory within **seconds**. An attacker can also keep |
|
CPUs busy for a long time with a small to medium size request. Under some |
|
circumstances it is even possible to access local files on your |
|
server, to circumvent a firewall, or to abuse services to rebound attacks to |
|
third parties. |
|
|
|
The attacks use and abuse less common features of XML and its parsers. The |
|
majority of developers are unacquainted with features such as processing |
|
instructions and entity expansions that XML inherited from SGML. At best |
|
they know about ``<!DOCTYPE>`` from experience with HTML but they are not |
|
aware that a document type definition (DTD) can generate an HTTP request |
|
or load a file from the file system. |
|
|
|
None of the issues is new. They have been known for a long time. Billion |
|
laughs was first reported in 2003. Nevertheless some XML libraries and |
|
applications are still vulnerable and even heavy users of XML are |
|
surprised by these features. It's hard to say whom to blame for the |
|
situation. It's too short sighted to shift all blame on XML parsers and |
|
XML libraries for using insecure default settings. After all they |
|
properly implement XML specifications. Application developers must not rely |
|
that a library is always configured for security and potential harmful data |
|
by default. |
|
|
|
|
|
.. contents:: Table of Contents |
|
:depth: 2 |
|
|
|
|
|
Attack vectors |
|
============== |
|
|
|
billion laughs / exponential entity expansion |
|
--------------------------------------------- |
|
|
|
The `Billion Laughs`_ attack -- also known as exponential entity expansion -- |
|
uses multiple levels of nested entities. The original example uses 9 levels |
|
of 10 expansions in each level to expand the string ``lol`` to a string of |
|
3 * 10 :sup:`9` bytes, hence the name "billion laughs". The resulting string |
|
occupies 3 GB (2.79 GiB) of memory; intermediate strings require additional |
|
memory. Because most parsers don't cache the intermediate step for every |
|
expansion it is repeated over and over again. It increases the CPU load even |
|
more. |
|
|
|
An XML document of just a few hundred bytes can disrupt all services on a |
|
machine within seconds. |
|
|
|
Example XML:: |
|
|
|
<!DOCTYPE xmlbomb [ |
|
<!ENTITY a "1234567890" > |
|
<!ENTITY b "&a;&a;&a;&a;&a;&a;&a;&a;"> |
|
<!ENTITY c "&b;&b;&b;&b;&b;&b;&b;&b;"> |
|
<!ENTITY d "&c;&c;&c;&c;&c;&c;&c;&c;"> |
|
]> |
|
<bomb>&d;</bomb> |
|
|
|
|
|
quadratic blowup entity expansion |
|
--------------------------------- |
|
|
|
A quadratic blowup attack is similar to a `Billion Laughs`_ attack; it abuses |
|
entity expansion, too. Instead of nested entities it repeats one large entity |
|
with a couple of thousand chars over and over again. The attack isn't as |
|
efficient as the exponential case but it avoids triggering countermeasures of |
|
parsers against heavily nested entities. Some parsers limit the depth and |
|
breadth of a single entity but not the total amount of expanded text |
|
throughout an entire XML document. |
|
|
|
A medium-sized XML document with a couple of hundred kilobytes can require a |
|
couple of hundred MB to several GB of memory. When the attack is combined |
|
with some level of nested expansion an attacker is able to achieve a higher |
|
ratio of success. |
|
|
|
:: |
|
|
|
<!DOCTYPE bomb [ |
|
<!ENTITY a "xxxxxxx... a couple of ten thousand chars"> |
|
]> |
|
<bomb>&a;&a;&a;... repeat</bomb> |
|
|
|
|
|
external entity expansion (remote) |
|
---------------------------------- |
|
|
|
Entity declarations can contain more than just text for replacement. They can |
|
also point to external resources by public identifiers or system identifiers. |
|
System identifiers are standard URIs. When the URI is a URL (e.g. a |
|
``http://`` locator) some parsers download the resource from the remote |
|
location and embed them into the XML document verbatim. |
|
|
|
Simple example of a parsed external entity:: |
|
|
|
<!DOCTYPE external [ |
|
<!ENTITY ee SYSTEM "http://www.python.org/some.xml"> |
|
]> |
|
<root>ⅇ</root> |
|
|
|
The case of parsed external entities works only for valid XML content. The |
|
XML standard also supports unparsed external entities with a |
|
``NData declaration``. |
|
|
|
External entity expansion opens the door to plenty of exploits. An attacker |
|
can abuse a vulnerable XML library and application to rebound and forward |
|
network requests with the IP address of the server. It highly depends |
|
on the parser and the application what kind of exploit is possible. For |
|
example: |
|
|
|
* An attacker can circumvent firewalls and gain access to restricted |
|
resources as all the requests are made from an internal and trustworthy |
|
IP address, not from the outside. |
|
* An attacker can abuse a service to attack, spy on or DoS your servers but |
|
also third party services. The attack is disguised with the IP address of |
|
the server and the attacker is able to utilize the high bandwidth of a big |
|
machine. |
|
* An attacker can exhaust additional resources on the machine, e.g. with |
|
requests to a service that doesn't respond or responds with very large |
|
files. |
|
* An attacker may gain knowledge, when, how often and from which IP address |
|
an XML document is accessed. |
|
* An attacker could send mail from inside your network if the URL handler |
|
supports ``smtp://`` URIs. |
|
|
|
|
|
external entity expansion (local file) |
|
-------------------------------------- |
|
|
|
External entities with references to local files are a sub-case of external |
|
entity expansion. It's listed as an extra attack because it deserves extra |
|
attention. Some XML libraries such as lxml disable network access by default |
|
but still allow entity expansion with local file access by default. Local |
|
files are either referenced with a ``file://`` URL or by a file path (either |
|
relative or absolute). |
|
|
|
An attacker may be able to access and download all files that can be read by |
|
the application process. This may include critical configuration files, too. |
|
|
|
:: |
|
|
|
<!DOCTYPE external [ |
|
<!ENTITY ee SYSTEM "file:///PATH/TO/simple.xml"> |
|
]> |
|
<root>ⅇ</root> |
|
|
|
|
|
DTD retrieval |
|
------------- |
|
|
|
This case is similar to external entity expansion, too. Some XML libraries |
|
like Python's xml.dom.pulldom retrieve document type definitions from remote |
|
or local locations. Several attack scenarios from the external entity case |
|
apply to this issue as well. |
|
|
|
:: |
|
|
|
<?xml version="1.0" encoding="utf-8"?> |
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" |
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
|
<html> |
|
<head/> |
|
<body>text</body> |
|
</html> |
|
|
|
|
|
Python XML Libraries |
|
==================== |
|
|
|
.. csv-table:: vulnerabilities and features |
|
:header: "kind", "sax", "etree", "minidom", "pulldom", "xmlrpc", "lxml", "genshi" |
|
:widths: 24, 7, 8, 8, 7, 8, 8, 8 |
|
:stub-columns: 0 |
|
|
|
"billion laughs", "**True**", "**True**", "**True**", "**True**", "**True**", "False (1)", "False (5)" |
|
"quadratic blowup", "**True**", "**True**", "**True**", "**True**", "**True**", "**True**", "False (5)" |
|
"external entity expansion (remote)", "**True**", "False (3)", "False (4)", "**True**", "false", "False (1)", "False (5)" |
|
"external entity expansion (local file)", "**True**", "False (3)", "False (4)", "**True**", "false", "**True**", "False (5)" |
|
"DTD retrieval", "**True**", "False", "False", "**True**", "false", "False (1)", "False" |
|
"gzip bomb", "False", "False", "False", "False", "**True**", "**partly** (2)", "False" |
|
"xpath support (7)", "False", "False", "False", "False", "False", "**True**", "False" |
|
"xsl(t) support (7)", "False", "False", "False", "False", "False", "**True**", "False" |
|
"xinclude support (7)", "False", "**True** (6)", "False", "False", "False", "**True** (6)", "**True**" |
|
"C library", "expat", "expat", "expat", "expat", "expat", "libxml2", "expat" |
|
|
|
1. Lxml is protected against billion laughs attacks and doesn't do network |
|
lookups by default. |
|
2. libxml2 and lxml are not directly vulnerable to gzip decompression bombs |
|
but they don't protect you against them either. |
|
3. xml.etree doesn't expand entities and raises a ParserError when an entity |
|
occurs. |
|
4. minidom doesn't expand entities and simply returns the unexpanded entity |
|
verbatim. |
|
5. genshi.input of genshi 0.6 doesn't support entity expansion and raises a |
|
ParserError when an entity occurs. |
|
6. Library has (limited) XInclude support but requires an additional step to |
|
process inclusion. |
|
7. These are features but they may introduce exploitable holes, see |
|
`Other things to consider`_ |
|
|
|
|
|
Settings in standard library |
|
---------------------------- |
|
|
|
|
|
xml.sax.handler Features |
|
........................ |
|
|
|
feature_external_ges (http://xml.org/sax/features/external-general-entities) |
|
disables external entity expansion |
|
|
|
feature_external_pes (http://xml.org/sax/features/external-parameter-entities) |
|
the option is ignored and doesn't modify any functionality |
|
|
|
DOM xml.dom.xmlbuilder.Options |
|
.............................. |
|
|
|
external_parameter_entities |
|
ignored |
|
|
|
external_general_entities |
|
ignored |
|
|
|
external_dtd_subset |
|
ignored |
|
|
|
entities |
|
unsure |
|
|
|
|
|
defusedxml |
|
========== |
|
|
|
The `defusedxml package`_ (`defusedxml on PyPI`_) |
|
contains several Python-only workarounds and fixes |
|
for denial of service and other vulnerabilities in Python's XML libraries. |
|
In order to benefit from the protection you just have to import and use the |
|
listed functions / classes from the right defusedxml module instead of the |
|
original module. Merely `defusedxml.xmlrpc`_ is implemented as monkey patch. |
|
|
|
Instead of:: |
|
|
|
>>> from xml.etree.ElementTree import parse |
|
>>> et = parse(xmlfile) |
|
|
|
alter code to:: |
|
|
|
>>> from defusedxml.ElementTree import parse |
|
>>> et = parse(xmlfile) |
|
|
|
Additionally the package has an **untested** function to monkey patch |
|
all stdlib modules with ``defusedxml.defuse_stdlib()``. |
|
|
|
All functions and parser classes accept three additional keyword arguments. |
|
They return either the same objects as the original functions or compatible |
|
subclasses. |
|
|
|
forbid_dtd (default: False) |
|
disallow XML with a ``<!DOCTYPE>`` processing instruction and raise a |
|
*DTDForbidden* exception when a DTD processing instruction is found. |
|
|
|
forbid_entities (default: True) |
|
disallow XML with ``<!ENTITY>`` declarations inside the DTD and raise an |
|
*EntitiesForbidden* exception when an entity is declared. |
|
|
|
forbid_external (default: True) |
|
disallow any access to remote or local resources in external entities |
|
or DTD and raising an *ExternalReferenceForbidden* exception when a DTD |
|
or entity references an external resource. |
|
|
|
|
|
defusedxml (package) |
|
-------------------- |
|
|
|
DefusedXmlException, DTDForbidden, EntitiesForbidden, |
|
ExternalReferenceForbidden, NotSupportedError |
|
|
|
defuse_stdlib() (*experimental*) |
|
|
|
|
|
defusedxml.cElementTree |
|
----------------------- |
|
|
|
**NOTE** ``defusedxml.cElementTree`` is deprecated and will be removed in a |
|
future release. Import from ``defusedxml.ElementTree`` instead. |
|
|
|
parse(), iterparse(), fromstring(), XMLParser |
|
|
|
|
|
defusedxml.ElementTree |
|
----------------------- |
|
|
|
parse(), iterparse(), fromstring(), XMLParser |
|
|
|
|
|
defusedxml.expatreader |
|
---------------------- |
|
|
|
create_parser(), DefusedExpatParser |
|
|
|
|
|
defusedxml.sax |
|
-------------- |
|
|
|
parse(), parseString(), make_parser() |
|
|
|
|
|
defusedxml.expatbuilder |
|
----------------------- |
|
|
|
parse(), parseString(), DefusedExpatBuilder, DefusedExpatBuilderNS |
|
|
|
|
|
defusedxml.minidom |
|
------------------ |
|
|
|
parse(), parseString() |
|
|
|
|
|
defusedxml.pulldom |
|
------------------ |
|
|
|
parse(), parseString() |
|
|
|
|
|
defusedxml.xmlrpc |
|
----------------- |
|
|
|
The fix is implemented as monkey patch for the stdlib's xmlrpc package (3.x) |
|
or xmlrpclib module (2.x). The function `monkey_patch()` enables the fixes, |
|
`unmonkey_patch()` removes the patch and puts the code in its former state. |
|
|
|
The monkey patch protects against XML related attacks as well as |
|
decompression bombs and excessively large requests or responses. The default |
|
setting is 30 MB for requests, responses and gzip decompression. You can |
|
modify the default by changing the module variable `MAX_DATA`. A value of |
|
`-1` disables the limit. |
|
|
|
|
|
defusedxml.lxml |
|
--------------- |
|
|
|
**DEPRECATED** The module is deprecated and will be removed in a future |
|
release. |
|
|
|
The module acts as an *example* how you could protect code that uses |
|
lxml.etree. It implements a custom Element class that filters out |
|
Entity instances, a custom parser factory and a thread local storage for |
|
parser instances. It also has a check_docinfo() function which inspects |
|
a tree for internal or external DTDs and entity declarations. In order to |
|
check for entities lxml > 3.0 is required. |
|
|
|
parse(), fromstring() |
|
RestrictedElement, GlobalParserTLS, getDefaultParser(), check_docinfo() |
|
|
|
|
|
defusedexpat |
|
============ |
|
|
|
The `defusedexpat package`_ (`defusedexpat on PyPI`_) |
|
comes with binary extensions and a |
|
`modified expat`_ library instead of the standard `expat parser`_. It's |
|
basically a stand-alone version of the patches for Python's standard |
|
library C extensions. |
|
|
|
Modifications in expat |
|
---------------------- |
|
|
|
new definitions:: |
|
|
|
XML_BOMB_PROTECTION |
|
XML_DEFAULT_MAX_ENTITY_INDIRECTIONS |
|
XML_DEFAULT_MAX_ENTITY_EXPANSIONS |
|
XML_DEFAULT_RESET_DTD |
|
|
|
new XML_FeatureEnum members:: |
|
|
|
XML_FEATURE_MAX_ENTITY_INDIRECTIONS |
|
XML_FEATURE_MAX_ENTITY_EXPANSIONS |
|
XML_FEATURE_IGNORE_DTD |
|
|
|
new XML_Error members:: |
|
|
|
XML_ERROR_ENTITY_INDIRECTIONS |
|
XML_ERROR_ENTITY_EXPANSION |
|
|
|
new API functions:: |
|
|
|
int XML_GetFeature(XML_Parser parser, |
|
enum XML_FeatureEnum feature, |
|
long *value); |
|
int XML_SetFeature(XML_Parser parser, |
|
enum XML_FeatureEnum feature, |
|
long value); |
|
int XML_GetFeatureDefault(enum XML_FeatureEnum feature, |
|
long *value); |
|
int XML_SetFeatureDefault(enum XML_FeatureEnum feature, |
|
long value); |
|
|
|
XML_FEATURE_MAX_ENTITY_INDIRECTIONS |
|
Limit the amount of indirections that are allowed to occur during the |
|
expansion of a nested entity. A counter starts when an entity reference |
|
is encountered. It resets after the entity is fully expanded. The limit |
|
protects the parser against exponential entity expansion attacks (aka |
|
billion laughs attack). When the limit is exceeded the parser stops and |
|
fails with `XML_ERROR_ENTITY_INDIRECTIONS`. |
|
A value of 0 disables the protection. |
|
|
|
Supported range |
|
0 .. UINT_MAX |
|
Default |
|
40 |
|
|
|
XML_FEATURE_MAX_ENTITY_EXPANSIONS |
|
Limit the total length of all entity expansions throughout the entire |
|
document. The lengths of all entities are accumulated in a parser variable. |
|
The setting protects against quadratic blowup attacks (lots of expansions |
|
of a large entity declaration). When the sum of all entities exceeds |
|
the limit, the parser stops and fails with `XML_ERROR_ENTITY_EXPANSION`. |
|
A value of 0 disables the protection. |
|
|
|
Supported range |
|
0 .. UINT_MAX |
|
Default |
|
8 MiB |
|
|
|
XML_FEATURE_RESET_DTD |
|
Reset all DTD information after the <!DOCTYPE> block has been parsed. When |
|
the flag is set (default: false) all DTD information after the |
|
endDoctypeDeclHandler has been called. The flag can be set inside the |
|
endDoctypeDeclHandler. Without DTD information any entity reference in |
|
the document body leads to `XML_ERROR_UNDEFINED_ENTITY`. |
|
|
|
Supported range |
|
0, 1 |
|
Default |
|
0 |
|
|
|
|
|
How to avoid XML vulnerabilities |
|
================================ |
|
|
|
Best practices |
|
-------------- |
|
|
|
* Don't allow DTDs |
|
* Don't expand entities |
|
* Don't resolve externals |
|
* Limit parse depth |
|
* Limit total input size |
|
* Limit parse time |
|
* Favor a SAX or iterparse-like parser for potential large data |
|
* Validate and properly quote arguments to XSL transformations and |
|
XPath queries |
|
* Don't use XPath expression from untrusted sources |
|
* Don't apply XSL transformations that come untrusted sources |
|
|
|
(based on Brad Hill's `Attacking XML Security`_) |
|
|
|
|
|
Other things to consider |
|
======================== |
|
|
|
XML, XML parsers and processing libraries have more features and possible |
|
issue that could lead to DoS vulnerabilities or security exploits in |
|
applications. I have compiled an incomplete list of theoretical issues that |
|
need further research and more attention. The list is deliberately pessimistic |
|
and a bit paranoid, too. It contains things that might go wrong under daffy |
|
circumstances. |
|
|
|
|
|
attribute blowup / hash collision attack |
|
---------------------------------------- |
|
|
|
XML parsers may use an algorithm with quadratic runtime O(n :sup:`2`) to |
|
handle attributes and namespaces. If it uses hash tables (dictionaries) to |
|
store attributes and namespaces the implementation may be vulnerable to |
|
hash collision attacks, thus reducing the performance to O(n :sup:`2`) again. |
|
In either case an attacker is able to forge a denial of service attack with |
|
an XML document that contains thousands upon thousands of attributes in |
|
a single node. |
|
|
|
I haven't researched yet if expat, pyexpat or libxml2 are vulnerable. |
|
|
|
|
|
decompression bomb |
|
------------------ |
|
|
|
The issue of decompression bombs (aka `ZIP bomb`_) apply to all XML libraries |
|
that can parse compressed XML stream like gzipped HTTP streams or LZMA-ed |
|
files. For an attacker it can reduce the amount of transmitted data by three |
|
magnitudes or more. Gzip is able to compress 1 GiB zeros to roughly 1 MB, |
|
lzma is even better:: |
|
|
|
$ dd if=/dev/zero bs=1M count=1024 | gzip > zeros.gz |
|
$ dd if=/dev/zero bs=1M count=1024 | lzma -z > zeros.xy |
|
$ ls -sh zeros.* |
|
1020K zeros.gz |
|
148K zeros.xy |
|
|
|
None of Python's standard XML libraries decompress streams except for |
|
``xmlrpclib``. The module is vulnerable <https://bugs.python.org/issue16043> |
|
to decompression bombs. |
|
|
|
lxml can load and process compressed data through libxml2 transparently. |
|
libxml2 can handle even very large blobs of compressed data efficiently |
|
without using too much memory. But it doesn't protect applications from |
|
decompression bombs. A carefully written SAX or iterparse-like approach can |
|
be safe. |
|
|
|
|
|
Processing Instruction |
|
---------------------- |
|
|
|
`PI`_'s like:: |
|
|
|
<?xml-stylesheet type="text/xsl" href="style.xsl"?> |
|
|
|
may impose more threats for XML processing. It depends if and how a |
|
processor handles processing instructions. The issue of URL retrieval with |
|
network or local file access apply to processing instructions, too. |
|
|
|
|
|
Other DTD features |
|
------------------ |
|
|
|
`DTD`_ has more features like ``<!NOTATION>``. I haven't researched how |
|
these features may be a security threat. |
|
|
|
|
|
XPath |
|
----- |
|
|
|
XPath statements may introduce DoS vulnerabilities. Code should never execute |
|
queries from untrusted sources. An attacker may also be able to create an XML |
|
document that makes certain XPath queries costly or resource hungry. |
|
|
|
|
|
XPath injection attacks |
|
----------------------- |
|
|
|
XPath injeciton attacks pretty much work like SQL injection attacks. |
|
Arguments to XPath queries must be quoted and validated properly, especially |
|
when they are taken from the user. The page `Avoid the dangers of XPath injection`_ |
|
list some ramifications of XPath injections. |
|
|
|
Python's standard library doesn't have XPath support. Lxml supports |
|
parameterized XPath queries which does proper quoting. You just have to use |
|
its xpath() method correctly:: |
|
|
|
|
|
>>> tree.xpath("/tag[@id='%s']" % value) |
|
|
|
|
|
>>> tree.xpath("/tag[@id=$tagid]", tagid=name) |
|
|
|
|
|
XInclude |
|
-------- |
|
|
|
`XML Inclusion`_ is another way to load and include external files:: |
|
|
|
<root xmlns:xi="http://www.w3.org/2001/XInclude"> |
|
<xi:include href="filename.txt" parse="text" /> |
|
</root> |
|
|
|
This feature should be disabled when XML files from an untrusted source are |
|
processed. Some Python XML libraries and libxml2 support XInclude but don't |
|
have an option to sandbox inclusion and limit it to allowed directories. |
|
|
|
|
|
XMLSchema location |
|
------------------ |
|
|
|
A validating XML parser may download schema files from the information in a |
|
``xsi:schemaLocation`` attribute. |
|
|
|
:: |
|
|
|
<ead xmlns="urn:isbn:1-931666-22-9" |
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" |
|
xsi:schemaLocation="urn:isbn:1-931666-22-9 http://www.loc.gov/ead/ead.xsd"> |
|
</ead> |
|
|
|
|
|
XSL Transformation |
|
------------------ |
|
|
|
You should keep in mind that XSLT is a Turing complete language. Never |
|
process XSLT code from unknown or untrusted source! XSLT processors may |
|
allow you to interact with external resources in ways you can't even imagine. |
|
Some processors even support extensions that allow read/write access to file |
|
system, access to JRE objects or scripting with Jython. |
|
|
|
Example from `Attacking XML Security`_ for Xalan-J:: |
|
|
|
<xsl:stylesheet version="1.0" |
|
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" |
|
xmlns:rt="http://xml.apache.org/xalan/java/java.lang.Runtime" |
|
xmlns:ob="http://xml.apache.org/xalan/java/java.lang.Object" |
|
exclude-result-prefixes= "rt ob"> |
|
<xsl:template match="/"> |
|
<xsl:variable name="runtimeObject" select="rt:getRuntime()"/> |
|
<xsl:variable name="command" |
|
select="rt:exec($runtimeObject, 'c:\Windows\system32\cmd.exe')"/> |
|
<xsl:variable name="commandAsString" select="ob:toString($command)"/> |
|
<xsl:value-of select="$commandAsString"/> |
|
</xsl:template> |
|
</xsl:stylesheet> |
|
|
|
|
|
Related CVEs |
|
============ |
|
|
|
CVE-2013-1664 |
|
Unrestricted entity expansion induces DoS vulnerabilities in Python XML |
|
libraries (XML bomb) |
|
|
|
CVE-2013-1665 |
|
External entity expansion in Python XML libraries inflicts potential |
|
security flaws and DoS vulnerabilities |
|
|
|
|
|
Other languages / frameworks |
|
============================= |
|
|
|
Several other programming languages and frameworks are vulnerable as well. A |
|
couple of them are affected by the fact that libxml2 up to 2.9.0 has no |
|
protection against quadratic blowup attacks. Most of them have potential |
|
dangerous default settings for entity expansion and external entities, too. |
|
|
|
Perl |
|
---- |
|
|
|
Perl's XML::Simple is vulnerable to quadratic entity expansion and external |
|
entity expansion (both local and remote). |
|
|
|
|
|
Ruby |
|
---- |
|
|
|
Ruby's REXML document parser is vulnerable to entity expansion attacks |
|
(both quadratic and exponential) but it doesn't do external entity |
|
expansion by default. In order to counteract entity expansion you have to |
|
disable the feature:: |
|
|
|
REXML::Document.entity_expansion_limit = 0 |
|
|
|
libxml-ruby and hpricot don't expand entities in their default configuration. |
|
|
|
|
|
PHP |
|
--- |
|
|
|
PHP's SimpleXML API is vulnerable to quadratic entity expansion and loads |
|
entities from local and remote resources. The option ``LIBXML_NONET`` disables |
|
network access but still allows local file access. ``LIBXML_NOENT`` seems to |
|
have no effect on entity expansion in PHP 5.4.6. |
|
|
|
|
|
C# / .NET / Mono |
|
---------------- |
|
|
|
Information in `XML DoS and Defenses (MSDN)`_ suggest that .NET is |
|
vulnerable with its default settings. The article contains code snippets |
|
how to create a secure XML reader:: |
|
|
|
XmlReaderSettings settings = new XmlReaderSettings(); |
|
settings.ProhibitDtd = false; |
|
settings.MaxCharactersFromEntities = 1024; |
|
settings.XmlResolver = null; |
|
XmlReader reader = XmlReader.Create(stream, settings); |
|
|
|
|
|
Java |
|
---- |
|
|
|
Untested. The documentation of Xerces and its `Xerces SecurityMananger`_ |
|
sounds like Xerces is also vulnerable to billion laugh attacks with its |
|
default settings. It also does entity resolving when an |
|
``org.xml.sax.EntityResolver`` is configured. I'm not yet sure about the |
|
default setting here. |
|
|
|
Java specialists suggest to have a custom builder factory:: |
|
|
|
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance(); |
|
builderFactory.setXIncludeAware(False); |
|
builderFactory.setExpandEntityReferences(False); |
|
builderFactory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, True); |
|
|
|
builderFactory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", True); |
|
|
|
builderFactory.setFeature("http://xml.org/sax/features/external-general-entities", False); |
|
builderFactory.setFeature("http://xml.org/sax/features/external-parameter-entities", False); |
|
builderFactory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", False); |
|
builderFactory.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", False); |
|
|
|
|
|
TODO |
|
==== |
|
|
|
* DOM: Use xml.dom.xmlbuilder options for entity handling |
|
* SAX: take feature_external_ges and feature_external_pes (?) into account |
|
* test experimental monkey patching of stdlib modules |
|
* improve documentation |
|
|
|
|
|
License |
|
======= |
|
|
|
Copyright (c) 2013-2017 by Christian Heimes <[email protected]> |
|
|
|
Licensed to PSF under a Contributor Agreement. |
|
|
|
See https://www.python.org/psf/license for licensing details. |
|
|
|
|
|
Acknowledgements |
|
================ |
|
|
|
Brett Cannon (Python Core developer) |
|
review and code cleanup |
|
|
|
Antoine Pitrou (Python Core developer) |
|
code review |
|
|
|
Aaron Patterson, Ben Murphy and Michael Koziarski (Ruby community) |
|
Many thanks to Aaron, Ben and Michael from the Ruby community for their |
|
report and assistance. |
|
|
|
Thierry Carrez (OpenStack) |
|
Many thanks to Thierry for his report to the Python Security Response |
|
Team on behalf of the OpenStack security team. |
|
|
|
Carl Meyer (Django) |
|
Many thanks to Carl for his report to PSRT on behalf of the Django security |
|
team. |
|
|
|
Daniel Veillard (libxml2) |
|
Many thanks to Daniel for his insight and assistance with libxml2. |
|
|
|
semantics GmbH (https://www.semantics.de/) |
|
Many thanks to my employer semantics for letting me work on the issue |
|
during working hours as part of semantics's open source initiative. |
|
|
|
|
|
References |
|
========== |
|
|
|
* `XML DoS and Defenses (MSDN)`_ |
|
* `Billion Laughs`_ on Wikipedia |
|
* `ZIP bomb`_ on Wikipedia |
|
* `Configure SAX parsers for secure processing`_ |
|
* `Testing for XML Injection`_ |
|
|
|
.. _defusedxml package: https://github.com/tiran/defusedxml |
|
.. _defusedxml on PyPI: https://pypi.python.org/pypi/defusedxml |
|
.. _defusedexpat package: https://github.com/tiran/defusedexpat |
|
.. _defusedexpat on PyPI: https://pypi.python.org/pypi/defusedexpat |
|
.. _modified expat: https://github.com/tiran/expat |
|
.. _expat parser: http://expat.sourceforge.net/ |
|
.. _Attacking XML Security: https://www.isecpartners.com/media/12976/iSEC-HILL-Attacking-XML-Security-bh07.pdf |
|
.. _Billion Laughs: https://en.wikipedia.org/wiki/Billion_laughs |
|
.. _XML DoS and Defenses (MSDN): https://msdn.microsoft.com/en-us/magazine/ee335713.aspx |
|
.. _ZIP bomb: https://en.wikipedia.org/wiki/Zip_bomb |
|
.. _DTD: https://en.wikipedia.org/wiki/Document_Type_Definition |
|
.. _PI: https://en.wikipedia.org/wiki/Processing_Instruction |
|
.. _Avoid the dangers of XPath injection: http://www.ibm.com/developerworks/xml/library/x-xpathinjection/index.html |
|
.. _Configure SAX parsers for secure processing: http://www.ibm.com/developerworks/xml/library/x-tipcfsx/index.html |
|
.. _Testing for XML Injection: https://www.owasp.org/index.php/Testing_for_XML_Injection_(OWASP-DV-008) |
|
.. _Xerces SecurityMananger: https://xerces.apache.org/xerces2-j/javadocs/xerces2/org/apache/xerces/util/SecurityManager.html |
|
.. _XML Inclusion: https://www.w3.org/TR/xinclude/#include_element |
|
|
|
Changelog |
|
========= |
|
|
|
defusedxml 0.7.1 |
|
--------------------- |
|
|
|
*Release date: 08-Mar-2021* |
|
|
|
- Fix regression ``defusedxml.ElementTree.ParseError`` (#63) |
|
The ``ParseError`` exception is now the same class object as |
|
``xml.etree.ElementTree.ParseError`` again. |
|
|
|
|
|
defusedxml 0.7.0 |
|
---------------- |
|
|
|
*Release date: 4-Mar-2021* |
|
|
|
- No changes |
|
|
|
|
|
defusedxml 0.7.0rc2 |
|
------------------- |
|
|
|
*Release date: 12-Jan-2021* |
|
|
|
- Re-add and deprecate ``defusedxml.cElementTree`` |
|
- Use GitHub Actions instead of TravisCI |
|
- Restore ``ElementTree`` attribute of ``xml.etree`` module after patching |
|
|
|
defusedxml 0.7.0rc1 |
|
------------------- |
|
|
|
*Release date: 04-May-2020* |
|
|
|
- Add support for Python 3.9 |
|
- ``defusedxml.cElementTree`` is not available with Python 3.9. |
|
- Python 2 is deprecate. Support for Python 2 will be removed in 0.8.0. |
|
|
|
|
|
defusedxml 0.6.0 |
|
---------------- |
|
|
|
*Release date: 17-Apr-2019* |
|
|
|
- Increase test coverage. |
|
- Add badges to README. |
|
|
|
|
|
defusedxml 0.6.0rc1 |
|
------------------- |
|
|
|
*Release date: 14-Apr-2019* |
|
|
|
- Test on Python 3.7 stable and 3.8-dev |
|
- Drop support for Python 3.4 |
|
- No longer pass *html* argument to XMLParse. It has been deprecated and |
|
ignored for a long time. The DefusedXMLParser still takes a html argument. |
|
A deprecation warning is issued when the argument is False and a TypeError |
|
when it's True. |
|
- defusedxml now fails early when pyexpat stdlib module is not available or |
|
broken. |
|
- defusedxml.ElementTree.__all__ now lists ParseError as public attribute. |
|
- The defusedxml.ElementTree and defusedxml.cElementTree modules had a typo |
|
and used XMLParse instead of XMLParser as an alias for DefusedXMLParser. |
|
Both the old and fixed name are now available. |
|
|
|
|
|
defusedxml 0.5.0 |
|
---------------- |
|
|
|
*Release date: 07-Feb-2017* |
|
|
|
- No changes |
|
|
|
|
|
defusedxml 0.5.0.rc1 |
|
-------------------- |
|
|
|
*Release date: 28-Jan-2017* |
|
|
|
- Add compatibility with Python 3.6 |
|
- Drop support for Python 2.6, 3.1, 3.2, 3.3 |
|
- Fix lxml tests (XMLSyntaxError: Detected an entity reference loop) |
|
|
|
|
|
defusedxml 0.4.1 |
|
---------------- |
|
|
|
*Release date: 28-Mar-2013* |
|
|
|
- Add more demo exploits, e.g. python_external.py and Xalan XSLT demos. |
|
- Improved documentation. |
|
|
|
|
|
defusedxml 0.4 |
|
-------------- |
|
|
|
*Release date: 25-Feb-2013* |
|
|
|
- As per http://seclists.org/oss-sec/2013/q1/340 please REJECT |
|
CVE-2013-0278, CVE-2013-0279 and CVE-2013-0280 and use CVE-2013-1664, |
|
CVE-2013-1665 for OpenStack/etc. |
|
- Add missing parser_list argument to sax.make_parser(). The argument is |
|
ignored, though. (thanks to Florian Apolloner) |
|
- Add demo exploit for external entity attack on Python's SAX parser, XML-RPC |
|
and WebDAV. |
|
|
|
|
|
defusedxml 0.3 |
|
-------------- |
|
|
|
*Release date: 19-Feb-2013* |
|
|
|
- Improve documentation |
|
|
|
|
|
defusedxml 0.2 |
|
-------------- |
|
|
|
*Release date: 15-Feb-2013* |
|
|
|
- Rename ExternalEntitiesForbidden to ExternalReferenceForbidden |
|
- Rename defusedxml.lxml.check_dtd() to check_docinfo() |
|
- Unify argument names in callbacks |
|
- Add arguments and formatted representation to exceptions |
|
- Add forbid_external argument to all functions and classes |
|
- More tests |
|
- LOTS of documentation |
|
- Add example code for other languages (Ruby, Perl, PHP) and parsers (Genshi) |
|
- Add protection against XML and gzip attacks to xmlrpclib |
|
|
|
defusedxml 0.1 |
|
-------------- |
|
|
|
*Release date: 08-Feb-2013* |
|
|
|
- Initial and internal release for PSRT review |
|
|
|
|
|
|