commit python-beautifulsoup4 for openSUSE:Factory

29 Jun 2013

Hello community,

here is the log from the commit of package python-beautifulsoup4 for openSUSE:Factory checked in at 2013-06-29 19:43:22
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/python-beautifulsoup4 (Old)
 and      /work/SRC/openSUSE:Factory/.python-beautifulsoup4.new (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Package is "python-beautifulsoup4"

Changes:
--------

--- /work/SRC/openSUSE:Factory/python-beautifulsoup4/python-beautifulsoup4.changes	2013-06-18 10:36:16.000000000 +0200
+++ /work/SRC/openSUSE:Factory/.python-beautifulsoup4.new/python-beautifulsoup4.changes	2013-06-29 22:25:55.000000000 +0200
@@ -1,0 +2,37 @@
+Thu Jun 27 13:32:06 UTC 2013 - speilicke@suse.com
+
+- Update upstream URL
+
+-------------------------------------------------------------------
+Tue Jun 25 11:52:34 UTC 2013 - dmueller@suse.com
+
+- update to 4.2.1:
+ * The default XML formatter will now replace ampersands even if they
+   appear to be part of entities. That is, "<" will become
+   "&lt;". The old code was left over from Beautiful Soup 3, which
+   didn't always turn entities into Unicode characters.
+ 
+   If you really want the old behavior (maybe because you add new
+   strings to the tree, those strings include entities, and you want
+   the formatter to leave them alone on output), it can be found in
+   EntitySubstitution.substitute_xml_containing_entities(). [bug=1182183]
+ 
+ * Gave new_string() the ability to create subclasses of
+   NavigableString. [bug=1181986]
+ 
+ * Fixed another bug by which the html5lib tree builder could create a
+   disconnected tree. [bug=1182089]
+ 
+ * The .previous_element of a BeautifulSoup object is now always None,
+   not the last element to be parsed. [bug=1182089]
+ 
+ * Fixed test failures when lxml is not installed. [bug=1181589]
+ 
+ * html5lib now supports Python 3. Fixed some Python 2-specific
+   code in the html5lib test suite. [bug=1181624]
+ 
+ * The html.parser treebuilder can now handle numeric attributes in
+   text when the hexidecimal name of the attribute starts with a
+   capital X. Patch by Tim Shirley. [bug=1186242]
+
+-------------------------------------------------------------------

Old:
----
  beautifulsoup4-4.2.0.tar.gz

New:
----
  beautifulsoup4-4.2.1.tar.gz

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Other differences:
------------------
++++++ python-beautifulsoup4.spec ++++++
--- /var/tmp/diff_new_pack.luDQsR/_old	2013-06-29 22:25:56.000000000 +0200
+++ /var/tmp/diff_new_pack.luDQsR/_new	2013-06-29 22:25:56.000000000 +0200
@@ -16,9 +16,8 @@
 #
 
 
-%define _name   beautifulsoup4
-Name:           python-%{_name}
-Version:        4.2.0
+Name:           python-beautifulsoup4
+Version:        4.2.1
 Release:        0
 Summary:        HTML/XML Parser for Quick-Turnaround Applications Like Screen-Scraping
 License:        MIT
@@ -26,20 +25,19 @@
 Url:            http://www.crummy.com/software/BeautifulSoup/
 Source:         http://pypi.python.org/packages/source/b/beautifulsoup4/beautifulsoup4-%{version}.tar.gz
 BuildRoot:      %{_tmppath}/%{name}-%{version}-build
-BuildRequires:  python-Sphinx
 BuildRequires:  python-devel >= 2.6
+# Documentation requirements:
+BuildRequires:  python-Sphinx
+# Test requirements:
 BuildRequires:  python-html5lib
 BuildRequires:  python-lxml
 BuildRequires:  python-nose
 Requires:       python-html5lib
 Requires:       python-lxml
-%{py_requires}
-
-# build fails for SLE11 64bit due to 'noarch'
-%if 0%{?suse_version} >= 1140
-BuildArch:      noarch
-%else
+%if 0%{?suse_version} && 0%{?suse_version} <= 1110
 %{!?python_sitelib: %global python_sitelib %(python -c "from distutils.sysconfig import get_python_lib; print get_python_lib()")}
+%else
+BuildArch:      noarch
 %endif
 
 %description
@@ -79,32 +77,26 @@
 
 
 %prep
-%setup -q -n %{_name}-%{version}
+%setup -q -n beautifulsoup4-%{version}
 
 %build
-CFLAGS="%{optflags}" python setup.py build
+python setup.py build
+cd doc && make html
 
 %install
-python setup.py install  \
-   --prefix=%{_prefix}   \
-   --root=%{buildroot}
-cd doc
-make html
+python setup.py install --prefix=%{_prefix} --root=%{buildroot}
 
-%if 0%{?suse_version} >= 1140
 %check
 nosetests
-%endif
 
 %files
 %defattr(-,root,root)
 %doc AUTHORS.txt COPYING.txt
 %{python_sitelib}/bs4/
-%{python_sitelib}/%{_name}-%{version}-py*.egg-info
+%{python_sitelib}/beautifulsoup4-%{version}-py*.egg-info
 
 %files doc
 %defattr(-,root,root)
-%doc NEWS.txt README.txt TODO.txt
-%doc doc/build/html
+%doc NEWS.txt README.txt TODO.txt doc/build/html
 
 %changelog

++++++ beautifulsoup4-4.2.0.tar.gz -> beautifulsoup4-4.2.1.tar.gz ++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/NEWS.txt new/beautifulsoup4-4.2.1/NEWS.txt
--- old/beautifulsoup4-4.2.0/NEWS.txt	2013-05-15 14:36:50.000000000 +0200
+++ new/beautifulsoup4-4.2.1/NEWS.txt	2013-05-31 15:49:44.000000000 +0200
@@ -1,3 +1,33 @@
+= 4.2.1 (20130531) =
+
+* The default XML formatter will now replace ampersands even if they
+  appear to be part of entities. That is, "<" will become
+  "&lt;". The old code was left over from Beautiful Soup 3, which
+  didn't always turn entities into Unicode characters.
+
+  If you really want the old behavior (maybe because you add new
+  strings to the tree, those strings include entities, and you want
+  the formatter to leave them alone on output), it can be found in
+  EntitySubstitution.substitute_xml_containing_entities(). [bug=1182183]
+
+* Gave new_string() the ability to create subclasses of
+  NavigableString. [bug=1181986]
+
+* Fixed another bug by which the html5lib tree builder could create a
+  disconnected tree. [bug=1182089]
+
+* The .previous_element of a BeautifulSoup object is now always None,
+  not the last element to be parsed. [bug=1182089]
+
+* Fixed test failures when lxml is not installed. [bug=1181589]
+
+* html5lib now supports Python 3. Fixed some Python 2-specific
+  code in the html5lib test suite. [bug=1181624]
+
+* The html.parser treebuilder can now handle numeric attributes in
+  text when the hexidecimal name of the attribute starts with a
+  capital X. Patch by Tim Shirley. [bug=1186242]
+
 = 4.2.0 (20130514) =
 
 * The Tag.select() method now supports a much wider variety of CSS
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/PKG-INFO new/beautifulsoup4-4.2.1/PKG-INFO
--- old/beautifulsoup4-4.2.0/PKG-INFO	2013-05-15 14:43:52.000000000 +0200
+++ new/beautifulsoup4-4.2.1/PKG-INFO	2013-05-31 15:54:14.000000000 +0200
@@ -1,6 +1,6 @@
 Metadata-Version: 1.1
 Name: beautifulsoup4
-Version: 4.2.0
+Version: 4.2.1
 Summary: UNKNOWN
 Home-page: http://www.crummy.com/software/BeautifulSoup/bs4/
 Author: Leonard Richardson
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/__init__.py new/beautifulsoup4-4.2.1/bs4/__init__.py
--- old/beautifulsoup4-4.2.0/bs4/__init__.py	2013-05-15 14:36:50.000000000 +0200
+++ new/beautifulsoup4-4.2.1/bs4/__init__.py	2013-05-31 15:42:38.000000000 +0200
@@ -17,7 +17,7 @@
 """
 
 __author__ = "Leonard Richardson (leonardr@segfault.org)"
-__version__ = "4.2.0"
+__version__ = "4.2.1"
 __copyright__ = "Copyright (c) 2004-2013 Leonard Richardson"
 __license__ = "MIT"
 
@@ -201,9 +201,9 @@
         """Create a new tag associated with this soup."""
         return Tag(None, self.builder, name, namespace, nsprefix, attrs)
 
-    def new_string(self, s):
+    def new_string(self, s, subclass=NavigableString):
         """Create a new NavigableString associated with this soup."""
-        navigable = NavigableString(s)
+        navigable = subclass(s)
         navigable.setup()
         return navigable
 
@@ -245,14 +245,14 @@
             o = containerClass(currentData)
             self.object_was_parsed(o)
 
-    def object_was_parsed(self, o, parent=None, previous_element=None):
+    def object_was_parsed(self, o, parent=None, most_recent_element=None):
         """Add an object to the parse tree."""
         parent = parent or self.currentTag
-        previous_element = previous_element or self.previous_element
-        o.setup(parent, previous_element)
-        if self.previous_element:
-            self.previous_element.next_element = o
-        self.previous_element = o
+        most_recent_element = most_recent_element or self._most_recent_element
+        o.setup(parent, most_recent_element)
+        if most_recent_element is not None:
+            most_recent_element.next_element = o
+        self._most_recent_element = o
         parent.contents.append(o)
 
     def _popToTag(self, name, nsprefix=None, inclusivePop=True):
@@ -297,12 +297,12 @@
             return None
 
         tag = Tag(self, self.builder, name, namespace, nsprefix, attrs,
-                  self.currentTag, self.previous_element)
+                  self.currentTag, self._most_recent_element)
         if tag is None:
             return tag
-        if self.previous_element:
-            self.previous_element.next_element = tag
-        self.previous_element = tag
+        if self._most_recent_element:
+            self._most_recent_element.next_element = tag
+        self._most_recent_element = tag
         self.pushTag(tag)
         return tag
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/builder/__init__.py new/beautifulsoup4-4.2.1/bs4/builder/__init__.py
--- old/beautifulsoup4-4.2.0/bs4/builder/__init__.py	2013-05-15 14:36:50.000000000 +0200
+++ new/beautifulsoup4-4.2.1/bs4/builder/__init__.py	2013-05-20 20:58:23.000000000 +0200
@@ -152,7 +152,7 @@
             tag_specific = self.cdata_list_attributes.get(
                 tag_name.lower(), [])
             for cdata_list_attr in itertools.chain(universal, tag_specific):
-                if cdata_list_attr in dict(attrs):
+                if cdata_list_attr in attrs:
                     # Basically, we have a "class" attribute whose
                     # value is a whitespace-separated list of CSS
                     # classes. Split it into a list.
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/builder/_html5lib.py new/beautifulsoup4-4.2.1/bs4/builder/_html5lib.py
--- old/beautifulsoup4-4.2.0/bs4/builder/_html5lib.py	2013-05-15 14:36:50.000000000 +0200
+++ new/beautifulsoup4-4.2.1/bs4/builder/_html5lib.py	2013-05-20 18:01:18.000000000 +0200
@@ -131,6 +131,7 @@
             old_element = self.element.contents[-1]
             new_element = self.soup.new_string(old_element + node.element)
             old_element.replace_with(new_element)
+            self.soup._most_recent_element = new_element
         else:
             self.soup.object_was_parsed(node.element, parent=self.element)
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/builder/_htmlparser.py new/beautifulsoup4-4.2.1/bs4/builder/_htmlparser.py
--- old/beautifulsoup4-4.2.0/bs4/builder/_htmlparser.py	2013-05-15 14:36:50.000000000 +0200
+++ new/beautifulsoup4-4.2.1/bs4/builder/_htmlparser.py	2013-05-31 15:48:27.000000000 +0200
@@ -58,6 +58,8 @@
         # it's fixed.
         if name.startswith('x'):
             real_name = int(name.lstrip('x'), 16)
+        elif name.startswith('X'):
+            real_name = int(name.lstrip('X'), 16)
         else:
             real_name = int(name)
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/builder/_lxml.py new/beautifulsoup4-4.2.1/bs4/builder/_lxml.py
--- old/beautifulsoup4-4.2.0/bs4/builder/_lxml.py	2013-05-15 14:36:50.000000000 +0200
+++ new/beautifulsoup4-4.2.1/bs4/builder/_lxml.py	2013-05-20 15:09:43.000000000 +0200
@@ -3,6 +3,7 @@
     'LXMLTreeBuilder',
     ]
 
+from io import BytesIO
 from StringIO import StringIO
 import collections
 from lxml import etree
@@ -75,7 +76,9 @@
                 dammit.contains_replacement_characters)
 
     def feed(self, markup):
-        if isinstance(markup, basestring):
+        if isinstance(markup, bytes):
+            markup = BytesIO(markup)
+        elif isinstance(markup, unicode):
             markup = StringIO(markup)
         # Call feed() at least once, even if the markup is empty,
         # or the parser won't be initialized.
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/dammit.py new/beautifulsoup4-4.2.1/bs4/dammit.py
--- old/beautifulsoup4-4.2.0/bs4/dammit.py	2013-05-15 14:36:50.000000000 +0200
+++ new/beautifulsoup4-4.2.1/bs4/dammit.py	2013-05-20 20:58:23.000000000 +0200
@@ -81,6 +81,8 @@
                                            "&(?!#\d+;|#x[0-9a-fA-F]+;|\w+;)"
                                            ")")
 
+    AMPERSAND_OR_BRACKET = re.compile("([<>&])")
+
     @classmethod
     def _substitute_html_entity(cls, matchobj):
         entity = cls.CHARACTER_TO_HTML_ENTITY.get(matchobj.group(0))
@@ -134,6 +136,28 @@
     def substitute_xml(cls, value, make_quoted_attribute=False):
         """Substitute XML entities for special XML characters.
 
+        :param value: A string to be substituted. The less-than sign
+          will become <, the greater-than sign will become >,
+          and any ampersands will become &. If you want ampersands
+          that appear to be part of an entity definition to be left
+          alone, use substitute_xml_containing_entities() instead.
+
+        :param make_quoted_attribute: If True, then the string will be
+         quoted, as befits an attribute value.
+        """
+        # Escape angle brackets and ampersands.
+        value = cls.AMPERSAND_OR_BRACKET.sub(
+            cls._substitute_xml_entity, value)
+
+        if make_quoted_attribute:
+            value = cls.quoted_attribute_value(value)
+        return value
+
+    @classmethod
+    def substitute_xml_containing_entities(
+        cls, value, make_quoted_attribute=False):
+        """Substitute XML entities for special XML characters.
+
         :param value: A string to be substituted. The less-than sign will
           become <, the greater-than sign will become >, and any
           ampersands that are not part of an entity defition will
@@ -151,6 +175,7 @@
             value = cls.quoted_attribute_value(value)
         return value
 
+
     @classmethod
     def substitute_html(cls, s):
         """Replace certain Unicode characters with named HTML entities.
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/diagnose.py new/beautifulsoup4-4.2.1/bs4/diagnose.py
--- old/beautifulsoup4-4.2.0/bs4/diagnose.py	2013-05-15 14:36:50.000000000 +0200
+++ new/beautifulsoup4-4.2.1/bs4/diagnose.py	2013-05-20 17:07:53.000000000 +0200
@@ -4,8 +4,11 @@
 from bs4 import BeautifulSoup, __version__
 from bs4.builder import builder_registry
 import os
+import random
+import time
 import traceback
 import sys
+import cProfile
 
 def diagnose(data):
     """Diagnostic suite for isolating common problems."""
@@ -70,32 +73,36 @@
 
 class AnnouncingParser(HTMLParser):
     """Announces HTMLParser parse events, without doing anything else."""
+
+    def _p(self, s):
+        print(s)
+
     def handle_starttag(self, name, attrs):
-        print "%s START" % name
+        self._p("%s START" % name)
 
     def handle_endtag(self, name):
-        print "%s END" % name
+        self._p("%s END" % name)
 
     def handle_data(self, data):
-        print "%s DATA" % data
+        self._p("%s DATA" % data)
 
     def handle_charref(self, name):
-        print "%s CHARREF" % name
+        self._p("%s CHARREF" % name)
 
     def handle_entityref(self, name):
-        print "%s ENTITYREF" % name
+        self._p("%s ENTITYREF" % name)
 
     def handle_comment(self, data):
-        print "%s COMMENT" % data
+        self._p("%s COMMENT" % data)
 
     def handle_decl(self, data):
-        print "%s DECL" % data
+        self._p("%s DECL" % data)
 
     def unknown_decl(self, data):
-        print "%s UNKNOWN-DECL" % data
+        self._p("%s UNKNOWN-DECL" % data)
 
     def handle_pi(self, data):
-        print "%s PI" % data
+        self._p("%s PI" % data)
 
 def htmlparser_trace(data):
     """Print out the HTMLParser events that occur during parsing.
@@ -106,5 +113,66 @@
     parser = AnnouncingParser()
     parser.feed(data)
 
+_vowels = "aeiou"
+_consonants = "bcdfghjklmnpqrstvwxyz"
+
+def rword(length=5):
+    "Generate a random word-like string."
+    s = ''
+    for i in range(length):
+        if i % 2 == 0:
+            t = _consonants
+        else:
+            t = _vowels
+        s += random.choice(t)
+    return s
+
+def rsentence(length=4):
+    "Generate a random sentence-like string."
+    return " ".join(rword(random.randint(4,9)) for i in range(length))
+        
+def rdoc(num_elements=1000):
+    """Randomly generate an invalid HTML document."""
+    tag_names = ['p', 'div', 'span', 'i', 'b', 'script', 'table']
+    elements = []
+    for i in range(num_elements):
+        choice = random.randint(0,3)
+        if choice == 0:
+            # New tag.
+            tag_name = random.choice(tag_names)
+            elements.append("<%s>" % tag_name)
+        elif choice == 1:
+            elements.append(rsentence(random.randint(1,4)))
+        elif choice == 2:
+            # Close a tag.
+            tag_name = random.choice(tag_names)
+            elements.append("</%s>" % tag_name)
+    return "<html>" + "\n".join(elements) + "</html>"
+
+def benchmark_parsers(num_elements=100000):
+    """Very basic head-to-head performance benchmark."""
+    print "Comparative parser benchmark on Beautiful Soup %s" % __version__
+    data = rdoc(num_elements)
+    print "Generated a large invalid HTML document (%d bytes)." % len(data)
+    
+    for parser in ["lxml", ["lxml", "html"], "html5lib", "html.parser"]:
+        success = False
+        try:
+            a = time.time()
+            soup = BeautifulSoup(data, parser)
+            b = time.time()
+            success = True
+        except Exception, e:
+            print "%s could not parse the markup." % parser
+            traceback.print_exc()
+        if success:
+            print "BS4+%s parsed the markup in %.2fs." % (parser, b-a)
+
+    from lxml import etree
+    a = time.time()
+    etree.HTML(data)
+    b = time.time()
+    print "Raw lxml parsed the markup in %.2fs." % (b-a)
+
 if __name__ == '__main__':
     diagnose(sys.stdin.read())
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/testing.py new/beautifulsoup4-4.2.1/bs4/testing.py
--- old/beautifulsoup4-4.2.0/bs4/testing.py	2013-05-15 14:36:50.000000000 +0200
+++ new/beautifulsoup4-4.2.1/bs4/testing.py	2013-05-31 15:46:18.000000000 +0200
@@ -228,12 +228,14 @@
         expect = u'<p id="pi\N{LATIN SMALL LETTER N WITH TILDE}ata"></p>'
         self.assertSoupEquals('<p id="piñata"></p>', expect)
         self.assertSoupEquals('<p id="piñata"></p>', expect)
+        self.assertSoupEquals('<p id="piñata"></p>', expect)
         self.assertSoupEquals('<p id="piñata"></p>', expect)
 
     def test_entities_in_text_converted_to_unicode(self):
         expect = u'<p>pi\N{LATIN SMALL LETTER N WITH TILDE}ata</p>'
         self.assertSoupEquals("<p>piñata</p>", expect)
         self.assertSoupEquals("<p>piñata</p>", expect)
+        self.assertSoupEquals("<p>piñata</p>", expect)
         self.assertSoupEquals("<p>piñata</p>", expect)
 
     def test_quot_entity_converted_to_quotation_mark(self):
@@ -246,6 +248,12 @@
         self.assertSoupEquals("�", expect)
         self.assertSoupEquals("�", expect)
 
+    def test_multipart_strings(self):
+        "Mostly to prevent a recurrence of a bug in the html5lib treebuilder."
+        soup = self.soup("<html><h2>\nfoo</h2><p></p></html>")
+        self.assertEqual("p", soup.h2.string.next_element.name)
+        self.assertEqual("p", soup.p.name)
+
     def test_basic_namespaces(self):
         """Parsers don't need to *understand* namespaces, but at the
         very least they should not choke on namespaces or lose
@@ -464,6 +472,18 @@
         self.assertEqual(
             soup.encode("utf-8"), markup)
 
+    def test_formatter_processes_script_tag_for_xml_documents(self):
+        doc = """
+  <script type="text/javascript">
+  </script>
+"""
+        soup = BeautifulSoup(doc, "xml")
+        # lxml would have stripped this while parsing, but we can add
+        # it later.
+        soup.script.string = 'console.log("< < hey > > ");'
+        encoded = soup.encode()
+        self.assertTrue(b"< < hey > >" in encoded)
+
     def test_popping_namespaced_tag(self):
         markup = '<rss xmlns:dc="foo"><dc:creator>b</dc:creator><dc:date>2012-07-02T20:33:42Z</dc:date><dc:rights>c</dc:rights><image>d</image></rss>'
         soup = self.soup(markup)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/tests/test_html5lib.py new/beautifulsoup4-4.2.1/bs4/tests/test_html5lib.py
--- old/beautifulsoup4-4.2.0/bs4/tests/test_html5lib.py	2013-05-15 14:36:50.000000000 +0200
+++ new/beautifulsoup4-4.2.1/bs4/tests/test_html5lib.py	2013-05-20 15:33:11.000000000 +0200
@@ -69,4 +69,4 @@
 </html>'''
         soup = self.soup(markup)
         # Verify that we can reach the <p> tag; this means the tree is connected.
-        self.assertEquals("<p>foo</p>", soup.p.encode())
+        self.assertEqual(b"<p>foo</p>", soup.p.encode())
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/tests/test_lxml.py new/beautifulsoup4-4.2.1/bs4/tests/test_lxml.py
--- old/beautifulsoup4-4.2.0/bs4/tests/test_lxml.py	2013-05-15 14:36:50.000000000 +0200
+++ new/beautifulsoup4-4.2.1/bs4/tests/test_lxml.py	2013-05-20 15:14:40.000000000 +0200
@@ -10,6 +10,7 @@
     LXML_VERSION = lxml.etree.LXML_VERSION
 except ImportError, e:
     LXML_PRESENT = False
+    LXML_VERSION = (0,)
 
 from bs4 import (
     BeautifulSoup,
@@ -47,7 +48,7 @@
     # test if an old version of lxml is installed.
 
     @skipIf(
-        LXML_VERSION < (2,3,5,0),
+        not LXML_PRESENT or LXML_VERSION < (2,3,5,0),
         "Skipping doctype test for old version of lxml to avoid segfault.")
     def test_empty_doctype(self):
         soup = self.soup("<!DOCTYPE>")
@@ -85,4 +86,3 @@
     @property
     def default_builder(self):
         return LXMLTreeBuilderForXML()
-
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/tests/test_soup.py new/beautifulsoup4-4.2.1/bs4/tests/test_soup.py
--- old/beautifulsoup4-4.2.0/bs4/tests/test_soup.py	2013-05-15 14:36:50.000000000 +0200
+++ new/beautifulsoup4-4.2.1/bs4/tests/test_soup.py	2013-05-20 20:58:23.000000000 +0200
@@ -125,9 +125,14 @@
     def test_xml_quoting_handles_ampersands(self):
         self.assertEqual(self.sub.substitute_xml("AT&T"), "AT&T")
 
-    def test_xml_quoting_ignores_ampersands_when_they_are_part_of_an_entity(self):
+    def test_xml_quoting_including_ampersands_when_they_are_part_of_an_entity(self):
         self.assertEqual(
             self.sub.substitute_xml("ÁT&T"),
+            "&Aacute;T&T")
+
+    def test_xml_quoting_ignoring_ampersands_when_they_are_part_of_an_entity(self):
+        self.assertEqual(
+            self.sub.substitute_xml_containing_entities("ÁT&T"),
             "ÁT&T")
 
     def test_quotes_not_html_substituted(self):
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/tests/test_tree.py new/beautifulsoup4-4.2.1/bs4/tests/test_tree.py
--- old/beautifulsoup4-4.2.0/bs4/tests/test_tree.py	2013-05-15 14:36:50.000000000 +0200
+++ new/beautifulsoup4-4.2.1/bs4/tests/test_tree.py	2013-05-31 15:43:04.000000000 +0200
@@ -689,6 +689,12 @@
         self.assertEqual("foo", s)
         self.assertTrue(isinstance(s, NavigableString))
 
+    def test_new_string_can_create_navigablestring_subclass(self):
+        soup = self.soup("")
+        s = soup.new_string("foo", Comment)
+        self.assertEqual("foo", s)
+        self.assertTrue(isinstance(s, Comment))
+
 class TestTreeModification(SoupTest):
 
     def test_attribute_modification(self):
@@ -1181,7 +1187,6 @@
         soup = self.soup("foo<!--IGNORE-->bar")
         self.assertEqual(['foo', 'bar'], list(soup.strings))
 
-
 class TestCDAtaListAttributes(SoupTest):
 
     """Testing cdata-list attributes like 'class'.
@@ -1344,18 +1349,6 @@
         encoded = BeautifulSoup(doc).encode()
         self.assertTrue(b"< < hey > >" in encoded)
 
-    def test_formatter_processes_script_tag_for_xml_documents(self):
-        doc = """
-  <script type="text/javascript">
-  </script>
-"""
-        soup = BeautifulSoup(doc, "xml")
-        # lxml would have stripped this while parsing, but we can add
-        # it later.
-        soup.script.string = 'console.log("< < hey > > ");'
-        encoded = soup.encode()
-        self.assertTrue(b"< < hey > >" in encoded)
-
     def test_prettify_leaves_preformatted_text_alone(self):
         soup = self.soup("<div>  foo  <pre>  \tbar\n  \n  </pre>  baz  ")
         # Everything outside the <pre> tag is reformatted, but everything
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/doc/source/index.rst new/beautifulsoup4-4.2.1/doc/source/index.rst
--- old/beautifulsoup4-4.2.0/doc/source/index.rst	2013-05-15 14:36:50.000000000 +0200
+++ new/beautifulsoup4-4.2.1/doc/source/index.rst	2013-05-20 16:18:05.000000000 +0200
@@ -239,10 +239,10 @@
 
 :kbd:`$ pip install lxml`
 
-If you're using Python 2, another alternative is the pure-Python
-`html5lib parser <http://code.google.com/p/html5lib/>`_, which parses
-HTML the way a web browser does. Depending on your setup, you might
-install html5lib with one of these commands:
+Another alternative is the pure-Python `html5lib parser
+<http://code.google.com/p/html5lib/>`_, which parses HTML the way a
+web browser does. Depending on your setup, you might install html5lib
+with one of these commands:
 
 :kbd:`$ apt-get install python-html5lib`
 
@@ -270,7 +270,7 @@
 | html5lib             | ``BeautifulSoup(markup, "html5lib")``      | * Extremely lenient            | * Very slow              |
 |                      |                                            | * Parses pages the same way a  | * External Python        |
 |                      |                                            |   web browser does             |   dependency             |
-|                      |                                            | * Creates valid HTML5          | * Python 2 only          |
+|                      |                                            | * Creates valid HTML5          |                          |
 +----------------------+--------------------------------------------+--------------------------------+--------------------------+
 
 If you can, I recommend you install and use lxml for speed. If you're
@@ -1806,6 +1806,20 @@
    tag.contents
    # [u'Hello', u' there']
 
+If you want to create a comment or some other subclass of
+``NavigableString``, pass that class as the second argument to
+``new_string()``::
+
+   from bs4 import Comment
+   new_comment = soup.new_string("Nice to see you.", Comment)
+   tag.append(new_comment)
+   tag
+   # <b>Hello there<!--Nice to see you.--></b>
+   tag.contents
+   # [u'Hello', u' there', u'Nice to see you.']
+
+(This is a new feature in Beautiful Soup 4.2.1.)
+
 What if you need to create a whole new tag?  The best solution is to
 call the factory method ``BeautifulSoup.new_tag()``::
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/setup.py new/beautifulsoup4-4.2.1/setup.py
--- old/beautifulsoup4-4.2.0/setup.py	2013-05-15 14:36:50.000000000 +0200
+++ new/beautifulsoup4-4.2.1/setup.py	2013-05-31 15:52:01.000000000 +0200
@@ -7,7 +7,7 @@
     from distutils.command.build_py import build_py
 
 setup(name="beautifulsoup4",
-      version = "4.2.0",
+      version = "4.2.1",
       author="Leonard Richardson",
       author_email='leonardr@segfault.org',
       url="http://www.crummy.com/software/BeautifulSoup/bs4/",

-- 
To unsubscribe, e-mail: opensuse-commit+unsubscribe@opensuse.org
For additional commands, e-mail: opensuse-commit+help@opensuse.org

    

root＠hilbert.suse.de

tags

participants (1)