commit python-beautifulsoup4 for openSUSE:Factory
Hello community, here is the log from the commit of package python-beautifulsoup4 for openSUSE:Factory checked in at 2013-06-29 19:43:22 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/python-beautifulsoup4 (Old) and /work/SRC/openSUSE:Factory/.python-beautifulsoup4.new (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Package is "python-beautifulsoup4" Changes: -------- --- /work/SRC/openSUSE:Factory/python-beautifulsoup4/python-beautifulsoup4.changes 2013-06-18 10:36:16.000000000 +0200 +++ /work/SRC/openSUSE:Factory/.python-beautifulsoup4.new/python-beautifulsoup4.changes 2013-06-29 22:25:55.000000000 +0200 @@ -1,0 +2,37 @@ +Thu Jun 27 13:32:06 UTC 2013 - speilicke@suse.com + +- Update upstream URL + +------------------------------------------------------------------- +Tue Jun 25 11:52:34 UTC 2013 - dmueller@suse.com + +- update to 4.2.1: + * The default XML formatter will now replace ampersands even if they + appear to be part of entities. That is, "<" will become + "<". The old code was left over from Beautiful Soup 3, which + didn't always turn entities into Unicode characters. + + If you really want the old behavior (maybe because you add new + strings to the tree, those strings include entities, and you want + the formatter to leave them alone on output), it can be found in + EntitySubstitution.substitute_xml_containing_entities(). [bug=1182183] + + * Gave new_string() the ability to create subclasses of + NavigableString. [bug=1181986] + + * Fixed another bug by which the html5lib tree builder could create a + disconnected tree. [bug=1182089] + + * The .previous_element of a BeautifulSoup object is now always None, + not the last element to be parsed. [bug=1182089] + + * Fixed test failures when lxml is not installed. [bug=1181589] + + * html5lib now supports Python 3. Fixed some Python 2-specific + code in the html5lib test suite. [bug=1181624] + + * The html.parser treebuilder can now handle numeric attributes in + text when the hexidecimal name of the attribute starts with a + capital X. Patch by Tim Shirley. [bug=1186242] + +------------------------------------------------------------------- Old: ---- beautifulsoup4-4.2.0.tar.gz New: ---- beautifulsoup4-4.2.1.tar.gz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ python-beautifulsoup4.spec ++++++ --- /var/tmp/diff_new_pack.luDQsR/_old 2013-06-29 22:25:56.000000000 +0200 +++ /var/tmp/diff_new_pack.luDQsR/_new 2013-06-29 22:25:56.000000000 +0200 @@ -16,9 +16,8 @@ # -%define _name beautifulsoup4 -Name: python-%{_name} -Version: 4.2.0 +Name: python-beautifulsoup4 +Version: 4.2.1 Release: 0 Summary: HTML/XML Parser for Quick-Turnaround Applications Like Screen-Scraping License: MIT @@ -26,20 +25,19 @@ Url: http://www.crummy.com/software/BeautifulSoup/ Source: http://pypi.python.org/packages/source/b/beautifulsoup4/beautifulsoup4-%{version}.tar.gz BuildRoot: %{_tmppath}/%{name}-%{version}-build -BuildRequires: python-Sphinx BuildRequires: python-devel >= 2.6 +# Documentation requirements: +BuildRequires: python-Sphinx +# Test requirements: BuildRequires: python-html5lib BuildRequires: python-lxml BuildRequires: python-nose Requires: python-html5lib Requires: python-lxml -%{py_requires} - -# build fails for SLE11 64bit due to 'noarch' -%if 0%{?suse_version} >= 1140 -BuildArch: noarch -%else +%if 0%{?suse_version} && 0%{?suse_version} <= 1110 %{!?python_sitelib: %global python_sitelib %(python -c "from distutils.sysconfig import get_python_lib; print get_python_lib()")} +%else +BuildArch: noarch %endif %description @@ -79,32 +77,26 @@ %prep -%setup -q -n %{_name}-%{version} +%setup -q -n beautifulsoup4-%{version} %build -CFLAGS="%{optflags}" python setup.py build +python setup.py build +cd doc && make html %install -python setup.py install \ - --prefix=%{_prefix} \ - --root=%{buildroot} -cd doc -make html +python setup.py install --prefix=%{_prefix} --root=%{buildroot} -%if 0%{?suse_version} >= 1140 %check nosetests -%endif %files %defattr(-,root,root) %doc AUTHORS.txt COPYING.txt %{python_sitelib}/bs4/ -%{python_sitelib}/%{_name}-%{version}-py*.egg-info +%{python_sitelib}/beautifulsoup4-%{version}-py*.egg-info %files doc %defattr(-,root,root) -%doc NEWS.txt README.txt TODO.txt -%doc doc/build/html +%doc NEWS.txt README.txt TODO.txt doc/build/html %changelog ++++++ beautifulsoup4-4.2.0.tar.gz -> beautifulsoup4-4.2.1.tar.gz ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/NEWS.txt new/beautifulsoup4-4.2.1/NEWS.txt --- old/beautifulsoup4-4.2.0/NEWS.txt 2013-05-15 14:36:50.000000000 +0200 +++ new/beautifulsoup4-4.2.1/NEWS.txt 2013-05-31 15:49:44.000000000 +0200 @@ -1,3 +1,33 @@ += 4.2.1 (20130531) = + +* The default XML formatter will now replace ampersands even if they + appear to be part of entities. That is, "<" will become + "<". The old code was left over from Beautiful Soup 3, which + didn't always turn entities into Unicode characters. + + If you really want the old behavior (maybe because you add new + strings to the tree, those strings include entities, and you want + the formatter to leave them alone on output), it can be found in + EntitySubstitution.substitute_xml_containing_entities(). [bug=1182183] + +* Gave new_string() the ability to create subclasses of + NavigableString. [bug=1181986] + +* Fixed another bug by which the html5lib tree builder could create a + disconnected tree. [bug=1182089] + +* The .previous_element of a BeautifulSoup object is now always None, + not the last element to be parsed. [bug=1182089] + +* Fixed test failures when lxml is not installed. [bug=1181589] + +* html5lib now supports Python 3. Fixed some Python 2-specific + code in the html5lib test suite. [bug=1181624] + +* The html.parser treebuilder can now handle numeric attributes in + text when the hexidecimal name of the attribute starts with a + capital X. Patch by Tim Shirley. [bug=1186242] + = 4.2.0 (20130514) = * The Tag.select() method now supports a much wider variety of CSS diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/PKG-INFO new/beautifulsoup4-4.2.1/PKG-INFO --- old/beautifulsoup4-4.2.0/PKG-INFO 2013-05-15 14:43:52.000000000 +0200 +++ new/beautifulsoup4-4.2.1/PKG-INFO 2013-05-31 15:54:14.000000000 +0200 @@ -1,6 +1,6 @@ Metadata-Version: 1.1 Name: beautifulsoup4 -Version: 4.2.0 +Version: 4.2.1 Summary: UNKNOWN Home-page: http://www.crummy.com/software/BeautifulSoup/bs4/ Author: Leonard Richardson diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/__init__.py new/beautifulsoup4-4.2.1/bs4/__init__.py --- old/beautifulsoup4-4.2.0/bs4/__init__.py 2013-05-15 14:36:50.000000000 +0200 +++ new/beautifulsoup4-4.2.1/bs4/__init__.py 2013-05-31 15:42:38.000000000 +0200 @@ -17,7 +17,7 @@ """ __author__ = "Leonard Richardson (leonardr@segfault.org)" -__version__ = "4.2.0" +__version__ = "4.2.1" __copyright__ = "Copyright (c) 2004-2013 Leonard Richardson" __license__ = "MIT" @@ -201,9 +201,9 @@ """Create a new tag associated with this soup.""" return Tag(None, self.builder, name, namespace, nsprefix, attrs) - def new_string(self, s): + def new_string(self, s, subclass=NavigableString): """Create a new NavigableString associated with this soup.""" - navigable = NavigableString(s) + navigable = subclass(s) navigable.setup() return navigable @@ -245,14 +245,14 @@ o = containerClass(currentData) self.object_was_parsed(o) - def object_was_parsed(self, o, parent=None, previous_element=None): + def object_was_parsed(self, o, parent=None, most_recent_element=None): """Add an object to the parse tree.""" parent = parent or self.currentTag - previous_element = previous_element or self.previous_element - o.setup(parent, previous_element) - if self.previous_element: - self.previous_element.next_element = o - self.previous_element = o + most_recent_element = most_recent_element or self._most_recent_element + o.setup(parent, most_recent_element) + if most_recent_element is not None: + most_recent_element.next_element = o + self._most_recent_element = o parent.contents.append(o) def _popToTag(self, name, nsprefix=None, inclusivePop=True): @@ -297,12 +297,12 @@ return None tag = Tag(self, self.builder, name, namespace, nsprefix, attrs, - self.currentTag, self.previous_element) + self.currentTag, self._most_recent_element) if tag is None: return tag - if self.previous_element: - self.previous_element.next_element = tag - self.previous_element = tag + if self._most_recent_element: + self._most_recent_element.next_element = tag + self._most_recent_element = tag self.pushTag(tag) return tag diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/builder/__init__.py new/beautifulsoup4-4.2.1/bs4/builder/__init__.py --- old/beautifulsoup4-4.2.0/bs4/builder/__init__.py 2013-05-15 14:36:50.000000000 +0200 +++ new/beautifulsoup4-4.2.1/bs4/builder/__init__.py 2013-05-20 20:58:23.000000000 +0200 @@ -152,7 +152,7 @@ tag_specific = self.cdata_list_attributes.get( tag_name.lower(), []) for cdata_list_attr in itertools.chain(universal, tag_specific): - if cdata_list_attr in dict(attrs): + if cdata_list_attr in attrs: # Basically, we have a "class" attribute whose # value is a whitespace-separated list of CSS # classes. Split it into a list. diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/builder/_html5lib.py new/beautifulsoup4-4.2.1/bs4/builder/_html5lib.py --- old/beautifulsoup4-4.2.0/bs4/builder/_html5lib.py 2013-05-15 14:36:50.000000000 +0200 +++ new/beautifulsoup4-4.2.1/bs4/builder/_html5lib.py 2013-05-20 18:01:18.000000000 +0200 @@ -131,6 +131,7 @@ old_element = self.element.contents[-1] new_element = self.soup.new_string(old_element + node.element) old_element.replace_with(new_element) + self.soup._most_recent_element = new_element else: self.soup.object_was_parsed(node.element, parent=self.element) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/builder/_htmlparser.py new/beautifulsoup4-4.2.1/bs4/builder/_htmlparser.py --- old/beautifulsoup4-4.2.0/bs4/builder/_htmlparser.py 2013-05-15 14:36:50.000000000 +0200 +++ new/beautifulsoup4-4.2.1/bs4/builder/_htmlparser.py 2013-05-31 15:48:27.000000000 +0200 @@ -58,6 +58,8 @@ # it's fixed. if name.startswith('x'): real_name = int(name.lstrip('x'), 16) + elif name.startswith('X'): + real_name = int(name.lstrip('X'), 16) else: real_name = int(name) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/builder/_lxml.py new/beautifulsoup4-4.2.1/bs4/builder/_lxml.py --- old/beautifulsoup4-4.2.0/bs4/builder/_lxml.py 2013-05-15 14:36:50.000000000 +0200 +++ new/beautifulsoup4-4.2.1/bs4/builder/_lxml.py 2013-05-20 15:09:43.000000000 +0200 @@ -3,6 +3,7 @@ 'LXMLTreeBuilder', ] +from io import BytesIO from StringIO import StringIO import collections from lxml import etree @@ -75,7 +76,9 @@ dammit.contains_replacement_characters) def feed(self, markup): - if isinstance(markup, basestring): + if isinstance(markup, bytes): + markup = BytesIO(markup) + elif isinstance(markup, unicode): markup = StringIO(markup) # Call feed() at least once, even if the markup is empty, # or the parser won't be initialized. diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/dammit.py new/beautifulsoup4-4.2.1/bs4/dammit.py --- old/beautifulsoup4-4.2.0/bs4/dammit.py 2013-05-15 14:36:50.000000000 +0200 +++ new/beautifulsoup4-4.2.1/bs4/dammit.py 2013-05-20 20:58:23.000000000 +0200 @@ -81,6 +81,8 @@ "&(?!#\d+;|#x[0-9a-fA-F]+;|\w+;)" ")") + AMPERSAND_OR_BRACKET = re.compile("([<>&])") + @classmethod def _substitute_html_entity(cls, matchobj): entity = cls.CHARACTER_TO_HTML_ENTITY.get(matchobj.group(0)) @@ -134,6 +136,28 @@ def substitute_xml(cls, value, make_quoted_attribute=False): """Substitute XML entities for special XML characters. + :param value: A string to be substituted. The less-than sign + will become <, the greater-than sign will become >, + and any ampersands will become &. If you want ampersands + that appear to be part of an entity definition to be left + alone, use substitute_xml_containing_entities() instead. + + :param make_quoted_attribute: If True, then the string will be + quoted, as befits an attribute value. + """ + # Escape angle brackets and ampersands. + value = cls.AMPERSAND_OR_BRACKET.sub( + cls._substitute_xml_entity, value) + + if make_quoted_attribute: + value = cls.quoted_attribute_value(value) + return value + + @classmethod + def substitute_xml_containing_entities( + cls, value, make_quoted_attribute=False): + """Substitute XML entities for special XML characters. + :param value: A string to be substituted. The less-than sign will become <, the greater-than sign will become >, and any ampersands that are not part of an entity defition will @@ -151,6 +175,7 @@ value = cls.quoted_attribute_value(value) return value + @classmethod def substitute_html(cls, s): """Replace certain Unicode characters with named HTML entities. diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/diagnose.py new/beautifulsoup4-4.2.1/bs4/diagnose.py --- old/beautifulsoup4-4.2.0/bs4/diagnose.py 2013-05-15 14:36:50.000000000 +0200 +++ new/beautifulsoup4-4.2.1/bs4/diagnose.py 2013-05-20 17:07:53.000000000 +0200 @@ -4,8 +4,11 @@ from bs4 import BeautifulSoup, __version__ from bs4.builder import builder_registry import os +import random +import time import traceback import sys +import cProfile def diagnose(data): """Diagnostic suite for isolating common problems.""" @@ -70,32 +73,36 @@ class AnnouncingParser(HTMLParser): """Announces HTMLParser parse events, without doing anything else.""" + + def _p(self, s): + print(s) + def handle_starttag(self, name, attrs): - print "%s START" % name + self._p("%s START" % name) def handle_endtag(self, name): - print "%s END" % name + self._p("%s END" % name) def handle_data(self, data): - print "%s DATA" % data + self._p("%s DATA" % data) def handle_charref(self, name): - print "%s CHARREF" % name + self._p("%s CHARREF" % name) def handle_entityref(self, name): - print "%s ENTITYREF" % name + self._p("%s ENTITYREF" % name) def handle_comment(self, data): - print "%s COMMENT" % data + self._p("%s COMMENT" % data) def handle_decl(self, data): - print "%s DECL" % data + self._p("%s DECL" % data) def unknown_decl(self, data): - print "%s UNKNOWN-DECL" % data + self._p("%s UNKNOWN-DECL" % data) def handle_pi(self, data): - print "%s PI" % data + self._p("%s PI" % data) def htmlparser_trace(data): """Print out the HTMLParser events that occur during parsing. @@ -106,5 +113,66 @@ parser = AnnouncingParser() parser.feed(data) +_vowels = "aeiou" +_consonants = "bcdfghjklmnpqrstvwxyz" + +def rword(length=5): + "Generate a random word-like string." + s = '' + for i in range(length): + if i % 2 == 0: + t = _consonants + else: + t = _vowels + s += random.choice(t) + return s + +def rsentence(length=4): + "Generate a random sentence-like string." + return " ".join(rword(random.randint(4,9)) for i in range(length)) + +def rdoc(num_elements=1000): + """Randomly generate an invalid HTML document.""" + tag_names = ['p', 'div', 'span', 'i', 'b', 'script', 'table'] + elements = [] + for i in range(num_elements): + choice = random.randint(0,3) + if choice == 0: + # New tag. + tag_name = random.choice(tag_names) + elements.append("<%s>" % tag_name) + elif choice == 1: + elements.append(rsentence(random.randint(1,4))) + elif choice == 2: + # Close a tag. + tag_name = random.choice(tag_names) + elements.append("</%s>" % tag_name) + return "<html>" + "\n".join(elements) + "</html>" + +def benchmark_parsers(num_elements=100000): + """Very basic head-to-head performance benchmark.""" + print "Comparative parser benchmark on Beautiful Soup %s" % __version__ + data = rdoc(num_elements) + print "Generated a large invalid HTML document (%d bytes)." % len(data) + + for parser in ["lxml", ["lxml", "html"], "html5lib", "html.parser"]: + success = False + try: + a = time.time() + soup = BeautifulSoup(data, parser) + b = time.time() + success = True + except Exception, e: + print "%s could not parse the markup." % parser + traceback.print_exc() + if success: + print "BS4+%s parsed the markup in %.2fs." % (parser, b-a) + + from lxml import etree + a = time.time() + etree.HTML(data) + b = time.time() + print "Raw lxml parsed the markup in %.2fs." % (b-a) + if __name__ == '__main__': diagnose(sys.stdin.read()) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/testing.py new/beautifulsoup4-4.2.1/bs4/testing.py --- old/beautifulsoup4-4.2.0/bs4/testing.py 2013-05-15 14:36:50.000000000 +0200 +++ new/beautifulsoup4-4.2.1/bs4/testing.py 2013-05-31 15:46:18.000000000 +0200 @@ -228,12 +228,14 @@ expect = u'<p id="pi\N{LATIN SMALL LETTER N WITH TILDE}ata"></p>' self.assertSoupEquals('<p id="piñata"></p>', expect) self.assertSoupEquals('<p id="piñata"></p>', expect) + self.assertSoupEquals('<p id="piñata"></p>', expect) self.assertSoupEquals('<p id="piñata"></p>', expect) def test_entities_in_text_converted_to_unicode(self): expect = u'<p>pi\N{LATIN SMALL LETTER N WITH TILDE}ata</p>' self.assertSoupEquals("<p>piñata</p>", expect) self.assertSoupEquals("<p>piñata</p>", expect) + self.assertSoupEquals("<p>piñata</p>", expect) self.assertSoupEquals("<p>piñata</p>", expect) def test_quot_entity_converted_to_quotation_mark(self): @@ -246,6 +248,12 @@ self.assertSoupEquals("", expect) self.assertSoupEquals("", expect) + def test_multipart_strings(self): + "Mostly to prevent a recurrence of a bug in the html5lib treebuilder." + soup = self.soup("<html><h2>\nfoo</h2><p></p></html>") + self.assertEqual("p", soup.h2.string.next_element.name) + self.assertEqual("p", soup.p.name) + def test_basic_namespaces(self): """Parsers don't need to *understand* namespaces, but at the very least they should not choke on namespaces or lose @@ -464,6 +472,18 @@ self.assertEqual( soup.encode("utf-8"), markup) + def test_formatter_processes_script_tag_for_xml_documents(self): + doc = """ + <script type="text/javascript"> + </script> +""" + soup = BeautifulSoup(doc, "xml") + # lxml would have stripped this while parsing, but we can add + # it later. + soup.script.string = 'console.log("< < hey > > ");' + encoded = soup.encode() + self.assertTrue(b"< < hey > >" in encoded) + def test_popping_namespaced_tag(self): markup = '<rss xmlns:dc="foo"><dc:creator>b</dc:creator><dc:date>2012-07-02T20:33:42Z</dc:date><dc:rights>c</dc:rights><image>d</image></rss>' soup = self.soup(markup) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/tests/test_html5lib.py new/beautifulsoup4-4.2.1/bs4/tests/test_html5lib.py --- old/beautifulsoup4-4.2.0/bs4/tests/test_html5lib.py 2013-05-15 14:36:50.000000000 +0200 +++ new/beautifulsoup4-4.2.1/bs4/tests/test_html5lib.py 2013-05-20 15:33:11.000000000 +0200 @@ -69,4 +69,4 @@ </html>''' soup = self.soup(markup) # Verify that we can reach the <p> tag; this means the tree is connected. - self.assertEquals("<p>foo</p>", soup.p.encode()) + self.assertEqual(b"<p>foo</p>", soup.p.encode()) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/tests/test_lxml.py new/beautifulsoup4-4.2.1/bs4/tests/test_lxml.py --- old/beautifulsoup4-4.2.0/bs4/tests/test_lxml.py 2013-05-15 14:36:50.000000000 +0200 +++ new/beautifulsoup4-4.2.1/bs4/tests/test_lxml.py 2013-05-20 15:14:40.000000000 +0200 @@ -10,6 +10,7 @@ LXML_VERSION = lxml.etree.LXML_VERSION except ImportError, e: LXML_PRESENT = False + LXML_VERSION = (0,) from bs4 import ( BeautifulSoup, @@ -47,7 +48,7 @@ # test if an old version of lxml is installed. @skipIf( - LXML_VERSION < (2,3,5,0), + not LXML_PRESENT or LXML_VERSION < (2,3,5,0), "Skipping doctype test for old version of lxml to avoid segfault.") def test_empty_doctype(self): soup = self.soup("<!DOCTYPE>") @@ -85,4 +86,3 @@ @property def default_builder(self): return LXMLTreeBuilderForXML() - diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/tests/test_soup.py new/beautifulsoup4-4.2.1/bs4/tests/test_soup.py --- old/beautifulsoup4-4.2.0/bs4/tests/test_soup.py 2013-05-15 14:36:50.000000000 +0200 +++ new/beautifulsoup4-4.2.1/bs4/tests/test_soup.py 2013-05-20 20:58:23.000000000 +0200 @@ -125,9 +125,14 @@ def test_xml_quoting_handles_ampersands(self): self.assertEqual(self.sub.substitute_xml("AT&T"), "AT&T") - def test_xml_quoting_ignores_ampersands_when_they_are_part_of_an_entity(self): + def test_xml_quoting_including_ampersands_when_they_are_part_of_an_entity(self): self.assertEqual( self.sub.substitute_xml("ÁT&T"), + "ÁT&T") + + def test_xml_quoting_ignoring_ampersands_when_they_are_part_of_an_entity(self): + self.assertEqual( + self.sub.substitute_xml_containing_entities("ÁT&T"), "ÁT&T") def test_quotes_not_html_substituted(self): diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/tests/test_tree.py new/beautifulsoup4-4.2.1/bs4/tests/test_tree.py --- old/beautifulsoup4-4.2.0/bs4/tests/test_tree.py 2013-05-15 14:36:50.000000000 +0200 +++ new/beautifulsoup4-4.2.1/bs4/tests/test_tree.py 2013-05-31 15:43:04.000000000 +0200 @@ -689,6 +689,12 @@ self.assertEqual("foo", s) self.assertTrue(isinstance(s, NavigableString)) + def test_new_string_can_create_navigablestring_subclass(self): + soup = self.soup("") + s = soup.new_string("foo", Comment) + self.assertEqual("foo", s) + self.assertTrue(isinstance(s, Comment)) + class TestTreeModification(SoupTest): def test_attribute_modification(self): @@ -1181,7 +1187,6 @@ soup = self.soup("foo<!--IGNORE-->bar") self.assertEqual(['foo', 'bar'], list(soup.strings)) - class TestCDAtaListAttributes(SoupTest): """Testing cdata-list attributes like 'class'. @@ -1344,18 +1349,6 @@ encoded = BeautifulSoup(doc).encode() self.assertTrue(b"< < hey > >" in encoded) - def test_formatter_processes_script_tag_for_xml_documents(self): - doc = """ - <script type="text/javascript"> - </script> -""" - soup = BeautifulSoup(doc, "xml") - # lxml would have stripped this while parsing, but we can add - # it later. - soup.script.string = 'console.log("< < hey > > ");' - encoded = soup.encode() - self.assertTrue(b"< < hey > >" in encoded) - def test_prettify_leaves_preformatted_text_alone(self): soup = self.soup("<div> foo <pre> \tbar\n \n </pre> baz ") # Everything outside the <pre> tag is reformatted, but everything diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/doc/source/index.rst new/beautifulsoup4-4.2.1/doc/source/index.rst --- old/beautifulsoup4-4.2.0/doc/source/index.rst 2013-05-15 14:36:50.000000000 +0200 +++ new/beautifulsoup4-4.2.1/doc/source/index.rst 2013-05-20 16:18:05.000000000 +0200 @@ -239,10 +239,10 @@ :kbd:`$ pip install lxml` -If you're using Python 2, another alternative is the pure-Python -`html5lib parser <http://code.google.com/p/html5lib/>`_, which parses -HTML the way a web browser does. Depending on your setup, you might -install html5lib with one of these commands: +Another alternative is the pure-Python `html5lib parser +<http://code.google.com/p/html5lib/>`_, which parses HTML the way a +web browser does. Depending on your setup, you might install html5lib +with one of these commands: :kbd:`$ apt-get install python-html5lib` @@ -270,7 +270,7 @@ | html5lib | ``BeautifulSoup(markup, "html5lib")`` | * Extremely lenient | * Very slow | | | | * Parses pages the same way a | * External Python | | | | web browser does | dependency | -| | | * Creates valid HTML5 | * Python 2 only | +| | | * Creates valid HTML5 | | +----------------------+--------------------------------------------+--------------------------------+--------------------------+ If you can, I recommend you install and use lxml for speed. If you're @@ -1806,6 +1806,20 @@ tag.contents # [u'Hello', u' there'] +If you want to create a comment or some other subclass of +``NavigableString``, pass that class as the second argument to +``new_string()``:: + + from bs4 import Comment + new_comment = soup.new_string("Nice to see you.", Comment) + tag.append(new_comment) + tag + # <b>Hello there<!--Nice to see you.--></b> + tag.contents + # [u'Hello', u' there', u'Nice to see you.'] + +(This is a new feature in Beautiful Soup 4.2.1.) + What if you need to create a whole new tag? The best solution is to call the factory method ``BeautifulSoup.new_tag()``:: diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.2.0/setup.py new/beautifulsoup4-4.2.1/setup.py --- old/beautifulsoup4-4.2.0/setup.py 2013-05-15 14:36:50.000000000 +0200 +++ new/beautifulsoup4-4.2.1/setup.py 2013-05-31 15:52:01.000000000 +0200 @@ -7,7 +7,7 @@ from distutils.command.build_py import build_py setup(name="beautifulsoup4", - version = "4.2.0", + version = "4.2.1", author="Leonard Richardson", author_email='leonardr@segfault.org', url="http://www.crummy.com/software/BeautifulSoup/bs4/", -- To unsubscribe, e-mail: opensuse-commit+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-commit+help@opensuse.org
participants (1)
-
root@hilbert.suse.de