commit python-lark for openSUSE:Factory
Script 'mail_helper' called by obssrc Hello community, here is the log from the commit of package python-lark for openSUSE:Factory checked in at 2024-10-04 17:08:27 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/python-lark (Old) and /work/SRC/openSUSE:Factory/.python-lark.new.19354 (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Package is "python-lark" Fri Oct 4 17:08:27 2024 rev:9 rq:1205380 version:1.2.2 Changes: -------- --- /work/SRC/openSUSE:Factory/python-lark/python-lark.changes 2024-01-15 22:11:00.316794908 +0100 +++ /work/SRC/openSUSE:Factory/.python-lark.new.19354/python-lark.changes 2024-10-04 17:08:33.983879725 +0200 @@ -1,0 +2,25 @@ +Thu Oct 3 08:30:59 UTC 2024 - Dirk Müller <dmueller@suse.com> + +- update to 1.2.2: + * Bugfix: Earley now respects ambiguity='resolve' again. +- update to 1.2.1: + * Dropped support for Python versions lower than 3.8 + * Several bugfixes in the Earley algorithm, related to + suppressed ambiguities + * Improved performance in `InteractiveParser.accepts()` + * Give "Shaping the tree" clear sub-headings + * Fix for when providing a transformer with a Token + * Pin types-regex to a working version + * Add Outlines to list of projects using Lark + * Code coverage: Update Python version + * Attempt to solve performance problems in accepts() + * Docs: Added Indenter + * Clean up test_parser.py, use xFail instead of skip where + appropriate + * Update config and drop python < 3.8 + * BUGFIX Earley: Now yielding a previously repressed ambiguity + * Fix SymbolNode.end for completed tokens + * Disable ForestToParseTree cache when ambiguity='resolve' + * Bugfix for issue #1434 + +------------------------------------------------------------------- Old: ---- lark-1.1.9.tar.gz New: ---- lark-1.2.2.tar.gz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ python-lark.spec ++++++ --- /var/tmp/diff_new_pack.OmrZCt/_old 2024-10-04 17:08:34.483900614 +0200 +++ /var/tmp/diff_new_pack.OmrZCt/_new 2024-10-04 17:08:34.483900614 +0200 @@ -18,7 +18,7 @@ %{?sle15_python_module_pythons} Name: python-lark -Version: 1.1.9 +Version: 1.2.2 Release: 0 Summary: A parsing library for Python License: MIT ++++++ lark-1.1.9.tar.gz -> lark-1.2.2.tar.gz ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/.github/workflows/codecov.yml new/lark-1.2.2/.github/workflows/codecov.yml --- old/lark-1.1.9/.github/workflows/codecov.yml 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/.github/workflows/codecov.yml 2024-08-13 21:47:06.000000000 +0200 @@ -8,7 +8,7 @@ os: [ubuntu-latest, macos-latest, windows-latest] env: OS: ${{ matrix.os }} - PYTHON: '3.7' + PYTHON: '3.8' steps: - uses: actions/checkout@v3 name: Download with submodules @@ -17,7 +17,7 @@ - name: Setup Python uses: actions/setup-python@v3 with: - python-version: "3.7" + python-version: "3.8" - name: Install dependencies run: | python -m pip install --upgrade pip @@ -35,6 +35,6 @@ flags: unittests env_vars: OS,PYTHON name: codecov-umbrella - fail_ci_if_error: true + fail_ci_if_error: false path_to_write_report: ./coverage/codecov_report.txt verbose: true diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/.github/workflows/tests.yml new/lark-1.2.2/.github/workflows/tests.yml --- old/lark-1.1.9/.github/workflows/tests.yml 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/.github/workflows/tests.yml 2024-08-13 21:47:06.000000000 +0200 @@ -3,11 +3,10 @@ jobs: build: - # runs-on: ubuntu-latest - runs-on: ubuntu-20.04 # See https://github.com/actions/setup-python/issues/544 + runs-on: ubuntu-latest strategy: matrix: - python-version: ["3.6", "3.7", "3.8", "3.9", "3.10", "3.11", "3.12", "pypy-3.7"] + python-version: ["3.8", "3.9", "3.10", "3.11", "3.12", "3.13-dev", "pypy-3.10"] steps: - uses: actions/checkout@v3 diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/README.md new/lark-1.2.2/README.md --- old/lark-1.1.9/README.md 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/README.md 2024-08-13 21:47:06.000000000 +0200 @@ -125,7 +125,7 @@ Check out the [JSON tutorial](/docs/json_tutorial.md#conclusion) for more details on how the comparison was made. -For a more thorough and objective comparison, checkout the [Python Parsing Benchmarks](https://github.com/goodmami/python-parsing-benchmarks) repo. +For thorough 3rd-party benchmarks, checkout the [Python Parsing Benchmarks](https://github.com/goodmami/python-parsing-benchmarks) repo. #### Feature comparison @@ -164,6 +164,7 @@ - [harmalysis](https://github.com/napulen/harmalysis) - A language for harmonic analysis and music theory - [gersemi](https://github.com/BlankSpruce/gersemi) - A CMake code formatter - [MistQL](https://github.com/evinism/mistql) - A query language for JSON-like structures + - [Outlines](https://github.com/outlines-dev/outlines) - Structured generation with Large Language Models [Full list](https://github.com/lark-parser/lark/network/dependents?package_id=UGFja2FnZS...) @@ -179,8 +180,8 @@ Big thanks to everyone who contributed so far: -<a href="https://github.com/lark-parser/lark/graphs/contributors"> <img src="https://contributors-img.web.app/image?repo=lark-parser/lark" /> +<a href="https://github.com/lark-parser/lark/graphs/contributors"> </a> ## Sponsor diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/docs/classes.rst new/lark-1.2.2/docs/classes.rst --- old/lark-1.1.9/docs/classes.rst 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/docs/classes.rst 2024-08-13 21:47:06.000000000 +0200 @@ -90,3 +90,9 @@ .. autofunction:: lark.ast_utils.create_transformer .. _/examples/advanced/create_ast.py: examples/advanced/create_ast.html + +Indenter +-------- + +.. autoclass:: lark.indenter.Indenter +.. autoclass:: lark.indenter.PythonIndenter diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/docs/parsers.md new/lark-1.2.2/docs/parsers.md --- old/lark-1.1.9/docs/parsers.md 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/docs/parsers.md 2024-08-13 21:47:06.000000000 +0200 @@ -23,7 +23,7 @@ 2) Users may choose to receive the set of all possible parse-trees (using ambiguity='explicit'), and choose the best derivation themselves. While simple and flexible, it comes at the cost of space and performance, and so it isn't recommended for highly ambiguous grammars, or very long inputs. -3) As an advanced feature, users may use specialized visitors to iterate the SPPF themselves. +3) As an advanced feature, users may use specialized visitors to iterate the SPPF themselves. There is also [a 3rd party utility for iterating over the SPPF](https://github.com/chanicpanic/lark-ambig-tools). **lexer="dynamic_complete"** diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/docs/tree_construction.md new/lark-1.2.2/docs/tree_construction.md --- old/lark-1.1.9/docs/tree_construction.md 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/docs/tree_construction.md 2024-08-13 21:47:06.000000000 +0200 @@ -78,8 +78,9 @@ Users can alter the automatic construction of the tree using a collection of grammar features. +### Inlining rules with `_` -* Rules whose name begins with an underscore will be inlined into their containing rule. +Rules whose name begins with an underscore will be inlined into their containing rule. **Example:** @@ -94,8 +95,9 @@ "hello" "world" +### Conditionally inlining rules with `?` -* Rules that receive a question mark (?) at the beginning of their definition, will be inlined if they have a single child, after filtering. +Rules that receive a question mark (?) at the beginning of their definition, will be inlined if they have a single child, after filtering. **Example:** @@ -113,7 +115,9 @@ "world" "planet" -* Rules that begin with an exclamation mark will keep all their terminals (they won't get filtered). +### Pinning rule terminals with `!` + +Rules that begin with an exclamation mark will keep all their terminals (they won't get filtered). ```perl !expr: "(" expr ")" @@ -136,7 +140,9 @@ Using the `!` prefix is usually a "code smell", and may point to a flaw in your grammar design. -* Aliases - options in a rule can receive an alias. It will be then used as the branch name for the option, instead of the rule name. +### Aliasing rules + +Aliases - options in a rule can receive an alias. It will be then used as the branch name for the option, instead of the rule name. **Example:** diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/lark/__init__.py new/lark-1.2.2/lark/__init__.py --- old/lark-1.1.9/lark/__init__.py 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/lark/__init__.py 2024-08-13 21:47:06.000000000 +0200 @@ -14,7 +14,7 @@ from .utils import logger from .visitors import Discard, Transformer, Transformer_NonRecursive, Visitor, v_args -__version__: str = "1.1.9" +__version__: str = "1.2.2" __all__ = ( "GrammarError", diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/lark/common.py new/lark-1.2.2/lark/common.py --- old/lark-1.1.9/lark/common.py 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/lark/common.py 2024-08-13 21:47:06.000000000 +0200 @@ -8,10 +8,7 @@ from .lexer import Lexer from .grammar import Rule from typing import Union, Type - if sys.version_info >= (3, 8): - from typing import Literal - else: - from typing_extensions import Literal + from typing import Literal if sys.version_info >= (3, 10): from typing import TypeAlias else: diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/lark/indenter.py new/lark-1.2.2/lark/indenter.py --- old/lark-1.1.9/lark/indenter.py 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/lark/indenter.py 2024-08-13 21:47:06.000000000 +0200 @@ -13,6 +13,19 @@ pass class Indenter(PostLex, ABC): + """This is a postlexer that "injects" indent/dedent tokens based on indentation. + + It keeps track of the current indentation, as well as the current level of parentheses. + Inside parentheses, the indentation is ignored, and no indent/dedent tokens get generated. + + Note: This is an abstract class. To use it, inherit and implement all its abstract methods: + - tab_len + - NL_type + - OPEN_PAREN_types, CLOSE_PAREN_types + - INDENT_type, DEDENT_type + + See also: the ``postlex`` option in `Lark`. + """ paren_level: int indent_level: List[int] @@ -73,35 +86,53 @@ @property @abstractmethod def NL_type(self) -> str: + "The name of the newline token" raise NotImplementedError() @property @abstractmethod def OPEN_PAREN_types(self) -> List[str]: + "The names of the tokens that open a parenthesis" raise NotImplementedError() @property @abstractmethod def CLOSE_PAREN_types(self) -> List[str]: + """The names of the tokens that close a parenthesis + """ raise NotImplementedError() @property @abstractmethod def INDENT_type(self) -> str: + """The name of the token that starts an indentation in the grammar. + + See also: %declare + """ raise NotImplementedError() @property @abstractmethod def DEDENT_type(self) -> str: + """The name of the token that end an indentation in the grammar. + + See also: %declare + """ raise NotImplementedError() @property @abstractmethod def tab_len(self) -> int: + """How many spaces does a tab equal""" raise NotImplementedError() class PythonIndenter(Indenter): + """A postlexer that "injects" _INDENT/_DEDENT tokens based on indentation, according to the Python syntax. + + See also: the ``postlex`` option in `Lark`. + """ + NL_type = '_NEWLINE' OPEN_PAREN_types = ['LPAR', 'LSQB', 'LBRACE'] CLOSE_PAREN_types = ['RPAR', 'RSQB', 'RBRACE'] diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/lark/lark.py new/lark-1.2.2/lark/lark.py --- old/lark-1.1.9/lark/lark.py 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/lark/lark.py 2024-08-13 21:47:06.000000000 +0200 @@ -12,14 +12,11 @@ from .parsers.lalr_interactive_parser import InteractiveParser from .tree import ParseTree from .visitors import Transformer - if sys.version_info >= (3, 8): - from typing import Literal - else: - from typing_extensions import Literal + from typing import Literal from .parser_frontends import ParsingFrontend from .exceptions import ConfigurationError, assert_config, UnexpectedInput -from .utils import Serialize, SerializeMemoizer, FS, isascii, logger +from .utils import Serialize, SerializeMemoizer, FS, logger from .load_grammar import load_grammar, FromPackageLoader, Grammar, verify_used_files, PackageResource, sha256_digest from .tree import Tree from .common import LexerConf, ParserConf, _ParserArgType, _LexerArgType @@ -303,7 +300,7 @@ if isinstance(grammar, str): self.source_grammar = grammar if self.options.use_bytes: - if not isascii(grammar): + if not grammar.isascii(): raise ConfigurationError("Grammar must be ascii only, when use_bytes=True") if self.options.cache: diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/lark/parsers/earley.py new/lark-1.2.2/lark/parsers/earley.py --- old/lark-1.1.9/lark/parsers/earley.py 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/lark/parsers/earley.py 2024-08-13 21:47:06.000000000 +0200 @@ -15,7 +15,7 @@ from ..lexer import Token from ..tree import Tree from ..exceptions import UnexpectedEOF, UnexpectedToken -from ..utils import logger, OrderedSet +from ..utils import logger, OrderedSet, dedup_list from .grammar_analysis import GrammarAnalyzer from ..grammar import NonTerminal from .earley_common import Item @@ -169,6 +169,7 @@ items.append(new_item) def _parse(self, lexer, columns, to_scan, start_symbol=None): + def is_quasi_complete(item): if item.is_complete: return True @@ -281,7 +282,7 @@ # If the parse was successful, the start # symbol should have been completed in the last step of the Earley cycle, and will be in # this column. Find the item for the start_symbol, which is the root of the SPPF tree. - solutions = [n.node for n in columns[-1] if n.is_complete and n.node is not None and n.s == start_symbol and n.start == 0] + solutions = dedup_list(n.node for n in columns[-1] if n.is_complete and n.node is not None and n.s == start_symbol and n.start == 0) if not solutions: expected_terminals = [t.expect.name for t in to_scan] raise UnexpectedEOF(expected_terminals, state=frozenset(i.s for i in to_scan)) @@ -293,16 +294,24 @@ except ImportError: logger.warning("Cannot find dependency 'pydot', will not generate sppf debug image") else: - debug_walker.visit(solutions[0], "sppf.png") - + for i, s in enumerate(solutions): + debug_walker.visit(s, f"sppf{i}.png") - if len(solutions) > 1: - assert False, 'Earley should not generate multiple start symbol items!' if self.Tree is not None: # Perform our SPPF -> AST conversion - transformer = ForestToParseTree(self.Tree, self.callbacks, self.forest_sum_visitor and self.forest_sum_visitor(), self.resolve_ambiguity) - return transformer.transform(solutions[0]) + # Disable the ForestToParseTree cache when ambiguity='resolve' + # to prevent a tree construction bug. See issue #1283 + use_cache = not self.resolve_ambiguity + transformer = ForestToParseTree(self.Tree, self.callbacks, self.forest_sum_visitor and self.forest_sum_visitor(), self.resolve_ambiguity, use_cache) + solutions = [transformer.transform(s) for s in solutions] + + if len(solutions) > 1 and not self.resolve_ambiguity: + t: Tree = self.Tree('_ambig', solutions) + t.expand_kids_by_data('_ambig') # solutions may themselves be _ambig nodes + return t + return solutions[0] # return the root of the SPPF + # TODO return a list of solutions, or join them together somehow return solutions[0] diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/lark/parsers/earley_common.py new/lark-1.2.2/lark/parsers/earley_common.py --- old/lark-1.1.9/lark/parsers/earley_common.py 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/lark/parsers/earley_common.py 2024-08-13 21:47:06.000000000 +0200 @@ -20,13 +20,13 @@ self.s = (rule, ptr) self.expect = rule.expansion[ptr] self.previous = rule.expansion[ptr - 1] if ptr > 0 and len(rule.expansion) else None - self._hash = hash((self.s, self.start)) + self._hash = hash((self.s, self.start, self.rule)) def advance(self): return Item(self.rule, self.ptr + 1, self.start) def __eq__(self, other): - return self is other or (self.s == other.s and self.start == other.start) + return self is other or (self.s == other.s and self.start == other.start and self.rule == other.rule) def __hash__(self): return self._hash diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/lark/parsers/earley_forest.py new/lark-1.2.2/lark/parsers/earley_forest.py --- old/lark-1.1.9/lark/parsers/earley_forest.py 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/lark/parsers/earley_forest.py 2024-08-13 21:47:06.000000000 +0200 @@ -38,15 +38,15 @@ Parameters: s: A Symbol, or a tuple of (rule, ptr) for an intermediate node. - start: The index of the start of the substring matched by this symbol (inclusive). - end: The index of the end of the substring matched by this symbol (exclusive). + start: For dynamic lexers, the index of the start of the substring matched by this symbol (inclusive). + end: For dynamic lexers, the index of the end of the substring matched by this symbol (exclusive). Properties: is_intermediate: True if this node is an intermediate node. priority: The priority of the node's symbol. """ Set: Type[AbstractSet] = set # Overridden by StableSymbolNode - __slots__ = ('s', 'start', 'end', '_children', 'paths', 'paths_loaded', 'priority', 'is_intermediate', '_hash') + __slots__ = ('s', 'start', 'end', '_children', 'paths', 'paths_loaded', 'priority', 'is_intermediate') def __init__(self, s, start, end): self.s = s self.start = start @@ -59,7 +59,6 @@ # unlike None or float('NaN'), and sorts appropriately. self.priority = float('-inf') self.is_intermediate = isinstance(s, tuple) - self._hash = hash((self.s, self.start, self.end)) def add_family(self, lr0, rule, start, left, right): self._children.add(PackedNode(self, lr0, rule, start, left, right)) @@ -93,14 +92,6 @@ def __iter__(self): return iter(self._children) - def __eq__(self, other): - if not isinstance(other, SymbolNode): - return False - return self is other or (type(self.s) == type(other.s) and self.s == other.s and self.start == other.start and self.end is other.end) - - def __hash__(self): - return self._hash - def __repr__(self): if self.is_intermediate: rule = self.s[0] @@ -618,9 +609,10 @@ children.append(data.left) if data.right is not PackedData.NO_DATA: children.append(data.right) - if node.parent.is_intermediate: - return self._cache.setdefault(id(node), children) - return self._cache.setdefault(id(node), self._call_rule_func(node, children)) + transformed = children if node.parent.is_intermediate else self._call_rule_func(node, children) + if self._use_cache: + self._cache[id(node)] = transformed + return transformed def visit_symbol_node_in(self, node): super(ForestToParseTree, self).visit_symbol_node_in(node) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/lark/parsers/grammar_analysis.py new/lark-1.2.2/lark/parsers/grammar_analysis.py --- old/lark-1.1.9/lark/parsers/grammar_analysis.py 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/lark/parsers/grammar_analysis.py 2024-08-13 21:47:06.000000000 +0200 @@ -3,7 +3,7 @@ from collections import Counter, defaultdict from typing import List, Dict, Iterator, FrozenSet, Set -from ..utils import bfs, fzset, classify +from ..utils import bfs, fzset, classify, OrderedSet from ..exceptions import GrammarError from ..grammar import Rule, Terminal, NonTerminal, Symbol from ..common import ParserConf @@ -177,13 +177,13 @@ self.FIRST, self.FOLLOW, self.NULLABLE = calculate_sets(rules) - def expand_rule(self, source_rule: NonTerminal, rules_by_origin=None) -> State: + def expand_rule(self, source_rule: NonTerminal, rules_by_origin=None) -> OrderedSet[RulePtr]: "Returns all init_ptrs accessible by rule (recursive)" if rules_by_origin is None: rules_by_origin = self.rules_by_origin - init_ptrs = set() + init_ptrs = OrderedSet[RulePtr]() def _expand_rule(rule: NonTerminal) -> Iterator[NonTerminal]: assert not rule.is_term, rule @@ -200,4 +200,4 @@ for _ in bfs([source_rule], _expand_rule): pass - return fzset(init_ptrs) + return init_ptrs diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/lark/parsers/lalr_interactive_parser.py new/lark-1.2.2/lark/parsers/lalr_interactive_parser.py --- old/lark-1.1.9/lark/parsers/lalr_interactive_parser.py 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/lark/parsers/lalr_interactive_parser.py 2024-08-13 21:47:06.000000000 +0200 @@ -6,6 +6,7 @@ from lark.exceptions import UnexpectedToken from lark.lexer import Token, LexerThread +from .lalr_parser_state import ParserState ###{standalone @@ -14,7 +15,7 @@ For a simpler interface, see the ``on_error`` argument to ``Lark.parse()``. """ - def __init__(self, parser, parser_state, lexer_thread: LexerThread): + def __init__(self, parser, parser_state: ParserState, lexer_thread: LexerThread): self.parser = parser self.parser_state = parser_state self.lexer_thread = lexer_thread @@ -63,15 +64,15 @@ Calls to feed_token() won't affect the old instance, and vice-versa. """ + return self.copy() + + def copy(self, deepcopy_values=True): return type(self)( self.parser, - copy(self.parser_state), + self.parser_state.copy(deepcopy_values=deepcopy_values), copy(self.lexer_thread), ) - def copy(self): - return copy(self) - def __eq__(self, other): if not isinstance(other, InteractiveParser): return False @@ -109,7 +110,7 @@ conf_no_callbacks.callbacks = {} for t in self.choices(): if t.isupper(): # is terminal? - new_cursor = copy(self) + new_cursor = self.copy(deepcopy_values=False) new_cursor.parser_state.parse_conf = conf_no_callbacks try: new_cursor.feed_token(self.lexer_thread._Token(t, '')) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/lark/parsers/lalr_parser_state.py new/lark-1.2.2/lark/parsers/lalr_parser_state.py --- old/lark-1.1.9/lark/parsers/lalr_parser_state.py 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/lark/parsers/lalr_parser_state.py 2024-08-13 21:47:06.000000000 +0200 @@ -54,16 +54,16 @@ return len(self.state_stack) == len(other.state_stack) and self.position == other.position def __copy__(self): + return self.copy() + + def copy(self, deepcopy_values=True) -> 'ParserState[StateT]': return type(self)( self.parse_conf, self.lexer, # XXX copy copy(self.state_stack), - deepcopy(self.value_stack), + deepcopy(self.value_stack) if deepcopy_values else copy(self.value_stack), ) - def copy(self) -> 'ParserState[StateT]': - return copy(self) - def feed_token(self, token: Token, is_end=False) -> Any: state_stack = self.state_stack value_stack = self.value_stack diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/lark/parsers/xearley.py new/lark-1.2.2/lark/parsers/xearley.py --- old/lark-1.1.9/lark/parsers/xearley.py 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/lark/parsers/xearley.py 2024-08-13 21:47:06.000000000 +0200 @@ -104,7 +104,7 @@ token.end_pos = i + 1 new_item = item.advance() - label = (new_item.s, new_item.start, i) + label = (new_item.s, new_item.start, i + 1) token_node = TokenNode(token, terminals[token.type]) new_item.node = node_cache[label] if label in node_cache else node_cache.setdefault(label, self.SymbolNode(*label)) new_item.node.add_family(new_item.s, item.rule, new_item.start, item.node, token_node) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/lark/tools/__init__.py new/lark-1.2.2/lark/tools/__init__.py --- old/lark-1.1.9/lark/tools/__init__.py 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/lark/tools/__init__.py 2024-08-13 21:47:06.000000000 +0200 @@ -28,9 +28,8 @@ lalr_argparser.add_argument('-v', '--verbose', action='count', default=0, help="Increase Logger output level, up to three times") lalr_argparser.add_argument('-s', '--start', action='append', default=[]) lalr_argparser.add_argument('-l', '--lexer', default='contextual', choices=('basic', 'contextual')) -encoding: Optional[str] = 'utf-8' if sys.version_info > (3, 4) else None -lalr_argparser.add_argument('-o', '--out', type=FileType('w', encoding=encoding), default=sys.stdout, help='the output file (default=stdout)') -lalr_argparser.add_argument('grammar_file', type=FileType('r', encoding=encoding), help='A valid .lark file') +lalr_argparser.add_argument('-o', '--out', type=FileType('w', encoding='utf-8'), default=sys.stdout, help='the output file (default=stdout)') +lalr_argparser.add_argument('grammar_file', type=FileType('r', encoding='utf-8'), help='A valid .lark file') for flag in flags: if isinstance(flag, tuple): diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/lark/tree.py new/lark-1.2.2/lark/tree.py --- old/lark-1.1.9/lark/tree.py 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/lark/tree.py 2024-08-13 21:47:06.000000000 +0200 @@ -9,13 +9,9 @@ import rich except ImportError: pass - if sys.version_info >= (3, 8): - from typing import Literal - else: - from typing_extensions import Literal + from typing import Literal ###{standalone -from collections import OrderedDict class Meta: @@ -140,11 +136,10 @@ Iterates over all the subtrees, never returning to the same node twice (Lark's parse-tree is actually a DAG). """ queue = [self] - subtrees = OrderedDict() + subtrees = dict() for subtree in queue: subtrees[id(subtree)] = subtree - # Reason for type ignore https://github.com/python/mypy/issues/10999 - queue += [c for c in reversed(subtree.children) # type: ignore[misc] + queue += [c for c in reversed(subtree.children) if isinstance(c, Tree) and id(c) not in subtrees] del queue @@ -242,7 +237,7 @@ possible attributes, see https://www.graphviz.org/doc/info/attrs.html. """ - import pydot # type: ignore[import] + import pydot # type: ignore[import-not-found] graph = pydot.Dot(graph_type='digraph', rankdir=rankdir, **kwargs) i = [0] diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/lark/utils.py new/lark-1.2.2/lark/utils.py --- old/lark-1.1.9/lark/utils.py 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/lark/utils.py 2024-08-13 21:47:06.000000000 +0200 @@ -68,7 +68,7 @@ res = {f: _serialize(getattr(self, f), memo) for f in fields} res['__type__'] = type(self).__name__ if hasattr(self, '_serialize'): - self._serialize(res, memo) # type: ignore[attr-defined] + self._serialize(res, memo) return res @classmethod @@ -89,7 +89,7 @@ raise KeyError("Cannot find key for class", cls, e) if hasattr(inst, '_deserialize'): - inst._deserialize() # type: ignore[attr-defined] + inst._deserialize() return inst @@ -141,7 +141,7 @@ regexp_final = expr try: # Fixed in next version (past 0.960) of typeshed - return [int(x) for x in sre_parse.parse(regexp_final).getwidth()] # type: ignore[attr-defined] + return [int(x) for x in sre_parse.parse(regexp_final).getwidth()] except sre_constants.error: if not _has_regex: raise ValueError(expr) @@ -188,11 +188,7 @@ """Given a list (l) will removing duplicates from the list, preserving the original order of the list. Assumes that the list entries are hashable.""" - dedup = set() - # This returns None, but that's expected - return [x for x in l if not (x in dedup or dedup.add(x))] # type: ignore[func-returns-value] - # 2x faster (ordered in PyPy and CPython 3.6+, guaranteed to be ordered in Python 3.7+) - # return list(dict.fromkeys(l)) + return list(dict.fromkeys(l)) class Enumerator(Serialize): @@ -234,8 +230,7 @@ return list(product(*lists)) try: - # atomicwrites doesn't have type bindings - import atomicwrites # type: ignore[import] + import atomicwrites _has_atomicwrites = True except ImportError: _has_atomicwrites = False @@ -251,19 +246,6 @@ return open(name, mode, **kwargs) - -def isascii(s: str) -> bool: - """ str.isascii only exists in python3.7+ """ - if sys.version_info >= (3, 7): - return s.isascii() - else: - try: - s.encode('ascii') - return True - except (UnicodeDecodeError, UnicodeEncodeError): - return False - - class fzset(frozenset): def __repr__(self): return '{%s}' % ', '.join(map(repr, self)) @@ -359,3 +341,6 @@ def __len__(self) -> int: return len(self.d) + + def __repr__(self): + return f"{type(self).__name__}({', '.join(map(repr,self))})" diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/lark/visitors.py new/lark-1.2.2/lark/visitors.py --- old/lark-1.1.9/lark/visitors.py 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/lark/visitors.py 2024-08-13 21:47:06.000000000 +0200 @@ -158,7 +158,11 @@ def transform(self, tree: Tree[_Leaf_T]) -> _Return_T: "Transform the given tree, and return the final result" - return self._transform_tree(tree) + res = list(self._transform_children([tree])) + if not res: + return None # type: ignore[return-value] + assert len(res) == 1 + return res[0] def __mul__( self: 'Transformer[_Leaf_T, Tree[_Leaf_U]]', @@ -470,8 +474,7 @@ def __init__(self, func: Callable, visit_wrapper: Callable[[Callable, str, list, Any], Any]): if isinstance(func, _VArgsWrapper): func = func.base_func - # https://github.com/python/mypy/issues/708 - self.base_func = func # type: ignore[assignment] + self.base_func = func self.visit_wrapper = visit_wrapper update_wrapper(self, func) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/pyproject.toml new/lark-1.2.2/pyproject.toml --- old/lark-1.1.9/pyproject.toml 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/pyproject.toml 2024-08-13 21:47:06.000000000 +0200 @@ -17,7 +17,7 @@ "Topic :: Text Processing :: Linguistic", "License :: OSI Approved :: MIT License", ] -requires-python = ">=3.6" +requires-python = ">=3.8" dependencies = [] dynamic = ["version"] @@ -41,7 +41,7 @@ - Import grammars from Nearley.js - Extensive test suite - And much more! -Since version 1.0, only Python versions 3.6 and up are supported.""" +Since version 1.2, only Python versions 3.8 and up are supported.""" content-type = "text/markdown" [project.urls] @@ -76,9 +76,9 @@ [tool.mypy] files = "lark" -python_version = "3.6" +python_version = "3.8" show_error_codes = true -enable_error_code = ["ignore-without-code"] +enable_error_code = ["ignore-without-code", "unused-ignore"] exclude = [ "^lark/__pyinstaller", ] @@ -95,3 +95,11 @@ ] [tool.pyright] include = ["lark"] + +[tool.pytest.ini_options] +minversion = 6.0 +addopts = "-ra -q" +testpaths =[ + "tests" +] +python_files = "__main__.py" diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/pytest.ini new/lark-1.2.2/pytest.ini --- old/lark-1.1.9/pytest.ini 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/pytest.ini 1970-01-01 01:00:00.000000000 +0100 @@ -1,6 +0,0 @@ -[pytest] -minversion = 6.0 -addopts = -ra -q -testpaths = - tests -python_files = __main__.py diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/tests/test_cache.py new/lark-1.2.2/tests/test_cache.py --- old/lark-1.1.9/tests/test_cache.py 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/tests/test_cache.py 2024-08-13 21:47:06.000000000 +0200 @@ -7,17 +7,14 @@ from lark.lexer import Lexer, Token import lark.lark as lark_module -try: - from StringIO import StringIO -except ImportError: - from io import BytesIO as StringIO +from io import BytesIO try: import regex except ImportError: regex = None -class MockFile(StringIO): +class MockFile(BytesIO): def close(self): pass def __enter__(self): diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/tests/test_logger.py new/lark-1.2.2/tests/test_logger.py --- old/lark-1.1.9/tests/test_logger.py 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/tests/test_logger.py 2024-08-13 21:47:06.000000000 +0200 @@ -3,10 +3,7 @@ from lark import Lark, logger from unittest import TestCase, main, skipIf -try: - from StringIO import StringIO -except ImportError: - from io import StringIO +from io import StringIO try: import interegular diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/tests/test_parser.py new/lark-1.2.2/tests/test_parser.py --- old/lark-1.1.9/tests/test_parser.py 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/tests/test_parser.py 2024-08-13 21:47:06.000000000 +0200 @@ -7,15 +7,8 @@ import sys from copy import copy, deepcopy -from lark.utils import isascii - from lark import Token, Transformer_NonRecursive, LexError -try: - from cStringIO import StringIO as cStringIO -except ImportError: - # Available only in Python 2.x, 3.x only has io.StringIO from below - cStringIO = None from io import ( StringIO as uStringIO, BytesIO, @@ -28,6 +21,7 @@ except ImportError: regex = None + import lark from lark import logger from lark.lark import Lark @@ -399,6 +393,8 @@ self.assertEqual( g.parse('abc').children[0], 'abc') + + @unittest.skipIf(LEXER=='basic', "Requires dynamic lexer") def test_earley(self): g = Lark("""start: A "b" c A: "a"+ @@ -421,8 +417,7 @@ l = Lark(grammar, parser='earley', lexer=LEXER) l.parse(program) - - @unittest.skipIf(LEXER=='dynamic', "Only relevant for the dynamic_complete parser") + @unittest.skipIf(LEXER != 'dynamic_complete', "Only relevant for the dynamic_complete parser") def test_earley3(self): """Tests prioritization and disambiguation for pseudo-terminals (there should be only one result) @@ -758,6 +753,8 @@ self.assertEqual(ambig_tree.data, '_ambig') self.assertEqual(set(ambig_tree.children), expected) + + @unittest.skipIf(LEXER=='basic', "Requires dynamic lexer") def test_fruitflies_ambig(self): grammar = """ start: noun verb noun -> simple @@ -828,6 +825,27 @@ tree = parser.parse(text) self.assertEqual(tree.children, ['foo', 'bar']) + def test_multiple_start_solutions(self): + grammar = r""" + !start: a | A + !a: A + A: "x" + """ + + l = Lark(grammar, ambiguity='explicit', lexer=LEXER) + tree = l.parse('x') + + expected = Tree('_ambig', [ + Tree('start', ['x']), + Tree('start', [Tree('a', ['x'])])] + ) + self.assertEqual(tree, expected) + + l = Lark(grammar, ambiguity='resolve', lexer=LEXER) + tree = l.parse('x') + assert tree == Tree('start', ['x']) + + def test_cycle(self): grammar = """ start: start? @@ -843,16 +861,24 @@ def test_cycle2(self): grammar = """ - start: _operation - _operation: value - value: "b" - | "a" value - | _operation + start: _recurse + _recurse: v + v: "b" + | "a" v + | _recurse """ l = Lark(grammar, ambiguity="explicit", lexer=LEXER) tree = l.parse("ab") - self.assertEqual(tree, Tree('start', [Tree('value', [Tree('value', [])])])) + expected = ( + Tree('start', [ + Tree('_ambig', [ + Tree('v', [Tree('v', [])]), + Tree('v', [Tree('v', [Tree('v', [])])]) + ]) + ]) + ) + self.assertEqual(tree, expected) def test_cycles(self): grammar = """ @@ -912,24 +938,60 @@ tree = l.parse(''); self.assertEqual(tree, Tree('a', [Tree('x', [Tree('b', [])])])) + @unittest.skipIf(LEXER=='basic', "start/end values work differently for the basic lexer") + def test_symbol_node_start_end_dynamic_lexer(self): + grammar = """ + start: "ABC" + """ + + l = Lark(grammar, ambiguity='forest', lexer=LEXER) + node = l.parse('ABC') + self.assertEqual(node.start, 0) + self.assertEqual(node.end, 3) + + grammar2 = """ + start: abc + abc: "ABC" + """ + + l = Lark(grammar2, ambiguity='forest', lexer=LEXER) + node = l.parse('ABC') + self.assertEqual(node.start, 0) + self.assertEqual(node.end, 3) + def test_resolve_ambiguity_with_shared_node(self): + grammar = """ + start: (a+)* + !a.1: "A" | + """ + + l = Lark(grammar, ambiguity='resolve', lexer=LEXER) + tree = l.parse("A") + self.assertEqual(tree, Tree('start', [Tree('a', []), Tree('a', []), Tree('a', ['A'])])) + def test_resolve_ambiguity_with_shared_node2(self): + grammar = """ + start: _s x _s + x: "X"? + _s: " "? + """ + + l = Lark(grammar, ambiguity='resolve', lexer=LEXER) + tree = l.parse("") + self.assertEqual(tree, Tree('start', [Tree('x', [])])) - # @unittest.skipIf(LEXER=='dynamic', "Not implemented in Dynamic Earley yet") # TODO - # def test_not_all_derivations(self): - # grammar = """ - # start: cd+ "e" - - # !cd: "c" - # | "d" - # | "cd" - - # """ - # l = Lark(grammar, parser='earley', ambiguity='explicit', lexer=LEXER, earley__all_derivations=False) - # x = l.parse('cde') - # assert x.data != '_ambig', x - # assert len(x.children) == 1 + def test_consistent_derivation_order1(self): + # Should return the same result for any hash-seed + parser = Lark(''' + start: a a + a: "." | b + b: "." + ''', lexer=LEXER) + + tree = parser.parse('..') + n = Tree('a', [Tree('b', [])]) + assert tree == Tree('start', [n, n]) _NAME = "TestFullEarley" + LEXER.capitalize() _TestFullEarley.__name__ = _NAME @@ -987,7 +1049,7 @@ def __init__(self, g, *args, **kwargs): self.text_lexer = Lark(g, *args, use_bytes=False, **kwargs) g = self.text_lexer.grammar_source.lower() - if '\\u' in g or not isascii(g): + if '\\u' in g or not g.isascii(): # Bytes re can't deal with uniode escapes self.bytes_lark = None else: @@ -996,7 +1058,7 @@ def parse(self, text, start=None): # TODO: Easy workaround, more complex checks would be beneficial - if not isascii(text) or self.bytes_lark is None: + if not text.isascii() or self.bytes_lark is None: return self.text_lexer.parse(text, start) try: rv = self.text_lexer.parse(text, start) @@ -1086,11 +1148,6 @@ assert x.data == 'start' and x.children == ['12', '2'], x - @unittest.skipIf(cStringIO is None, "cStringIO not available") - def test_stringio_bytes(self): - """Verify that a Lark can be created from file-like objects other than Python's standard 'file' object""" - _Lark(cStringIO(b'start: a+ b a* "b" a*\n b: "b"\n a: "a" ')) - def test_stringio_unicode(self): """Verify that a Lark can be created from file-like objects other than Python's standard 'file' object""" _Lark(uStringIO(u'start: a+ b a* "b" a*\n b: "b"\n a: "a" ')) @@ -1140,7 +1197,7 @@ """) g.parse('abc') - @unittest.skipIf(sys.version_info < (3, 3), "re package did not support 32bit unicode escape sequence before Python 3.3") + def test_unicode_literal_range_escape2(self): g = _Lark(r"""start: A+ A: "\U0000FFFF".."\U00010002" @@ -1153,8 +1210,7 @@ """) g.parse('\x01\x02\x03') - @unittest.skipIf(sys.version_info[0]==2 or sys.version_info[:2]==(3, 4), - "bytes parser isn't perfect in Python2, exceptions don't work correctly") + def test_bytes_utf8(self): g = r""" start: BOM? char+ @@ -1305,49 +1361,6 @@ [list] = r.children self.assertSequenceEqual([item.data for item in list.children], ()) - @unittest.skipIf(True, "Flattening list isn't implemented (and may never be)") - def test_single_item_flatten_list(self): - g = _Lark(r"""start: list - list: | item "," list - item : A - A: "a" - """) - r = g.parse("a,") - - # Because 'list' is a flatten rule it's top-level element should *never* be expanded - self.assertSequenceEqual([subtree.data for subtree in r.children], ('list',)) - - # Sanity check: verify that 'list' contains exactly the one 'item' we've given it - [list] = r.children - self.assertSequenceEqual([item.data for item in list.children], ('item',)) - - @unittest.skipIf(True, "Flattening list isn't implemented (and may never be)") - def test_multiple_item_flatten_list(self): - g = _Lark(r"""start: list - #list: | item "," list - item : A - A: "a" - """) - r = g.parse("a,a,") - - # Because 'list' is a flatten rule it's top-level element should *never* be expanded - self.assertSequenceEqual([subtree.data for subtree in r.children], ('list',)) - - # Sanity check: verify that 'list' contains exactly the two 'item's we've given it - [list] = r.children - self.assertSequenceEqual([item.data for item in list.children], ('item', 'item')) - - @unittest.skipIf(True, "Flattening list isn't implemented (and may never be)") - def test_recurse_flatten(self): - """Verify that stack depth doesn't get exceeded on recursive rules marked for flattening.""" - g = _Lark(r"""start: a | start a - a : A - A : "a" """) - - # Force PLY to write to the debug log, but prevent writing it to the terminal (uses repr() on the half-built - # STree data structures, which uses recursion). - g.parse("a" * (sys.getrecursionlimit() // 4)) - def test_token_collision(self): g = _Lark(r"""start: "Hello" NAME NAME: /\w/+ @@ -1459,20 +1472,6 @@ x1 = g.parse("ABBc") x2 = g.parse("abdE") - # def test_string_priority(self): - # g = _Lark("""start: (A | /a?bb/)+ - # A: "a" """) - # x = g.parse('abb') - # self.assertEqual(len(x.children), 2) - - # # This parse raises an exception because the lexer will always try to consume - # # "a" first and will never match the regular expression - # # This behavior is subject to change!! - # # This won't happen with ambiguity handling. - # g = _Lark("""start: (A | /a?ab/)+ - # A: "a" """) - # self.assertRaises(LexError, g.parse, 'aab') - def test_rule_collision(self): g = _Lark("""start: "a"+ "b" | "a"+ """) @@ -1561,13 +1560,6 @@ """) x = g.parse('\n') - - # def test_token_recurse(self): - # g = _Lark("""start: A - # A: B - # B: A - # """) - @unittest.skipIf(PARSER == 'cyk', "No empty rules") def test_empty(self): # Fails an Earley implementation without special handling for empty rules, @@ -1649,13 +1641,6 @@ tree = l.parse('aA') self.assertEqual(tree.children, ['a', 'A']) - # g = """!start: "a"i "a" - # """ - # self.assertRaises(GrammarError, _Lark, g) - - # g = """!start: /a/i /a/ - # """ - # self.assertRaises(GrammarError, _Lark, g) g = """start: NAME "," "a" NAME: /[a-z_]/i /[a-z0-9_]/i* @@ -1666,6 +1651,25 @@ tree = l.parse('AB,a') self.assertEqual(tree.children, ['AB']) + @unittest.skipIf(LEXER in ('basic', 'custom_old', 'custom_new'), "Requires context sensitive terminal selection") + def test_token_flags_collision(self): + + g = """!start: "a"i "a" + """ + l = _Lark(g) + self.assertEqual(l.parse('aa').children, ['a', 'a']) + self.assertEqual(l.parse('Aa').children, ['A', 'a']) + self.assertRaises(UnexpectedInput, l.parse, 'aA') + self.assertRaises(UnexpectedInput, l.parse, 'AA') + + g = """!start: /a/i /a/ + """ + l = _Lark(g) + self.assertEqual(l.parse('aa').children, ['a', 'a']) + self.assertEqual(l.parse('Aa').children, ['A', 'a']) + self.assertRaises(UnexpectedInput, l.parse, 'aA') + self.assertRaises(UnexpectedInput, l.parse, 'AA') + def test_token_flags3(self): l = _Lark("""!start: ABC+ ABC: "abc"i @@ -1754,7 +1758,7 @@ self.assertEqual(len(tree.children), 2) - @unittest.skipIf(LEXER != 'basic', "basic lexer prioritization differs from dynamic lexer prioritization") + @unittest.skipIf('dynamic' in LEXER, "basic lexer prioritization differs from dynamic lexer prioritization") def test_lexer_prioritization(self): "Tests effect of priority on result" @@ -2274,7 +2278,6 @@ - @unittest.skipIf(PARSER=='earley', "Priority not handled correctly right now") # TODO XXX def test_priority_vs_embedded(self): g = """ A.2: "a" @@ -2407,7 +2410,7 @@ parser = _Lark(grammar) - @unittest.skipIf(PARSER!='lalr' or 'custom' in LEXER, "Serialize currently only works for LALR parsers without custom lexers (though it should be easy to extend)") + @unittest.skipIf(PARSER!='lalr' or LEXER == 'custom_old', "Serialize currently only works for LALR parsers without custom lexers (though it should be easy to extend)") def test_serialize(self): grammar = """ start: _ANY b "C" @@ -2512,7 +2515,7 @@ """ self.assertRaises((GrammarError, LexError, re.error), _Lark, g, regex=True) - @unittest.skipIf(PARSER!='lalr', "interactive_parser is only implemented for LALR at the moment") + @unittest.skipIf(PARSER != 'lalr', "interactive_parser is only implemented for LALR at the moment") def test_parser_interactive_parser(self): g = _Lark(r''' @@ -2549,7 +2552,7 @@ res = ip_copy.feed_eof() self.assertEqual(res, Tree('start', ['a', 'b', 'b'])) - @unittest.skipIf(PARSER!='lalr', "interactive_parser error handling only works with LALR for now") + @unittest.skipIf(PARSER != 'lalr', "interactive_parser error handling only works with LALR for now") def test_error_with_interactive_parser(self): def ignore_errors(e): if isinstance(e, UnexpectedCharacters): @@ -2584,10 +2587,10 @@ s = "[0 1, 2,@, 3,,, 4, 5 6 ]$" tree = g.parse(s, on_error=ignore_errors) - @unittest.skipIf(PARSER!='lalr', "interactive_parser error handling only works with LALR for now") + @unittest.skipIf(PARSER != 'lalr', "interactive_parser error handling only works with LALR for now") def test_iter_parse(self): ab_grammar = '!start: "a"* "b"*' - parser = Lark(ab_grammar, parser="lalr") + parser = _Lark(ab_grammar) ip = parser.parse_interactive("aaabb") i = ip.iter_parse() assert next(i) == 'a' @@ -2595,7 +2598,7 @@ assert next(i) == 'a' assert next(i) == 'b' - @unittest.skipIf(PARSER!='lalr', "interactive_parser is only implemented for LALR at the moment") + @unittest.skipIf(PARSER != 'lalr', "interactive_parser is only implemented for LALR at the moment") def test_interactive_treeless_transformer(self): grammar = r""" start: SYM+ @@ -2617,7 +2620,7 @@ res = ip.feed_eof() self.assertEqual(res.children, [1, 2, 1]) - @unittest.skipIf(PARSER!='lalr', "Tree-less mode is only supported in lalr") + @unittest.skipIf(PARSER == 'earley', "Tree-less mode is not supported in earley") def test_default_in_treeless_mode(self): grammar = r""" start: expr @@ -2643,7 +2646,7 @@ b = parser.parse(s) assert a == b - @unittest.skipIf(PARSER!='lalr', "strict mode is only supported in lalr for now") + @unittest.skipIf(PARSER != 'lalr', "strict mode is only supported in lalr for now") def test_strict(self): # Test regex collision grammar = r""" @@ -2687,7 +2690,7 @@ for _LEXER, _PARSER in _TO_TEST: _make_parser_test(_LEXER, _PARSER) -for _LEXER in ('dynamic', 'dynamic_complete'): +for _LEXER in ('basic', 'dynamic', 'dynamic_complete'): _make_full_earley_test(_LEXER) if __name__ == '__main__': diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/tests/test_reconstructor.py new/lark-1.2.2/tests/test_reconstructor.py --- old/lark-1.1.9/tests/test_reconstructor.py 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/tests/test_reconstructor.py 2024-08-13 21:47:06.000000000 +0200 @@ -154,7 +154,6 @@ for code in examples: self.assert_reconstruct(g, code, keep_all_tokens=True) - @unittest.skipIf(sys.version_info < (3, 0), "Python 2 does not play well with Unicode.") def test_switch_grammar_unicode_terminal(self): """ This test checks that a parse tree built with a grammar containing only ascii characters can be reconstructed diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/tests/test_trees.py new/lark-1.2.2/tests/test_trees.py --- old/lark-1.1.9/tests/test_trees.py 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/tests/test_trees.py 2024-08-13 21:47:06.000000000 +0200 @@ -447,5 +447,20 @@ with self.assertRaises(AttributeError): merge_transformers(T1(), module=T3()) + def test_transform_token(self): + class MyTransformer(Transformer): + def INT(self, value): + return int(value) + + t = Token('INT', '123') + assert MyTransformer().transform(t) == 123 + + class MyTransformer(Transformer): + def INT(self, value): + return Discard + + assert MyTransformer().transform(t) is None + + if __name__ == '__main__': unittest.main() diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/lark-1.1.9/tox.ini new/lark-1.2.2/tox.ini --- old/lark-1.1.9/tox.ini 2024-01-10 09:30:23.000000000 +0100 +++ new/lark-1.2.2/tox.ini 2024-08-13 21:47:06.000000000 +0200 @@ -1,5 +1,5 @@ [tox] -envlist = lint, type, py36, py37, py38, py39, py310, py311, py312, pypy3 +envlist = lint, type, py38, py39, py310, py311, py312, py313, pypy3 skip_missing_interpreters = true [testenv] @@ -25,8 +25,8 @@ skip_install = true recreate = false deps = - mypy==0.950 - interegular>=0.2.4 + mypy==1.10 + interegular>=0.3.1,<0.4.0 types-atomicwrites types-regex rich<=13.4.1
participants (1)
-
Source-Sync