Parsers

PyDetex https://github.com/ppizarror/PyDetex

PARSERS Defines parsers, which perform a single task for removal of LaTex things.

pydetex.parsers.find_str(s, char)[source]

Finds a sequence within a string, and returns the position. If not exists, returns -1.

Parameters:
Return type:

int

Returns:

Position

pydetex.parsers.process_begin_document(s, **kwargs)[source]

Removes all code outside begin document, if found.

Parameters:

s (str) – Latex code

Return type:

str

Returns:

Removes all data outside the document

pydetex.parsers.process_chars_equations(s, lang, single_only, **kwargs)[source]

Process single char equations, removing the symbols.

Parameters:
  • s (str) – Latex string code

  • lang (str) – Language tag of the code

  • single_only (bool) – Only process single char equations. If False, replaces the equation by a text-label

Return type:

str

Returns:

Code without symbols

pydetex.parsers.process_cite(s, sort_cites=True, compress_cite=True, cite_separator=', ', **kwargs)[source]

Transforms all cites to a text-based with numbers. For example, 'This is from \cite{Pizarro}' to 'This is from [1]'.

Parameters:
  • s (str) – Latex string code

  • sort_cites (bool) – Sorts the cite numbers

  • compress_cite (bool) – Compress the cite numbers, ex [1, 2, 3, 10] to [1-3, 10]

  • cite_separator (str) – Separator of cites, for example [1{sep}2{sep}3]

Return type:

str

Returns:

Latex with cite as numbers

pydetex.parsers.process_citeauthor(s, lang, **kwargs)[source]

Transforms all citeauthor to [cite]. For example: 'This is from \citeauthor{Pizarro}, and that is from \citeauthor{cite1, cite2}' to 'This is from [author], and that is from [authors]'.

Parameters:
  • s (str) – Latex string code

  • lang (str) – Language tag of the code

Return type:

str

Returns:

Latex with replaced cites

pydetex.parsers.process_def(s, clear_learned=True, replace=False, **kwargs)[source]

Process defs. Store the definition, among others.

Parameters:
  • s (str) – Latex with definitions

  • clear_learned (bool) – Clear the last learned definitions

  • replace (bool) – Replace instances of learned defs

Return type:

str

Returns:

Latex without definitions

pydetex.parsers.process_inputs(s, clear_not_found_files=False, **kwargs)[source]

Process inputs, which find the input files and retrieve its contents.

Parameters:
  • s (str) – Latex string code with inputs

  • clear_not_found_files (bool) – Clear the not found files. Used when changing the path

Return type:

str

Returns:

Text copied with data from inputs

pydetex.parsers.process_items(s, lang, **kwargs)[source]

Process itemize and enumerate.

Parameters:
  • s (str) – Latex string code

  • lang (str) – Language tag

Return type:

str

Returns:

Processed items

pydetex.parsers.process_labels(s, **kwargs)[source]

Removes labels.

Parameters:

s (str) – Latex string code

Return type:

str

Returns:

String with no labels

pydetex.parsers.process_ref(s, **kwargs)[source]

Process references, same as cites, replace by numbers.

Parameters:

s (str) – Latex string code

Return type:

str

Returns:

String with numbers instead of references.

pydetex.parsers.remove_commands_char(s, chars)[source]

Remove all char commands.

Parameters:
  • s (str) – Latex string code

  • chars (List[Tuple[str, str, bool]]) – Char that define equations [(initial, final, ignore escape), …]

Return type:

str

Returns:

Code with removed chars

pydetex.parsers.remove_commands_param(s, lang, invalid_commands=None, **kwargs)[source]

Remove all commands with params.

Parameters:
  • s (str) – Latex string code

  • lang (str) – Language tag of the code

  • invalid_commands (Optional[List[str]]) – Invalid commands that will not call output_text_for_some_commands. If None use default

Return type:

str

Returns:

Code with removed chars

pydetex.parsers.remove_commands_param_noargv(s, **kwargs)[source]

Remove all commands without arguments.

Parameters:

s (str) – Latex string code

Return type:

str

Returns:

Code with removed chars

pydetex.parsers.remove_comments(s, **kwargs)[source]

Remove comments from the text.

Parameters:

s (str) – Latex string code

Return type:

str

Returns:

String without comments

pydetex.parsers.remove_common_tags(s, replace_tags=None, **kwargs)[source]

Remove common tags from string.

Parameters:
  • s (str) – Latex string code

  • replace_tags (Optional[List]) – List to replace. If None, default will be used

Return type:

str

Returns:

Text without tags

pydetex.parsers.remove_environments(s, env_list=None, **kwargs)[source]

Remove a selection of environments.

Parameters:
  • s (str) – Latex code

  • env_list (Optional[List[str]]) – Environment list, if not defined, use the default from PyDetex

Return type:

str

Returns:

Code without given environments

pydetex.parsers.remove_equations(s, **kwargs)[source]

Remove all equations from a string.

Parameters:

s (str) – Latex string code

Return type:

str

Returns:

Latex without equation

pydetex.parsers.remove_tag(s, tagname)[source]

Removes a latex tag code.

Parameters:
  • s (str) – Latex string code

  • tagname (str) – Tag code

Return type:

str

Returns:

String without tags

pydetex.parsers.replace_pydetex_tags(s, cite_format=('[', ']'), **kwargs)[source]

Replaces tags to text.

Parameters:
  • s (str) – Latex string code

  • cite_format (Tuple[str, str]) – Cite format

Return type:

str

Returns:

String with no cites

pydetex.parsers.simple_replace(s, **kwargs)[source]

Replace simple tokens.

Parameters:

s (str) – Latex string code

Return type:

str

Returns:

String with replaced items

pydetex.parsers.strip_punctuation(s, **kwargs)[source]

Strips punctuation. For example, 'mycode :' to 'mycode:'.

Parameters:

s (str) – Latex string code

Return type:

str

Returns:

Stripped punctuation

pydetex.parsers.unicode_chars_equations(s, **kwargs)[source]

Converts all equations to unicode.

Parameters:

s (str) – Latex string code

Return type:

str

Returns:

Latex with unicode converted