Parsers

PyDetex https://github.com/ppizarror/PyDetex

PARSERS Defines parsers, which perform a single task for removal LaTex things.

pydetex.parsers.find_str(s, char)[source]

Finds a sequence within a string, and returns the position. If not exists, returns -1.

Parameters
Return type

int

Returns

Position

pydetex.parsers.process_begin_document(s, **kwargs)[source]

Removes all code outside begin document, if found.

Parameters

s (str) – Latex code

Return type

str

Returns

Removes all data outside the document

pydetex.parsers.process_chars_equations(s, lang, single_only, **kwargs)[source]

Process single char equations, removing the symbols.

Parameters
  • s (str) – Latex string code

  • lang (str) – Language tag of the code

  • single_only (bool) – Only process single char equations. If False, replaces the equation by a text-label

Return type

str

Returns

Code without symbols

pydetex.parsers.process_cite(s, sort_cites=True, compress_cite=True, cite_separator=', ', **kwargs)[source]

Transforms all cites to a text-based with numbers. For example: 'This is from \cite{Pizarro}' to 'This is from [1]'.

Parameters
  • s (str) – Latex string code

  • sort_cites (bool) – Sort the cite numbers

  • compress_cite (bool) – Compress the cite numbers, ex [1, 2, 3, 10] to [1-3, 10]

  • cite_separator (str) – Separator of cites, for example [1{sep}2{sep}3]

Return type

str

Returns

Latex with cite as numbers

pydetex.parsers.process_citeauthor(s, lang, **kwargs)[source]

Transforms all citeauthor to [cite]. For example: 'This is from \citeauthor{Pizarro}, and that is from \citeauthor{cite1, cite2}' to 'This is from [author], and that is from [authors]'.

Parameters
  • s (str) – Latex string code

  • lang (str) – Language tag of the code

Return type

str

Returns

Latex with replaced cites

pydetex.parsers.process_def(s, clear_learned=True, replace=False, **kwargs)[source]

Process defs. Store the definition, among others.

Parameters
  • s (str) – Latex with definitions

  • clear_learned (bool) – Clear the last learned definitions

  • replace (bool) – Replace instances of learned defs

Return type

str

Returns

Latex without definitions

pydetex.parsers.process_inputs(s, clear_not_found_files=False, **kwargs)[source]

Process inputs, and try to copy the content.

Parameters
  • s (str) – Latex string code with inputs

  • clear_not_found_files (bool) – Clear the not found files. Used when changing the path

Return type

str

Returns

Text copied with data from inputs

pydetex.parsers.process_items(s, **kwargs)[source]

Process itemize and enumerate.

Parameters

s (str) – Latex string code

Return type

str

Returns

Processed items

pydetex.parsers.process_labels(s, **kwargs)[source]

Removes labels.

Parameters

s (str) – Latex string code

Return type

str

Returns

String with no labels

pydetex.parsers.process_quotes(s, **kwargs)[source]

Process quotes.

Parameters

s (str) – Latex string code

Return type

str

Returns

String with “quotes”

pydetex.parsers.process_ref(s, **kwargs)[source]

Process references, same as cites, replaces by numbers.

Parameters

s (str) – Latex string code

Return type

str

Returns

String with numbers instead of references.

pydetex.parsers.remove_commands_char(s, chars)[source]

Remove all char commands.

Parameters
  • s (str) – Latex string code

  • chars (List[Tuple[str, str, bool]]) – Char that define equations [(initial, final, ignore escape), …]

Return type

str

Returns

Code with removed chars

pydetex.parsers.remove_commands_param(s, lang, invalid_commands=None, **kwargs)[source]

Remove all commands with params.

Parameters
  • s (str) – Latex string code

  • lang (str) – Language tag of the code

  • invalid_commands (Optional[List[str]]) – Invalid commands that will not call output_text_for_some_commands. If None use defaults

Return type

str

Returns

Code with removed chars

pydetex.parsers.remove_commands_param_noargv(s, **kwargs)[source]

Remove all commands without arguments.

Parameters

s (str) – Latex string code

Return type

str

Returns

Code with removed chars

pydetex.parsers.remove_comments(s, **kwargs)[source]

Remove comments from text.

Parameters

s (str) – Latex string code

Return type

str

Returns

String without comments

pydetex.parsers.remove_common_tags(s, replace_tags=None, **kwargs)[source]

Remove common tags from string.

Parameters
  • s (str) – Latex string code

  • replace_tags (Optional[List]) – List to replace. If None, default will be used

Return type

str

Returns

Text without tags

pydetex.parsers.remove_environments(s, env_list=None, **kwargs)[source]

Remove a selection of environments.

Parameters
  • s (str) – Latex code

  • env_list (Optional[List[str]]) – Environment list, if not defined, use the default from PyDetex

Return type

str

Returns

Code without given environments

pydetex.parsers.remove_equations(s, **kwargs)[source]

Remove all equations from a string.

Parameters

s (str) – Latex string code

Return type

str

Returns

Latex without equation

pydetex.parsers.remove_tag(s, tagname)[source]

Removes a latex tag code.

Parameters
  • s (str) – Latex string code

  • tagname (str) – Tag code

Return type

str

Returns

String without tags

pydetex.parsers.replace_pydetex_tags(s, cite_format=('[', ']'), **kwargs)[source]

Replaces font tags to a specific format.

Parameters
  • s (str) – Latex string code

  • cite_format (Tuple[str, str]) – Cite format

Return type

str

Returns

String with no cites

pydetex.parsers.simple_replace(s, **kwargs)[source]

Replace simple tokens.

Parameters

s (str) – Latex string code

Return type

str

Returns

String with replaced items

pydetex.parsers.strip_punctuation(s, **kwargs)[source]

Strips punctuation. For example, 'mycode :' to 'mycode:'.

Parameters

s (str) – Latex string code

Return type

str

Returns

Stripped punctuation

pydetex.parsers.unicode_chars_equations(s, **kwargs)[source]

Converts all equations to unicode.

Parameters

s (str) – Latex string code

Return type

str

Returns

Latex with unicode converted