Utils
PyDetex https://github.com/ppizarror/PyDetex
UTILS Module that contain all util methods and classes used in parsers and pipelines, from tex, language, and low-level.
- pydetex.utils.apply_tag_between_inside_char_command(s, symbols_char, tags)[source]
Apply tag between symbols.
For example, if symbols are
($, $)and tag is[1,2,3,4]:Input: This is a $formula$ and this is not. Output: This is a 1$2formula3$4 and this is not
- pydetex.utils.apply_tag_tex_commands(s, tags)[source]
Apply tag to tex command.
For example, if tag is
[1,2,3,4,5]:Input: This is a \formula{epic} and this is not Output: This is a 1\formula2{3epic4}5 and this is not
- pydetex.utils.apply_tag_tex_commands_no_argv(s, tags)[source]
Apply tag to tex command.
For example, if tag is
[1,2]:Input: This is a \formula and this is not. Output: This is a 1\formula2 and this is not
- pydetex.utils.check_repeated_words(s, lang, min_chars, window, stopwords, stemming, ignore=None, remove_tokens=None, font_tag_format='', font_param_format='', font_normal_format='', tag='repeated')[source]
Check repeated words.
- Parameters:
s (
str) – Textlang (
str) – Language codemin_chars (
int) – Min chars to acceptwindow (
int) – Window words span to checkstopwords (
bool) – Use stopwordsstemming (
bool) – Use stemmingremove_tokens (
Optional[List[str]]) – Remove keys before verify repeatfont_tag_format (
str) – Tag’s formatfont_param_format (
str) – Param’s formatfont_normal_format (
str) – Normal’s formattag (
str) – Tag’s name
- Return type:
- Returns:
Text with repeated words marked
- pydetex.utils.complete_langs_dict(lang)[source]
Completes a language dict. Assumes
'en'is the main language.
- pydetex.utils.find_tex_command_char(s, symbols_char)[source]
Find symbols command positions.
Example:
00000000001111111111.... 01234567890123456789.... Input: This is a $formula$ and this is not. Output: ((10, 11, 17, 18), ...)
- pydetex.utils.find_tex_commands(s, offset=0)[source]
Find all tex commands within a code.
00000000001111111111222 01234567890123456789012 a b c d Example: This is \aCommand{nice}... Output: ((8, 16, 18, 21), ...)
- pydetex.utils.find_tex_commands_noargv(s)[source]
Find all tex commands with no arguments within a code.
00000000001111111111222 01234567890123456789012 x x Example: This is Command ... Output: ((8,16), ...)
- pydetex.utils.find_tex_environments(s)[source]
Find all tex commands within a code.
Example:
0000000000111111111122222222223333333333 0123456789012345678901234567890123456789 a b c d Example: This is egin{nice}[cmd]my...\end{nice} Output: (('nice', 8, 20, 29, 39, 'parentenv', 0, -1), ...)This method also returns the name of the parent environment, the depth of the environment, and the depth of the item enviroment (if itemizable).
- pydetex.utils.get_diff_startend_word(original, new)[source]
Return the difference of the word from start and end, for example:
original XXXwordYY new word diff = (XXX, YY)
- pydetex.utils.get_number_of_day()[source]
Return the number of the day from the current year.
- Return type:
- Returns:
Day number
- pydetex.utils.get_tex_commands_args(s, pos=False)[source]
Get all the arguments from a tex command. Each command argument has a boolean indicating if that is optional or not.
Example: This is Command[\label{}]{nice} and... Output: (('aCommand', ('\label{}', True), ('nice', False)), ...)
- pydetex.utils.get_word_from_cursor(s, pos)[source]
Return the word from a string on a given cursor.
- pydetex.utils.split_tags(s, tags)[source]
Split a string based on tags, each line is then tagged.
String format: [TAG1]new line[TAG2]this is[TAG1]very epic
Output: [(‘TAG1’, ‘newline’), (‘TAG’, ‘this is), (‘TAG1’, ‘very epic’) … ]