Autoparse Tutorial¶
Autoparse allows for user-friendly regex usage to parse for information. Consider if you had the following string:
>>> mystring = ' ___A_a_ * & b ___c_ C 1d 2 __e_ D__'
We can use autoparse to work with the string
>>> import autoparse
>>> autoparse.find.split_words(mystring)
('___A_a_', '*', '&', 'b', '___c_', 'C', '1d', '2', '__e_', 'D__')
Or we can capture specific patterns
>>> pattern = autoparse.pattern.capturing(autoparse.pattern.UPPERCASE_LETTER)
>>> autoparse.find.all_captures(pattern, mystring)
('A', 'C', 'D')
>>> pattern = autoparse.pattern.capturing(autoparse.pattern.DIGIT)
>>> autoparse.find.first_capture(pattern, mystring)
'1'
We can also use autoparse to make sure a string is formatted as expected
>>> message = 'Greetings, user'
>>> autoparse.find.starts_with('Greet', message)
True
>>> autoparse.find.ends_with('goodbye', message)
False
>>> mynumber = ' 400 '
>>> autoparse.find.is_number(mynumber)
True
Now that you’ve got the basics, lets see how autoparse can help us parse real data we’ll encounter in AutoMech. We are defining a search pattern that will parse the cartesian coordinates out of a file.
>>> atom_symbols = (
... autoparse.pattern.LETTER +
... autoparse.pattern.maybe(autoparse.pattern.LETTER)
... )
>>> number = autoparse.pattern.FLOAT
>>> xyz_lines = autoparse.pattern.LINESPACES.join([
... autoparse.pattern.capturing(atom_symbols),
... autoparse.pattern.capturing(number),
... autoparse.pattern.capturing(number),
... autoparse.pattern.capturing(number),
... ])
Now we can put in an example xyz string
>>> xyz_string = """
... dummy line 1
... another dummy line
...
... that dummy line was blank
... 6
... charge: 0, mult: 1
... F 1.584823 -0.748487 -0.427122
... C 0.619220 0.190166 -0.271639
... C -0.635731 -0.183914 -0.180364
... Cl -1.602333 0.736678 -0.026051
... H 0.916321 1.229946 -0.227127
... H -0.882300 -1.224388 -0.229636
... let's end on a dummy line
... """
>>> autoparse.cast(autoparse.find.all_captures(xyz_lines, xyz_string))
(('F', 1.584823, -0.748487, -0.427122), ('C', 0.61922, 0.190166, -0.271639), ('C', -0.635731, -0.183914, -0.180364), ('Cl', -1.602333, 0.736678, -0.026051), ('H', 0.916321, 1.229946, -0.227127), ('H', -0.8823, -1.224388, -0.229636))
Note
Move on to the next tutorial autoread-tutorial-doc to …
Or return to the tutorial hub Tutorial Hub to check out more tutorials