Commit Graph

1 Commits

Author SHA1 Message Date
sherif-med
31525dfef7 Text file loaders (#3031)
* adding requiered packages for loading pdf, docx, md, tex files (preferably pure python packages)

* adding text file utils providing function to load file based on extension && adding read_text_file command

* adding test cases for text file loading (pdf file creation is hardcoded due to external package requierment for creation (a sample file can be added))

* formatting

* changing command name from 'read_text_file' to 'parse_text_document'

* fallback to txtParser if file extension is not known to read script and code files

* adding extension respective parsers

* adding binary file check function

* adding file existance check && raising valueError for unsupported binary file formats

* adding check file type (binary) in test_parsers for specific extensions && fixing mock pdf generation to include null bytes

* adding .yml extension parser

* removal of .doc parser

* updating file loading commands names

* updating test (removing .doc mock function)

* fix: import sort

* new cassette for mem A

* feat: update Cassettes

* feat: consolidate commands

* feat: linting

* feat: updates to cassettes

---------

Co-authored-by: Reinier van der Leer <github@pwuts.nl>
Co-authored-by: Nicholas Tindle <nick@ntindle.com>
Co-authored-by: k-boikov <64261260+k-boikov@users.noreply.github.com>
2023-05-21 14:48:40 -05:00