mirror of
https://github.com/aljazceru/IngestRSS.git
synced 2025-12-18 06:24:21 +01:00
48 lines
2.1 KiB
Markdown
48 lines
2.1 KiB
Markdown
# After Public Launch
|
|
* Monthly Kaggle Dataset Publishing.
|
|
|
|
* Vector Database Initialization at earlier phase. [ Done ]
|
|
* Test out Vector Databases at Small Scale.
|
|
* [x] Testing
|
|
* [x] Fix OpenAI Error.
|
|
* [x] Fix Pinecone Error
|
|
* [x] Fix input error.
|
|
* [ ] Let it run for a day
|
|
* [x] Check Open AI Bill
|
|
* [x] Check Vector Database Bill
|
|
* [x] Figure out Vector Database Bug.
|
|
* [x] Add Logging to Pinecone
|
|
* [x] Run a simple test to see if any logs pop up.
|
|
* [x] Check Logs out and figure out what to debug
|
|
* [x] Figure out best way to store articles since metadata or in S3.
|
|
* [x] Turn off the eu
|
|
* [ ] Ensure the US data storage for both is working.
|
|
* [ ] Decreae the cost of cloudwatch Logs
|
|
* [ ] Test out Vector Databases at Scale.
|
|
* [ ] Add in text cleaning before after ingesting article but before storage.
|
|
* [ ] Automate the monthly data ingestion job
|
|
* [ ] Lambda Optimization
|
|
|
|
|
|
* Monthly ingestion job
|
|
* Protocol for annotating data.
|
|
* [ ] Development
|
|
* [ ] Check out Raj's script
|
|
* [ ] DSPy Integration
|
|
* [ ] LLMRouter integration
|
|
* [ ] Annotation Categories
|
|
* [ ] Main topic/Category ( list )
|
|
* [ ] Writing Stley ( e.g. Informal, professional, etc...)
|
|
* [ ] Promotional Material ( 0=Not Promotional, 1=Promotional)
|
|
* [ ] Stuff that is news ( 0= Not News, 1=News)
|
|
* [ ] Stuff that is news but like a list of news topics. ( 0=Opposite, 1=News Topic Lists)
|
|
* [ ] Annotating Entities ( List of Key entities with entity specific sentiment )
|
|
* [ ] List of Major Events ( e.g. Ukraine War, Israel Palestine, etc... )
|
|
* [ ] List of Minor Event ( e.g. Specific Battle, Court Case step, etc..)
|
|
* [ ] Novelty Factor ( Scale from 0(Not Interesting) -> 100(Interesting))
|
|
* [ ] Annotating Podcast Scripts or Video Scripts ( 0=is not a script, 1=Is a script)
|
|
* [ ] Political Quadrant ( Or that eight dimensional thing that guy had. )
|
|
|
|
* Estimation Algorithm for annotation cost.
|
|
* Open Source Protocol for running this.
|