🚀 IngestRSS - 🗞️💵⚖️
IngestRSS is an AWS-based RSS feed processing system that automatically fetches, processes, and stores articles from specified RSS feeds. This project is designed to support social scientists in progressing research on news and media.
🎯 Purpose
The primary goal of IngestRSS is to provide researchers with a robust, scalable solution for collecting and analyzing large volumes of news data. By automating the process of gathering articles from diverse sources, this tool enables social scientists to focus on their research questions and data analysis, rather than the complexities of data collection.
🚀 Getting Started
Prerequisites
- Python 3.12
- AWS account with necessary permissions
- AWS CLI configured with your credentials
Setup
-
Clone the repository:
git clone https://github.com/yourusername/IngestRSS.git cd IngestRSS -
Install required packages:
python -m pip install -r requirements.txt -
Set up your environment variables:
- Find the file named
template.envin your folder. - Make a copy of this file in the same folder.
- Rename the copy to
.env(make sure to include the dot at the start). - Open the
.envfile and fill in your information where you see***.
Here's what you need to fill in:
AWS_REGION=*** AWS_ACCOUNT_ID=*** AWS_ACCESS_KEY_ID=*** AWS_SECRET_ACCESS_KEY=***The other settings in the file are already set up for you, but you can change them if you need to.
- Find the file named
-
Launch the application:
python launch.py
🛠️ Configuration
- RSS feeds can be modified in the
rss_feeds.jsonfile. - CloudFormation templates are located in
src/infra/cloudformation/. - Lambda function code is in
src/lambda_function/src/.
📊 Monitoring
The Lambda function logs are still sent to CloudWatch Logs, however metrics are
exposed using Prometheus. When the processor runs it
starts a tiny HTTP server that serves metrics on /metrics (port 8000 by
default). These metrics can be scraped by a Prometheus server for monitoring.
🤝 Contributing
Contributions are welcome, feel free to see open issues to get started.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
