Data Scientist @ CSE
Developed resource constrained data pipelines and large-scale scrapers of college sports data using Golang and optimized MySQL databases.
Location
Data Scientist @ CSE ¶
Position: Data Scientist Company: CSE Location: Colorado Springs, CO Employment Period: June 2024 - August 2024 Industry: Data Science / Technology
Overview ¶
As a Data Scientist at CSE, I was responsible for developing and maintaining data infrastructure focused on college sports analytics. This role required building scalable, resource-efficient solutions that could handle large volumes of data while working within computational constraints.
Key Responsibilities ¶
-
Developed resource constrained data pipelines and large-scale scrapers of college sports data using Golang
- Built efficient data collection systems that could process millions of records
- Implemented rate limiting and retry mechanisms to ensure reliable data acquisition
- Optimized memory usage for processing large datasets on limited hardware
-
Optimized and managed MySQL databases
- Designed and implemented database schemas for efficient data storage
- Created indexes and optimized queries for faster data retrieval
- Implemented database backup and recovery procedures
-
Fixed and improved Python scripts under time pressure
- Debugged and refactored legacy data processing scripts
- Improved script performance and reliability
- Added error handling and logging for better monitoring
Technical Skills Acquired ¶
- Golang: Advanced proficiency in building concurrent, efficient applications
- MySQL: Database design, optimization, and administration
- Python: Data processing, scripting, and debugging
- SQL: Complex query writing and optimization
- Data Pipeline Development: Building scalable ETL processes
- Web Scraping: Developing robust scrapers with proper error handling
Impact ¶
Successfully built and deployed data pipelines that collected and processed sports data for analytics purposes. The systems I developed continued to run reliably after my internship ended, providing valuable data for the company’s analytics products.