Data Scientist at CSE

About this employment

Developed resource constrained data pipelines and large-scale scrapers of college sports data using Golang and optimized MySQL databases.

Started: Jun 1, 2024
• 2 min read

Data Scientist at CSE

Position: Data Scientist
Company: CSE
Location: Colorado Springs, CO
Employment Period: June 2024 - August 2024
Industry: Data Science / Technology

Overview

As a Data Scientist at CSE, I was responsible for developing and maintaining data infrastructure focused on college sports analytics. This role required building scalable, resource-efficient solutions that could handle large volumes of data while working within computational constraints.

Key Responsibilities

  • Developed resource constrained data pipelines and large-scale scrapers of college sports data using Golang

    • Built efficient data collection systems that could process millions of records
    • Implemented rate limiting and retry mechanisms to ensure reliable data acquisition
    • Optimized memory usage for processing large datasets on limited hardware
  • Optimized and managed MySQL databases

    • Designed and implemented database schemas for efficient data storage
    • Created indexes and optimized queries for faster data retrieval
    • Implemented database backup and recovery procedures
  • Fixed and improved Python scripts under time pressure

    • Debugged and refactored legacy data processing scripts
    • Improved script performance and reliability
    • Added error handling and logging for better monitoring

Technical Skills Acquired

  • Golang: Advanced proficiency in building concurrent, efficient applications
  • MySQL: Database design, optimization, and administration
  • Python: Data processing, scripting, and debugging
  • SQL: Complex query writing and optimization
  • Data Pipeline Development: Building scalable ETL processes
  • Web Scraping: Developing robust scrapers with proper error handling

Impact

Successfully built and deployed data pipelines that collected and processed sports data for analytics purposes. The systems I developed continued to run reliably after my internship ended, providing valuable data for the company's analytics products.