Scrapy Spiders, Python Processing & Web APIs

Over the past couple of weeks, I’ve spent some time drafting a Web App for Touch Footy results. The App is built on .NET Core, and this gave me a great opportunity to review the new Visual Studio .NET Core tooling. But once I had my bare bones App, I needed some data to play with. Enter Scrapy and Python…

What I wanted was a data set that I could use in the Touch Footy App. I had a good data source and I figured my best bet was a web scraper. Scrapy made it easy for me to scrape together my test data set. It’s built using Python (hence you need some understanding of Python to use it). Python is an interpreted language. It’s great for list processing and it’s easy to read/write.

Scrapy is an open source framework for writing web crawlers, or spiders. It gives you control over how and when you execute the spiders you’ve written. And a great shell as a part of the framework to test/debug commands. After looking at other web scraping options, I decided on Scrapy as a neat way to get my data.

After a few hours coding, I had a crawler that collected the data I wanted, i.e. groups, teams, fixtures and results for my Web App. I wanted to store the data in JSON – for easy processing – and Scrapy made that easy too. It was simple then to write some Python code to process the JSON for groups, teams, fixtures and results.

All good so far, and fun to boot. My next step – how to get the data to the Touch Footy Web App? Well, Visual Studio 2015 tooling for .NET Core makes it easy to add a Web API to an MVC Web Application…. several hours later I had a working spider populating data into my App.

Published by

Unknown's avatar

mytechstuffsite

Chris Cormack is a seasoned professional with 30 years of IT experience, specializing in guiding leadership teams through cloud transformation and AI initiatives. As a trusted advisor to C-suite executives, Chris excels in translating complex strategies—such as cloud implementation and hybrid solutions—into measurable business outcomes. He identifies critical challenges and transforms them into opportunities for innovation and growth. With Chris's expertise, organizations can effectively harness the full potential of cloud and AI technologies, driving innovation and maintaining a competitive edge in an increasingly digital marketplace.

Leave a comment