CitySpire: Our Project

A Team of Data Scientists, Web Developers, & iOS Developers.

Drewrust
5 min readApr 23, 2021

For my final project with Lambda School’s Data Science Program we worked on developing CitySpire, an app to help people relocate. Making up the larger team as a whole were three other smaller teams of web developers, iOS developers, and last but not least our crew of four Data Science developers. We broke up our portions of the data projects to better tackle the individual tasks. My problem was trying to access Indeed’s Job API and add jobs to the database.

Having worked with Twitter’s and Spotify’s Python developer APIs I went into it with optimism as I started trying to figure out Indeed’s API. That optimism dimmed after reading the documentation on it and not finding many other developers who had blogged on the process. Finding that I would need a job publisher ID and it would take a while to apply for it plans changed. Also, not to mention you would I believe need jobs as an employer to publish of which of course I didn’t the best option became a web scraper.

Here there were plenty of blog posts and people who had “How To’s” on the process. I had worked with Python’s brilliant and fun Beautiful Soup Web Scraper Library before tackling an NLP project so I was excited to begin. We were going to use FastAPI web framework which I had used before as well.

The Scraper Code and Implementation

Here is the Web Scraper function I used of course with “bs” the Beautiful Soup Library.
The Parameters of the Fast API queries.
Returning up to 15 jobs using Python’s Beautiful Soup Web Scraper Library.

Re-evaluating the live scraper in favor of a database

After getting the live scraper up and running on our implementation of FastAPI I discussed it with our project manager and he encouraged me to compile jobs for each city we would be using. So with time winding down I compiled a CSV of random jobs for each city using the scraper. The code took about 10 minutes to compile about 2500 jobs for all the cities. As with any project though I hit some roadblocks and challenges.

Jupyter Notebook on compiling about 30 random jobs for each city.

Roadblocks

So after creating the CSV one of the other data team members uploaded my CSV onto the Postgresql database and I wrote the code to access it. I realized it was well intentioned but rushed and for an unknown reason the code didn’t acquire jobs for all the cities. Second, obviously some cities have the same names in different states so I’d have to address that in the database (actually above it’s implemented for the scraper with a box to enter a state abbreviation). Third, when I tried to query the database I was reminded by a teammate that I needed to import the function. So at this point in the project it was a bit late to rework the code to get jobs for all the cities.

In conclusion, this was a challenging part of my project and it could’ve gone better but I did get a minimum viable product at least running locally with the scraper. My team members were and are amazing and I learned a lot in the process. I’m grateful for their hard work and knowledge shared.

Current State of our App

The current state of the product is I feel about as good as we could do on the first try. Knowing what I know now I could’ve done much better gathering jobs data. It was a learning experience though and I enjoyed working on a team.

The Data API in action Getting Jobs from Indeed’s Site curtesy of Giphy

The full list of shipped features would be as follows:

  • Scraped Jobs from Indeed’s API by Job Role, City, and State.
  • Weather Conditions sunny, cloudy, rainy, snowy days for target city.
  • Monthly Weather forecast for the target city.
  • Daily Weather forecast for the target city.
  • School District information for number of schools, students, teachers, and pupil/teacher ration in that school district.
  • Housing Price averages for each city for homes, condos, and specifically number of bedrooms.

Future of the Product and Reflections

Future features would have a much more solid collection of data for sure. My portion of the project would have about 15 random jobs for each city. Since we have 397 cities times 15 that would result in 5,955 jobs in the database. I would also have to write test code to make sure each city had at least 15 jobs in the database. Still the only issue with said method, it would result in jobs that wouldn’t be up to date. As a student though, I understand for the sake of learning it’s important to know how to get data on a database and then query it. I would in the end hope to acquire a publisher ID and have up to date access to Indeed’s API for current job openings.

I would also try to get how many jobs are currently available for a particular job title in the target city. Essentially how strong the industry of a particular job title is in the entered city. This could also tie into better updating a livability score for each individual users needs.

Some of the feedback that I got from my teammates was that I should just keep “clacking away.” This was a joke made by one of our web developers who doubled as our motivational speaker and all around optimist. It meant to just keep working even if it seems like you’re getting no where at some point it pays off. This is probably good advice all around not just in becoming a better developer but in most things in life if you want to get good at them “just keep clacking away.” Great advice Kyla if you’re reading this and thanks again to all the team members and project leads!

--

--

Drewrust

Aspiring Data Science guy and student studying Data Science through Lambda School. Intermediate Web Developer with an interest in Cyber Security.