Professional projects
These are examples of the kinds of projects I've done professionally
Webscraping
I've built a wide range of webscraping systems, some which are used on demand, some which would run for a weekend collecting huge amounts of data. I overcame a variety of challenges in doing so, such as:
- A website which was composed of a single image tag and basically nothing else in the html. The website seemed to create the page needed and then take a screenshot and simply use that image in the html, so all there was to scrape was 1 image tag in the HTML. I solved this by writing a system that uses handwriting recognition software (tesseract) and breaking the screen down into small chunks, then by going through each section at a time until I find certain keywords, This way I was able to collect the required data that way and convert it into a pandas dataframe.
- Websites which require accounts and are limited to one login at a time would be quite slow since time would be spent inititiating chromedriver, connecting to the website, loggining in and navigating the website to the correct area before any data collection could be started. I overcame this by having a pool of accounts logged in at any time allready waiting to collect data as a microservice, and each idle microservice would communicate to the backend its available. This way when instruction were passed from the backend much time could be saved since the systems were already waiting to collect data.
- A Complete lack of standardization could prove problematic for finding the correct product, but by scouring the office for various csv with dozens of rows of data for each product (many of which were ~7 digit numbers) I would have enough data to identify the product needed.
Other
I have built an assortment of tools for comparing, checking, and dividing excel files to help speedup various task around the office. in some cases, I built systems to assign primary keys (capcode) by comparing dozens of data points per object against millions of other objects to find exact and exclusive matches. these tools have made some impossible tasks possible and made over 1000 times faster.
I have made multivarient linear regression systems for predicting future car prices. the main difficulty here was finding sufficient data points for cars as each car is only produced for a set period of time and no one was keeping old data for future use when I began working on this. however the monthly ratebooks that do exist for certain cars typically had lots of other data to act as independent variables like vat, bik, co2 output etc...
I've made many display systems which take their data from the companies P&L or KPI to display learderboards and target for teams around the office. sometimes these graphs would be made in JavaScript, being constantly fed information using flask, sometimes matplotlib using tkinter.