HarvestOrb - Web Scraping: Automated LinkedIn Job Searching Project

I started this project because spenting time scrolling and searching job posts on a job site can be time consuming. More importantly, some job postings are not up-to-date.

My first approach was to use BeautifulSoup to extract job posts from LinkedIn. It did not work because LinkedIn prevent web crawlers from indexing their job listing pages and whitelisted most of the web crawlers.

After doing some research, I finally find out that using Selenium Web driver can do the trick since Selenium is not a web scraping tool. In fact, Selenium is a purpose-built browser tool for testing. With Selenium, I don't have to worry about making any unauthorized or suspicious requests to a webpage.

As a result, I scraped and saved all customized data, especially job posting date, into a HTML file.

Steps Overview

Automation process of using Selenium and BeautifulSoup to extract the jobs from the job site (Behind the scene).

  1. Visit LinkedIn page
  2. Click 'Sign In' button
  3. Enter account credentials
  4. Click 'Log or Sign In' button
  5. Click 'Jobs' tab
  6. Enter title and location in the text fields (This can be skipped)
  7. Click 'Search' button
  8. Click 'All filters' button (This can be skipped)
  9. Check any or all preferred boxes (This can be skipped)
  10. Click 'show results' (This can be skipped)
  11. Click job post that shows on the left panel
    1. At the same time, the right panel will show more info about the company, job type, description, and requirement
    2. Repeat 11 and 11a. to until the end of job list
  12. Click 'Reset' button (This can be skipped)
  13. Click 'Sign Out'
Examining the Console Output and Job List in HTML format

In this video, it will show the console output while running the automation.

The additional information is that it shows the table of job list that saved in html file.

The process of looping through 25 job posts will take some time.

Feel free to skip to 3:30 to view the list of job that stored in an HTML file.