You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For realizing the *Weather Information* scenario, you need to register on [OpenWeather](https://openweathermap.org/). Once done, set the environment variable *OPEN_WEATHER_API* in .env. The API keys can be located from [OpenWeather API Page](https://home.openweathermap.org/api_keys)
For the URL health checking scenario, the Chrome browser is invoked in the non-headless mode. It is recommended to install Chrome on your machine before you proceed to Step(4)
93
85
94
86
**Step 2**
95
87
96
-
Trigger the command *make clean* to clean the remove _pycache_ folder(s) and .pyc files
88
+
Trigger the command *make clean* to clean the remove _pycache_ folder(s) and .pyc files. It also cleans .DS_Store folder from the project.
Follow the below mentioned steps to perform scraping on local machine:
122
-
123
-
**Step 1**
124
-
125
-
Set *EXEC_PLATFORM* environment variable to *local*. Trigger the command ```brew install hyperfine``` on the terminal for installing hyperfine on macOS.
Beautiful Soup is a Python library that is majorly used for screen-scraping (or web scraping). More information about the library is available on [Beautiful Soup HomePage](https://www.crummy.com/software/BeautifulSoup/)
150
-
151
-
The Beautiful Soup (bs4) library is already installed as a part of *pre-requisite steps*. Hence, it is safe to proceed with the scraping with Beautiful Soup. [Scraping Club Infinite Scroll Website](https://scrapingclub.com/exercise/list_infinite_scroll/) has infinite scrolling pages and Selenium is used to scroll to the end of the page so that all the items on the page can be scraped using the said libraries.
152
-
153
-
The following websites are used for demonstration:
Trigger the command *make fetch-pokemon-names* to fetch the Pokemon names using [Pokemon APIs](https://pokeapi.co/api/v2/pokemon/) and running code in sync & async mode (using Asyncio in Python).
### Fetching Weather information for certain cities in the US using Openweather APIs
179
105
180
-
As seen from the above screenshots, content on Pages (1) thru' (5) on [LambdaTest E-Commerce Playground](https://ecommerce-playground.lambdatest.io/index.php?route=product/category&path=57) are successfully displayed on the console.
106
+
In this scenario, the city names from [Page-1](https://www.latlong.net/category/cities-236-15-1.html) thru' [Page-15](https://www.latlong.net/category/cities-236-15-13.html) are first scraped using BeautifulSoup.
Now that we have the city names, let's fetch the weather of those cities by providing Latitude & Longitude to the [OpenWeather API](https://api.openweathermap.org/data/2.5/weather?lat=<LATITUDE>&lon=<LONGITUDE>&appid=<OPEN_WEATHER_API>)
185
111
186
-
Also, all the 60 items on [Scraping Club Infinite Scroll Website](https://scrapingclub.com/exercise/list_infinite_scroll/) are scraped without any issues.
112
+
Trigger the command *make fetch-sync-weather-info* to fetch the Pokemon names using [Pokemon APIs](https://pokeapi.co/api/v2/pokemon/) and running code in sync mode.
187
113
188
-
## Web Scraping using Selenium Cloud Grid and Python
189
-
190
-
<b>Note</b>: As mentioned earlier, there could be cases where YouTube Scraping might fail on cloud grid (particularly when there are a number of attempts to scrape the content). Since cookies and other settings are cleared (or sanitized) after every test session, YouTube might take genuine web scraping as a Bot Attack! In such cases, you might across the following page where cookie consent has to be given by clicking on "Accept all" button.
You can find more information about this insightful [Stack Overflow thread](https://stackoverflow.com/questions/66902404/selenium-python-click-agree-to-youtube-cookie)
195
-
196
-
Since we are using LambdaTest Selenium Grid for test execution, it is recommended to create an acccount on [LambdaTest](https://www.lambdatest.com/?fp_ref=himanshu15) before proceeding with the test execution. Procure the LambdaTest User Name and Access Key by navigating to [LambdaTest Account Page](https://accounts.lambdatest.com/security).
As seen above, the content from LambdaTest YouTube channel and LambdaTest e-commerce playground are scrapped successfully! You can find the status of test execution in the [LambdaTest Automation Dashboard](https://automation.lambdatest.com/build).
As seen above, the status of test execution is "Completed". Since the browser is instantiated in the *Headless* mode, the video recording is not available on the dashboard.
239
-
240
-
### Web Scraping using Selenium Pytest (Cloud Execution)
241
-
242
-
The following websites are used for demonstration:
Trigger the command *make fetch-async-weather-info* to fetch the Pokemon names using [Pokemon APIs](https://pokeapi.co/api/v2/pokemon/) and running code in sync & async mode (using Asyncio in Python).
As seen above, the content from LambdaTest YouTube channel and LambdaTest e-commerce playground are scrapped successfully! You can find the status of test execution in the [LambdaTest Automation Dashboard](https://automation.lambdatest.com/build).
### <b>Result: Asyncio - 318.53+ seconds faster than sync execution</b>
276
129
277
-
As seen above, the status of test execution is "Completed". Since the browser is instantiated in the *Headless* mode, the video recording is not available on the dashboard. -->
278
130
279
131
## Have feedback or need assistance?
280
132
Feel free to fork the repo and contribute to make it better! Email to [himanshu[dot]sheth[at]gmail[dot]com](mailto:himanshu.sheth@gmail.com) for any queries or ping me on the following social media sites:
0 commit comments