html - Python selenium webdriver not consistently selecting element even though it's there -
i'm developing web scraper collect src link source tag in html file , add list.
the site has video nested under load of divs, of pages come to:
<video type="video/mp4" poster="someimagelink" preload="metadata" crossorigin="anonymous"> <source type="video/mp4" src="somemp4link"> </video>
my current method logging site, going page links video pages, going each video page 1 one , trying find source tag , adding list.
import time import requests bs4 import beautifulsoup selenium import webdriver browser = webdriver.firefox() # bunch of log in , list of video page links, works fine soup = beautifulsoup(browser.page_source) in range(3): browser.get(soup('a', {'class', 'subject__item'})[i]['href']) vsoup = beautifulsoup(browser.page_source) print(vsoup('source')) browser.get('pagewithvideopages') # doen't add list, goes video page, # tries find source tag , print out. # go original page , start loop again.
what happens this:
[<source src="themp4link" type="video/mp4"></source>] [] [] []
so first 1 works fine, rest return black lists...as if there no source tag, mannually checking inspector reveals there source tag there.
repeating this, get:
[<source src="http://themp4link" type="video/mp4"></source>] [] [<source src="http://themp4link" type="video/mp4"></source>]
the site needed javascript enabled load content (which why i'm using webdriver this)...could that?
any appreciated!
you need wait web element looking for. should explore using webdriverwait.
Comments
Post a Comment