How to get a `td` text under a specific `th` in a table with Jsoup? -
following extracted rows table;
<table class="infobox vevent" style="width:22em"> <caption class="summary">adobe shockwave player</caption> <tr> <td colspan="2" style="text-align:center"><a href="/wiki/file:adobe_shockwave_player_logo.png" class="image"><img alt="adobe shockwave player logo.png" src="//upload.wikimedia.org/wikipedia/en/thumb/8/8e/adobe_shockwave_player_logo.png/64px-adobe_shockwave_player_logo.png" width="64" height="64" srcset="//upload.wikimedia.org/wikipedia/en/thumb/8/8e/adobe_shockwave_player_logo.png/96px-adobe_shockwave_player_logo.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/8/8e/adobe_shockwave_player_logo.png/128px-adobe_shockwave_player_logo.png 2x" data-file-width="165" data-file-height="165"></a></td> </tr> <tr> <th scope="row" style="white-space: nowrap;"><a href="/wiki/software_developer" title="software developer">original author(s)</a></th> <td><a href="/wiki/macromedia" title="macromedia">macromedia</a></td> </tr> <tr> <th scope="row" style="white-space: nowrap;"><a href="/wiki/software_developer" title="software developer">developer(s)</a></th> <td><a href="/wiki/adobe_systems" title="adobe systems">adobe systems</a></td> </tr> <tr> <th scope="row" style="white-space: nowrap;"><a href="/wiki/software_release_life_cycle" title="software release life cycle">stable release</a></th> <td>12.2.4.194 / 19 february 2016<span class="noprint">; 4 months ago</span><span style="display:none"> (<span class="bday dtstart published updated">2016-02-19</span>)</span><sup id="cite_ref-1" class="reference"><a href="#cite_note-1">[1]</a></sup></td> </tr> <tr> <th scope="row" style="white-space: nowrap;"><a href="/wiki/operating_system" title="operating system">operating system</a></th> <td><a href="/wiki/microsoft_windows" title="microsoft windows">microsoft windows</a>, <a href="/wiki/mac_os_9" title="mac os 9">mac os 9</a>, <a href="/wiki/mac_os_x" class="mw-redirect" title="mac os x">mac os x</a> (universal)</td> </tr> <tr> <th scope="row" style="white-space: nowrap;"><a href="/wiki/computing_platform" title="computing platform">platform</a></th> <td><a href="/wiki/web_browsers" class="mw-redirect" title="web browsers">web browsers</a></td> </tr> <tr> <th scope="row" style="white-space: nowrap;"><a href="/wiki/list_of_software_categories" title="list of software categories">type</a></th> <td>multimedia player / <a href="/wiki/mime" title="mime">mime</a> type: application/x-director</td> </tr> <tr> <th scope="row" style="white-space: nowrap;"><a href="/wiki/software_license" title="software license">license</a></th> <td><a href="/wiki/proprietary_software" title="proprietary software">proprietary</a><sup id="cite_ref-2" class="reference"><a href="#cite_note-2">[2]</a></sup></td> </tr> <tr> <th scope="row" style="white-space: nowrap;">website</th> <td><span class="url"><a rel="nofollow" class="external text" href="http://www.adobe.com/products/shockwaveplayer/">www<wbr>.adobe<wbr>.com<wbr>/products<wbr>/shockwaveplayer<wbr>/</a></span></td> </tr> </table>
i'm trying get:
1. td's text "12.2.4.194" under specific th's text "stable release".
2. td's text "microsoft windows" under specific th's text "operating system".
i stuck below code:
document doc = jsoup.connect("url").get(); (element table : doc.select("table.infobox")) { string strname = table.getelementsbytag("caption").text(); if (strname.tolowercase().contains("shockwave player")) { elements trow = table.select("tr"); system.out.println(trow); } }
try css query:
table.infobox tr:has(a:containsown(stable release)) > td, table.infobox tr:has(a:containsown(microsoft windows)) > td
sample code:
public static string gettdtext(element table, string headertext) { element td = table.select("tr:has(a:containsown(" + headertext + ")) > td").first(); if (td==null) { throw new runtimeexception("unable find text " + headertext); } else { return td.owntext(); } }
discussion:
tr /* select tr elements ... */ :has( /* ... having ... */ /* ... anchor element ... */ :containsown(headertext) /* ... containing headertext ... */ ) > td /* select td elements direct children */
Comments
Post a Comment