How to get a `td` text under a specific `th` in a table with Jsoup? -


following extracted rows table;

<table class="infobox vevent" style="width:22em"> <caption class="summary">adobe shockwave player</caption> <tr>   <td colspan="2" style="text-align:center"><a href="/wiki/file:adobe_shockwave_player_logo.png" class="image"><img alt="adobe shockwave player logo.png" src="//upload.wikimedia.org/wikipedia/en/thumb/8/8e/adobe_shockwave_player_logo.png/64px-adobe_shockwave_player_logo.png" width="64" height="64" srcset="//upload.wikimedia.org/wikipedia/en/thumb/8/8e/adobe_shockwave_player_logo.png/96px-adobe_shockwave_player_logo.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/8/8e/adobe_shockwave_player_logo.png/128px-adobe_shockwave_player_logo.png 2x" data-file-width="165" data-file-height="165"></a></td>  </tr> <tr>   <th scope="row" style="white-space: nowrap;"><a href="/wiki/software_developer" title="software developer">original author(s)</a></th>   <td><a href="/wiki/macromedia" title="macromedia">macromedia</a></td>  </tr> <tr>   <th scope="row" style="white-space: nowrap;"><a href="/wiki/software_developer" title="software developer">developer(s)</a></th>   <td><a href="/wiki/adobe_systems" title="adobe systems">adobe systems</a></td>  </tr> <tr>   <th scope="row" style="white-space: nowrap;"><a href="/wiki/software_release_life_cycle" title="software release life cycle">stable release</a></th>   <td>12.2.4.194 / 19&nbsp;february 2016<span class="noprint">; 4 months ago</span><span style="display:none">&nbsp;(<span class="bday dtstart published updated">2016-02-19</span>)</span><sup id="cite_ref-1" class="reference"><a href="#cite_note-1">[1]</a></sup></td>  </tr> <tr>   <th scope="row" style="white-space: nowrap;"><a href="/wiki/operating_system" title="operating system">operating system</a></th>   <td><a href="/wiki/microsoft_windows" title="microsoft windows">microsoft windows</a>, <a href="/wiki/mac_os_9" title="mac os 9">mac os 9</a>, <a href="/wiki/mac_os_x" class="mw-redirect" title="mac os x">mac os x</a> (universal)</td>  </tr> <tr>   <th scope="row" style="white-space: nowrap;"><a href="/wiki/computing_platform" title="computing platform">platform</a></th>   <td><a href="/wiki/web_browsers" class="mw-redirect" title="web browsers">web browsers</a></td>  </tr> <tr>   <th scope="row" style="white-space: nowrap;"><a href="/wiki/list_of_software_categories" title="list of software categories">type</a></th>   <td>multimedia player / <a href="/wiki/mime" title="mime">mime</a> type: application/x-director</td>  </tr> <tr>   <th scope="row" style="white-space: nowrap;"><a href="/wiki/software_license" title="software license">license</a></th>   <td><a href="/wiki/proprietary_software" title="proprietary software">proprietary</a><sup id="cite_ref-2" class="reference"><a href="#cite_note-2">[2]</a></sup></td>  </tr> <tr>   <th scope="row" style="white-space: nowrap;">website</th>   <td><span class="url"><a rel="nofollow" class="external text" href="http://www.adobe.com/products/shockwaveplayer/">www<wbr>.adobe<wbr>.com<wbr>/products<wbr>/shockwaveplayer<wbr>/</a></span></td>  </tr> </table> 

i'm trying get:

1. td's text "12.2.4.194" under specific th's text "stable release".
2. td's text "microsoft windows" under specific th's text "operating system".

i stuck below code:

document doc = jsoup.connect("url").get(); (element table : doc.select("table.infobox")) {     string strname = table.getelementsbytag("caption").text();     if (strname.tolowercase().contains("shockwave player")) {         elements trow = table.select("tr");         system.out.println(trow);     } } 

try css query:

table.infobox tr:has(a:containsown(stable release))    > td, table.infobox tr:has(a:containsown(microsoft windows)) > td 

demo

sample code:

public static string gettdtext(element table, string headertext) {     element td = table.select("tr:has(a:containsown(" + headertext + ")) > td").first();      if (td==null) {         throw new runtimeexception("unable find text " + headertext);     } else {         return td.owntext();     } } 

discussion:

tr                           /* select tr elements ...                 */ :has(                        /* ... having ...                         */                            /* ... anchor element ...              */    :containsown(headertext)  /* ... containing headertext ...          */ ) > td                         /* select td elements direct children */ 

references:


Comments

Popular posts from this blog

sql - invalid in the select list because it is not contained in either an aggregate function -

Angularjs unit testing - ng-disabled not working when adding text to textarea -

How to start daemon on android by adb -