python - Can't parse a second table with beautifulsoup even if the first one works? -


i trying parse tables using beautifulsoup. first 1 on page easy cannot parse similar table on same page. not understand why.

here code. in advance help.

import urllib2 bs4 import beautifulsoup   url = urllib2.urlopen("https://dl.dropboxusercontent.com/u/956261/poftext.html") contenthtml = url.read()  soup = beautifulsoup(contenthtml)  tableuserdetails = soup.find("table", {"class" : "user-details"})  = 0 tableuserdetailslist = [] row in tableuserdetails.findall('tr'):     col in row.findall('td'):         contenttd = col.contents[0].string.strip()          if contenttd:             print "td number %d : %s" % (i, contenttd)             tableuserdetailslist.append(contenttd)             += 1  # first table ok print tableuserdetailslist   # 1 tableuserdetails = soup.find("table", {"class" : "secondpart"})  = 0 tableuserdetailslist = [] row in tableuserdetails.findall('tr'):     col in row.findall('td'):         contenttd = col.contents[0].string.strip()          if contenttd:             print "td number %d : %s" % (i, contenttd)             tableuserdetailslist.append(contenttd)             += 1  print tableuserdetailslist  # list empty :( 

here simplified version of html code trying parse:

<!doctype html public "-//w3c//dtd xhtml 1.0 transitional//en" "http://www.w3.org/tr/xhtml1/dtd/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head>     <title>         french.kiss         sorties, sport, voyages, nouvelles expériences</title>   </head> <body style='background-color: #fff;' leftmargin='0' topmargin='0' marginwidth='0' marginheight='0' link='#1e55d6' vlink='#1e55d6'  text='#6551b0'>              <table class="user-details">                 <tr>                     <td class="headline txtblue size15" style="width:80px">                                             </td>                     <td style="width:10px">                         &nbsp;                     </td>                     <td class="txtgrey size15">                         fume occasionnellement silhouette mince                     </td>                     <td width="25px;">                         &nbsp;                     </td>                     <td class="headline txtblue size15">                         city                     </td>                     <td class="txtgrey size15">                         paris ile-de-france                     </td>                 </tr>                 <tr>                     <td class="headline txtblue size15">                         details                     </td>                     <td style="width:10px">                         &nbsp;                     </td>                     <td class="txtgrey size15">                         26 year old un homme, 185cm, sans religion                     </td>                     <td>                     </td>                     <td class="headline txtblue size15">                         ethnicity                     </td>                     <td class="txtgrey size15">                         caucasienne balance châtains                     </td>                 </tr>                 <tr>                     <td class="headline txtblue size15">                         intent                     </td>                     <td style="width:10px">                         &nbsp;                     </td>                     <td class="txtgrey size15">                         french.kiss cherche une relation amoureuse.                     </td>                     <td>                     </td>                     <td class="headline txtblue size15" style="width:90px">                         education                     </td>                     <td class="txtgrey size15">                         diplôme universitaire/licence                     </td>                 </tr>                  <tr>                     <td class="headline txtblue size15">                         personnalité                     </td>                     <td style="width:10px">                         &nbsp;                     </td>                     <td class="txtgrey size15">                      </td>   <td>                     </td>                 <td>                             <span class="headline txtblue size15">profession </span>                         </td>                         <td>                             <span class="txtgrey size15">                                 visioconférence</span>                         </td>                 </tr>              </table>                                       <table width="85%" class="secondpart">                     <tr height="25px">                         <td width="200px">                             <span class="headline txtblue size14">i seeking a</span>                         </td>                         <td width="300px">                             <span class="txtgrey size14">                                 une femme</span>                         </td>                         <td width="25px">                         </td>                         <td width="200px">                             <span class="headline txtblue size14">for</span>                         </td>                         <td width="200px">                             <span class="txtgrey size14">                                 sorties</span>                         </td>                     </tr>                     <tr height="25px">                         <td>                             <span class="headline txtblue size14"><a href='needs_test.aspx'>needs test</a></span>                         </td>                         <td>                             <span class="txtgrey size14"><a href='needs_test.aspx'>                                   <a href="needs_view.aspx?id=38028200">view                                                                         relationship needs</a></a></span>                         </td>                         <td>                         </td>                         <td>                             <span class="headline txtblue size14"><a href='poftest.aspx'>chemistry</a></span>                         </td>                         <td>                             <span class="txtgrey size14"><a href='poftest.aspx'>                                  <a href="personality.aspx?id=26&user_id=41724176" rel="nofollow">view                                                                         chemistry results</a></a></span>                         </td>                     </tr>                     <tr height="25px">                         <td>                             <span class="headline txtblue size14">do drink?</span>                         </td>                         <td>                             <span class="txtgrey size14">                                 occasionnellement</span>                         </td>                         <td>                         </td>                         <td>                             <span class="headline txtblue size14">do want children?</span>                         </td>                         <td>                             <span class="txtgrey size14">                                 non divulgué</span>                         </td>                     </tr>                     <tr height="25px">                         <td>                             <span class="headline txtblue size14">marital status</span>                         </td>                         <td>                             <span class="txtgrey size14">                                 célibataire</span>                         </td>                         <td>                         </td>                         <td>                             <span class="headline txtblue size14">do drugs?</span>                         </td>                         <td>                             <span class="txtgrey size14">                                 non</span>                         </td>                     </tr>                      <tr height="25px">                         <td>                             <span class="headline txtblue size14">pets </span>                         </td>                         <td>                             <span class="txtgrey size14">                                 aucun</span>                         </td>                         <td>                         </td>                         <td>                             <span class="headline txtblue size14">eye color</span>                         </td>                         <td>                             <span class="txtgrey size14">                                 bruns</span>                         </td>                     </tr>                      <tr height="25px">                         <td>                             <span class="headline txtblue size14">do have car? </span>                         </td>                         <td>                             <span class="txtgrey size14">                                 n/a</span>                         </td>                         <td>                         </td>                         <td>                             <span class="headline txtblue size14">do have children?</span>                         </td>                         <td>                             <span class="txtgrey size14">                                 non</span>                         </td>                     </tr>                     <tr height="25px">                         <td>                          <span class="headline txtblue size14">longest relationship</span>                         </td>                          <td>                             <span class="txtgrey size14">                                 plus de 2 ans</span>                         </td>                         <td>                         </td>                         <td>                          </td>                         <td>                          </td>                     </tr>                  </table>  </body> </html> 

tableuserdetails.content, tableuserdetails , tableuserdetailslist both tables:

* first table *

print tableuserdetails.content = none

print tableuserdetails =

  <table class="user-details">                 <tr>                     <td class="headline txtblue size15" style="width:80px">                                             </td>                     <td style="width:10px">                         &nbsp;                     </td>                     <td class="txtgrey size15">                         fume occasionnellement silhouette mince                     </td>                     <td width="25px;">                         &nbsp;                     </td>                     <td class="headline txtblue size15">                         city                     </td>                     <td class="txtgrey size15">                         paris ile-de-france                     </td>                 </tr>                 <tr>                     <td class="headline txtblue size15">                         details                     </td>                     <td style="width:10px">                         &nbsp;                     </td>                     <td class="txtgrey size15">                         26 year old un homme, 185cm, sans religion                     </td>                     <td>                     </td>                     <td class="headline txtblue size15">                         ethnicity                     </td>                     <td class="txtgrey size15">                         caucasienne balance châtains                     </td>                 </tr>                 <tr>                     <td class="headline txtblue size15">                         intent                     </td>                     <td style="width:10px">                         &nbsp;                     </td>                     <td class="txtgrey size15">                         french.kiss cherche une relation amoureuse.                     </td>                     <td>                     </td>                     <td class="headline txtblue size15" style="width:90px">                         education                     </td>                     <td class="txtgrey size15">                         diplôme universitaire/licence                     </td>                 </tr>                  <tr>                     <td class="headline txtblue size15">                         personnalité                     </td>                     <td style="width:10px">                         &nbsp;                     </td>                     <td class="txtgrey size15">                      </td>   <td>                     </td>                 <td>                             <span class="headline txtblue size15">profession </span>                         </td>                         <td>                             <span class="txtgrey size15">                                 visioconférence</span>                         </td>                 </tr>              </table>  

print tableuserdetailslist = [u'about', u'fume occasionnellement silhouette mince', u'city', u'paris ile-de-france', u'details', u'26 year old un homme, 185cm, sans religion', u'ethnic ity', u'caucasienne balance ch\xe2tains', u'intent', u'french.kiss cherche une relation amoureuse.', u'education', u'dipl\xf4me universitaire/licence', u'p ersonnalit\xe9']

* second table *

print tableuserdetails.content = none

print tableuserdetails =

 <table width="85%" class="secondpart">                     <tr height="25px">                         <td width="200px">                             <span class="headline txtblue size14">i seeking a</span>                         </td>                         <td width="300px">                             <span class="txtgrey size14">                                 une femme</span>                         </td>                         <td width="25px">                         </td>                         <td width="200px">                             <span class="headline txtblue size14">for</span>                         </td>                         <td width="200px">                             <span class="txtgrey size14">                                 sorties</span>                         </td>                     </tr>                     <tr height="25px">                         <td>                             <span class="headline txtblue size14"><a href='needs_test.aspx'>needs test</a></span>                         </td>                         <td>                             <span class="txtgrey size14"><a href='needs_test.aspx'>                                   <a href="needs_view.aspx?id=38028200">view                                                                         relationship needs</a></a></span>                         </td>                         <td>                         </td>                         <td>                             <span class="headline txtblue size14"><a href='poftest.aspx'>chemistry</a></span>                         </td>                         <td>                             <span class="txtgrey size14"><a href='poftest.aspx'>                                  <a href="personality.aspx?id=26&user_id=41724176" rel="nofollow">view                                                                         chemistry results</a></a></span>                         </td>                     </tr>                     <tr height="25px">                         <td>                             <span class="headline txtblue size14">do drink?</span>                         </td>                         <td>                             <span class="txtgrey size14">                                 occasionnellement</span>                         </td>                         <td>                         </td>                         <td>                             <span class="headline txtblue size14">do want children?</span>                         </td>                         <td>                             <span class="txtgrey size14">                                 non divulgué</span>                         </td>                     </tr>                     <tr height="25px">                         <td>                             <span class="headline txtblue size14">marital status</span>                         </td>                         <td>                             <span class="txtgrey size14">                                 célibataire</span>                         </td>                         <td>                         </td>                         <td>                             <span class="headline txtblue size14">do drugs?</span>                         </td>                         <td>                             <span class="txtgrey size14">                                 non</span>                         </td>                     </tr>                      <tr height="25px">                         <td>                             <span class="headline txtblue size14">pets </span>                         </td>                         <td>                             <span class="txtgrey size14">                                 aucun</span>                         </td>                         <td>                         </td>                         <td>                             <span class="headline txtblue size14">eye color</span>                         </td>                         <td>                             <span class="txtgrey size14">                                 bruns</span>                         </td>                     </tr>                      <tr height="25px">                         <td>                             <span class="headline txtblue size14">do have car? </span>                         </td>                         <td>                             <span class="txtgrey size14">                                 n/a</span>                         </td>                         <td>                         </td>                         <td>                             <span class="headline txtblue size14">do have children?</span>                         </td>                         <td>                             <span class="txtgrey size14">                                 non</span>                         </td>                     </tr>                     <tr height="25px">                         <td>                          <span class="headline txtblue size14">longest relationship</span>                         </td>                          <td>                             <span class="txtgrey size14">                                 plus de 2 ans</span>                         </td>                         <td>                         </td>                         <td>                          </td>                         <td>                          </td>                     </tr>                  </table>  

print tableuserdetailslist = []

this works:

tableuserdetailslist = [] row in tableuserdetails.findall('tr'):     col in row.findall('td'):         contents = list(col.stripped_strings)         if contents:             contenttd = contents[0]             print "td number %d : %s" % (i, contenttd)             tableuserdetailslist.append(contenttd)             += 1 

the problem second table contains spans. line break before span interpreted content , returned in col.contents list.

it works first table. anubhav commented, should consider iterating on tables , not having same code twice.


Comments

Popular posts from this blog

jquery - How can I dynamically add a browser tab? -

node.js - Getting the socket id,user id pair of a logged in user(s) -

keyboard - C++ GetAsyncKeyState alternative -