java - Create a HashMap with a fixed Key corresponding to a HashSet. point of departure -


my aim create hashmap string key, , entry values hashset of strings.


output

this output looks now:

hudson+(surname)=[q2720681], hudson,+quebec=[q141445], hudson+(given+name)=[q5928530], hudson,+colorado=[q2272323], hudson,+illinois=[q2672022], hudson,+indiana=[q2710584], hudson,+ontario=[q5928505], hudson,+buenos+aires+province=[q10298710], hudson,+florida=[q768903]] 

according idea, should this:

[hudson+(surname)=[q2720681,q141445,q5928530,q2272323,q2672022]] 

the purpose store particular name in wikidata , of q values associated it's disambiguation, example:

this page "bush".

i want bush key, , of different points of departure, of different ways bush associated terminal page of wikidata, want store corresponding "q value", or unique alpha-numeric identifier.

what i'm doing trying scrape different names, values, wikipedia disambiguation , unique alpha-numeric identifier associated value in wikidata.

for example, bush have:

george h. w. bush  george w. bush jeb bush bush family bush (surname)  

accordingly q values are:

george h. w. bush (q23505)

george w. bush (q207)

jeb bush (q221997)

bush family (q2743830)

bush (q1484464)

my idea data structure should construed in following way

key:bush entry set: q23505, q207, q221997, q2743830, q1484464

but code have doesn't that.

it creates seperate entry each name , q value. i.e.

key:jeb bush entry set: q221997

key:george w. bush entry set: q207

and on.

the full code in it's glory can seen on my github page, i'll summarize below also.

this i'm using add values data strucuture:

// add q values arraylist in hash map @ index of appropriate entity public static hashset<string> put_to_hash(string key, string value)  {     if (!q_valmap.containskey(key))      {         return q_valmap.put(key, new hashset<string>() );     }     hashset<string> list = q_valmap.get(key);     list.add(value);     return q_valmap.put(key, list); } 

this how fetch content:

    while ((line_by_line = wiki_data_pagecontent.readline()) != null)      {         // if can determine it's disambig page need send off          // possible senses in can used.         pattern disambig_pattern = pattern.compile("<div class=\"wikibase-entitytermsview-heading-description \">wikipedia disambiguation page</div>");         matcher disambig_indicator = disambig_pattern.matcher(line_by_line);         if (disambig_indicator.matches())          {             //off different usages             wikipedia_disambig_fetcher.all_possibilities( variable_entity );         }         else         {             //get q value off page matching             pattern q_page_pattern = pattern.compile("<!-- wikibase-toolbar --><span class=\"wikibase-toolbar-container\"><span class=\"wikibase-toolbar-item " +                     "wikibase-toolbar \">\\[<span class=\"wikibase-toolbar-item wikibase-toolbar-button wikibase-toolbar-button-edit\"><a " +                     "href=\"/wiki/special:setsitelink/(.*?)\">edit</a></span>\\]</span></span>");              matcher match_q_component = q_page_pattern.matcher(line_by_line);             if ( match_q_component.matches() )              {                 string q = match_q_component.group(1);                  // 'q' should appended array, since each entity can hold multiple                 // q values on basis of disambig                 put_to_hash( variable_entity, q );             }         }      } 

and how deal disambiguation page:

public static void all_possibilities( string variable_entity ) throws exception {     system.out.println("this disambig page");     //if it's disambig page know can go right wikipedia       //get it's normal wiki disambig page     document docx = jsoup.connect( "https://en.wikipedia.org/wiki/" + variable_entity ).get();            //this can handle less structured ones.          elements linx = docx.select( "p:contains(" + variable_entity + ") ~ ul a:eq(0)" );          (element linq : linx)          {             system.out.println(linq.text());             string linq_nospace = linq.text().replace(' ', '+');             wikidata_q_reader.getq( linq_nospace );          }  } 

i thinking maybe pass key value around, don't know. i'm kind of stuck. maybe can see how can implement functionality.

i'm not clear question isn't working, or if you're seeing actual errors. but, while basic data structure idea (hashmap of string set<string>) sound, there's bug in "add" function.

public static hashset<string> put_to_hash(string key, string value)  {     if (!q_valmap.containskey(key))      {         return q_valmap.put(key, new hashset<string>() );     }     hashset<string> list = q_valmap.get(key);     list.add(value);     return q_valmap.put(key, list); } 

in case key seen first time (if (!q_valmap.containskey(key))), vivifies new hashset key, doesn't add value before returning. (and returned value old value key, it'll null.) you're going losing 1 of q-values every term.

for multi-layered data structures this, special-case vivification of intermediate structure, , adding , return in single code path. think fix it. (i'm going call valset because it's set , not list. , there's no need re-add set map each time; it's reference type , gets added first time encounter key.)

public static hashset<string> put_to_hash(string key, string value)  {     if (!q_valmap.containskey(key)) {         q_valmap.put(key, new hashset<string>());     }      hashset<string> valset = q_valmap.get(key);     valset.add(value);     return valset; } 

also aware set return reference live set key, need careful modifying in callers, , if you're doing multithreading you're going have concurrent access issues.

or use guava multimap don't have worry writing implementation yourself.


Popular posts from this blog

c# - ODP.NET Oracle.ManagedDataAccess causes ORA-12537 network session end of file -

matlab - Compression and Decompression of ECG Signal using HUFFMAN ALGORITHM -

utf 8 - split utf-8 string into bytes in python -