java - Create a HashMap with a fixed Key corresponding to a HashSet. point of departure -
my aim create hashmap string key, , entry values hashset of strings.
output
this output looks now:
hudson+(surname)=[q2720681], hudson,+quebec=[q141445], hudson+(given+name)=[q5928530], hudson,+colorado=[q2272323], hudson,+illinois=[q2672022], hudson,+indiana=[q2710584], hudson,+ontario=[q5928505], hudson,+buenos+aires+province=[q10298710], hudson,+florida=[q768903]]
according idea, should this:
[hudson+(surname)=[q2720681,q141445,q5928530,q2272323,q2672022]]
the purpose store particular name in wikidata , of q values associated it's disambiguation, example:
this page "bush".
i want bush key, , of different points of departure, of different ways bush
associated terminal page of wikidata, want store corresponding "q value", or unique alpha-numeric identifier.
what i'm doing trying scrape different names, values, wikipedia disambiguation , unique alpha-numeric identifier associated value in wikidata.
for example, bush
have:
george h. w. bush george w. bush jeb bush bush family bush (surname)
accordingly q values are:
george h. w. bush (q23505)
george w. bush (q207)
jeb bush (q221997)
bush family (q2743830)
bush (q1484464)
my idea data structure should construed in following way
key:bush
entry set: q23505, q207, q221997, q2743830, q1484464
but code have doesn't that.
it creates seperate entry each name , q value. i.e.
key:jeb bush
entry set: q221997
key:george w. bush
entry set: q207
and on.
the full code in it's glory can seen on my github page, i'll summarize below also.
this i'm using add values data strucuture:
// add q values arraylist in hash map @ index of appropriate entity public static hashset<string> put_to_hash(string key, string value) { if (!q_valmap.containskey(key)) { return q_valmap.put(key, new hashset<string>() ); } hashset<string> list = q_valmap.get(key); list.add(value); return q_valmap.put(key, list); }
this how fetch content:
while ((line_by_line = wiki_data_pagecontent.readline()) != null) { // if can determine it's disambig page need send off // possible senses in can used. pattern disambig_pattern = pattern.compile("<div class=\"wikibase-entitytermsview-heading-description \">wikipedia disambiguation page</div>"); matcher disambig_indicator = disambig_pattern.matcher(line_by_line); if (disambig_indicator.matches()) { //off different usages wikipedia_disambig_fetcher.all_possibilities( variable_entity ); } else { //get q value off page matching pattern q_page_pattern = pattern.compile("<!-- wikibase-toolbar --><span class=\"wikibase-toolbar-container\"><span class=\"wikibase-toolbar-item " + "wikibase-toolbar \">\\[<span class=\"wikibase-toolbar-item wikibase-toolbar-button wikibase-toolbar-button-edit\"><a " + "href=\"/wiki/special:setsitelink/(.*?)\">edit</a></span>\\]</span></span>"); matcher match_q_component = q_page_pattern.matcher(line_by_line); if ( match_q_component.matches() ) { string q = match_q_component.group(1); // 'q' should appended array, since each entity can hold multiple // q values on basis of disambig put_to_hash( variable_entity, q ); } } }
and how deal disambiguation page:
public static void all_possibilities( string variable_entity ) throws exception { system.out.println("this disambig page"); //if it's disambig page know can go right wikipedia //get it's normal wiki disambig page document docx = jsoup.connect( "https://en.wikipedia.org/wiki/" + variable_entity ).get(); //this can handle less structured ones. elements linx = docx.select( "p:contains(" + variable_entity + ") ~ ul a:eq(0)" ); (element linq : linx) { system.out.println(linq.text()); string linq_nospace = linq.text().replace(' ', '+'); wikidata_q_reader.getq( linq_nospace ); } }
i thinking maybe pass key
value around, don't know. i'm kind of stuck. maybe can see how can implement functionality.
i'm not clear question isn't working, or if you're seeing actual errors. but, while basic data structure idea (hashmap
of string
set<string>
) sound, there's bug in "add" function.
public static hashset<string> put_to_hash(string key, string value) { if (!q_valmap.containskey(key)) { return q_valmap.put(key, new hashset<string>() ); } hashset<string> list = q_valmap.get(key); list.add(value); return q_valmap.put(key, list); }
in case key seen first time (if (!q_valmap.containskey(key))
), vivifies new hashset
key, doesn't add value
before returning. (and returned value old value key, it'll null.) you're going losing 1 of q-values every term.
for multi-layered data structures this, special-case vivification of intermediate structure, , adding , return in single code path. think fix it. (i'm going call valset
because it's set , not list. , there's no need re-add set map each time; it's reference type , gets added first time encounter key.)
public static hashset<string> put_to_hash(string key, string value) { if (!q_valmap.containskey(key)) { q_valmap.put(key, new hashset<string>()); } hashset<string> valset = q_valmap.get(key); valset.add(value); return valset; }
also aware set
return reference live set
key, need careful modifying in callers, , if you're doing multithreading you're going have concurrent access issues.
or use guava multimap
don't have worry writing implementation yourself.