|Thesis abstract: |
Top-K join also known as rank join, computes 'K' join results with the highest aggregate score from the data sources. The top-K join manages to report these 'K' join by accessing a subset of data from the data sources, provided the data source provide the data in sorted order. This thesis address the issue of accessing top-K join results from the Web based data sources, with main focus on Web services as data sources. The main issue with such Web based data sources is the time utilized in the data acquisition. Therefore, in this work we have proposed the efficient usage of parallel data access to reduce the total time to compute desired number of top join results, while minimizing the number of possible extra data fetches. We also address as to how we can report already computed join results to the user in an efficient manner while using probabilistic measures, and we discover providing the results with probabilistic guarantees is, on average, much faster than the deterministic reporting with a little compromise on the quality of results.