-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom Output Columns #7
Comments
Hi @amoslai5128, thank you for your nice words about my work and your question! So, I want to understand your issue first. You are trying to match the points 1-by-1 so that for every original point you get the final matched point in the output, without loosing the original columns, right? Technically, this is not possible, because a match, which is a concatenation of road network edges, can contain more or less points than the original track, for example, because curves contain more points and straight lines contain less points than the original track. So how would you output exactly one output point for each input point in such a situation? Here is an example of what I mean: As you can see, in the curve section in the middle of the track, the match contains much more points than the original track. So for one input point, multiple output points would exist. Furthermore, in the left part of the track on the straight line, the original track contained in one part more points than the match, but the prepared track contained less points. Let me elaborate on this: Currently, the output (the match) is given as a linestring, and because a 1-to-1 mapping is not feasible, the time and speed information is omitted. It would not make much sense any more in the match, as the points are completely different to the input points, the correlation would not be correct any more. Now for the first good part: The trajectory simplification can be completely disabled by applying With the trajectory simplification disabled (or by implementing a trajectory simplification algorithm that does not omit the time information), it would technically be possible to retain the time information throughout the process. However, the first issue with the example image above cannot be solved this way. Outputting the result as a list of points would still not correlate 1-to-1 with the input points and it would be totally questionable where to put which timestamp and speed information. It would be possible to interpolate the values, but this would also be an additional assumption (the match itself already is an assumption, we cannot know if it is correct to the ground truth). Nevertheless, there exists information about to which point of the match each input point was mapped to. This result can be obtained by applying Here is an example of the When you disable the trajectory simplification completely, you would in fact get for each input point of the original track exactly one output point of the match in the candidates.csv file, although with worse matching quality, as already explained. Moreover, as we have seen, these points are not necessarily contained in the match. But you get for each point a "mapping" to where that point was put in the match. However, keep in mind that the match (in green) generally has a different amount of points as the original track. With that result, you could theoretically join the results from the candidates.csv file with your input file and append the original timestamp and speed column. Since no points were removed during the matching, you can simply concat the tables (when the input table is sorted by id and timestamp (because the timestamp is used as sorting criteria when importing points for matching) and the candidates table is sorted by id, part_index, set_index, candidate_index). The result of this can look like this: The code that was used to combine the data is here: # !/usr/bin/env python3
# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd
import geopandas as gpd
import shapely as shp
input_file = "points.csv"
output_file = "merged.csv"
candidates_file = "candidates.csv"
input = pd.read_csv(input_file, parse_dates=["time"])
candidates = pd.read_csv(candidates_file)
results = candidates[candidates["is_result"] == 1].copy()
input.sort_values(by=["id", "time"], inplace=True)
results.sort_values(by=["id", "part_index", "set_index", "candidate_index"], inplace=True)
results.reset_index(drop=True, inplace=True)
assert len(input) == len(results)
projections = gpd.GeoDataFrame(results.drop(columns=["projection"]),
geometry=gpd.GeoSeries.from_wkt(results["projection"]))
projections["pointing_coordinate"] = projections['geometry'].apply(
lambda x: shp.geometry.Point(x.coords[1]) if (len(x.coords) > 1) else np.nan)
input["pointing_coordinate"] = projections["pointing_coordinate"]
input.to_csv(output_file, index=False) You get as output a file where each input point has the output point from the match. With trajectory simplification enabled, this becomes quite impossible, because the median-merge algorithm may introduce new points that were not in the original input track, but this only happens in small noisy situations. However, a 1-to-1 mapping then is practically impossible. You can try to re-map the prepared track points by distance to the original points, but this might not work well when the track goes through the same position multiple times. You can of course also omit these points and make a join with only the points that were also present in the original tracks (the x.coords[0] contains the starting point from the blue segments, which should lie close to the original point), but this needs additional care because of floating point inaccuracies between the original input point and the output point in the candidates.csv. Please use the new version v1.0.5 when you want to try out with disabled trajectory simplification. I noticed a bug while preparing the above example and fixed it in advance of this reply. So TL;DR: We cannot output the original timestamps and speed (or any other columns) from the original track input in the match output because the input and output are too different in amount and position of points. There are, however, ways with certain sacrifices in accuracy and speed, that allow to combine the input and output, as I have shown above. As this is a very specific and individual use-case, I don't really know currently how this could be implemented directly in this software in a way that it works generally due to all the explanations that I have given. But there are workarounds with individual code, as you see. The outputs of course can be later combined in individual tools. I hope you see it the same way as I do. If not, please let me know! |
Hi, thank you for precising your questions! I hope that I understand them better now and will try to response to them. First some more comments to your statements: Concerning your first picture, you get the information from the original geo point to the matched geo point from the Concerning "dropped original points", there are two ways currently how an original point is "dropped". The first way is via the multiple trajectory simplification algorithms (filter-duplicates, simplify-track, and median-merge). This actually drops points and the median-merge may introduce new ones, as explained in my article in section 2.1: https://doi.org/10.1111/tgis.13107. See for example this old image from this repository (shows a track from the Kubicka et. al dataset: So "dropping original points" is done in two ways, one explicit way in the trajectory simplification and one stochastic implicit way in the actual matching optimization algorithm backed by the candidate adoption feature. You can turn off every of these features individually, not only trajectory simplification, but also candidate adoption! Then no "dopping" occurs at all. However, deactivating both features reduces the matching quality (and depending on the options also the speed) drastically , as can also be seen in my article in the benchmark results in section 3.3: https://doi.org/10.1111/tgis.13107. So the One additional note: The points that you named "generated" in this software are in fact referenced from the road network edges. The "improved" points are computed points on these edges (might also reuse an existing point if they are spatially very close), they are contained in the projection column in the candidates.csv file, but they are often removed in the final match (the Concerning your questions: Q1: I think the only valid way would be the one that I explained in my previous post. The python code gives out as result the input file but with an added column of the green match points (pointing_coordinates). This should fulfil your needs as you have both the original red points, the new green match points, and the time and speed information all aligned and correlated in one file. It does work correctly only when you disable all trajectory simplification, as explained. To further elaborate: As soon as you use the trajectory simplification algorithms, the time and speed information is dropped in a way that it cannot be restored correctly anymore, as explaind in my previous post. I recommend that you follow my example from my previous post and combine the file that you can generate with the Q2: You can use the I hope that I could help you further! Your concepts are very interesting and I see all of your ideas working in practice, although in some cases sacrifices in speed and / or quality are needed. The general use-case of this software is to find a most accurate match for a given track, so use-cases like yours, which are very interesting, however, need more deep diving and some extra work. However, I had ongoing research in mind, which is the reason why all these special options (and more) exist, that I explained here. Please let me know if I can be of further help. I hope you understand that passing the time / speed information within this software directly to the result is currently out of scope of what I can do, because of the challenges with the trajectory simplification algorithms, that were designed with only spatial but currently not temporal map matching in mind. |
Thank you for giving us this amazing map matching tool, it works so cool.
However, I'm figuring out how to custom the headers & columns in a output result csv, since the current output headers are like:
id,aborted,duration,track,prepared,match,track_length,prepared_length,match_length
And, in some cases I need to match the geo points from GPS records with their timestamp in the result, therefore, here is what I want to output:
id, aborted, duration, ..., *point_timestamp*, *speed*
* custom column header
For your reference, here's the input dataset:
The text was updated successfully, but these errors were encountered: