Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: implementation of non-contiguous group lasso penalty GroupL1nc #94

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

matteofrigo
Copy link

@matteofrigo matteofrigo commented Mar 29, 2021

This PR adds a new class to the copt.penalty module which defines a non-contiguous version of the existing GroupL1 penalty. The main differences between the two are:

  • Parsing of the input. GroupL1nc requires just non-overlapping groups.
  • Definition of B matrix adapted to non-contiguous setting (it's no longer block-diagonal).
  • Evaluation of _prox_gl adapted to non-contiguous setting.
  • Result of simple test adapted to new block matrix.

Minor style/PEP8 improvements have been applied to the modified files.

I've still got a couple of concerns on which I would like to have your opinion @fabianp .

  • The B matrix in prox_factory that I implemented in GroupL1nc is slightly different with respect to the one of GroupL1. The difference is in how the "forgotten" features are encoded in B. Here's a small example. Let n_features = 5 and groups = [(0, 1), (3, 4)], then the B matrix for the two methods is the following:
b_GroupL1 = [[1.0, 1.0, 0.0, 0.0, 0.0],
                       [0.0, 0.0, -1.0, 0.0, 0.0],
                       [0.0, 0.0, 0.0, 1.0, 1.0]]

b_GroupL1nc = [[1.0, 1.0, 0.0, 0.0, 0.0],
                           [0.0, 0.0, 0.0, 1.0, 1.0],
                           [0.0, 0.0, -1.0, 0.0, 0.0]]

I made a few tests and it seems to me that this does not change the result, but I don't have a proof for it. What happens is that we reorder the groups by putting all the "-1" singletons after the groups. Are you aware of any reason why it shouldn't work?

  • My second concern is about the group structure. Are we good with accepting only non-overlapping groups or do we want a more general code that accepts overlapping groups or even tree structures?

After solving these two points we can think of merging GroupL1 and GroupL1nc, as the latter trivially generalizes the former.

This PR addresses #20 .

This commit adds a new class to the copt.penalty module which defines
a non-contiguous version of the existing GroupL1 penalty. The main
differences between the two are:
- Parsing of the input. GroupL1nc requires just non-overlapping groups.
- Definition of block matrix adapted to non-contiguous setting.
- Evaluation of _prox_gl adapted to non-contiguous setting.
- Result of simple test adapted to new block matrix.

Minor style/PEP8 improvements have been applied to the modified files.
@coveralls
Copy link

coveralls commented Mar 29, 2021

Coverage Status

Coverage increased (+0.2%) to 88.959% when pulling be2c4e9 on matteofrigo:groupL1-noncontiguous into e5cb995 on openopt:master.

@fabianp
Copy link
Member

fabianp commented Mar 29, 2021

Thanks for the contribution! This already looks great.

I agree that it would be nice to have GroupL1 and GroupL1nc merged as a single object.

I agree with you that intuitively the order in which the forgotten groups appear should not matter for the penalty. This might change if we allow to have a different per-group regularization, but that's not the case yet.

This last part makes me wonder if we should weight the group regularization by the size of the group, as done for instance in Eq. (1) in https://statweb.stanford.edu/~tibs/ftp/sparse-grlasso.pdf . I think this makes a lot of sense when groups have very different sizes. Do you have an opinion on this @matteofrigo ? Of course this is somewhat orthogonal to your pull request, so feel free to ignore this point or postpone for a later contribution

@fabianp
Copy link
Member

fabianp commented Mar 29, 2021

As for the second concern, I would stop at non-overlapping groups, at least for the pull request. The code for more general objects would be significantly different, and which point I would split it in a different object.

This commit merges the GroupL1 classes for contiguous and non-contiguous
groups, adding the possibility to apply group-specific penalties.

The commit also adds the necessary tests and improves the existing ones.

Minor PEP8 issues are fixed.
@matteofrigo
Copy link
Author

I added the possibility to define group-specific penalties and some more general tests. Also, the two classes are merged back to a single one (GroupL1).

For the moment I added the possibility to use group-wise weights that are like the ones in the paper you mentioned (sqrt(len(g))), it's inverse, and custom weights passed by the user. By default it uses a weight equal to 1 for each group. Anything else that you think it should go there @fabianp ?

@matteofrigo
Copy link
Author

matteofrigo commented Apr 2, 2021

It seems like my changes to how the B matrix is built broke everything. I'll look into it. For the dense matrices case it works, anyway.

This commit adds the OGroupL1 class for the overlapping groups case.
The class is independent w.r.t. GroupL1 as the computation of the
proximal operator in the sparse case is quite delicate to deal with.

The corresponding tests have been adapted and extended where possible.
@matteofrigo
Copy link
Author

The problem was in a test where the group sparsity was used with overlapping groups. I added a separate class for that case to keep the two worlds separate as @fabianp suggested in #94 (comment) .

As for the second concern, I would stop at non-overlapping groups, at least for the pull request. The code for more general objects would be significantly different, and which point I would split it in a different object.

To summarise, here's what we did with this PR until now:

  • Separated overlapping- (OGroupL1) and non-overlapping-groups (GroupL1) cases;
  • Generalised GroupL1 to non-contiguous groups;
  • Improved tests for GroupL1;
  • Adapted tests for overlapping groups;
  • Fixed minor PEP8 issues.

Anything to add?

Copy link
Member

@fabianp fabianp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @matteofrigo ! I'm happy to merge this without the overlapping group lasso part.

I think that the overlapping GL part still requires some discussion that is better to have in a different PR. If you remove that part (leaving it for example for a separa PR) I'm happy to merge this one. Thanks!

copt/penalty.py Outdated
@@ -81,7 +195,6 @@ def __call__(self, x):
def prox(self, x, step_size):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the correct prox for the overlapping group lasso penalty. The prox of this penalty doesn't have a closed form expression but instead requires an iterative scheme (see for example https://papers.nips.cc/paper/2011/file/03c6b06952c750899bb03d998e631860-Paper.pdf or https://www.researchgate.net/profile/Guillaume-Obozinski/publication/221346424_Group_Lasso_with_overlap_and_graph_Lasso/links/0a85e53b2c72e3d517000000/Group-Lasso-with-overlap-and-graph-Lasso.pdf)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the OGroupL1 class. Thanks for the references!

The current implementation of the overlapping group lass is incorrect.
This commit removes it from the codebase.
The corresponding tests are also removed, with the exception of a test
for the VRTOS solver involving overlapping group lasso penalties, which
is now commented.

Minor PEP8 issues are solved.
@@ -188,37 +188,37 @@ def test_gl(groups):
assert np.linalg.norm(grad_map) < 1e-6


def test_vrtos_ogl():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why removing the test? This is testing an overlapping group lasso model by adding to proximal functions (for solvers that allow this)

@fabianp
Copy link
Member

fabianp commented Apr 7, 2021

Thanks Matteo. We're almost there, but there's a test that has been commented out while I think its important to keep it, as it's testing a valid implementation of the overlapping group lasso

@matteofrigo
Copy link
Author

Sorry @fabianp , I've been offline for some days.

I removed the test because I have misunderstood the suggestion to avoid the overlapping groups case.

Speaking about that test, I'm not able to make it pass. I haven't been able to track the bug to its root, but it seems to me that the new definition of the block matrix (in the penalty) changes the definition of the support matrix in a way that makes it wrong. Notice that the only actual difference with the old implementation of the penalty is the order in which we treat the groups. Do you have any low-level way to debug and test the prox_factory function? I haven't found anything except from testing the correctness of the block matrix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants