OpenGL 2.x support #1

aloznat · 2013-06-08T03:46:17Z

This might be a long shot - more a question than a bug - but how much work would it take to be able to advertise GL 2.x support? I have no idea how many of the extensions missing from OpenGL ES 2 are needed for GL 2, let alone what it'll take to implement them, but from what little I've read - specifically about the Intel GMA950 drivers going from only supporting GL 1.5 to 2.1 - coding support for occlusion queries might make this possible.

Just a hunch from a quick review of the code. I wish I understood half of it...

robclark · 2013-06-08T12:14:52Z

adreno a2xx lacks integer support for shaders, so it would never be able to support enough for gl2 glsl.. a3xx does support integers so that should be possible, although I'm not entirely sure about occlusion query.. that does seem to be supported in gles3 and the hw is gles3 capable but I don't have gles3 drivers yet to see how that works. I'm not sure if there might be something else missing for full gl2 support.

aloznat · 2013-06-08T12:43:35Z

I'm going to assume there's no point doing a float<->int conversion hack (if such a thing is possible) for a2xx since that type of thing is usually very expensive... As for the other features, I guess I'll find out when I tell the driver to lie about its capabilities :P

robclark · 2013-06-08T13:20:07Z

the problem is no bitwise operations, so float<->int conversion isn't sufficient.

You can lie about the caps, it will work up until the point where a shader triggers use of some TGSI instructions that can't be supported. But I ran into a problem w/ some intrinsics compiling into things I can't support without int if I advertise a higher GLSL version.

aloznat · 2013-06-19T01:43:28Z

Well I managed to get glxinfo to report OpenGL 2.1 by adding stub occlusion query support (pretty much copying from the i915 drivers). It should surprise no one that glxgears rendered crescents instead of gears at under 15FPS.
I think I satisfied my curiosity :P

freedreno-zz · 2013-06-19T02:01:05Z

hmm, at least for GL1.4, glxgears needed TGSI_OPCODE_LIT, which I hadn't implemented yet (although es2gears which uses glsl shaders works ok). 15fps seems like it might be software fallback? For es2gears, on HP touchpad, iirc it was around ~100+ fps, nexus4 (a320) is somewhat faster at ~400fps. And I'm sure in both cases there is lots of room for optimization.

aloznat · 2013-06-19T02:16:00Z

I'm pretty sure it's running freedreno and not swrast (OpenGL vendor: freedreno, OpenGL render string: Gallium 0.4 on FD001)... Either way, it seems to be doing most of the work on the CPU, so there's definitely a software fallback happening somewhere.

freedreno-zz · 2013-06-20T00:51:20Z

fwiw, here are the configure flags that I use:

./autogen.sh --prefix=/usr --with-dri-drivers= --with-gallium-drivers=freedreno,swrast --with-egl-platforms=x11 --enable-gles2 --enable-gles1 --enable-debug --enable-gallium-egl --disable-gallium-llvm --enable-xa

aloznat · 2013-06-20T01:32:39Z

My only significant deviation from those flags is I don't think I enabled xa. It probably won't make much of a difference. Real occlusion queries would be nice, but I figured there's no harm in sharing dumb little hacks like this. I sorted out the segfault, and I thought I'd mention es2gears runs at 60fps in near-fullscreen on my TouchPad. I'm crossing my fingers for the rumored Snapdragon Nexus 7...

aloznat · 2013-06-20T01:43:16Z

Yeah, unless you want my stub occlusion query hack (unlikely since you could probably implement it for real in less time than it took me), and/or have another use for my limited skills, you may close this ticket, being that it was more of a feature request than a bug report,

freedreno-zz · 2013-06-20T02:03:03Z

fwiw, the a3xx should support occlusion query, and I suspect the a2xx does as well. But, as far as I can tell the gles2 blob drivers do not implement any occlusion related extension, and I think we only get gles3 on a3xx. So at least for the newer devices, I think we should eventually be able to do this in hw. But I don't have gles3 blob drivers on my nexus4, and I cannot download the latest drivers from qcom without a EULA, so I haven't been able to look into how this works yet.

Anyways, if you have a patch, feel free to post it, or even just send to mesa-devel (whichever you prefer) and I will review it.

The previous code for sRGB overrides assumes that the source and destination formats are equal, other than the color space. This won't be feasible when we add support for format conversions. Here are a few cases, and how the old code handled them: 1. RGB8 -> SRGB8, MSAA ==> SRGB8 -> SRGB8 2. RGB8 -> SRGB8, single ==> RGB8 -> RGB8 3. SRGB8 -> RGB8, MSAA ==> RGB8 -> RGB8 4. SRGB8 -> RGB8, single ==> SRGB8 -> SRGB8 Apparently, preserving the behavior of #1 is important. When doing a multisample to single-sample resolve, blending the samples together in an sRGB correct fashion results in a noticably higher quality image. It also is necessary to pass Piglit's EXT_framebuffer_multisample accuracy color tests. Paul, Eric, Anuj, and I talked about this, and aren't sure that it matters in the other cases. This patch preserves the behavior of #1, but otherwise reverts to doing everything in linear space, changing the behavior of case #4. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Daniel Vetter <[email protected]>

With the recent SRGB changes all my automated OpenGL llvmpipe tests (piglit, conform, glretrace) start asserting with the backtrace below. I'm hoping this change will fix it. I'm not entirely sure, as this doesn't happen in my development machine (the bug probably depends on the exact X visual). Anyway, it seems the sensible thing to do here. Program terminated with signal 5, Trace/breakpoint trap. #0 _debug_assert_fail (expr=expr@entry=0x7fa324df2ed7 "0", file=file@entry=0x7fa324e3fc30 "src/mesa/state_tracker/st_format.c", line=line@entry=758, function=function@entry=0x7fa324e40160 <__func__.34798> "st_pipe_format_to_mesa_format") at src/gallium/auxiliary/util/u_debug.c:281 #0 _debug_assert_fail (expr=expr@entry=0x7fa324df2ed7 "0", file=file@entry=0x7fa324e3fc30 "src/mesa/state_tracker/st_format.c", line=line@entry=758, function=function@entry=0x7fa324e40160 <__func__.34798> "st_pipe_format_to_mesa_format") at src/gallium/auxiliary/util/u_debug.c:281 No locals. #1 0x00007fa3241d22b3 in st_pipe_format_to_mesa_format (format=format@entry=PIPE_FORMAT_R8G8B8A8_SRGB) at src/mesa/state_tracker/st_format.c:758 __func__ = "st_pipe_format_to_mesa_format" #2 0x00007fa3241c8ec5 in st_new_renderbuffer_fb (format=format@entry=PIPE_FORMAT_R8G8B8A8_SRGB, samples=0, sw=<optimised out>) at src/mesa/state_tracker/st_cb_fbo.c:295 strb = 0x19e8420 #3 0x00007fa32409d355 in st_framebuffer_add_renderbuffer (stfb=stfb@entry=0x19e7fa0, idx=<optimised out>) at src/mesa/state_tracker/st_manager.c:314 rb = <optimised out> format = PIPE_FORMAT_R8G8B8A8_SRGB sw = <optimised out> #4 0x00007fa32409e635 in st_framebuffer_create (st=0x19e7fa0, st=0x19e7fa0, stfbi=0x19e7a30) at src/mesa/state_tracker/st_manager.c:458 stfb = 0x19e7fa0 mode = {rgbMode = 1 '\001', floatMode = 0 '\000', colorIndexMode = 0 '\000', doubleBufferMode = 0, stereoMode = 0, haveAccumBuffer = 0 '\000', haveDepthBuffer = 1 '\001', haveStencilBuffer = 1 '\001', redBits = 8, greenBits = 8, blueBits = 8, alphaBits = 8, redMask = 0, greenMask = 0, blueMask = 0, alphaMask = 0, rgbBits = 32, indexBits = 0, accumRedBits = 0, accumGreenBits = 0, accumBlueBits = 0, accumAlphaBits = 0, depthBits = 24, stencilBits = 8, numAuxBuffers = 0, level = 0, visualRating = 0, transparentPixel = 0, transparentRed = 0, transparentGreen = 0, transparentBlue = 0, transparentAlpha = 0, transparentIndex = 0, sampleBuffers = 0, samples = 0, maxPbufferWidth = 0, maxPbufferHeight = 0, maxPbufferPixels = 0, optimalPbufferWidth = 0, optimalPbufferHeight = 0, swapMethod = 0, bindToTextureRgb = 0, bindToTextureRgba = 0, bindToMipmapTexture = 0, bindToTextureTargets = 0, yInverted = 0, sRGBCapable = 1} idx = <optimised out> #5 st_framebuffer_reuse_or_create (st=st@entry=0x19dfce0, fb=<optimised out>, stfbi=stfbi@entry=0x19e7a30) at src/mesa/state_tracker/st_manager.c:728 No locals. #6 0x00007fa32409e8cc in st_api_make_current (stapi=<optimised out>, stctxi=0x19dfce0, stdrawi=0x19e7a30, streadi=0x19e7a30) at src/mesa/state_tracker/st_manager.c:747 st = 0x19dfce0 stdraw = 0x640064 stread = 0x1300000006 ret = <optimised out> #7 0x00007fa324074a20 in XMesaMakeCurrent2 (c=c@entry=0x195bb00, drawBuffer=0x19e7e90, readBuffer=0x19e7e90) at src/gallium/state_trackers/glx/xlib/xm_api.c:1194 No locals. #8 0x00007fa3240783c8 in glXMakeContextCurrent (dpy=0x194e900, draw=8388610, read=8388610, ctx=0x195bac0) at src/gallium/state_trackers/glx/xlib/glx_api.c:1177 drawBuffer = <optimised out> readBuffer = <optimised out> xmctx = 0x195bb00 glxCtx = 0x195bac0 firsttime = 0 '\000' no_rast = 0 '\000' #9 0x00007fa32407852f in glXMakeCurrent (dpy=<optimised out>, drawable=<optimised out>, ctx=<optimised out>) at src/gallium/state_trackers/glx/xlib/glx_api.c:1211 No locals. Acked-by: Brian Paul <[email protected]>

Fixes: Program received signal SIGSEGV, Segmentation fault. bind_samplers (comp=0x21b054, comp=0x21b054, ctx=0x211430) at ../../../../../src/gallium/state_trackers/xa/xa_composite.c:445 445 mask_pic->srf->tex->format); (gdb) bt #0 bind_samplers (comp=0x21b054, comp=0x21b054, ctx=0x211430) at ../../../../../src/gallium/state_trackers/xa/xa_composite.c:445 #1 xa_composite_prepare (ctx=0x211430, comp=comp@entry=0x21b054) at ../../../../../src/gallium/state_trackers/xa/xa_composite.c:488 #2 0xb6f454b4 in XAPrepareComposite (op=<optimized out>, pSrcPicture=<optimized out>, pMaskPicture=<optimized out>, pDstPicture=<optimized out>, pSrc=0x5b3ad8, pMask=0x0, pDst=0x5923b8) at msm-exa-xa.c:533 We can't yet handle solid fill mask, so explicitly reject that, rather than segfaulting. Otherwise DDX would need to check XA version to see if solid fill mask were supported. Signed-off-by: Rob Clark <[email protected]>

So when checking/building sse code we have three possibilities: 1 Old compiler, throws an error when using -msse* 2 New compiler, user disables sse* (-mno-sse*) 3 New compiler, user doesn't disable sse The original code, added code for #1 but not #2. Later on we patched around the lack of handling #2 by wrapping the code in __SSE4_1__. Yet it lead to a missing/undefined symbol in case of #1 or #2, which might cause an issue for #2 when using the i965 driver. A bit later we "fixed" the undefined symbol by using #1, rather than updating it to handle #2. With this commit we set things straight :) To top it all up, conventions state that in case of conflicting (-enable-foo -disable-foo) options, the latter one takes precedence. Thus we need to make sure to prepend -msse4.1 to CFLAGS in our test. v2: Clean the #includes. Suggested by Ilia, Matt & Siavash. Cc: "10.3 10.4" <[email protected]> Tested-by: David Heidelberg <[email protected]> Tested-by: Siavash Eliasi <[email protected]> Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Emil Velikov <[email protected]>

This is a partial revert of c893069. It split the {start,base}_vertex_location handling into several steps: 1. Set brw->draw.start_vertex_location = prim[i].start and brw->draw.base_vertex_location = prim[i].basevertex. (This happened once per _mesa_prim, in the main drawing loop.) 2. Add brw->vb.start_vertex_bias and brw->ib.start_vertex_offset appropriately. (This happened in brw_prepare_shader_draw_parameters, which was called just after brw_prepare_vertices, as part of state upload, and only happened when BRW_NEW_VERTICES was flagged.) 3. Use those values when emitting 3DPRIMITIVE (once per _mesa_prim). If we drew multiple _mesa_prims, but didn't flag BRW_NEW_VERTICES on the second (or later) primitives, we would do step #1, but not #2. The first _mesa_prim would get correct values, but subsequent ones would only get the first half of the summation. The reason I originally did this was because I needed the value of gl_BaseVertexARB to exist in a buffer object prior to uploading 3DSTATE_VERTEX_BUFFERS. I believed I wanted to upload the value of 3DPRIMITIVE's "Base Vertex Location" field, which was computed as: (prims[i].indexed ? prims[i].start : prims[i].basevertex) + brw->vb.start_vertex_bias. The latter value wasn't available until after brw_prepare_vertices, and the former weren't available in the state upload code at all. Hence the awkward split. However, I believe that including brw->vb.start_vertex_bias was a mistake. It's an extra bias we apply when uploading vertex data into VBOs, to move [min_index, max_index] to [0, max_index - min_index]. >From the GL_ARB_shader_draw_parameters specification: "<gl_BaseVertexARB> holds the integer value passed to the <baseVertex> parameter to the command that resulted in the current shader invocation. In the case where the command has no <baseVertex> parameter, the value of <gl_BaseVertexARB> is zero." I conclude that gl_BaseVertexARB should only include the baseVertex parameter from glDraw*Elements*, not any internal biases we add for optimization purposes. With that in mind, gl_BaseVertexARB only needs prim[i].start or prim[i].basevertex. We can simply store that, and go back to computing start_vertex_location and base_vertex_location in brw_emit_prim(), like we used to. This is much simpler, and should actually fix two bugs. Fixes missing geometry in Unvanquished. Cc: "10.4 10.3" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85529 Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]>

This implements a workaround (exact excerpt as a comment in the code). The docs specify [clearly, after you struggle for a while] that the offset isn't relative to state base. This actually makes sense. This fixes hangs on SKL. Buffer #0 is meant to be used for normal uniforms. Buffer #1 is typically used for gather constants when using RS. Buffer #1-#3 could be used to push a bunch of UBO data which would just be somewhere in memory, and not relative to the dynamic state. NOTE: I've moved away from the ternary operator for the new gen9 conditions. Admittedly it's probably not great to do this, but I really want to fix this all up in the subsequent patch and doing it here makes that diff a lot nicer. I want to split out the gen8/9 code to make the function a bit more readable, but to keep this easily cherry-pickable I am doing this fix first. If we decide not to merge the cleanup patch then I can revisit this. Cc: "10.5 10.6" <[email protected]> Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Tested-by: Valtteri Rantala <[email protected]>

Equivalent to commit 8ac3b52 but with sel operations. In this case we select the PredCtrl based on the writemask. This patch helps on cases like this: 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F 2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D 3: (+f0.0) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD In this case, cmod propagation can't optimize instruction #2, because instructions #1 and #2 have different writemasks, and we can't update directly instruction #2 writemask because our code thinks that sel at instruction #3 reads all four channels of the flag, when it actually only reads .x. So, with this patch, the previous case becames this: 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F 2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D 3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD Now only the x channel of the flag is used, allowing dead code eliminate to update the writemask at the second instruction: 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F 2: cmp.nz.f0.0 null.x:D, vgrf40.xxxx:D, 0D 3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD So now cmod propagation can simplify out #2: 1: cmp.l.f0.0 vgrf40.0.x:F, attr18.wwww:F, vgrf7.xxxx:F 2: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD Shader-db numbers: total instructions in shared programs: 6235835 -> 6228008 (-0.13%) instructions in affected programs: 219850 -> 212023 (-3.56%) total loops in shared programs: 1979 -> 1979 (0.00%) helped: 1192 HURT: 0

Not 100% sure, but I think being an unsigned literal will help: CID 1358505 (#1 of 1): Unintended sign extension (SIGN_EXTENSION)sign_extension: Suspicious implicit sign extension: load1->def.num_components with type unsigned char (8 bits, unsigned) is promoted in load1->def.num_components * (load1->def.bit_size / 8) to type int (32 bits, signed), then sign-extended to type unsigned long (64 bits, unsigned). If load1->def.num_components * (load1->def.bit_size / 8) is greater than 0x7FFFFFFF, the upper bits of the result will all be 1. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Matt Turner <[email protected]>

At this point, it would require a logic error in nir_validate to not have already populated this hashtable entry, but coverity doesn't realize that: CID 1265547 (#1 of 1): Dereference null return value (NULL_RETURNS)3. dereference: Dereferencing a null pointer entry. CID 1271039 (#1 of 1): Dereference null return value (NULL_RETURNS)3. dereference: Dereferencing a null pointer entry. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Matt Turner <[email protected]>

For example, at nir_lower_locals_to_regs.c:365: CID 1358911 (#1 of 1): Explicit null dereferenced (FORWARD_NULL)7. var_deref_op: Dereferencing null pointer deref_ptr. I'm pretty sure that coverity doesn't understand the relationship between arr->data and arr->size. I think throwing in a !null check should do the trick.