Bugs and Debugging

Bugs and debugging #

A list of bugs encountered in the wild. Not as much fun as those , though.

๐Ÿž = bug cause

Logical Errors/Wrong numbers/Garbage data #

  • Integer Division without casting to float
  • Used wrong units (cm instead of mm) for a physically measured value
  • Pushing elements onto end of std::vector accidentally destroyed temporal ordering when iterating over it
  • OpenCV camera calibration: copy-pasted code, and forgot to update resolution for one of the cameras
  • Imported a python module repeatedly, where each import caused a file open, which caused a file descriptor starvation issue
  • Python program was blocking when it shouldn’t — ๐Ÿž Mistook threading.Thread.start() with threading.Thread.run()
  • In C++ code, used << instead of <<= in an assignment, which caused a mask to be all 0
  • Forgot that assert() is disabled in CMake non-debug configurations (ReleaseWithDebugSymbols).
  • Did not read documentation carefully and made wrong assumptions about API
  • Mistook two similar variables for each other

Shaders #

  • Wrote vec4 frag_color and thus created a local variable instead of setting the global frag_color
  • Wrong loop termination conditions led to a function ending without returning a value even though it should have returned a vec4, so the variable which should have received the return value contained garbage
  • Casted an uniform to GLint instead its correct type GLfloat

Segfaults/Memory/Crashes #

  • Returning stack objects by reference
  • Forgetting to resize std::vector led to memory access error
  • A C++ module used by linked module was dependent on static initialization, which was not called from the first module

General Internet/Servers/Bash #

  • LocomotiveCMS: Nginx max. allowed body size was too small, so form submits with large images failed
  • LocomotiveCMS: could not set date-time fields in :de locale — ๐Ÿž translations for date format were missing time component
  • Logrotate had set wrong user/group for files
  • In a backup script, used sudo without -H option, which caused the script to use the wrong $HOME directory
  • In a certain terminal emulator, some elements of $PATH were missing in comparision to other terminal emulators, because some were running login shells and some not
  • Invitation email addresses were not normalized (downcased), which lead to instances of account registrations not being associated with an invitation
  • Ipv6 did not work for a mail server but ipv4 did, so connections took a long time from the Digitalocean network (until the client decided to try ipv4). Connections from local networks worked very fast, because the local network did not have ipv6. DO support: “We currently block all IPv6 SMTP traffic. We suggest using an IPv4 Droplet for hosting a mail server. Alternatively, you can edit the /etc/gai.conf file to prioritize IPv4”.
  • Webservice running inside Docker on DO droplet not reachable sometimes. Reachable via IP, but not via DNS — ๐Ÿž DNS returned an IPv6 record and the server was not configured for ipv6.

Cloud Infrastructure #

  • Bitbucket Pipelines build teardown step doesn’t record the duration it takes to compress and upload the docker cache — ๐Ÿž Bug https://jira.atlassian.com/browse/BCLOUD-19821?error=login_required&error_description=Login+required&state=618f8a40-937e-4669-905e-42ee48e24fad
  • Auto scaling group provisioning on AWS configured using terraform did not work — ๐Ÿž The AMI was not available in this region.
  • Using the docker layer cache on bitbucket pipelines CI did not work — ๐Ÿž Dockerfile contained a FROM scratch statement, creating an empty layer image during the build. When pipelines tried to save this, a Docker bug is triggered. https://github.com/moby/moby/issues/38039.
  • Keycloak container was working on AWS Fargate, but got killed every few minutes. — ๐Ÿž Load balancer health check failed due to startup time not being generous enough.
  • App was hanging after keycloak login — ๐Ÿž Tried to fetch grant token from Keycloak after the redirect. However the frontend (and thus Keycloak) only allowed access from a restricted set of IPs, so the backend private subnet was not able to access it.
  • Docker-compose setup did not work correctly on Windows. — ๐Ÿž CRLF broke some of the bash setup scrpts that were copied into the Dockerfiles.

Web #

Various #

  • CUDA performance was severely diminished after switching to clang from nvcc — ๐Ÿž nvcc had higher default optimization flags…
  • Drone control surfaces were randomly moving, and motor started to spin, when only part of the plane was powered up — ๐Ÿž a uC which was not supposed to be powered was getting leak current through a pin and sent out random freak PWM signals
  • Sometimes saw huge roundtrip times through MAVLink network — ๐Ÿž mode of measurement was bogus. Packets would sometimes loop around, get duplicated and get returned to origin long after sent, even if the “straight path” packet arrived milliseconds after the request was sent.
  • https://github.com/elastic/elasticsearch/issues/8959 — ๐Ÿž actual bug in Elasticsearch

Unexpected data in test output #

“where did the elevation attribute come from?”

TEST(JsonProcessingTest, Simple) {
    const std::string source = R"({                                                                                                                                                                         "elevation": {                                                                                                                                                                                                          "value": -3.695                                                                                                                                                                                                 },
    "attr": "123123",
    "id": 123
})";

    LOG(ERROR) << jsoncons::pretty_print(Json::parse(source));
}

curiously outputted

{
   "elevation":{
      "value":-3.695
   },
   "attr":"123123",
   "id":123
}

๐Ÿž Did not scroll to the right in the editor window to see the random JSON stuff there which came from copying from a tmux window.

Debugging #

Initial debugging checklist #

  • Are you in the right directory?
  • Are you looking at the right source file?
  • Is it the right line of code?
  • Any misspelled variable names?
  • Symbol names confused?
  • Are you actually measuring what you think you are measuring?
  • Improperly initialized variables?
  • Invalid physical units?

Find missing dynamic libs #

readelf -d binary
ldd -v binary
lddtree binary # apt-get install pax-utils
LD_DEBUG=libs ./binary # IMPORTANT: output only shows search directories from RPATH if they actually exist

Bazel debugging #

  • show dependencies: bazel query "deps(@my_lib//:lib)"
  • add --verbose to linkopts
  • Show compiler/linker options: bazel run -s ...
  • get output binary path: bazel run --run_under=echo //src:my_target
  • show fully evaluated rule: bazel query --output=build @my_lib//:my_target