Quantcast
Viewing all articles
Browse latest Browse all 536

Intel® RealSense™ Technology and the Point Cloud

Download PDF

1. Introduction

If you develop graphical applications, at some point in your career you will come across the term ”Point Cloud,” and with respect to 3D programming this simply refers to a collection of vectors or points that represent a shape. In traditional 3D rendering, the points by themselves are not sufficient to provide a visual representation of the shape as they represent a single coordinate in space, not a volume or any association with neighboring points that might imply a surface. It is usually the programmer’s job to stitch these points together to form polygons or other surface-defining techniques to produce a solid render of the shape being represented.

Image may be NSFW.
Clik here to view.

Figure 1.A point cloud set representing a 3D donut shape rendered as dots.

Extensive information is available on capturing, manipulating, and representing point cloud datasets, but there is very little specific advice on how this concept can apply to creating Intel® RealSense™ applications with them.

This article will highlight some basic techniques, advice on APIs, and technology you can research that will hopefully give you a few more tools to add to your skillset. A basic knowledge of the Intel RealSense SDK, 3D programming, and geometry structures is recommended but not essential.

2. Why Is This Important

When you consider the raw data that comes from a typical depth camera, it is more accurate to say that this data is in fact a point cloud aligned within a regular grid, instead of a 3D shape. This subtle distinction provides a key to finding new and innovative solutions to current challenges we face today.

It is also fair to say that we have yet to solve the problem of accurately manipulating a virtual 3D space using just our hands, such as grabbing a virtual ball from the air or sculpting a clay statue. These types of activities seem to be a natural fit for Intel RealSense technology, but the creation of the techniques that would allow this is currently beyond the scope of most SDKs and falls upon programmers to innovate the solutions.

In addition to the collision possibilities mentioned above, another important aspect of understanding the raw depth data as a point cloud is that it allows us to weld this data together into much more accurate 3D representations of a shape. For example, it’s possible to scan a room from several vantage points and angles, collect the point data, and then stitch it together by detecting common points.

If you are not yet convinced that point clouds are a powerful medium to work in, I invite you to search the Internet for a video called “The Shipping Galleries - A 3D Point Cloud Fly Through” and “Real-time Rendering of Massive Unstructured Raw Point Clouds” and see how the real world can be virtualized.

Now imagine a technology that can utilize point cloud data generated in real time, not using the traditional approach of a 100 million point dataset. Imagine having realistic real-time depictions in your virtual world, controlling your virtual objects from the real world and creating previously unimagined solutions.

Image may be NSFW.
Clik here to view.

Figure 2.PerceptuCam is an Intel® RealSense™ conferencing app using point data to create a virtual you.

Getting in on the ground floor and understanding all there is to know about point clouds could be very valuable for future projects. How long do you think it will be before we see Google vans driving down the street capturing real-time point cloud data and transmitting it to the cloud for instant digestion by a million users navigating from A to B? How long before all security cameras in every major city integrate deep-scan depth capture equipment and host petabytes of point cloud data on free-to-use server farms? Point clouds will not remain static, and Intel RealSense technology is your ticket to working with real-time point cloud datasets before they become a widespread consumer resource.

3. How to Get Your Point Across

For an in-depth look at capturing, storing, and using 3D data from a depth camera, please refer to my earlier article that covers the subject from the perspective of generating 3D geometry, entitled ”Generating 3D From Depth Data” (https://software.intel.com/en-us/articles/perpetual-computing-generating-3d-from-depth-data).

Image may be NSFW.
Clik here to view.

Figure 3.An early prototype showing how to stitch 3D geometry from raw depth data.

The only difference between the 3D created in the former article and obtaining your point cloud now is that there’s no stitching step. Once you have determined the depth distance of each point (Z) in your fixed scanning grid (XY), the array in which you have stored these vectors becomes your point cloud dataset demonstrated by code examples 1 and 2 below

CODE EXAMPLE 1. Creating the point cloud data structure

// basic vector structure and point cloud dataset array
struct vec3
{
	float x;
	float y;
	float z;
}
vec3* dataset = new vec3[depthwidth*depthheight];
dataset[(y*depthwidth)+x].x=(float)x;
dataset[(y*depthwidth)+x].y=(float)y;
dataset[(y*depthwidth)+x].z=(float)depthdistance;

As a recap and summary, once you have initialized the depth camera and obtained a depth data stream, you can populate a 2D array of short (16-bit) values that contain the distance from the camera to a detected solid object. The size of the 2D array reflects the resolution of the depth data format you have chosen. At the time of writing, a number of depth cameras are available offering depth resolutions from 320x240 to 640x480 that produce a point cloud count of between 76,800 and 307,200 dots. Since you can reasonably assume each point consumes 12 bytes (4 bytes per vertex axis), you are looking at upwards of 900KB to store a single uncompressed point cloud dataset.

4. Add a Little Color

One aspect of the data coming from the depth camera that we’ve not covered is that the additional color stream is often a higher resolution image. The Intel RealSense SDK provides a mapping stream that provides a look-up to correlate the depth point with the color at that location.

CODE EXAMPLE 2. Expanded data structure with RGB component

// basic vector structure and point cloud dataset array
struct vec3
{
	float x;
	float y;
	float z;
  unsigned char red;
  unsigned char green;
  unsigned char blue;
}
vec3* dataset = new vec3[depthwidth*depthheight];
int datasetindex = (y*depthwidth)+x;
dataset[datasetindex].x=(float)x;
dataset[datasetindex].y=(float)y;
dataset[datasetindex].z=(float)depthdistance;
dataset[datasetindex].red=((*(DWORD*)colorStreamPtr)&0xFF0000)>>16;
dataset[datasetindex].green=((*(DWORD*)colorStreamPtr))&0xFF00)>>8;
dataset[datasetindex].blue=((*(DWORD*)colorStreamPtr))&0xFF);

Increasing your point data structure to include an RGB component allows you to reconstruct not only the shape but also the texture of the object. Only the most expensive ‘LiDAR’ hardware is able to capture color as well as laser-accurate depth information. So making use of this extra information from a consumer device is highly recommended if you want to create the best visual representation of what’s in front of the camera.

Image may be NSFW.
Clik here to view.

Figure 4.A snapshot of the color stream with distant depth pixels excluded from the render.

The extra color component would increase your per-point data structure by 24 bytes, which would more than double your point cloud dataset memory usage and subsequent transportation if you intended to store these point cloud packets. One way to reduce this burden is with simple RGB compression, whether you use a 565 format (5 bits for red, 6 bits for green, 5 bits for blue) or something more aggressive such as a palette and look-up index.

5. Uses for your Point Cloud Dataset

We will assume you have stored your dataset as a typical vector array and are ready to either visualize or control something. There are a number of techniques you can use, and a few of them are covered here.

Point Cloud API

If you want to hit the floor running, take a look at the well-known open source project called PCL (Point Cloud Library; http://pointclouds.org/), which contains a wealth of common point cloud operations categorized into areas of interest. Features include filters, key point detection, tree generation for sorting, segmentation, surface detection, 3D shape recognition, and a number of visualization techniques.

Image may be NSFW.
Clik here to view.

Figure 5.Features available in the Point Cloud Library (Copyright © PCL1) .

The specifics of these specialist modules go beyond the scope of this article, but with a little patience and some set-up pain you will be up and running. With a huge number of contributors across the industry and various branches off the main source code trunk, you will not be lacking for support and advice with this invaluable API.

Physics Manipulation

For the most part, many Intel RealSense applications are happy to extract something akin to a mouse pointer coordinate from the hand or head position and require no more from the depth data. With the power of point clouds, you can turn the hand into a real physics object, granting your application the same level of virtual control as you would have in the real world.

The technique would involve creating around 76,000 physics spheres (320x240 depth data) and drifting them into the real-time positions of the points in the dataset, eliminating any high energy motions and collisions as part of the process. The result is an accurate physics surface of the visible hand that is able to interact with other physics objects in the virtual world, lifting, pushing, grabbing, hitting, and poking your way through a whole new control system.

You can scale the dataset by controlling the number of data samples collected from the depth data to balance 3D hand resolution with overall processing cost. If you are familiar with modern GPU physics techniques, you can even stream the whole dataset into video memory and have a considerably higher level of granularity for your simulation.

Visual Rendering

Rendering point cloud data in a raw state resembles a series of tiny dots on a relatively empty screen that gives the impression of a ghostly 3D outline. Most applications would not find this method of visualization desirable and so there are a number of ways you can turn this in to something solid.

Image may be NSFW.
Clik here to view.

Figure 6.Even with a high concentration of points, the render is still difficult to see (Copyright © PCL1).

The first technique has already been covered in the earlier article entitled ”Generating 3D From Depth Data,” which basically involves creating a polygon from the three nearest points and progressing through the mesh in that fashion until all points are stitched together. This technique has a number of advantages and disadvantages, the primary disadvantage being the process makes no distinction about which shapes are separate. For example, your hand is separate from your head, but the basic stitch method does not understand this and assumes they’re a single uninterrupted surface. The other disadvantage is the processing cost of stitching so many points together to make polygons—a step that must be done every cycle and will negatively impact CPU and GPU resources. The final disadvantage is that your app will need additional memory to store this 3D mesh once generated because the final size is larger than that of the original point cloud dataset (which must reside in memory as well). The main advantage is that once the 3D mesh is generated, it shares all the benefits of regular geometry and can be textured, lit, and shaded as your application requires.

A more experimental technique is to sort the point cloud data into a search tree (see PCL again for details on these techniques), deleting the real-time point cloud data after it’s processed. Similar to the way Voxel technology works, you render your shape in screen-space, with each pixel triggering a read into the sorted point cloud search tree. If your view position and angle are fixed, this search can be very quick. With the addition of a more aggressive ray-casting search you can render the point cloud shape from any angle and position. Also, as the search can return the nearest viable point for each screen pixel, the gaps that normally accompany a raw rendering of the point cloud get filled in. For more information on KDTREEs and a starting point for researching more on this subject, check out the white paper entitled “The Quantized KD-Tree” (http://research.edm.uhasselt.be/tmertens/papers/qkdtree.pdf).

Automatic Point Cloud Mapping

Another exciting usage scenario of point cloud is that you can accurately detect unique markers within a single snapshot of a point cloud dataset and use that marker as an anchor to stitch a second point cloud. Running in real time and using automatic marker detection, you could create an application that will learn more about a shape the longer it remains in front of the camera. Take a mug, for example. A single snapshot of the front reveals at best 50% of the surface, leaving the reverse side devoid of point data. The snapshot is handed to another thread that begins the work of identifying markers within the point cloud (handle, rim, indentations, planes, cylinders, etc.) and stores them for future reference.

Image may be NSFW.
Clik here to view.

Figure 7.Point data can be reduced to simple planes and cylinders making ideal markers (Copyright © PCL1).

Subsequent deliveries of point cloud data repeat this process, and a second process begins to match up markers. Only those that exhibit a high correlation are instructed to stitch to the real-time point cloud data. In theory, when you lift up any object and show the computer all sides, your software will have a complete understanding of the entire shape.

3D Photocopier

With the availability of consumer 3D printing devices, it is now possible to turn a virtual 3D mesh into an object in the real world by printing it out as an actual solid item. Using the above technique, almost any handheld object can be scanned in seconds by the depth camera, inspected in the software, and then converted to a format ready for your 3D printer. Imagine losing a piece from your favorite chess set, holding up the matching opposing piece, and scanning it while you turn it around in your hand.

Image may be NSFW.
Clik here to view.

Figure 8.Just hold the object out in front of you, and let the computer do the rest.

For additional fidelity, your software would discriminate between the color of your hand, fingers, face, and the predominant backdrop hues. Perhaps when the app begins, it has a short “calibration step”, taking the form of a ”wake-up wave to the camera.”

In practice, this type of uncontrolled, free-form scanning will never produce a result as accurate as a professional scanning lab or a simple potter’s wheel turn-style, but by employing point cloud filters (see PCL for more information on outliner and noise reduction filters) you can produce a good quality sealed 3D mesh given a sufficient number of samples. This process is helped greatly by the fact the depth camera can stream up to 60 frames per second when the color stream is deactivated, producing many micro-snapshots and more opportunities for a clever marker-detection algorithm to do its work.

3D Shape Detection

Taking this technique a bit further, once you have a database of marker references and enough of the associated point clouds to reconstruct the 3D object, you have the makings of a 3D object detection system. You can create software that caches a full shape, and the next time that shape is shown to the camera, a hundred red flags go up as the computer recognizes the many markers it has on record for that object.

Image may be NSFW.
Clik here to view.

Figure 9.Given enough markers, you can detect this mug from the handle and cylinder (Copyright © PCL1).

The opportunities available to software engineers once we solve the challenge of instant object detection will be immense. Right now, the computer is able to know only a few words and gestures, like a well-trained, but half-blind, dog. With the power to distinguish between objects, and using context to label those observations your computer transitions from a crude device that makes generalised conclusions to a smart device that is capable of being very specific, creating the possibility of some very interesting projects.

Moving past the detection of a cup, the software could detect facial features. The more a person sits in front of the computer and the more markers are associated with that ”object,” the faster the computer will be able to recognize that person.

6. Tricks and Tips

Do’s

  • Before implementing any point cloud dataset techniques, always implement a method to render the raw point cloud dots on screen as this will act as a good debug view during development and also a reference that the camera is producing what it should.
  • If you decide to use the Point Cloud Library, you should first set up and compile the provided examples and become confident that the library calls are working. The PCL library has many dependencies, so it may be easier to port your Intel RealSense application into an existing PCL example instead of the other way around.
  • Before embarking on writing your own point-cloud-to-3D-visual technique, spend a few hours researching the wealth of very good techniques that already exist such as Delaunay Triangulation, Ball-Pivoting Algorithm, Poisson Surface Reconstruction, and Alpha Shapes.
  • MeshLab is another good open source library you can use to convert point cloud into 3D meshes, and it comes with a host of very cool clean-up algorithms to help you seal and polish your resulting meshes.

Don’ts

  • Do not attempt to process real-time point cloud datasets with intense tree generation code that is designed to make the subsequent point access quick. These types are ideally meant for a pre-bake step and not suitable for real-time processing. Where possible, try out the generation technique separately in a prototype before relying on it in your main software.
  • Do not try to create point cloud data within a mutex lock as this will slow down the performance of the whole application as the specific threads stall. Best to create a chain of point cloud allocations that allow the depth stream to produce them as fast as possible and use a second thread to perform any intense data manipulation on the dataset elsewhere.
  • Do not lose track of the memory and file storage your application will start to demand. Typical point cloud applications consume vast quantities of both memory and hard drive space, which can easily get out of control. Plan your resource budget in advance.

7. Summary

Perhaps in the not too distant future, we will walk into our office/bedroom/study and sit down to be greeted with “Hello Lee, welcome back. You’re not wearing your glasses today; do you WANT eyestrain?” the voice says testily. “Computer, I AM wearing my glasses,” I reply. “Yes, I see them now, sorry about that,” the computer intones.

For many years now, I’ve had a strong belief that computers and robots can be trained in some small way. Right now we essentially kill our computing devices at the end of each day. We power them down, wipe their brains, and start them up again the next day. We may load and unload the many burdens we want our electronic donkey to carry, but the donkey itself is a mindless drone incapable of remembering who its owner is and cares less. Could we not teach it to recognize us with a mere glance, passively soak up and create a neural net of point cloud references, and connect those with its other senses such as time, location, and what the device is doing. How easy then to program it with some basic human idiosyncrasies such as recognizing a well-trodden pattern, “Hey Lee, do you realize you’ve been wearing the same shirt for seven days. I’m glad I don’t have a nose!” How easier still to replace traditional programming with direct visual communication, “No computer, THIS is a tablet, not a phone,” you respond. “Show me properly!” requests the computer.

All this may sound like science fiction and one programmer’s wild imaginings, and a few years ago I would have agreed with you. The difference today is that we have cloud storage capacity to record years of computer experience, both individually and collectively. We now have sensors that allow the computer to see you and hear you, and we have the processing power to make it all happen. The only missing ingredient now is the intrepid pioneer, who having read this, does not say “that’s totally crazy” or “it’ll never happen.” But thought to themselves “don’t mind if I do.”

About The Author
When not writing articles, Lee Bamber is the CEO of The Game Creators (http://www.thegamecreators.com), a British company that specializes in the development and distribution of game creation tools. Established in 1999, the company and surrounding community of game makers are responsible for many popular brands including Dark Basic, The 3D Game Maker, FPS Creator, App Game Kit (AGK) and most recently, Guru.

1A special thanks to POINTCLOUDS.ORG for sharing their website images via the Creative Commons Attributions 3.0. As required by the license, please find a link to the license (http://creativecommons.org/licenses/by/3.0/), and it’s confirmed that no changes were made to the images with a PCL copyright.

Notices
Intel, the Intel logo, and Intel RealSense are trademarks of Intel Corporation in the U.S. and/or other countries.
Copyright © 2015 Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.


Viewing all articles
Browse latest Browse all 536

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>